{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Taxonomist: Application Detection through Rich Monitoring Data\n", "### Emre Ates1, Ozan Tuncer1, Ata Turk1, Vitus J. Leung2, Jim Brandt2, Manuel Egele1, Ayse K. Coskun1\n", "1 Department of Electrical and Computer Engineering, Boston University\n", "\n", "2 Sandia National Laboratories\n", "\n", "In the 24th International European Conference on Parallel and Distributed Computing\n", "\n", "This Jupyter notebook contains a subset of the raw data collected for the Euro-Par paper with the same title, and necessary code for generating models for detecting applications and testing these models. Our Euro-Par paper can be accessed from www.bu.edu/peaclab/publications after the camera-ready paper submission.\n", "\n", "The accompanying files are:\n", "* `requirements.txt`: A list of python packages required, which can be installed by the command `pip install -r requirements.txt`. Python 3 is required, and this document was created using Python 3.6.5\n", "* `README.pdf`: The documentation for setting up this notebook\n", "* `notebook.ipynb`: The interactive Jupyter Notebook\n", "* `notebook.html`: The static Jupyter Notebook\n", "* `data/`: The monitoring data collected from different applications executed on Volta.\n", " - `metadata.csv`: A csv file listing each run, the IDs of the nodes where each run executed on, which application was executed with which inputs, etc.\n", " - `timeseries.tar.bz2`: A bzip2 file containing the data collected from the nodes. The unpacked size is **16 GB**, so unpacking is optional for running this notebook.\n", " - `features.hdf`: An [HDF5 file](https://support.hdfgroup.org/HDF5/) containing pre-calculated features. The calculation process is also included in this notebook.\n", "\n", "Things not included in this notebook are:\n", "* The code required for tuning the classifiers. The tuning can be performed by using `sklearn.model_selection.GridSearchCV` as the estimator for `Taxonomist`.\n", "* Code for generating the result for the baselines. The power metric used is `power(W)_cray_aries_r`, which is included in the data. However, generating the features for the baselines require R packages, which is difficult to set up so we excluded that part.\n", "\n", "Please refer to `README.pdf` for a full list of differences between our Euro-Par paper and the artifacts in this notebook." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup of the Notebook\n", "The packages are imported here, as well as any constants the user of the notebook has to set." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Import any packages\n", "from ast import literal_eval\n", "import multiprocessing\n", "import os\n", "from pathlib import Path\n", "import warnings\n", "\n", "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "import scipy.stats\n", "from sklearn.ensemble import RandomForestClassifier\n", "from sklearn.exceptions import UndefinedMetricWarning\n", "from sklearn.metrics import precision_recall_fscore_support\n", "from sklearn.metrics import f1_score\n", "from sklearn.model_selection import StratifiedKFold\n", "from tqdm import tqdm_notebook as tqdm\n", "from tqdm import TqdmSynchronisationWarning\n", "\n", "import taxonomist\n", "\n", "warnings.filterwarnings(\"ignore\", category=TqdmSynchronisationWarning)\n", "warnings.filterwarnings(\"ignore\", category=UndefinedMetricWarning)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Constants used\n", "# The path that the package is extracted to\n", "DATA_DIR = Path('./data').expanduser().absolute()\n", "# The number of processes to use for parallel parts\n", "N_JOBS = multiprocessing.cpu_count()\n", "# The base estimator for taxonomist. Can be anything that implements\n", "# A `fit` and `predict_proba` or `decision_function` method.\n", "BASE_ESTIMATOR = RandomForestClassifier(n_estimators=200)\n", "# Whether to use the pre-generated features or generate new ones\n", "GENERATE_FEATURES = False\n", "# Whether to unpack the timeseries and plot them or not\n", "UNPACK_TIMESERIES = False \n", "# The (name, function) tuples for the features used. These are\n", "# what are used for the paper, but new features can be added.\n", "FEATURE_TUPLES = [\n", " ('max', np.max),\n", " ('min', np.min),\n", " ('mean', np.mean),\n", " ('std', np.std),\n", " ('skew', scipy.stats.skew),\n", " ('kurt', scipy.stats.kurtosis),\n", " ('perc05', lambda x: np.percentile(x, 5)),\n", " ('perc25', lambda x: np.percentile(x, 25)),\n", " ('perc50', lambda x: np.percentile(x, 50)),\n", " ('perc75', lambda x: np.percentile(x, 75)),\n", " ('perc95', lambda x: np.percentile(x, 95))\n", "]\n", "# The default figure size\n", "plt.rcParams['figure.figsize'] = [10, 5]\n", "# Turn on grid for figures\n", "plt.rcParams['axes.grid'] = True" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Helper Functions\n", "def report_results(test_labels, predictions):\n", " print(\"Precision: {0:.3f}, Recall: {1:.3f}, F-Score: {2:.3f}\".format(\n", " *precision_recall_fscore_support(test_labels, predictions, \n", " average='weighted')))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Overview of the Data\n", "\n", "The `metadata` file contains metadata about each application run included in this artifact." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | platform | \n", "app | \n", "input | \n", "node_ids | \n", "start_time | \n", "end_time | \n", "unwanted | \n", "duration | \n", "
---|---|---|---|---|---|---|---|---|
run_id | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
59e5012b57f3f44ead63f567 | \n", "volta | \n", "ft | \n", "Y | \n", "[59e5012b57f3f44ead63f563, 59e5012b57f3f44ead6... | \n", "1507373566 | \n", "1507374450 | \n", "0.0 | \n", "884 | \n", "
59e5012b57f3f44ead63f56c | \n", "volta | \n", "ft | \n", "Y | \n", "[59e5012b57f3f44ead63f568, 59e5012b57f3f44ead6... | \n", "1507374465 | \n", "1507375341 | \n", "0.0 | \n", "876 | \n", "
59e5012c57f3f44ead63f571 | \n", "volta | \n", "ft | \n", "Y | \n", "[59e5012c57f3f44ead63f56d, 59e5012c57f3f44ead6... | \n", "1506904997 | \n", "1506905863 | \n", "0.0 | \n", "866 | \n", "
59e5012d57f3f44ead63f576 | \n", "volta | \n", "ft | \n", "Y | \n", "[59e5012c57f3f44ead63f572, 59e5012c57f3f44ead6... | \n", "1506797409 | \n", "1506798296 | \n", "0.0 | \n", "887 | \n", "
59e5012d57f3f44ead63f57b | \n", "volta | \n", "ft | \n", "Y | \n", "[59e5012d57f3f44ead63f577, 59e5012d57f3f44ead6... | \n", "1506904157 | \n", "1506905026 | \n", "0.0 | \n", "869 | \n", "
\n", " | AR_NIC_NETMON_ORB_EVENT_CNTR_REQ_FLITS_metric_set_nic | \n", "AR_NIC_NETMON_ORB_EVENT_CNTR_REQ_PKTS_metric_set_nic | \n", "AR_NIC_NETMON_ORB_EVENT_CNTR_REQ_STALLED_metric_set_nic | \n", "AR_NIC_RSPMON_PARB_EVENT_CNTR_AMO_BLOCKED_metric_set_nic | \n", "AR_NIC_RSPMON_PARB_EVENT_CNTR_AMO_FLITS_metric_set_nic | \n", "AR_NIC_RSPMON_PARB_EVENT_CNTR_AMO_PKTS_metric_set_nic | \n", "AR_NIC_RSPMON_PARB_EVENT_CNTR_BTE_RD_BLOCKED_metric_set_nic | \n", "AR_NIC_RSPMON_PARB_EVENT_CNTR_BTE_RD_FLITS_metric_set_nic | \n", "AR_NIC_RSPMON_PARB_EVENT_CNTR_BTE_RD_PKTS_metric_set_nic | \n", "AR_NIC_RSPMON_PARB_EVENT_CNTR_IOMMU_BLOCKED_metric_set_nic | \n", "... | \n", "procs_running_procstat | \n", "slabs_scanned_vmstat | \n", "softirq_count_procstat | \n", "softirq_procstat | \n", "sys_procstat | \n", "unevictable_pgs_culled_vmstat | \n", "unevictable_pgs_mlocked_vmstat | \n", "unevictable_pgs_munlocked_vmstat | \n", "unevictable_pgs_rescued_vmstat | \n", "user_procstat | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
#Time | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
1507387966 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "159 | \n", "159 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "1 | \n", "0 | \n", "22690 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
1507387967 | \n", "9 | \n", "3 | \n", "0 | \n", "0 | \n", "159 | \n", "159 | \n", "0 | \n", "2 | \n", "2 | \n", "0 | \n", "... | \n", "1 | \n", "0 | \n", "22625 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
1507387968 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "159 | \n", "159 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "1 | \n", "0 | \n", "22675 | \n", "0 | \n", "2 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
1507387969 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "159 | \n", "159 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "1 | \n", "0 | \n", "22458 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
1507387970 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "159 | \n", "159 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "1 | \n", "0 | \n", "22641 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
5 rows × 563 columns
\n", "\n", " | max_AR_NIC_NETMON_ORB_EVENT_CNTR_REQ_FLITS_metric_set_nic | \n", "min_AR_NIC_NETMON_ORB_EVENT_CNTR_REQ_FLITS_metric_set_nic | \n", "mean_AR_NIC_NETMON_ORB_EVENT_CNTR_REQ_FLITS_metric_set_nic | \n", "std_AR_NIC_NETMON_ORB_EVENT_CNTR_REQ_FLITS_metric_set_nic | \n", "skew_AR_NIC_NETMON_ORB_EVENT_CNTR_REQ_FLITS_metric_set_nic | \n", "kurt_AR_NIC_NETMON_ORB_EVENT_CNTR_REQ_FLITS_metric_set_nic | \n", "perc05_AR_NIC_NETMON_ORB_EVENT_CNTR_REQ_FLITS_metric_set_nic | \n", "perc25_AR_NIC_NETMON_ORB_EVENT_CNTR_REQ_FLITS_metric_set_nic | \n", "perc50_AR_NIC_NETMON_ORB_EVENT_CNTR_REQ_FLITS_metric_set_nic | \n", "perc75_AR_NIC_NETMON_ORB_EVENT_CNTR_REQ_FLITS_metric_set_nic | \n", "... | \n", "min_user_procstat | \n", "mean_user_procstat | \n", "std_user_procstat | \n", "skew_user_procstat | \n", "kurt_user_procstat | \n", "perc05_user_procstat | \n", "perc25_user_procstat | \n", "perc50_user_procstat | \n", "perc75_user_procstat | \n", "perc95_user_procstat | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
59e5012b57f3f44ead63f563 | \n", "251708171 | \n", "0 | \n", "8.488983e+07 | \n", "9.734840e+07 | \n", "0.633594 | \n", "-1.215991 | \n", "0.0 | \n", "47.0 | \n", "31054906.0 | \n", "175712369.0 | \n", "... | \n", "3167 | \n", "3197.250980 | \n", "9.724157 | \n", "-0.062776 | \n", "0.334579 | \n", "3182.2 | \n", "3191.0 | \n", "3198.0 | \n", "3203.0 | \n", "3214.0 | \n", "
59e5012b57f3f44ead63f564 | \n", "251707494 | \n", "0 | \n", "8.488958e+07 | \n", "9.247342e+07 | \n", "0.595364 | \n", "-1.207735 | \n", "0.0 | \n", "8.0 | \n", "46449508.0 | \n", "173588664.0 | \n", "... | \n", "3167 | \n", "3197.156863 | \n", "10.912796 | \n", "-0.042226 | \n", "-0.226500 | \n", "3180.0 | \n", "3189.0 | \n", "3199.0 | \n", "3205.0 | \n", "3215.0 | \n", "
59e5012b57f3f44ead63f565 | \n", "251707555 | \n", "0 | \n", "8.488960e+07 | \n", "9.244100e+07 | \n", "0.595058 | \n", "-1.207605 | \n", "0.0 | \n", "10.0 | \n", "47638390.0 | \n", "172981795.0 | \n", "... | \n", "3168 | \n", "3196.980392 | \n", "9.004916 | \n", "-0.030884 | \n", "0.830260 | \n", "3179.2 | \n", "3193.0 | \n", "3197.0 | \n", "3201.0 | \n", "3214.0 | \n", "
59e5012b57f3f44ead63f566 | \n", "251707494 | \n", "0 | \n", "8.488958e+07 | \n", "1.045421e+08 | \n", "0.667807 | \n", "-1.315681 | \n", "0.0 | \n", "8.0 | \n", "76.0 | \n", "198788583.0 | \n", "... | \n", "3174 | \n", "3197.345098 | \n", "7.810576 | \n", "-0.027800 | \n", "0.599669 | \n", "3184.2 | \n", "3193.0 | \n", "3199.0 | \n", "3200.0 | \n", "3212.0 | \n", "
59e5012b57f3f44ead63f568 | \n", "251708171 | \n", "0 | \n", "8.578694e+07 | \n", "9.444057e+07 | \n", "0.595700 | \n", "-1.219924 | \n", "0.0 | \n", "47.0 | \n", "45645214.0 | \n", "171000617.0 | \n", "... | \n", "3166 | \n", "3197.034346 | \n", "8.536433 | \n", "0.119504 | \n", "0.763120 | \n", "3183.8 | \n", "3192.0 | \n", "3197.0 | \n", "3201.0 | \n", "3212.0 | \n", "
5 rows × 6193 columns
\n", "