%0 Generic
%A Ates, Emre
%A Tuncer, Ozan
%A Turk, Ata
%A Leung, Vitus J.
%A Brandt, Jim
%A Egele, Manuel
%A Coskun, Ayse K.
%D 2018
%T Artifact for Taxonomist: Application Detection through Rich Monitoring Data
%U https://springernature.figshare.com/articles/dataset/Artifact_for_Taxonomist_Application_Detection_through_Rich_Monitoring_Data/6384248
%R 10.6084/m9.figshare.6384248.v1
%2 https://springernature.figshare.com/ndownloader/files/11748524
%2 https://springernature.figshare.com/ndownloader/files/11748527
%2 https://springernature.figshare.com/ndownloader/files/11748515
%2 https://springernature.figshare.com/ndownloader/files/11748680
%2 https://springernature.figshare.com/ndownloader/files/11748521
%2 https://springernature.figshare.com/ndownloader/files/11748512
%2 https://springernature.figshare.com/ndownloader/files/12297428
%K Application Detection
%K Supercomputing
%K Monitoring
%K Security
%K Cryptocurrency
%K HPC
%K supervised learning
%K machine learning
%K classifier
%K taxonomist
%K monitoring data
%X Code, documentation, data and Jupyter Notebook associated with the publication "Taxonomist: Application Detection Through Rich Monitoring Data" for the European Conference on Parallel Processing 2018.
The related study develops a technique named 'Taxonomist' to identify applications running on supercomputers, using machine learning to classify known applications and detect unknown applications. The technique uses monitoring data such as CPU and memory usage metrics and hardware counters collected from supercomputers. The aims of this technique include providing an alternative to 'naive' application detection methods based on names of processes and scripts, and helping prevent fraud, waste and abuse in supercomputers.
- metadata.csv: A csv file listing each run, the IDs of the nodes on which each run executed, which application was executed with which inputs, the start and end times and the duration of the applications.
- timeseries.tar.bz2: A bzip2 compressed file containing the data collected. The uncompressed size is 16 GB, it is not necessary to uncompress for most of the notebook.
- features.hdf: A HDF5 File containing the pre-calculated features. The calculation process is included in the notebook.