Springer Nature
Browse
1/1
15 files

C++ GPU implementation of the Boolean matrix factorization algorithm C-Salt

dataset
posted on 2017-12-29, 18:19 authored by Sibylle Hess, Katharina Morik
The dataset comprises of the C++ GPU implementation of the Boolean matrix factorization algorithm C-Salt. The ReadMe file which is included describes how the implementation can be set up and used.

Two of the files, the IJulia notebooks with filenames CSaltEvalRealWorld.ipynb and CSaltGenerateSynthData.ipynb can be used to generate synthetic data as proposed in the paper, and to evaluate the quality measurements for results on the submitted text data.

The data includes two IPython (IPYNB) notebook files, 6 tab separated value (.tsv) files, two .hpp files and two .cu files. The IPYNB files can be exported to .HTML, .PDF, reStructuredText, and LaTeX formats. .TSV files can be opened using open source text editors. .HPP is a header format used by C++, and .cu are associated with the NVIDIA CUDA Toolkit.

Abstract:
Given labelled data represented by a binary matrix, we consider the task to derive a Boolean matrix factorization which identifies commonalities and specifications among the classes. While existing works focus on rank-one factorizations which are either specific or common to the classes, we derive class-specific alterations from common factorizations as well. Therewith, we broaden the applicability of our new method to datasets whose class-dependencies have a more complex structure. On the basis of synthetic and real-world datasets, we show on the one hand that our method is able to alter structure which corresponds to our model assumption, and on the other hand that our model assumption is justified in real-world application. Our method is parameter-free.

History

Research Data Support

Research data support provided by Springer Nature.