Documentation and SupportThis page consists of documentation to get started, and expected data layout for various algorithms. At the moment, it is embarassingly small.
InstallationPlease look at the INSTALL file in the top level directory.
Dataset LayoutMaTEx supports sparse data format. The dataset is expected to have one sample/vector on each line, separated by , or :.
Sample Data Layout for Classification AlgorithmsDataset(s) for Classification algorithms are expected to adhere to libsvm sparse data format. An example is here, where each line in the dataset is expected to look like as follows:
Sample Data Layout for Clustering AlgorithmsDataset(s) for Clustering algorithms are expected to follow a sparse data format as follows. Since datasets for clustering do not have class variable, each line in the dataset is expected to look like as follows:
Sample Data Layout for Association Rule Mining (ARM) AlgorithmsDataset(s) for ARM are expected to follow a sparse data format as follows. The datasets for ARM are not expected to have val associated with column. Hence, each line in the dataset is expected to look like:
Running MaTEx algorithmsEach algorithm requires different parameters:
SVM ExampleSVM requires a training set and a testing set in the libsvm format (see above). The hyperparameters need to be provided as well (C and sigmasqr). As an example to run svm with 16 processes on adult training set (a9a) and testing set (a9a.t) with parameters C and sigmasqr, 32 and 64, respectively:
mpirun -np 16 ./smo a9a a9a.t 32 64