The examples here will use a subset of the reuters dataset dataset, which we include with precomputed character level ngram features and binary labels indicating inclusion in the acq topic subset. The data is arranged in LIBSVM format, which is compatible with all the learning algorithms used in these examples. The results shown here should be easily reproducible and serve as a good first exercise.

Note that these exercises have been constructed the mechanics behind using the automatic kernel selection tools. The kernels and parameters and used here are NOT necessarily the best for the one dataset used here.

**Correlation Kernel Example:** The correlation based kernel weights each input feature by a quantity proportional to its correlation with the training labels. The following command will generate weighted features:

$ klweightfeatures --weight_type=corr --features --sparse --num_train=300 acq.10-gram 1 > acq.10-gram.corr INFO: Loaded 466 datapoints. INFO: Selecting 300 training examples... INFO: Using 148465 features.

The `--features`

flag forces the output of explicit feature vectors, rather than the kernel matrix, and the `--sparse`

flag forces the use of sparse data-structure, which are both desirable in this case since the ngram-features are sparse. The `--num_train`

flag indicates that the kernel selection algorithm should use only the first 300 data-points for training, and thus allows us to use the remaining points as a holdout set for evaluating performance. The `--alg_reg`

flag lets the kernel selection algorithm know the regularization value of the Finally, the `--weight_type`

flag selects which type of kernel selection algorithm is used. The first argument indicates the input dataset and the second argument regularizes the kernel which, in the case of the correlation kernel, restricts the kernel trace to equal 1.

The weighted features can then be used to train and test an svm model via libsvm or liblinear:

Separate train and test:

$ head -n300 acq.10-gram.corr > acq.10-gram.corr.train $ tail -n+301 acq.10-gram.corr > acq.10-gram.corr.test

Train:

$ svm-train -s 0 -t 0 -c 2048 acq.10-gram.corr.train model

Test:

$ svm-predict acq.10-gram.corr.test model out Accuracy = 85.5422% (142/166) (classification)

**L2 Regularized Linear Combination:** Here we optimally weight the input features in order to maximize the kernel ridge regression (KRR) objective, subject to the L2 regularization constraint: `||mu - mu0|| < ker_reg||`

, where `mu`

is the vector of squared weights and `mu0`

and `ker_reg`

are user specified arguments.

$ klweightfeatures --weight_type=lin2 --features --sparse --num_train=300 --tol=0.001 --alg_reg=2 acq.4-gram 1 > alg_reg=4-gram.lin2 ... INFO: iter: 11 obj: 9.68861 gap: 0.00396578 INFO: iter: 12 obj: 9.71034 gap: 0.00199578 INFO: iter: 13 obj: 9.72107 gap: 0.000998093

The algorithm will iterate until the tolerance, which is set by the `--tol`

flag, or maximum number of iterations is met. In this case `mu0`

is equal to zero (the default) and `ker_reg`

is specified by the second argument to the function. Since this selection is algorithm specific, we should also specify the regularization parameter we will use in the second step via the `--alg_reg`

flag.

(NOTE: the time taken for each iteration will be improved soon. Their still remains some optimization to be made to the sparse-matrix data-structure).

We then train and test use (KRR), with input and output arguments that have been made to closely resemble libsvm. One main difference is that the user must specify to use sparse data-structures. If the data is indeed dense, it is better to use highly efficient dense blas routines instead by omitting the `--sparse`

flag. To see a full list of command line arguments, run krr-train without any parameters.

Separate train and test:

$ head -n300 acq.4-gram.lin2 > acq.4-gram.lin2.train $ tail -n+301 acq.4-gram.lin2 > acq.4-gram.lin2.test

Train:

$ krr-train --sparse acq.4-gram.lin2.train 2 model

Test:

$ krr-predict --sparse acq.4-gram.lin2.test model out INFO: Using primal solution to make predictions... INFO: RMSE: 0.761885

In this example we find the best linear combination of 10 character-level ngram kernels (1-gram, 2-gram, ..., 10-gram) with respect to the SVM objective.

klcombinefeatures ...

This will produce a kernels with many features, but which are sparse, thus liblinear is a good choice for training a model:

Separate train and test:

head -n1000 electronics.class.corr > electronics.class.corr.train tail -n+1001 electronics.class.corr > electronics.class.corr.test

Train:

train ...

Test:

predict ...

-- AfshinRostamizadeh - 24 Aug 2009

Topic revision: r4 - 2009-08-26 - AfshinRostamizadeh

Copyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.

Ideas, requests, problems regarding TWiki? Send feedback

Ideas, requests, problems regarding TWiki? Send feedback