Difference: LearningKernels (1 vs. 13)

Revision 132010-04-09 - CyrilAllauzen

Line: 1 to 1
 

Learning Kernels

Added:
>
>
  -- AfshinRostamizadeh - 24 Aug 2009

Revision 122009-09-24 - AfshinRostamizadeh

Line: 1 to 1
 

Learning Kernels

Changed:
<
<
>
>
  -- AfshinRostamizadeh - 24 Aug 2009
Added:
>
>
META TOPICMOVED by="AfshinRostamizadeh" date="1253752022" from="Kernel.AutomaticKernelSelection" to="Kernel.LearningKernels"

Revision 112009-09-18 - AfshinRostamizadeh

Line: 1 to 1
Changed:
<
<

Automatic Kernel Selection

>
>

Learning Kernels

 
Deleted:
<
<
TODO: move into its own place
 -- AfshinRostamizadeh - 24 Aug 2009

Revision 102009-09-16 - AfshinRostamizadeh

Line: 1 to 1
 

Automatic Kernel Selection

Added:
>
>
TODO: move into its own place
 

-- AfshinRostamizadeh - 24 Aug 2009

Revision 92009-09-10 - AfshinRostamizadeh

Line: 1 to 1
 

Automatic Kernel Selection

Deleted:
<
<
Here we give some examples showing how to automatically create custom kernels using data. The kernels are generally created from a combination of base kernels, which can be specified by the user.
 
Changed:
<
<
The examples here will use the electronics category from the sentiment analysis dataset of Blitzer et al., which we include with precomputed word level ngram features and binary as well as regression labels (1-5 stars). The data is arranged in LIBSVM format, which is compatible with all the learning algorithms used in these examples. The results shown here should be easily reproducible and serve as a good first exercise.

Note that these exercises have been constructed the mechanics behind using the automatic kernel selection tools. The kernels and parameters and used here are NOT necessarily the best for the one dataset used here.

Feature Weighted Kernels

The examples below consider the case when each base kernel corresponds to a single features. Such a set of base kernels occurs naturally when, for example, learning rational kernels as explained in Cortes et al. (MLSP 2008).

Correlation Kernel Example: The correlation based kernel weights each input feature by a quantity proportional to its correlation with the training labels. The following command will generate weighted features:

$ klweightfeatures  --weight_type=corr --features --sparse --num_train=1000 elec.2-gram 1 \
  > elec.2-gram.corr
INFO: Loaded 2000 datapoints.
INFO: Selecting 1000 training examples...
INFO: Using 57466 features.

The --features flag forces the output of explicit feature vectors, rather than the kernel matrix, and the --sparse flag forces the use of sparse data-structure, which are both desirable in this case since the ngram-features are sparse. The --num_train flag indicates that the kernel selection algorithm should use only the first 1000 data-points for training, and thus allows us to use the remaining points as a holdout set for evaluating performance. The --alg_reg flag lets the kernel selection algorithm know the regularization value of the Finally, the --weight_type flag selects which type of kernel selection algorithm is used. The first argument indicates the input dataset and the second argument regularizes the kernel which, in the case of the correlation kernel, restricts the kernel trace to equal 1.

The weighted features can then be used to train and test an svm model via libsvm or liblinear:

Separate train and test:

$ head -n1000 elec.2-gram.corr > elec.2-gram.corr.train
$ tail -n+1001 elec.2-gram.corr > elec.2-gram.corr.test

Train:

$ svm-train -s 0 -t 0 -c 4096 elec.2-gram.corr.train model

Test:

$ svm-predict elec.2-gram.corr.test model pred
Accuracy = 80% (800/1000) (classification)

L2 Regularized Linear Combination: Here we optimally weight the input features in order to maximize the kernel ridge regression (KRR) objective, subject to the L2 regularization constraint: ||mu - mu0|| < ker_reg||, where mu is the vector of squared weights and mu0 and ker_reg are user specified arguments.

$ klweightfeatures --weight_type=lin2 --features --sparse --num_train=1000 \
    --alg_reg=4 --offset=1 --tol=1e-4 elec.1-gram.reg 1 > elec.1-gram.lin2
INFO: Loaded 2000 datapoints.
INFO: Selecting 1000 training examples...
...
INFO: iter: 18 obj: 333.919 gap: 0.000123185
INFO: iter: 19 obj: 333.922 gap: 6.15492e-05
INFO: Using 12876 features.

The algorithm will iterate until the tolerance, which is set by the --tol flag, or maximum number of iterations is met. In this case mu0 is equal to zero (the default) and ker_reg is specified by the second argument to the function. Since this selection is algorithm specific, we should also specify the regularization parameter we will use in the second step via the --alg_reg flag. The --offset flag adds the constant indicated offset to the dataset input if one is not already included. Finally, the =--tol flag indicates at what precision the iterative method should stop.

(NOTE: the time taken for each iteration will be improved soon. There still remains some optimization to be made to the sparse-matrix data-structure).

We then train and test using kernel ridge regression (KRR), with input and output arguments that have been made to closely resemble libsvm. One main difference is that the user must specify to use sparse data-structures. If the data is dense, it is better to use highly efficient dense blas routines instead by omitting the --sparse flag. To see a full list of command line arguments, run krr-train without any parameters.

Separate train and test:

$ head -n1000 elec.1-gram.lin2 > elec.1-gram.lin2.train
$ tail -n+1001 elec.1-gram.lin2 > elec.1-gram.lin2.test

Train:

$ krr-train --sparse elec.1-gram.lin2.train 4 model

Test:

$ krr-predict --sparse elec.1-gram.lin2.test model pred
INFO: Using primal solution to make predictions...
INFO: RMSE: 1.34898

Kernel Combinations with Explicit Features

Here we consider the case of combining several general base kernels that admit explicit feature mappings. In the case that these features are sparse, for example, we are able to very efficiently compute combinations in high dimensional features spaces.

In this example we find the best linear combination of 5 character-level ngram kernels (1-gram, 2-gram, ..., 5-gram) with respect to the SVM objective.

$ klcombinefeatures --weight_type=lin1 --sparse --num_train=1000 \
    --alg_reg=0.1 elec.list elec.comb
...
INFO: iter: 9 constraint: -5.65079 theta: -5.65044 gap: 6.16132e-05
     16:   objval =  -5.650437235e+00   infeas =   1.000000000e+00 (0)
     17:   objval =  -5.650600323e+00   infeas =   0.000000000e+00 (0)
OPTIMAL SOLUTION FOUND
.*
optimization finished, #iter = 12
Objective value = -5.650654
nSV = 792
INFO: iter: 10 constraint: -5.65065 theta: -5.6506 gap: 9.48414e-06

Here the argument elec.list is a file with the paths to each basekernel written on a separate line, and the combined kernel is written to the file elec.comb. The flag --alg_reg indicates the regularization parameter that will be used with SVM.

This will produce a kernel with many features, but which are sparse, thus liblinear is a good choice for training a model:

Separate train and test:

$ head -n1000 elec.comb > elec.comb.train
$ tail -n+1001 elec.comb > elec.comb.test

Train:

$ train -s 3 -c 0.1 -B -1 elec.comb.train model

Test:

$ predict elec.comb.test model pred
Accuracy = 83.5% (835/1000)

General Kernel Combinations

The final example listed here is regarding general combinations of kernels, where we combine the kernel matrices of the 5 ngram kernels. Of course, in practice this general kernel combination should be used when easy to represent explicit feature mapping are not available, such as with Guassian kernels.

klcombinekernels ...
$ klcombinekernels --weight_type=lin2 --num_train=1000 \
    --alg_reg=0.0001 --tol=1e-3 elec.kernel.list elec.kernel.comb
...
INFO: iter: 13 obj: 2980.56 gap: 0.00217053
INFO: iter: 14 obj: 2980.47 gap: 0.00108526
INFO: iter: 15 obj: 2980.42 gap: 0.00054263

Here elec.kernel.list with the path to each base kernel written on a line.

Separate train and test:

$ head -n1000  elec.kernel.comb > elec.kernel.comb.train
$ tail -n+1001 elec.kernel.comb > elec.kernel.comb.test

Train:

$ krr-train --kernel elec.kernel.comb.train 0.0001 model

Test:

$ krr-predict --kernel elec.kernel.comb.test model pred
INFO: Using dual solution to make predictions...
INFO: Making predicitons...
INFO: RMSE: 1.36997
>
>
  -- AfshinRostamizadeh - 24 Aug 2009

Revision 82009-08-27 - AfshinRostamizadeh

Line: 1 to 1
 

Automatic Kernel Selection

Here we give some examples showing how to automatically create custom kernels using data. The kernels are generally created from a combination of base kernels, which can be specified by the user.
Line: 77 to 77
 INFO: RMSE: 1.34898
Changed:
<
<

General Kernel Combinations

The final example listed here is regarding general combinations of kernels, where we combine the kernel matrices of the 5 ngram kernels. Of course, in practice this general kernel combination should be used when easy to represent explicit feature mapping are not available, such as with Guassian kernels.
>
>

Kernel Combinations with Explicit Features

Here we consider the case of combining several general base kernels that admit explicit feature mappings. In the case that these features are sparse, for example, we are able to very efficiently compute combinations in high dimensional features spaces.

In this example we find the best linear combination of 5 character-level ngram kernels (1-gram, 2-gram, ..., 5-gram) with respect to the SVM objective.

 
Changed:
<
<
klcombinekernels ... $ klcombinekernels --weight_type=lin2 --num_train=1000 --alg_reg=0.0001 --tol=1e-3 elec.kernel.list elec.kernel.comb
>
>
$ klcombinefeatures --weight_type=lin1 --sparse --num_train=1000 --alg_reg=0.1 elec.list elec.comb
 ...
Changed:
<
<
INFO: iter: 13 obj: 2980.56 gap: 0.00217053 INFO: iter: 14 obj: 2980.47 gap: 0.00108526 INFO: iter: 15 obj: 2980.42 gap: 0.00054263
>
>
INFO: iter: 9 constraint: -5.65079 theta: -5.65044 gap: 6.16132e-05 16: objval = -5.650437235e+00 infeas = 1.000000000e+00 (0) 17: objval = -5.650600323e+00 infeas = 0.000000000e+00 (0) OPTIMAL SOLUTION FOUND .* optimization finished, #iter = 12 Objective value = -5.650654 nSV = 792 INFO: iter: 10 constraint: -5.65065 theta: -5.6506 gap: 9.48414e-06
 
Changed:
<
<
Here elec.kernel.list with the path to each base kernel written on a line.
>
>
Here the argument elec.list is a file with the paths to each basekernel written on a separate line, and the combined kernel is written to the file elec.comb. The flag --alg_reg indicates the regularization parameter that will be used with SVM.

This will produce a kernel with many features, but which are sparse, thus liblinear is a good choice for training a model:

  Separate train and test:
Changed:
<
<
$ head -n1000 elec.kernel.comb > elec.kernel.comb.train $ tail -n+1001 elec.kernel.comb > elec.kernel.comb.test
>
>
$ head -n1000 elec.comb > elec.comb.train $ tail -n+1001 elec.comb > elec.comb.test
 

Train:

Changed:
<
<
$ krr-train --kernel elec.kernel.comb.train 0.0001 model
>
>
$ train -s 3 -c 0.1 -B -1 elec.comb.train model
 

Test:

Changed:
<
<
$ krr-predict --kernel elec.kernel.comb.test model pred INFO: Using dual solution to make predictions... INFO: Making predicitons... INFO: RMSE: 1.36997
>
>
$ predict elec.comb.test model pred Accuracy = 83.5% (835/1000)
 
Changed:
<
<

--- Under Construction ---

Kernel Combinations w/ Explicit Features

Here we consider the case of combining several general base kernels that admit explicit feature mappings. In the case that these features are sparse, for example, we are able to very efficiently compute combinations in high dimensional features spaces.

In this example we find the best linear combination of 5 character-level ngram kernels (1-gram, 2-gram, ..., 5-gram) with respect to the SVM objective.

>
>

General Kernel Combinations

The final example listed here is regarding general combinations of kernels, where we combine the kernel matrices of the 5 ngram kernels. Of course, in practice this general kernel combination should be used when easy to represent explicit feature mapping are not available, such as with Guassian kernels.
 
Changed:
<
<
klcombinefeatures ...
>
>
klcombinekernels ... $ klcombinekernels --weight_type=lin2 --num_train=1000 --alg_reg=0.0001 --tol=1e-3 elec.kernel.list elec.kernel.comb ... INFO: iter: 13 obj: 2980.56 gap: 0.00217053 INFO: iter: 14 obj: 2980.47 gap: 0.00108526 INFO: iter: 15 obj: 2980.42 gap: 0.00054263
 
Changed:
<
<
This will produce a kernels with many features, but which are sparse, thus liblinear is a good choice for training a model:
>
>
Here elec.kernel.list with the path to each base kernel written on a line.
  Separate train and test:
Changed:
<
<
head -n1000 electronics.class.corr > electronics.class.corr.train tail -n+1001 electronics.class.corr > electronics.class.corr.test
>
>
$ head -n1000 elec.kernel.comb > elec.kernel.comb.train $ tail -n+1001 elec.kernel.comb > elec.kernel.comb.test
 

Train:

Changed:
<
<
train ...
>
>
$ krr-train --kernel elec.kernel.comb.train 0.0001 model
 

Test:

Changed:
<
<
predict ...
>
>
$ krr-predict --kernel elec.kernel.comb.test model pred INFO: Using dual solution to make predictions... INFO: Making predicitons... INFO: RMSE: 1.36997
 
Deleted:
<
<
 -- AfshinRostamizadeh - 24 Aug 2009

Revision 72009-08-26 - CyrilAllauzen

Line: 1 to 1
Changed:
<
<

Automatic Kernel Selection

>
>

Automatic Kernel Selection

 Here we give some examples showing how to automatically create custom kernels using data. The kernels are generally created from a combination of base kernels, which can be specified by the user.

The examples here will use the electronics category from the sentiment analysis dataset of Blitzer et al., which we include with precomputed word level ngram features and binary as well as regression labels (1-5 stars). The data is arranged in LIBSVM format, which is compatible with all the learning algorithms used in these examples. The results shown here should be easily reproducible and serve as a good first exercise.

Note that these exercises have been constructed the mechanics behind using the automatic kernel selection tools. The kernels and parameters and used here are NOT necessarily the best for the one dataset used here.

Changed:
<
<

Feature Weighted Kernels

>
>

Feature Weighted Kernels

 The examples below consider the case when each base kernel corresponds to a single features. Such a set of base kernels occurs naturally when, for example, learning rational kernels as explained in Cortes et al. (MLSP 2008).

Correlation Kernel Example: The correlation based kernel weights each input feature by a quantity proportional to its correlation with the training labels. The following command will generate weighted features:

Line: 77 to 77
 INFO: RMSE: 1.34898
Changed:
<
<

General Kernel Combinations

>
>

General Kernel Combinations

 The final example listed here is regarding general combinations of kernels, where we combine the kernel matrices of the 5 ngram kernels. Of course, in practice this general kernel combination should be used when easy to represent explicit feature mapping are not available, such as with Guassian kernels.

Revision 62009-08-26 - AfshinRostamizadeh

Line: 1 to 1
 

Automatic Kernel Selection

Here we give some examples showing how to automatically create custom kernels using data. The kernels are generally created from a combination of base kernels, which can be specified by the user.
Line: 77 to 77
 INFO: RMSE: 1.34898
Added:
>
>

General Kernel Combinations

The final example listed here is regarding general combinations of kernels, where we combine the kernel matrices of the 5 ngram kernels. Of course, in practice this general kernel combination should be used when easy to represent explicit feature mapping are not available, such as with Guassian kernels.

klcombinekernels ...
$ klcombinekernels --weight_type=lin2 --num_train=1000 \
    --alg_reg=0.0001 --tol=1e-3 elec.kernel.list elec.kernel.comb
...
INFO: iter: 13 obj: 2980.56 gap: 0.00217053
INFO: iter: 14 obj: 2980.47 gap: 0.00108526
INFO: iter: 15 obj: 2980.42 gap: 0.00054263

Here elec.kernel.list with the path to each base kernel written on a line.

Separate train and test:

$ head -n1000  elec.kernel.comb > elec.kernel.comb.train
$ tail -n+1001 elec.kernel.comb > elec.kernel.comb.test
 
Changed:
<
<

--- Under Construction ---

>
>
Train:
$ krr-train --kernel elec.kernel.comb.train 0.0001 model

Test:

$ krr-predict --kernel elec.kernel.comb.test model pred
INFO: Using dual solution to make predictions...
INFO: Making predicitons...
INFO: RMSE: 1.36997

--- Under Construction ---

 

Kernel Combinations w/ Explicit Features

Here we consider the case of combining several general base kernels that admit explicit feature mappings. In the case that these features are sparse, for example, we are able to very efficiently compute combinations in high dimensional features spaces.
Line: 107 to 140
 
Deleted:
<
<

General Kernel Combinations

The final example listed here is regarding general combinations of kernels
 

-- AfshinRostamizadeh - 24 Aug 2009

Revision 52009-08-26 - AfshinRostamizadeh

Line: 1 to 1
 

Automatic Kernel Selection

Here we give some examples showing how to automatically create custom kernels using data. The kernels are generally created from a combination of base kernels, which can be specified by the user.
Changed:
<
<
The examples here will use a subset of the reuters dataset dataset, which we include with precomputed character level ngram features and binary labels indicating inclusion in the acq topic subset. The data is arranged in LIBSVM format, which is compatible with all the learning algorithms used in these examples. The results shown here should be easily reproducible and serve as a good first exercise.
>
>
The examples here will use the electronics category from the sentiment analysis dataset of Blitzer et al., which we include with precomputed word level ngram features and binary as well as regression labels (1-5 stars). The data is arranged in LIBSVM format, which is compatible with all the learning algorithms used in these examples. The results shown here should be easily reproducible and serve as a good first exercise.
  Note that these exercises have been constructed the mechanics behind using the automatic kernel selection tools. The kernels and parameters and used here are NOT necessarily the best for the one dataset used here.
Line: 12 to 12
 Correlation Kernel Example: The correlation based kernel weights each input feature by a quantity proportional to its correlation with the training labels. The following command will generate weighted features:
Changed:
<
<
$ klweightfeatures --weight_type=corr --features --sparse --num_train=300 acq.10-gram 1 > acq.10-gram.corr INFO: Loaded 466 datapoints. INFO: Selecting 300 training examples... INFO: Using 148465 features.
>
>
$ klweightfeatures --weight_type=corr --features --sparse --num_train=1000 elec.2-gram 1 > elec.2-gram.corr INFO: Loaded 2000 datapoints. INFO: Selecting 1000 training examples... INFO: Using 57466 features.
 
Changed:
<
<
The --features flag forces the output of explicit feature vectors, rather than the kernel matrix, and the --sparse flag forces the use of sparse data-structure, which are both desirable in this case since the ngram-features are sparse. The --num_train flag indicates that the kernel selection algorithm should use only the first 300 data-points for training, and thus allows us to use the remaining points as a holdout set for evaluating performance. The --alg_reg flag lets the kernel selection algorithm know the regularization value of the Finally, the --weight_type flag selects which type of kernel selection algorithm is used. The first argument indicates the input dataset and the second argument regularizes the kernel which, in the case of the correlation kernel, restricts the kernel trace to equal 1.
>
>
The --features flag forces the output of explicit feature vectors, rather than the kernel matrix, and the --sparse flag forces the use of sparse data-structure, which are both desirable in this case since the ngram-features are sparse. The --num_train flag indicates that the kernel selection algorithm should use only the first 1000 data-points for training, and thus allows us to use the remaining points as a holdout set for evaluating performance. The --alg_reg flag lets the kernel selection algorithm know the regularization value of the Finally, the --weight_type flag selects which type of kernel selection algorithm is used. The first argument indicates the input dataset and the second argument regularizes the kernel which, in the case of the correlation kernel, restricts the kernel trace to equal 1.
  The weighted features can then be used to train and test an svm model via libsvm or liblinear:

Separate train and test:

Changed:
<
<
$ head -n300 acq.10-gram.corr > acq.10-gram.corr.train $ tail -n+301 acq.10-gram.corr > acq.10-gram.corr.test
>
>
$ head -n1000 elec.2-gram.corr > elec.2-gram.corr.train $ tail -n+1001 elec.2-gram.corr > elec.2-gram.corr.test
 

Train:

Changed:
<
<
$ svm-train -s 0 -t 0 -c 2048 acq.10-gram.corr.train model
>
>
$ svm-train -s 0 -t 0 -c 4096 elec.2-gram.corr.train model
 

Test:

Changed:
<
<
$ svm-predict acq.10-gram.corr.test model out Accuracy = 85.5422% (142/166) (classification)
>
>
$ svm-predict elec.2-gram.corr.test model pred Accuracy = 80% (800/1000) (classification)
 

L2 Regularized Linear Combination: Here we optimally weight the input features in order to maximize the kernel ridge regression (KRR) objective, subject to the L2 regularization constraint: ||mu - mu0|| < ker_reg||, where mu is the vector of squared weights and mu0 and ker_reg are user specified arguments.

Changed:
<
<
$ klweightfeatures --weight_type=lin2 --features --sparse --num_train=300 --tol=0.001 --alg_reg=2 acq.4-gram 1 > alg_reg=4-gram.lin2
>
>
$ klweightfeatures --weight_type=lin2 --features --sparse --num_train=1000 --alg_reg=4 --offset=1 --tol=1e-4 elec.1-gram.reg 1 > elec.1-gram.lin2 INFO: Loaded 2000 datapoints. INFO: Selecting 1000 training examples...
 ...
Changed:
<
<
INFO: iter: 11 obj: 9.68861 gap: 0.00396578 INFO: iter: 12 obj: 9.71034 gap: 0.00199578 INFO: iter: 13 obj: 9.72107 gap: 0.000998093
>
>
INFO: iter: 18 obj: 333.919 gap: 0.000123185 INFO: iter: 19 obj: 333.922 gap: 6.15492e-05 INFO: Using 12876 features.
 
Changed:
<
<
The algorithm will iterate until the tolerance, which is set by the --tol flag, or maximum number of iterations is met. In this case mu0 is equal to zero (the default) and ker_reg is specified by the second argument to the function. Since this selection is algorithm specific, we should also specify the regularization parameter we will use in the second step via the --alg_reg flag.
>
>
The algorithm will iterate until the tolerance, which is set by the --tol flag, or maximum number of iterations is met. In this case mu0 is equal to zero (the default) and ker_reg is specified by the second argument to the function. Since this selection is algorithm specific, we should also specify the regularization parameter we will use in the second step via the --alg_reg flag. The --offset flag adds the constant indicated offset to the dataset input if one is not already included. Finally, the =--tol flag indicates at what precision the iterative method should stop.
 
Changed:
<
<
(NOTE: the time taken for each iteration will be improved soon. Their still remains some optimization to be made to the sparse-matrix data-structure).
>
>
(NOTE: the time taken for each iteration will be improved soon. There still remains some optimization to be made to the sparse-matrix data-structure).
 
Changed:
<
<
We then train and test use (KRR), with input and output arguments that have been made to closely resemble libsvm. One main difference is that the user must specify to use sparse data-structures. If the data is indeed dense, it is better to use highly efficient dense blas routines instead by omitting the --sparse flag. To see a full list of command line arguments, run krr-train without any parameters.
>
>
We then train and test using kernel ridge regression (KRR), with input and output arguments that have been made to closely resemble libsvm. One main difference is that the user must specify to use sparse data-structures. If the data is dense, it is better to use highly efficient dense blas routines instead by omitting the --sparse flag. To see a full list of command line arguments, run krr-train without any parameters.
  Separate train and test:
Changed:
<
<
$ head -n300 acq.4-gram.lin2 > acq.4-gram.lin2.train $ tail -n+301 acq.4-gram.lin2 > acq.4-gram.lin2.test
>
>
$ head -n1000 elec.1-gram.lin2 > elec.1-gram.lin2.train $ tail -n+1001 elec.1-gram.lin2 > elec.1-gram.lin2.test
 

Train:

Changed:
<
<
$ krr-train --sparse acq.4-gram.lin2.train 2 model
>
>
$ krr-train --sparse elec.1-gram.lin2.train 4 model
 

Test:

Changed:
<
<
$ krr-predict --sparse acq.4-gram.lin2.test model out
>
>
$ krr-predict --sparse elec.1-gram.lin2.test model pred
 INFO: Using primal solution to make predictions...
Changed:
<
<
INFO: RMSE: 0.761885
>
>
INFO: RMSE: 1.34898
 
Added:
>
>

--- Under Construction ---

 

Kernel Combinations w/ Explicit Features

Here we consider the case of combining several general base kernels that admit explicit feature mappings. In the case that these features are sparse, for example, we are able to very efficiently compute combinations in high dimensional features spaces.
Changed:
<
<
In this example we find the best linear combination of 10 character-level ngram kernels (1-gram, 2-gram, ..., 10-gram) with respect to the SVM objective.
>
>
In this example we find the best linear combination of 5 character-level ngram kernels (1-gram, 2-gram, ..., 5-gram) with respect to the SVM objective.
 
klcombinefeatures ...
Line: 104 to 108
 

General Kernel Combinations

Changed:
<
<
>
>
The final example listed here is regarding general combinations of kernels
 

-- AfshinRostamizadeh - 24 Aug 2009

Revision 42009-08-26 - AfshinRostamizadeh

Line: 1 to 1
 

Automatic Kernel Selection

Changed:
<
<
Here we give some examples showing how to automatically create custom kernels using data. The kernels are generally created from a certain combination of base kernels, which can be specified by the user.
>
>
Here we give some examples showing how to automatically create custom kernels using data. The kernels are generally created from a combination of base kernels, which can be specified by the user.
 
Changed:
<
<
The examples here will use the electonics sentiment analysis dataset, which we include with precomputed ngram features and binary as well as real-valued labels. The data is arranged in LIBSVM format, which is compatible with all the learning algorithms used in these examples. The results shown here should be easily reproducible and serve as a good first exercise.
>
>
The examples here will use a subset of the reuters dataset dataset, which we include with precomputed character level ngram features and binary labels indicating inclusion in the acq topic subset. The data is arranged in LIBSVM format, which is compatible with all the learning algorithms used in these examples. The results shown here should be easily reproducible and serve as a good first exercise.

Note that these exercises have been constructed the mechanics behind using the automatic kernel selection tools. The kernels and parameters and used here are NOT necessarily the best for the one dataset used here.

 

Feature Weighted Kernels

The examples below consider the case when each base kernel corresponds to a single features. Such a set of base kernels occurs naturally when, for example, learning rational kernels as explained in Cortes et al. (MLSP 2008).
Changed:
<
<
Correlation Kernel Example: The correlation based kernel weights each input feature by a quantity proportional to its correlation with the training labels. The following command will generate weighed features:
>
>
Correlation Kernel Example: The correlation based kernel weights each input feature by a quantity proportional to its correlation with the training labels. The following command will generate weighted features:
 
Changed:
<
<
klweightfeatures --weight_type=corr --features --sparse --num_train=1000 electronics.class 0.1 > electronics.class.corr
>
>
$ klweightfeatures --weight_type=corr --features --sparse --num_train=300 acq.10-gram 1 > acq.10-gram.corr INFO: Loaded 466 datapoints. INFO: Selecting 300 training examples... INFO: Using 148465 features.
 
Changed:
<
<
The --features flag forces the output of explicit feature vectors, rather than the kernel matrix, and the --sparse flag forces the use of sparse data-structure, which are both desirable in this case since the ngram-features are sparse. The --num_train flag indicates that the kernel selection algorithm should use only the first 1000 data-points for training, and thus allows us to use the remaining points as a holdout set for evaluating performance. The --weight_type flag selects which type of kernel selection algorithm is used. The first argument indicates the input dataset and the second argument regularizes the kernel which, in the case of the correlation kernel, restricts the kernel trace to equal 0.1.
>
>
The --features flag forces the output of explicit feature vectors, rather than the kernel matrix, and the --sparse flag forces the use of sparse data-structure, which are both desirable in this case since the ngram-features are sparse. The --num_train flag indicates that the kernel selection algorithm should use only the first 300 data-points for training, and thus allows us to use the remaining points as a holdout set for evaluating performance. The --alg_reg flag lets the kernel selection algorithm know the regularization value of the Finally, the --weight_type flag selects which type of kernel selection algorithm is used. The first argument indicates the input dataset and the second argument regularizes the kernel which, in the case of the correlation kernel, restricts the kernel trace to equal 1.
  The weighted features can then be used to train and test an svm model via libsvm or liblinear:

Separate train and test:

Changed:
<
<
head -n1000 electronics.class.corr > electronics.class.corr.train tail -n+1001 electronics.class.corr > electronics.class.corr.test
>
>
$ head -n300 acq.10-gram.corr > acq.10-gram.corr.train $ tail -n+301 acq.10-gram.corr > acq.10-gram.corr.test
 

Train:

Changed:
<
<
svm-train ...
>
>
$ svm-train -s 0 -t 0 -c 2048 acq.10-gram.corr.train model
 

Test:

Changed:
<
<
svm-predict ...
>
>
$ svm-predict acq.10-gram.corr.test model out Accuracy = 85.5422% (142/166) (classification)
 

L2 Regularized Linear Combination: Here we optimally weight the input features in order to maximize the kernel ridge regression (KRR) objective, subject to the L2 regularization constraint: ||mu - mu0|| < ker_reg||, where mu is the vector of squared weights and mu0 and ker_reg are user specified arguments.

Changed:
<
<
klweightfeatures --weight_type=lin2 --features --sparse --num_train=1000 electronics.reg 1024 > electronics.reg.lin2
>
>
$ klweightfeatures --weight_type=lin2 --features --sparse --num_train=300 --tol=0.001 --alg_reg=2 acq.4-gram 1 > alg_reg=4-gram.lin2 ... INFO: iter: 11 obj: 9.68861 gap: 0.00396578 INFO: iter: 12 obj: 9.71034 gap: 0.00199578 INFO: iter: 13 obj: 9.72107 gap: 0.000998093
 
Changed:
<
<
We then train and test use (KRR), with input and output arguments that have been made to closely resemble libsvm. To see a full list of command line arguments, run krr-train without any parameters.
>
>
The algorithm will iterate until the tolerance, which is set by the --tol flag, or maximum number of iterations is met. In this case mu0 is equal to zero (the default) and ker_reg is specified by the second argument to the function. Since this selection is algorithm specific, we should also specify the regularization parameter we will use in the second step via the --alg_reg flag.

(NOTE: the time taken for each iteration will be improved soon. Their still remains some optimization to be made to the sparse-matrix data-structure).

We then train and test use (KRR), with input and output arguments that have been made to closely resemble libsvm. One main difference is that the user must specify to use sparse data-structures. If the data is indeed dense, it is better to use highly efficient dense blas routines instead by omitting the --sparse flag. To see a full list of command line arguments, run krr-train without any parameters.

  Separate train and test:
Changed:
<
<
head -n1000 electronics.class.corr > electronics.class.corr.train tail -n+1001 electronics.class.corr > electronics.class.corr.test
>
>
$ head -n300 acq.4-gram.lin2 > acq.4-gram.lin2.train $ tail -n+301 acq.4-gram.lin2 > acq.4-gram.lin2.test
 

Train:

Changed:
<
<
krr-train ...
>
>
$ krr-train --sparse acq.4-gram.lin2.train 2 model
 

Test:

Changed:
<
<
krr-predict ...
>
>
$ krr-predict --sparse acq.4-gram.lin2.test model out INFO: Using primal solution to make predictions... INFO: RMSE: 0.761885
 

Kernel Combinations w/ Explicit Features

Here we consider the case of combining several general base kernels that admit explicit feature mappings. In the case that these features are sparse, for example, we are able to very efficiently compute combinations in high dimensional features spaces.
Changed:
<
<
In this example we find the best linear combination of 5 ngram kernels (1-gram, 2-gram, ..., 5-gram) with respect to the SVM objective.
>
>
In this example we find the best linear combination of 10 character-level ngram kernels (1-gram, 2-gram, ..., 10-gram) with respect to the SVM objective.
 
klcombinefeatures ...

Revision 32009-08-25 - AfshinRostamizadeh

Line: 1 to 1
 

Automatic Kernel Selection

Here we give some examples showing how to automatically create custom kernels using data. The kernels are generally created from a certain combination of base kernels, which can be specified by the user.
Line: 34 to 34
 svm-predict ...
Changed:
<
<
L2 Regularized Linear Combination: Here we optimally weight the input features in order to maximize the kernel ridge regression (KRR) objective, subject to the L2 regularization constraint: ||mu - mu0|| < Lambda||, where mu is the vector of squared weights and mu0 and Lambda are user specified arguments.
>
>
L2 Regularized Linear Combination: Here we optimally weight the input features in order to maximize the kernel ridge regression (KRR) objective, subject to the L2 regularization constraint: ||mu - mu0|| < ker_reg||, where mu is the vector of squared weights and mu0 and ker_reg are user specified arguments.
 
klweightfeatures  --weight_type=lin2 --features --sparse --num_train=1000 electronics.reg 1024

Revision 22009-08-24 - AfshinRostamizadeh

Line: 1 to 1
 

Automatic Kernel Selection

Added:
>
>
Here we give some examples showing how to automatically create custom kernels using data. The kernels are generally created from a certain combination of base kernels, which can be specified by the user.

The examples here will use the electonics sentiment analysis dataset, which we include with precomputed ngram features and binary as well as real-valued labels. The data is arranged in LIBSVM format, which is compatible with all the learning algorithms used in these examples. The results shown here should be easily reproducible and serve as a good first exercise.

Feature Weighted Kernels

The examples below consider the case when each base kernel corresponds to a single features. Such a set of base kernels occurs naturally when, for example, learning rational kernels as explained in Cortes et al. (MLSP 2008).

Correlation Kernel Example: The correlation based kernel weights each input feature by a quantity proportional to its correlation with the training labels. The following command will generate weighed features:

klweightfeatures  --weight_type=corr --features --sparse --num_train=1000 electronics.class 0.1
  > electronics.class.corr

The --features flag forces the output of explicit feature vectors, rather than the kernel matrix, and the --sparse flag forces the use of sparse data-structure, which are both desirable in this case since the ngram-features are sparse. The --num_train flag indicates that the kernel selection algorithm should use only the first 1000 data-points for training, and thus allows us to use the remaining points as a holdout set for evaluating performance. The --weight_type flag selects which type of kernel selection algorithm is used. The first argument indicates the input dataset and the second argument regularizes the kernel which, in the case of the correlation kernel, restricts the kernel trace to equal 0.1.

The weighted features can then be used to train and test an svm model via libsvm or liblinear:

Separate train and test:

head -n1000 electronics.class.corr > electronics.class.corr.train
tail -n+1001 electronics.class.corr > electronics.class.corr.test

Train:

svm-train ...

Test:

svm-predict ...

L2 Regularized Linear Combination: Here we optimally weight the input features in order to maximize the kernel ridge regression (KRR) objective, subject to the L2 regularization constraint: ||mu - mu0|| < Lambda||, where mu is the vector of squared weights and mu0 and Lambda are user specified arguments.

klweightfeatures  --weight_type=lin2 --features --sparse --num_train=1000 electronics.reg 1024
  > electronics.reg.lin2

We then train and test use (KRR), with input and output arguments that have been made to closely resemble libsvm. To see a full list of command line arguments, run krr-train without any parameters.

Separate train and test:

head -n1000 electronics.class.corr > electronics.class.corr.train
tail -n+1001 electronics.class.corr > electronics.class.corr.test

Train:

krr-train ...

Test:

krr-predict ...

Kernel Combinations w/ Explicit Features

Here we consider the case of combining several general base kernels that admit explicit feature mappings. In the case that these features are sparse, for example, we are able to very efficiently compute combinations in high dimensional features spaces.

In this example we find the best linear combination of 5 ngram kernels (1-gram, 2-gram, ..., 5-gram) with respect to the SVM objective.

klcombinefeatures ...

This will produce a kernels with many features, but which are sparse, thus liblinear is a good choice for training a model:

Separate train and test:

head -n1000 electronics.class.corr > electronics.class.corr.train
tail -n+1001 electronics.class.corr > electronics.class.corr.test

Train:

train ...

Test:

predict ...

General Kernel Combinations

 
Deleted:
<
<
Here we give some examples showing how to automatically create custom kernels using data. The examples here will use the electonics sentiment analysis dataset, and the results should be easily reproducible.
  -- AfshinRostamizadeh - 24 Aug 2009

Revision 12009-08-24 - AfshinRostamizadeh

Line: 1 to 1
Added:
>
>

Automatic Kernel Selection

Here we give some examples showing how to automatically create custom kernels using data. The examples here will use the electonics sentiment analysis dataset, and the results should be easily reproducible.

-- AfshinRostamizadeh - 24 Aug 2009

 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback