Line: 1 to 1  

Learning Kernels
 
Added:  
> > 
 
 AfshinRostamizadeh  24 Aug 2009 
Line: 1 to 1  

Learning Kernels  
Changed:  
< < 
 
> > 
 
 AfshinRostamizadeh  24 Aug 2009  
Added:  
> > 

Line: 1 to 1  

Changed:  
< <  Automatic Kernel Selection  
> >  Learning Kernels  
 
Deleted:  
< <  TODO: move into its own place
 
 AfshinRostamizadeh  24 Aug 2009 
Line: 1 to 1  

Automatic Kernel Selection
 
Added:  
> >  TODO: move into its own place  
 AfshinRostamizadeh  24 Aug 2009 
Line: 1 to 1  

Automatic Kernel Selection  
Deleted:  
< <  Here we give some examples showing how to automatically create custom kernels using data. The kernels are generally created from a combination of base kernels, which can be specified by the user.  
Changed:  
< <  The examples here will use the electronics category from the sentiment analysis dataset of Blitzer et al., which we include with precomputed word level ngram features and binary as well as regression labels (15 stars). The data is arranged in LIBSVM format, which is compatible with all the learning algorithms used in these examples. The results shown here should be easily reproducible and serve as a good first exercise.
Note that these exercises have been constructed the mechanics behind using the automatic kernel selection tools. The kernels and parameters and used here are NOT necessarily the best for the one dataset used here.
Feature Weighted KernelsThe examples below consider the case when each base kernel corresponds to a single features. Such a set of base kernels occurs naturally when, for example, learning rational kernels as explained in Cortes et al. (MLSP 2008).Correlation Kernel Example: The correlation based kernel weights each input feature by a quantity proportional to its correlation with the training labels. The following command will generate weighted features:
$ klweightfeatures weight_type=corr features sparse num_train=1000 elec.2gram 1 \ > elec.2gram.corr INFO: Loaded 2000 datapoints. INFO: Selecting 1000 training examples... INFO: Using 57466 features.
The The weighted features can then be used to train and test an svm model via libsvm or liblinear: Separate train and test: $ head n1000 elec.2gram.corr > elec.2gram.corr.train $ tail n+1001 elec.2gram.corr > elec.2gram.corr.test Train: $ svmtrain s 0 t 0 c 4096 elec.2gram.corr.train model Test: $ svmpredict elec.2gram.corr.test model pred Accuracy = 80% (800/1000) (classification)
L2 Regularized Linear Combination: Here we optimally weight the input features in order to maximize the kernel ridge regression (KRR) objective, subject to the L2 regularization constraint:
$ klweightfeatures weight_type=lin2 features sparse num_train=1000 \ alg_reg=4 offset=1 tol=1e4 elec.1gram.reg 1 > elec.1gram.lin2 INFO: Loaded 2000 datapoints. INFO: Selecting 1000 training examples... ... INFO: iter: 18 obj: 333.919 gap: 0.000123185 INFO: iter: 19 obj: 333.922 gap: 6.15492e05 INFO: Using 12876 features.
The algorithm will iterate until the tolerance, which is set by the (NOTE: the time taken for each iteration will be improved soon. There still remains some optimization to be made to the sparsematrix datastructure).
We then train and test using kernel ridge regression (KRR), with input and output arguments that have been made to closely resemble libsvm. One main difference is that the user must specify to use sparse datastructures. If the data is dense, it is better to use highly efficient dense blas routines instead by omitting the Separate train and test: $ head n1000 elec.1gram.lin2 > elec.1gram.lin2.train $ tail n+1001 elec.1gram.lin2 > elec.1gram.lin2.test Train: $ krrtrain sparse elec.1gram.lin2.train 4 model Test: $ krrpredict sparse elec.1gram.lin2.test model pred INFO: Using primal solution to make predictions... INFO: RMSE: 1.34898
Kernel Combinations with Explicit FeaturesHere we consider the case of combining several general base kernels that admit explicit feature mappings. In the case that these features are sparse, for example, we are able to very efficiently compute combinations in high dimensional features spaces.In this example we find the best linear combination of 5 characterlevel ngram kernels (1gram, 2gram, ..., 5gram) with respect to the SVM objective.
$ klcombinefeatures weight_type=lin1 sparse num_train=1000 \ alg_reg=0.1 elec.list elec.comb ... INFO: iter: 9 constraint: 5.65079 theta: 5.65044 gap: 6.16132e05 16: objval = 5.650437235e+00 infeas = 1.000000000e+00 (0) 17: objval = 5.650600323e+00 infeas = 0.000000000e+00 (0) OPTIMAL SOLUTION FOUND .* optimization finished, #iter = 12 Objective value = 5.650654 nSV = 792 INFO: iter: 10 constraint: 5.65065 theta: 5.6506 gap: 9.48414e06
Here the argument This will produce a kernel with many features, but which are sparse, thus liblinear is a good choice for training a model: Separate train and test: $ head n1000 elec.comb > elec.comb.train $ tail n+1001 elec.comb > elec.comb.test Train: $ train s 3 c 0.1 B 1 elec.comb.train model Test: $ predict elec.comb.test model pred Accuracy = 83.5% (835/1000)
General Kernel CombinationsThe final example listed here is regarding general combinations of kernels, where we combine the kernel matrices of the 5 ngram kernels. Of course, in practice this general kernel combination should be used when easy to represent explicit feature mapping are not available, such as with Guassian kernels.
klcombinekernels ... $ klcombinekernels weight_type=lin2 num_train=1000 \ alg_reg=0.0001 tol=1e3 elec.kernel.list elec.kernel.comb ... INFO: iter: 13 obj: 2980.56 gap: 0.00217053 INFO: iter: 14 obj: 2980.47 gap: 0.00108526 INFO: iter: 15 obj: 2980.42 gap: 0.00054263
Here Separate train and test: $ head n1000 elec.kernel.comb > elec.kernel.comb.train $ tail n+1001 elec.kernel.comb > elec.kernel.comb.test Train: $ krrtrain kernel elec.kernel.comb.train 0.0001 model Test: $ krrpredict kernel elec.kernel.comb.test model pred INFO: Using dual solution to make predictions... INFO: Making predicitons... INFO: RMSE: 1.36997  
> > 
 
 AfshinRostamizadeh  24 Aug 2009 
Line: 1 to 1  

Automatic Kernel SelectionHere we give some examples showing how to automatically create custom kernels using data. The kernels are generally created from a combination of base kernels, which can be specified by the user.  
Line: 77 to 77  
INFO: RMSE: 1.34898  
Changed:  
< <  General Kernel CombinationsThe final example listed here is regarding general combinations of kernels, where we combine the kernel matrices of the 5 ngram kernels. Of course, in practice this general kernel combination should be used when easy to represent explicit feature mapping are not available, such as with Guassian kernels.  
> >  Kernel Combinations with Explicit FeaturesHere we consider the case of combining several general base kernels that admit explicit feature mappings. In the case that these features are sparse, for example, we are able to very efficiently compute combinations in high dimensional features spaces.In this example we find the best linear combination of 5 characterlevel ngram kernels (1gram, 2gram, ..., 5gram) with respect to the SVM objective.  
Changed:  
< <  klcombinekernels ... $ klcombinekernels weight_type=lin2 num_train=1000 alg_reg=0.0001 tol=1e3 elec.kernel.list elec.kernel.comb  
> >  $ klcombinefeatures weight_type=lin1 sparse num_train=1000 alg_reg=0.1 elec.list elec.comb  
...  
Changed:  
< <  INFO: iter: 13 obj: 2980.56 gap: 0.00217053 INFO: iter: 14 obj: 2980.47 gap: 0.00108526 INFO: iter: 15 obj: 2980.42 gap: 0.00054263  
> >  INFO: iter: 9 constraint: 5.65079 theta: 5.65044 gap: 6.16132e05 16: objval = 5.650437235e+00 infeas = 1.000000000e+00 (0) 17: objval = 5.650600323e+00 infeas = 0.000000000e+00 (0) OPTIMAL SOLUTION FOUND .* optimization finished, #iter = 12 Objective value = 5.650654 nSV = 792 INFO: iter: 10 constraint: 5.65065 theta: 5.6506 gap: 9.48414e06  
Changed:  
< <  Here elec.kernel.list with the path to each base kernel written on a line.  
> >  Here the argument elec.list is a file with the paths to each basekernel written on a separate line, and the combined kernel is written to the file elec.comb . The flag alg_reg indicates the regularization parameter that will be used with SVM.
This will produce a kernel with many features, but which are sparse, thus liblinear is a good choice for training a model:  
Separate train and test:  
Changed:  
< <  $ head n1000 elec.kernel.comb > elec.kernel.comb.train $ tail n+1001 elec.kernel.comb > elec.kernel.comb.test  
> >  $ head n1000 elec.comb > elec.comb.train $ tail n+1001 elec.comb > elec.comb.test  
Train:  
Changed:  
< <  $ krrtrain kernel elec.kernel.comb.train 0.0001 model  
> >  $ train s 3 c 0.1 B 1 elec.comb.train model  
Test:  
Changed:  
< <  $ krrpredict kernel elec.kernel.comb.test model pred INFO: Using dual solution to make predictions... INFO: Making predicitons... INFO: RMSE: 1.36997  
> >  $ predict elec.comb.test model pred Accuracy = 83.5% (835/1000)  
Changed:  
< <   Under Construction Kernel Combinations w/ Explicit FeaturesHere we consider the case of combining several general base kernels that admit explicit feature mappings. In the case that these features are sparse, for example, we are able to very efficiently compute combinations in high dimensional features spaces.In this example we find the best linear combination of 5 characterlevel ngram kernels (1gram, 2gram, ..., 5gram) with respect to the SVM objective.  
> >  General Kernel CombinationsThe final example listed here is regarding general combinations of kernels, where we combine the kernel matrices of the 5 ngram kernels. Of course, in practice this general kernel combination should be used when easy to represent explicit feature mapping are not available, such as with Guassian kernels.  
Changed:  
< <  klcombinefeatures ...  
> >  klcombinekernels ... $ klcombinekernels weight_type=lin2 num_train=1000 alg_reg=0.0001 tol=1e3 elec.kernel.list elec.kernel.comb ... INFO: iter: 13 obj: 2980.56 gap: 0.00217053 INFO: iter: 14 obj: 2980.47 gap: 0.00108526 INFO: iter: 15 obj: 2980.42 gap: 0.00054263  
Changed:  
< <  This will produce a kernels with many features, but which are sparse, thus liblinear is a good choice for training a model:  
> >  Here elec.kernel.list with the path to each base kernel written on a line.  
Separate train and test:  
Changed:  
< <  head n1000 electronics.class.corr > electronics.class.corr.train tail n+1001 electronics.class.corr > electronics.class.corr.test  
> >  $ head n1000 elec.kernel.comb > elec.kernel.comb.train $ tail n+1001 elec.kernel.comb > elec.kernel.comb.test  
Train:  
Changed:  
< <  train ...  
> >  $ krrtrain kernel elec.kernel.comb.train 0.0001 model  
Test:  
Changed:  
< <  predict ...  
> >  $ krrpredict kernel elec.kernel.comb.test model pred INFO: Using dual solution to make predictions... INFO: Making predicitons... INFO: RMSE: 1.36997  
Deleted:  
< <  
 AfshinRostamizadeh  24 Aug 2009 
Line: 1 to 1  

Changed:  
< <  Automatic Kernel Selection  
> >  Automatic Kernel Selection  
Here we give some examples showing how to automatically create custom kernels using data. The kernels are generally created from a combination of base kernels, which can be specified by the user.
The examples here will use the electronics category from the sentiment analysis dataset of Blitzer et al., which we include with precomputed word level ngram features and binary as well as regression labels (15 stars). The data is arranged in LIBSVM format, which is compatible with all the learning algorithms used in these examples. The results shown here should be easily reproducible and serve as a good first exercise. Note that these exercises have been constructed the mechanics behind using the automatic kernel selection tools. The kernels and parameters and used here are NOT necessarily the best for the one dataset used here.  
Changed:  
< <  Feature Weighted Kernels  
> >  Feature Weighted Kernels  
The examples below consider the case when each base kernel corresponds to a single features. Such a set of base kernels occurs naturally when, for example, learning rational kernels as explained in Cortes et al. (MLSP 2008).
Correlation Kernel Example: The correlation based kernel weights each input feature by a quantity proportional to its correlation with the training labels. The following command will generate weighted features:  
Line: 77 to 77  
INFO: RMSE: 1.34898  
Changed:  
< <  General Kernel Combinations  
> >  General Kernel Combinations  
The final example listed here is regarding general combinations of kernels, where we combine the kernel matrices of the 5 ngram kernels. Of course, in practice this general kernel combination should be used when easy to represent explicit feature mapping are not available, such as with Guassian kernels. 
Line: 1 to 1  

Automatic Kernel SelectionHere we give some examples showing how to automatically create custom kernels using data. The kernels are generally created from a combination of base kernels, which can be specified by the user.  
Line: 77 to 77  
INFO: RMSE: 1.34898  
Added:  
> >  General Kernel CombinationsThe final example listed here is regarding general combinations of kernels, where we combine the kernel matrices of the 5 ngram kernels. Of course, in practice this general kernel combination should be used when easy to represent explicit feature mapping are not available, such as with Guassian kernels.
klcombinekernels ... $ klcombinekernels weight_type=lin2 num_train=1000 \ alg_reg=0.0001 tol=1e3 elec.kernel.list elec.kernel.comb ... INFO: iter: 13 obj: 2980.56 gap: 0.00217053 INFO: iter: 14 obj: 2980.47 gap: 0.00108526 INFO: iter: 15 obj: 2980.42 gap: 0.00054263
Here Separate train and test: $ head n1000 elec.kernel.comb > elec.kernel.comb.train $ tail n+1001 elec.kernel.comb > elec.kernel.comb.test  
Changed:  
< <   Under Construction   
> >  Train:
$ krrtrain kernel elec.kernel.comb.train 0.0001 model Test: $ krrpredict kernel elec.kernel.comb.test model pred INFO: Using dual solution to make predictions... INFO: Making predicitons... INFO: RMSE: 1.36997
 Under Construction   
Kernel Combinations w/ Explicit FeaturesHere we consider the case of combining several general base kernels that admit explicit feature mappings. In the case that these features are sparse, for example, we are able to very efficiently compute combinations in high dimensional features spaces.  
Line: 107 to 140  
Deleted:  
< <  General Kernel CombinationsThe final example listed here is regarding general combinations of kernels  
 AfshinRostamizadeh  24 Aug 2009 
Line: 1 to 1  

Automatic Kernel SelectionHere we give some examples showing how to automatically create custom kernels using data. The kernels are generally created from a combination of base kernels, which can be specified by the user.  
Changed:  
< <  The examples here will use a subset of the reuters dataset dataset, which we include with precomputed character level ngram features and binary labels indicating inclusion in the acq topic subset. The data is arranged in LIBSVM format, which is compatible with all the learning algorithms used in these examples. The results shown here should be easily reproducible and serve as a good first exercise.  
> >  The examples here will use the electronics category from the sentiment analysis dataset of Blitzer et al., which we include with precomputed word level ngram features and binary as well as regression labels (15 stars). The data is arranged in LIBSVM format, which is compatible with all the learning algorithms used in these examples. The results shown here should be easily reproducible and serve as a good first exercise.  
Note that these exercises have been constructed the mechanics behind using the automatic kernel selection tools. The kernels and parameters and used here are NOT necessarily the best for the one dataset used here.  
Line: 12 to 12  
Correlation Kernel Example: The correlation based kernel weights each input feature by a quantity proportional to its correlation with the training labels. The following command will generate weighted features:  
Changed:  
< <  $ klweightfeatures weight_type=corr features sparse num_train=300 acq.10gram 1 > acq.10gram.corr INFO: Loaded 466 datapoints. INFO: Selecting 300 training examples... INFO: Using 148465 features.  
> >  $ klweightfeatures weight_type=corr features sparse num_train=1000 elec.2gram 1 > elec.2gram.corr INFO: Loaded 2000 datapoints. INFO: Selecting 1000 training examples... INFO: Using 57466 features.  
Changed:  
< <  The features flag forces the output of explicit feature vectors, rather than the kernel matrix, and the sparse flag forces the use of sparse datastructure, which are both desirable in this case since the ngramfeatures are sparse. The num_train flag indicates that the kernel selection algorithm should use only the first 300 datapoints for training, and thus allows us to use the remaining points as a holdout set for evaluating performance. The alg_reg flag lets the kernel selection algorithm know the regularization value of the Finally, the weight_type flag selects which type of kernel selection algorithm is used. The first argument indicates the input dataset and the second argument regularizes the kernel which, in the case of the correlation kernel, restricts the kernel trace to equal 1.  
> >  The features flag forces the output of explicit feature vectors, rather than the kernel matrix, and the sparse flag forces the use of sparse datastructure, which are both desirable in this case since the ngramfeatures are sparse. The num_train flag indicates that the kernel selection algorithm should use only the first 1000 datapoints for training, and thus allows us to use the remaining points as a holdout set for evaluating performance. The alg_reg flag lets the kernel selection algorithm know the regularization value of the Finally, the weight_type flag selects which type of kernel selection algorithm is used. The first argument indicates the input dataset and the second argument regularizes the kernel which, in the case of the correlation kernel, restricts the kernel trace to equal 1.  
The weighted features can then be used to train and test an svm model via libsvm or liblinear:
Separate train and test:  
Changed:  
< <  $ head n300 acq.10gram.corr > acq.10gram.corr.train $ tail n+301 acq.10gram.corr > acq.10gram.corr.test  
> >  $ head n1000 elec.2gram.corr > elec.2gram.corr.train $ tail n+1001 elec.2gram.corr > elec.2gram.corr.test  
Train:  
Changed:  
< <  $ svmtrain s 0 t 0 c 2048 acq.10gram.corr.train model  
> >  $ svmtrain s 0 t 0 c 4096 elec.2gram.corr.train model  
Test:  
Changed:  
< <  $ svmpredict acq.10gram.corr.test model out Accuracy = 85.5422% (142/166) (classification)  
> >  $ svmpredict elec.2gram.corr.test model pred Accuracy = 80% (800/1000) (classification)  
L2 Regularized Linear Combination: Here we optimally weight the input features in order to maximize the kernel ridge regression (KRR) objective, subject to the L2 regularization constraint:  
Changed:  
< <  $ klweightfeatures weight_type=lin2 features sparse num_train=300 tol=0.001 alg_reg=2 acq.4gram 1 > alg_reg=4gram.lin2  
> >  $ klweightfeatures weight_type=lin2 features sparse num_train=1000 alg_reg=4 offset=1 tol=1e4 elec.1gram.reg 1 > elec.1gram.lin2 INFO: Loaded 2000 datapoints. INFO: Selecting 1000 training examples...  
...  
Changed:  
< <  INFO: iter: 11 obj: 9.68861 gap: 0.00396578 INFO: iter: 12 obj: 9.71034 gap: 0.00199578 INFO: iter: 13 obj: 9.72107 gap: 0.000998093  
> >  INFO: iter: 18 obj: 333.919 gap: 0.000123185 INFO: iter: 19 obj: 333.922 gap: 6.15492e05 INFO: Using 12876 features.  
Changed:  
< <  The algorithm will iterate until the tolerance, which is set by the tol flag, or maximum number of iterations is met. In this case mu0 is equal to zero (the default) and ker_reg is specified by the second argument to the function. Since this selection is algorithm specific, we should also specify the regularization parameter we will use in the second step via the alg_reg flag.  
> >  The algorithm will iterate until the tolerance, which is set by the tol flag, or maximum number of iterations is met. In this case mu0 is equal to zero (the default) and ker_reg is specified by the second argument to the function. Since this selection is algorithm specific, we should also specify the regularization parameter we will use in the second step via the alg_reg flag. The offset flag adds the constant indicated offset to the dataset input if one is not already included. Finally, the =tol flag indicates at what precision the iterative method should stop.  
Changed:  
< <  (NOTE: the time taken for each iteration will be improved soon. Their still remains some optimization to be made to the sparsematrix datastructure).  
> >  (NOTE: the time taken for each iteration will be improved soon. There still remains some optimization to be made to the sparsematrix datastructure).  
Changed:  
< <  We then train and test use (KRR), with input and output arguments that have been made to closely resemble libsvm. One main difference is that the user must specify to use sparse datastructures. If the data is indeed dense, it is better to use highly efficient dense blas routines instead by omitting the sparse flag. To see a full list of command line arguments, run krrtrain without any parameters.  
> >  We then train and test using kernel ridge regression (KRR), with input and output arguments that have been made to closely resemble libsvm. One main difference is that the user must specify to use sparse datastructures. If the data is dense, it is better to use highly efficient dense blas routines instead by omitting the sparse flag. To see a full list of command line arguments, run krrtrain without any parameters.  
Separate train and test:  
Changed:  
< <  $ head n300 acq.4gram.lin2 > acq.4gram.lin2.train $ tail n+301 acq.4gram.lin2 > acq.4gram.lin2.test  
> >  $ head n1000 elec.1gram.lin2 > elec.1gram.lin2.train $ tail n+1001 elec.1gram.lin2 > elec.1gram.lin2.test  
Train:  
Changed:  
< <  $ krrtrain sparse acq.4gram.lin2.train 2 model  
> >  $ krrtrain sparse elec.1gram.lin2.train 4 model  
Test:  
Changed:  
< <  $ krrpredict sparse acq.4gram.lin2.test model out  
> >  $ krrpredict sparse elec.1gram.lin2.test model pred  
INFO: Using primal solution to make predictions...  
Changed:  
< <  INFO: RMSE: 0.761885  
> >  INFO: RMSE: 1.34898  
Added:  
> > 
 Under Construction   
Kernel Combinations w/ Explicit FeaturesHere we consider the case of combining several general base kernels that admit explicit feature mappings. In the case that these features are sparse, for example, we are able to very efficiently compute combinations in high dimensional features spaces.  
Changed:  
< <  In this example we find the best linear combination of 10 characterlevel ngram kernels (1gram, 2gram, ..., 10gram) with respect to the SVM objective.  
> >  In this example we find the best linear combination of 5 characterlevel ngram kernels (1gram, 2gram, ..., 5gram) with respect to the SVM objective.  
klcombinefeatures ...  
Line: 104 to 108  
General Kernel Combinations  
Changed:  
< <  
> >  The final example listed here is regarding general combinations of kernels  
 AfshinRostamizadeh  24 Aug 2009 
Line: 1 to 1  

Automatic Kernel Selection  
Changed:  
< <  Here we give some examples showing how to automatically create custom kernels using data. The kernels are generally created from a certain combination of base kernels, which can be specified by the user.  
> >  Here we give some examples showing how to automatically create custom kernels using data. The kernels are generally created from a combination of base kernels, which can be specified by the user.  
Changed:  
< <  The examples here will use the electonics sentiment analysis dataset, which we include with precomputed ngram features and binary as well as realvalued labels. The data is arranged in LIBSVM format, which is compatible with all the learning algorithms used in these examples. The results shown here should be easily reproducible and serve as a good first exercise.  
> >  The examples here will use a subset of the reuters dataset dataset, which we include with precomputed character level ngram features and binary labels indicating inclusion in the acq topic subset. The data is arranged in LIBSVM format, which is compatible with all the learning algorithms used in these examples. The results shown here should be easily reproducible and serve as a good first exercise.
Note that these exercises have been constructed the mechanics behind using the automatic kernel selection tools. The kernels and parameters and used here are NOT necessarily the best for the one dataset used here.  
Feature Weighted KernelsThe examples below consider the case when each base kernel corresponds to a single features. Such a set of base kernels occurs naturally when, for example, learning rational kernels as explained in Cortes et al. (MLSP 2008).  
Changed:  
< <  Correlation Kernel Example: The correlation based kernel weights each input feature by a quantity proportional to its correlation with the training labels. The following command will generate weighed features:  
> >  Correlation Kernel Example: The correlation based kernel weights each input feature by a quantity proportional to its correlation with the training labels. The following command will generate weighted features:  
Changed:  
< <  klweightfeatures weight_type=corr features sparse num_train=1000 electronics.class 0.1 > electronics.class.corr  
> >  $ klweightfeatures weight_type=corr features sparse num_train=300 acq.10gram 1 > acq.10gram.corr INFO: Loaded 466 datapoints. INFO: Selecting 300 training examples... INFO: Using 148465 features.  
Changed:  
< <  The features flag forces the output of explicit feature vectors, rather than the kernel matrix, and the sparse flag forces the use of sparse datastructure, which are both desirable in this case since the ngramfeatures are sparse. The num_train flag indicates that the kernel selection algorithm should use only the first 1000 datapoints for training, and thus allows us to use the remaining points as a holdout set for evaluating performance. The weight_type flag selects which type of kernel selection algorithm is used. The first argument indicates the input dataset and the second argument regularizes the kernel which, in the case of the correlation kernel, restricts the kernel trace to equal 0.1.  
> >  The features flag forces the output of explicit feature vectors, rather than the kernel matrix, and the sparse flag forces the use of sparse datastructure, which are both desirable in this case since the ngramfeatures are sparse. The num_train flag indicates that the kernel selection algorithm should use only the first 300 datapoints for training, and thus allows us to use the remaining points as a holdout set for evaluating performance. The alg_reg flag lets the kernel selection algorithm know the regularization value of the Finally, the weight_type flag selects which type of kernel selection algorithm is used. The first argument indicates the input dataset and the second argument regularizes the kernel which, in the case of the correlation kernel, restricts the kernel trace to equal 1.  
The weighted features can then be used to train and test an svm model via libsvm or liblinear:
Separate train and test:  
Changed:  
< <  head n1000 electronics.class.corr > electronics.class.corr.train tail n+1001 electronics.class.corr > electronics.class.corr.test  
> >  $ head n300 acq.10gram.corr > acq.10gram.corr.train $ tail n+301 acq.10gram.corr > acq.10gram.corr.test  
Train:  
Changed:  
< <  svmtrain ...  
> >  $ svmtrain s 0 t 0 c 2048 acq.10gram.corr.train model  
Test:  
Changed:  
< <  svmpredict ...  
> >  $ svmpredict acq.10gram.corr.test model out Accuracy = 85.5422% (142/166) (classification)  
L2 Regularized Linear Combination: Here we optimally weight the input features in order to maximize the kernel ridge regression (KRR) objective, subject to the L2 regularization constraint:  
Changed:  
< <  klweightfeatures weight_type=lin2 features sparse num_train=1000 electronics.reg 1024 > electronics.reg.lin2  
> >  $ klweightfeatures weight_type=lin2 features sparse num_train=300 tol=0.001 alg_reg=2 acq.4gram 1 > alg_reg=4gram.lin2 ... INFO: iter: 11 obj: 9.68861 gap: 0.00396578 INFO: iter: 12 obj: 9.71034 gap: 0.00199578 INFO: iter: 13 obj: 9.72107 gap: 0.000998093  
Changed:  
< <  We then train and test use (KRR), with input and output arguments that have been made to closely resemble libsvm. To see a full list of command line arguments, run krrtrain without any parameters.  
> >  The algorithm will iterate until the tolerance, which is set by the tol flag, or maximum number of iterations is met. In this case mu0 is equal to zero (the default) and ker_reg is specified by the second argument to the function. Since this selection is algorithm specific, we should also specify the regularization parameter we will use in the second step via the alg_reg flag.
(NOTE: the time taken for each iteration will be improved soon. Their still remains some optimization to be made to the sparsematrix datastructure).
We then train and test use (KRR), with input and output arguments that have been made to closely resemble libsvm. One main difference is that the user must specify to use sparse datastructures. If the data is indeed dense, it is better to use highly efficient dense blas routines instead by omitting the  
Separate train and test:  
Changed:  
< <  head n1000 electronics.class.corr > electronics.class.corr.train tail n+1001 electronics.class.corr > electronics.class.corr.test  
> >  $ head n300 acq.4gram.lin2 > acq.4gram.lin2.train $ tail n+301 acq.4gram.lin2 > acq.4gram.lin2.test  
Train:  
Changed:  
< <  krrtrain ...  
> >  $ krrtrain sparse acq.4gram.lin2.train 2 model  
Test:  
Changed:  
< <  krrpredict ...  
> >  $ krrpredict sparse acq.4gram.lin2.test model out INFO: Using primal solution to make predictions... INFO: RMSE: 0.761885  
Kernel Combinations w/ Explicit FeaturesHere we consider the case of combining several general base kernels that admit explicit feature mappings. In the case that these features are sparse, for example, we are able to very efficiently compute combinations in high dimensional features spaces.  
Changed:  
< <  In this example we find the best linear combination of 5 ngram kernels (1gram, 2gram, ..., 5gram) with respect to the SVM objective.  
> >  In this example we find the best linear combination of 10 characterlevel ngram kernels (1gram, 2gram, ..., 10gram) with respect to the SVM objective.  
klcombinefeatures ... 
Line: 1 to 1  

Automatic Kernel Selection  
Added:  
> >  Here we give some examples showing how to automatically create custom kernels using data. The kernels are generally created from a certain combination of base kernels, which can be specified by the user.
The examples here will use the electonics sentiment analysis dataset, which we include with precomputed ngram features and binary as well as realvalued labels. The data is arranged in LIBSVM format, which is compatible with all the learning algorithms used in these examples. The results shown here should be easily reproducible and serve as a good first exercise.
Feature Weighted KernelsThe examples below consider the case when each base kernel corresponds to a single features. Such a set of base kernels occurs naturally when, for example, learning rational kernels as explained in Cortes et al. (MLSP 2008).Correlation Kernel Example: The correlation based kernel weights each input feature by a quantity proportional to its correlation with the training labels. The following command will generate weighed features:
klweightfeatures weight_type=corr features sparse num_train=1000 electronics.class 0.1 > electronics.class.corr
The The weighted features can then be used to train and test an svm model via libsvm or liblinear: Separate train and test: head n1000 electronics.class.corr > electronics.class.corr.train tail n+1001 electronics.class.corr > electronics.class.corr.test Train: svmtrain ... Test: svmpredict ...
L2 Regularized Linear Combination: Here we optimally weight the input features in order to maximize the kernel ridge regression (KRR) objective, subject to the L2 regularization constraint:
klweightfeatures weight_type=lin2 features sparse num_train=1000 electronics.reg 1024 > electronics.reg.lin2 We then train and test use (KRR), with input and output arguments that have been made to closely resemble libsvm. To see a full list of command line arguments, run krrtrain without any parameters. Separate train and test: head n1000 electronics.class.corr > electronics.class.corr.train tail n+1001 electronics.class.corr > electronics.class.corr.test Train: krrtrain ... Test: krrpredict ...
Kernel Combinations w/ Explicit FeaturesHere we consider the case of combining several general base kernels that admit explicit feature mappings. In the case that these features are sparse, for example, we are able to very efficiently compute combinations in high dimensional features spaces.In this example we find the best linear combination of 5 ngram kernels (1gram, 2gram, ..., 5gram) with respect to the SVM objective.
klcombinefeatures ... This will produce a kernels with many features, but which are sparse, thus liblinear is a good choice for training a model: Separate train and test: head n1000 electronics.class.corr > electronics.class.corr.train tail n+1001 electronics.class.corr > electronics.class.corr.test Train: train ... Test: predict ...
General Kernel Combinations  
Deleted:  
< <  Here we give some examples showing how to automatically create custom kernels using data. The examples here will use the electonics sentiment analysis dataset, and the results should be easily reproducible.  
 AfshinRostamizadeh  24 Aug 2009 
Line: 1 to 1  

Added:  
> >  Automatic Kernel SelectionHere we give some examples showing how to automatically create custom kernels using data. The examples here will use the electonics sentiment analysis dataset, and the results should be easily reproducible.  AfshinRostamizadeh  24 Aug 2009 