RSS
热门关键字:  数据挖掘  数据仓库  商业智能  人工智能  搜索引擎

mySVM - a support vector machine

来源: 作者:unkonwn 时间:2004-12-09 点击:

About mySVM

mySVM is an implementation of the Support Vector Machine introduced by V. Vapnik (see [Vapnik/98a]). It is based on the optimization algorithm of SVMlight as described in [Joachims/99a]. mySVM can be used for pattern recognition, regression and distribution estimation.

License

This software is free only for non-commercial use. It must not be modified and distributed without prior permission of the author. The author is not responsible for implications from the use of this software.

If you are using mySVM for research purposes, please cite the software manual available from this cite in your publications (Stefan Rüping (2000): mySVM-Manual, University of Dortmund, Lehrstuhl Informatik 8, http://www-ai.cs.uni-dortmund.de/SOFTWARE/MYSVM/).

数据挖掘实验室

Installation

Installation under Unix

  • Download mySVM.
  • Create a new directory, change into it and unpack the files into this directory
  • On typical UN*X systems simply type make to compile mySVM. On other systems you have to call your C++ compiler manually.
If everything went right you should have a new subdirectory named bin and to files mysvm and predict in a subdirectory thereof. On some systems you might get an error message about sys/times.h. If you do, open the file globals.h and uncomment the line #undef use_time.

Installation under Windows

If you get the source code version, you have to compile mySVM youself. First edit the file globals.h and uncomment the line #define windows 1. Compile the file learn.cpp to get the learning program and predict.cpp for the model application program. mySVM was tested under Visual C++ 6.0. You can also get the binary version.

Using mySVM

For a complete reference of mySVM have a look into the mySVM manual (Postscript, PDF). Here is a short users guide:
  • mysvm is used for training a SVM on a given example set and testing the results
  • predict is used for predicting the functional value of new examples based on an already trained SVM.
The input of mySVM consists of Input lines starting with "#" are treated as commentary. The input can be given in one or more files. If no filenames or the filename "-" are given, the input is read from stdin. mysvm trains a SVM on the first given example set. The following example sets are used for testing (if their classification is given) or the functional value of the examples is being computed (if no classification is given).

Parameter definition

The parameter definition lets the user choose the type of loss function, the optimizer parameters and the training algorithm to use. The parameter definition starts with the line @parameters.

Global parameters:

pattern use SVM for pattern recognition, y has to be in {-1,1}.
regression use regression SVM (default)
nu float use nu-SVM with the given value of nu instead of normal SVM (see [Schoelkopf/etal/2000a] for details on nu-SVMs).
distribution estimate the support of the distribution of the training examples (see [Schoelkopf/etal/99a]). Nu must be set!
verbosity [1..5] ranges from 1 (no messages) over 3 (default) to 5 (flood, for debugging only)
scale scale the training examples to mean 0 and variance 1 (default)
no_scale do not scale the training examples (may be numerically less stable!)
format set the default example file format. See the description here.
delimiter set the default example file format. See the description here.

Loss function:

C float the SVM complexity constant. If not set, 1/avg(K(x,x)) is used.
L+ float penalize positive deviation (prediction too high) by this factor
L- float penalize negative deviation (prediction too low) by this factor
epsilon float insensitivity constant. No loss if prediction lies this close to true value
epsilon+ float epsilon for positive deviation only
epsilon- float epsilon for negative deviation only
quadraticLoss+ use quadratic loss for positive deviation
quadraticLoss- use quadratic loss for negative deviation
quadraticLoss use quadratic loss for both positive and negative deviation

Optimizer parameters:

working_set_size int optimize this much examples in each iteration (default: 10)
max_iterations int stop after this much iterations
shrink_const int fix a variable to the bound if it is optimal for this much iterations
is_zero float numerical precision (default: 1e-10)
descend float make this much descend on the target function in each iteration
convergence_epsilon float precision on the KKT conditions (default: 1e-3 for pattern recognition and 1e-4 for regression)
kernel_cache int size of the cache for kernel evaluations im MB (default: 40)

Training algorithms

cross_validation int do cross validation on the training examples with the given number of chunks
cv_inorder do cross validation in the order the examples are given in
cv_window int do cross validation by moving a window of the given number of chunks over the training data. (Implies cv_inorder)
search_C [am] find an optimal C in the range of cmin to cmax by Adding or Multiplying the current C by cdelta
cmin lower bound for search_C
cmax upper bound for search_C
cdelta step size for search_C

Kernel definition

The kernel definition lets you choose the type of kernel function to use and its parameters. It starts with the line @kernel

name kernel type parameters
dot inner product none
polynomial polynomial (x*y+1)^d degree int
radial radial basis function exp(-gamma ||x-y||^2) gamma float
neural two layered neural net tanh(a x*y+b) a float, b float
anova (RBF) anova kernel gamma float>/em>, degree int
user user definable kernel param_i_1 ... param_i_5 int, param_f_1 ... param_f_5 float
user2 user definable kernel 2 param_i, param_f
sum_aggregation sum of other kernels number_parts int, range int int, followed by number_parts kernel definitions
prod_aggregation product of other kernels number_parts int, range int int, followed by number_parts kernel definitions
An example set consists of the learning attributes for each example, its classification (for pattern recognition, -1 or 1) or functional value (for regression) and its lagrangian multiplier (actually, you don′t need to supply the lagrangian multiplier for training and you don′t even have to supply the functional value for prediction. But you could). The examples can be given in two different formats: dense and sparse. Note that you can change the data format

The examples set definition starts with @examples. Note that each example has to be in an own line. 数据挖掘研究院

WARNING: Giving real number you can also use a colon instead of a decimal dot ("1234,56" instead of "1234.56", german style). Therefore something like "1,234.56" does not work! 数据挖掘研究院

common parameters:

format F Format of examples where F is either "sparse" or a string containing "x", "y" or "a". The format strings define the position of the attributes x, the funtional value y and the lagrangian multiplier a in an example. "x" has to be set. The default format is "yx", but you can set another default in the parameters definition.
dimension int number of attributes. If the dimension is not given it is set from the examples (maximum dimension in sparse format, dimension from the first line in dense format).
number int total number of examples. A warning is issued when a wrong number of examples is given
b float additional constant of the hyperplane
delimiter char character by which the attributes of an example are separated (default: space). You can set a default in the parameters section. Be careful if you set the delimiter to "," or "."!

sparse format:

In the sparse data format, only non-zero attributes have to be given. For each non-zero attribute you give its attribute number (starting at 1) and its value, separated by a colon. The functional value is given by y:float (the "y:" is optional here!) and the lagrangian multiplier by a:float.

Example: The following lines all define the same example: 数据挖掘研究院

  • 1:-1 2:0 3:1.2 y:2 a:0
  • 3:1.2 y:2 1:-1
  • 3:1.2 2 1:-1

dense format

The dense format consists of all attributes and (if defined so) the functional values and the lagrangian multipliers listed in the order given by the format parameter.

Example: The following lines all define the same example as above:

数据挖掘实验室

  • With "format yx" (default) : "2 -1 0 1.2"
  • With "format xya" it is "-1 0 1.2 2 0"
  • And with "format xy" and "delimiter ′,′" the example reads "-1,,1.2,2"

References

Schoelkopf/etal/2000a Schölkopf, Bernhard and Smola, Alex J. and Williamson, Robert C. and Bartlett, Peter L. (2000). New Support Vector Algorithms. Neural Computation, 12 pages 1207--1245.
schoelkopf/etal/99a Schölkopf, Bernhard and Williamson, Robert C. and Smola, Alex J. and Shawe-Taylor, John (2000). SV Estimation of a Distribution′s Support. In Solla, S.A. and Leen, T.K. and Müller, K.-R., editor(s), Neural Information Processing Systems 12. MIT Press.
Joachims/99a Joachims, Thorsten (1999). Making large-Scale SVM Learning Practical. In Advances in Kernel Methods - Support Vector Learning, chapter 11. MIT Press. [.ps.gz] [.pdf]
Scheffer/Joachims/99a Tobias Scheffer and Thorsten Joachims (1999). Expected Error Analysis for Model Selection. In International Conference on Machine Learning (ICML). .
Vapnik/98a V. Vapnik (1998). Statistical Learning Theory. Wiley.
数据挖掘研究院

Example sets

最新评论共有 0 位网友发表了评论
发表评论
评论内容:不能超过250字,需审核,请自觉遵守互联网相关政策法规。
匿名?