• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

MakefileH A D10-Sep-201932 53

READMEH A D10-Sep-201915.5 KiB465326

commonutil.pyH A D10-Sep-20195.3 KiB179150

svm.pyH A D10-Sep-201913.3 KiB442377

svmutil.pyH A D10-Sep-20199.3 KiB260221

README

1----------------------------------
2--- Python interface of LIBSVM ---
3----------------------------------
4
5Table of Contents
6=================
7
8- Introduction
9- Installation
10- Quick Start
11- Quick Start with Scipy
12- Design Description
13- Data Structures
14- Utility Functions
15- Additional Information
16
17Introduction
18============
19
20Python (http://www.python.org/) is a programming language suitable for rapid
21development. This tool provides a simple Python interface to LIBSVM, a library
22for support vector machines (http://www.csie.ntu.edu.tw/~cjlin/libsvm). The
23interface is very easy to use as the usage is the same as that of LIBSVM. The
24interface is developed with the built-in Python library "ctypes."
25
26Installation
27============
28
29On Unix systems, type
30
31> make
32
33The interface needs only LIBSVM shared library, which is generated by
34the above command. We assume that the shared library is on the LIBSVM
35main directory or in the system path.
36
37For windows, the shared library libsvm.dll for 64-bit python is ready
38in the directory `..\windows'. To regenerate the shared library,
39please follow the instruction of building windows binaries in LIBSVM
40README.
41
42Quick Start
43===========
44
45"Quick Start with Scipy" is in the next section.
46
47There are two levels of usage. The high-level one uses utility
48functions in svmutil.py and commonutil.py (shared with LIBLINEAR and
49imported by svmutil.py). The usage is the same as the LIBSVM MATLAB
50interface.
51
52>>> from svmutil import *
53# Read data in LIBSVM format
54>>> y, x = svm_read_problem('../heart_scale')
55>>> m = svm_train(y[:200], x[:200], '-c 4')
56>>> p_label, p_acc, p_val = svm_predict(y[200:], x[200:], m)
57
58# Construct problem in python format
59# Dense data
60>>> y, x = [1,-1], [[1,0,1], [-1,0,-1]]
61# Sparse data
62>>> y, x = [1,-1], [{1:1, 3:1}, {1:-1,3:-1}]
63>>> prob  = svm_problem(y, x)
64>>> param = svm_parameter('-t 0 -c 4 -b 1')
65>>> m = svm_train(prob, param)
66
67# Precomputed kernel data (-t 4)
68# Dense data
69>>> y, x = [1,-1], [[1, 2, -2], [2, -2, 2]]
70# Sparse data
71>>> y, x = [1,-1], [{0:1, 1:2, 2:-2}, {0:2, 1:-2, 2:2}]
72# isKernel=True must be set for precomputed kernel
73>>> prob  = svm_problem(y, x, isKernel=True)
74>>> param = svm_parameter('-t 4 -c 4 -b 1')
75>>> m = svm_train(prob, param)
76# For the format of precomputed kernel, please read LIBSVM README.
77
78
79# Other utility functions
80>>> svm_save_model('heart_scale.model', m)
81>>> m = svm_load_model('heart_scale.model')
82>>> p_label, p_acc, p_val = svm_predict(y, x, m, '-b 1')
83>>> ACC, MSE, SCC = evaluations(y, p_label)
84
85# Getting online help
86>>> help(svm_train)
87
88The low-level use directly calls C interfaces imported by svm.py. Note that
89all arguments and return values are in ctypes format. You need to handle them
90carefully.
91
92>>> from svm import *
93>>> prob = svm_problem([1,-1], [{1:1, 3:1}, {1:-1,3:-1}])
94>>> param = svm_parameter('-c 4')
95>>> m = libsvm.svm_train(prob, param) # m is a ctype pointer to an svm_model
96# Convert a Python-format instance to svm_nodearray, a ctypes structure
97>>> x0, max_idx = gen_svm_nodearray({1:1, 3:1})
98>>> label = libsvm.svm_predict(m, x0)
99
100Quick Start with Scipy
101======================
102
103Make sure you have Scipy installed to proceed in this section.
104If numba (http://numba.pydata.org) is installed, some operations will be much faster.
105
106There are two levels of usage. The high-level one uses utility functions
107in svmutil.py and the usage is the same as the LIBSVM MATLAB interface.
108
109>>> import scipy
110>>> from svmutil import *
111# Read data in LIBSVM format
112>>> y, x = svm_read_problem('../heart_scale', return_scipy = True) # y: ndarray, x: csr_matrix
113>>> m = svm_train(y[:200], x[:200, :], '-c 4')
114>>> p_label, p_acc, p_val = svm_predict(y[200:], x[200:, :], m)
115
116# Construct problem in Scipy format
117# Dense data: numpy ndarray
118>>> y, x = scipy.asarray([1,-1]), scipy.asarray([[1,0,1], [-1,0,-1]])
119# Sparse data: scipy csr_matrix((data, (row_ind, col_ind))
120>>> y, x = scipy.asarray([1,-1]), scipy.sparse.csr_matrix(([1, 1, -1, -1], ([0, 0, 1, 1], [0, 2, 0, 2])))
121>>> prob  = svm_problem(y, x)
122>>> param = svm_parameter('-t 0 -c 4 -b 1')
123>>> m = svm_train(prob, param)
124
125# Precomputed kernel data (-t 4)
126# Dense data: numpy ndarray
127>>> y, x = scipy.asarray([1,-1]), scipy.asarray([[1,2,-2], [2,-2,2]])
128# Sparse data: scipy csr_matrix((data, (row_ind, col_ind))
129>>> y, x = scipy.asarray([1,-1]), scipy.sparse.csr_matrix(([1, 2, -2, 2, -2, 2], ([0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2])))
130# isKernel=True must be set for precomputed kernel
131>>> prob  = svm_problem(y, x, isKernel=True)
132>>> param = svm_parameter('-t 4 -c 4 -b 1')
133>>> m = svm_train(prob, param)
134# For the format of precomputed kernel, please read LIBSVM README.
135
136# Apply data scaling in Scipy format
137>>> y, x = svm_read_problem('../heart_scale', return_scipy=True)
138>>> scale_param = csr_find_scale_param(x, lower=0)
139>>> scaled_x = csr_scale(x, scale_param)
140
141# Other utility functions
142>>> svm_save_model('heart_scale.model', m)
143>>> m = svm_load_model('heart_scale.model')
144>>> p_label, p_acc, p_val = svm_predict(y, x, m, '-b 1')
145>>> ACC, MSE, SCC = evaluations(y, p_label)
146
147# Getting online help
148>>> help(svm_train)
149
150The low-level use directly calls C interfaces imported by svm.py. Note that
151all arguments and return values are in ctypes format. You need to handle them
152carefully.
153
154>>> from svm import *
155>>> prob = svm_problem(scipy.asarray([1,-1]), scipy.sparse.csr_matrix(([1, 1, -1, -1], ([0, 0, 1, 1], [0, 2, 0, 2]))))
156>>> param = svm_parameter('-c 4')
157>>> m = libsvm.svm_train(prob, param) # m is a ctype pointer to an svm_model
158# Convert a tuple of ndarray (index, data) to feature_nodearray, a ctypes structure
159# Note that index starts from 0, though the following example will be changed to 1:1, 3:1 internally
160>>> x0, max_idx = gen_svm_nodearray((scipy.asarray([0,2]), scipy.asarray([1,1])))
161>>> label = libsvm.svm_predict(m, x0)
162
163Design Description
164==================
165
166There are two files svm.py and svmutil.py, which respectively correspond to
167low-level and high-level use of the interface.
168
169In svm.py, we adopt the Python built-in library "ctypes," so that
170Python can directly access C structures and interface functions defined
171in svm.h.
172
173While advanced users can use structures/functions in svm.py, to
174avoid handling ctypes structures, in svmutil.py we provide some easy-to-use
175functions. The usage is similar to LIBSVM MATLAB interface.
176
177Data Structures
178===============
179
180Four data structures derived from svm.h are svm_node, svm_problem, svm_parameter,
181and svm_model. They all contain fields with the same names in svm.h. Access
182these fields carefully because you directly use a C structure instead of a
183Python object. For svm_model, accessing the field directly is not recommanded.
184Programmers should use the interface functions or methods of svm_model class
185in Python to get the values. The following description introduces additional
186fields and methods.
187
188Before using the data structures, execute the following command to load the
189LIBSVM shared library:
190
191    >>> from svm import *
192
193- class svm_node:
194
195    Construct an svm_node.
196
197    >>> node = svm_node(idx, val)
198
199    idx: an integer indicates the feature index.
200
201    val: a float indicates the feature value.
202
203    Show the index and the value of a node.
204
205    >>> print(node)
206
207- Function: gen_svm_nodearray(xi [,feature_max=None [,isKernel=False]])
208
209    Generate a feature vector from a Python list/tuple/dictionary, numpy ndarray or tuple of (index, data):
210
211    >>> xi_ctype, max_idx = gen_svm_nodearray({1:1, 3:1, 5:-2})
212
213    xi_ctype: the returned svm_nodearray (a ctypes structure)
214
215    max_idx: the maximal feature index of xi
216
217    feature_max: if feature_max is assigned, features with indices larger than
218                 feature_max are removed.
219
220    isKernel: if isKernel == True, the list index starts from 0 for precomputed
221              kernel. Otherwise, the list index starts from 1. The default
222              value is False.
223
224- class svm_problem:
225
226    Construct an svm_problem instance
227
228    >>> prob = svm_problem(y, x)
229
230    y: a Python list/tuple/ndarray of l labels (type must be int/double).
231
232    x: 1. a list/tuple of l training instances. Feature vector of
233          each training instance is a list/tuple or dictionary.
234
235       2. an l * n numpy ndarray or scipy spmatrix (n: number of features).
236
237    Note that if your x contains sparse data (i.e., dictionary), the internal
238    ctypes data format is still sparse.
239
240    For pre-computed kernel, the isKernel flag should be set to True:
241
242    >>> prob = svm_problem(y, x, isKernel=True)
243
244    Please read LIBSVM README for more details of pre-computed kernel.
245
246- class svm_parameter:
247
248    Construct an svm_parameter instance
249
250    >>> param = svm_parameter('training_options')
251
252    If 'training_options' is empty, LIBSVM default values are applied.
253
254    Set param to LIBSVM default values.
255
256    >>> param.set_to_default_values()
257
258    Parse a string of options.
259
260    >>> param.parse_options('training_options')
261
262    Show values of parameters.
263
264    >>> print(param)
265
266- class svm_model:
267
268    There are two ways to obtain an instance of svm_model:
269
270    >>> model = svm_train(y, x)
271    >>> model = svm_load_model('model_file_name')
272
273    Note that the returned structure of interface functions
274    libsvm.svm_train and libsvm.svm_load_model is a ctypes pointer of
275    svm_model, which is different from the svm_model object returned
276    by svm_train and svm_load_model in svmutil.py. We provide a
277    function toPyModel for the conversion:
278
279    >>> model_ptr = libsvm.svm_train(prob, param)
280    >>> model = toPyModel(model_ptr)
281
282    If you obtain a model in a way other than the above approaches,
283    handle it carefully to avoid memory leak or segmentation fault.
284
285    Some interface functions to access LIBSVM models are wrapped as
286    members of the class svm_model:
287
288    >>> svm_type = model.get_svm_type()
289    >>> nr_class = model.get_nr_class()
290    >>> svr_probability = model.get_svr_probability()
291    >>> class_labels = model.get_labels()
292    >>> sv_indices = model.get_sv_indices()
293    >>> nr_sv = model.get_nr_sv()
294    >>> is_prob_model = model.is_probability_model()
295    >>> support_vector_coefficients = model.get_sv_coef()
296    >>> support_vectors = model.get_SV()
297
298Utility Functions
299=================
300
301To use utility functions, type
302
303    >>> from svmutil import *
304
305The above command loads
306    svm_train()            : train an SVM model
307    svm_predict()          : predict testing data
308    svm_read_problem()     : read the data from a LIBSVM-format file.
309    svm_load_model()       : load a LIBSVM model.
310    svm_save_model()       : save model to a file.
311    evaluations()          : evaluate prediction results.
312    csr_find_scale_param() : find scaling parameter for data in csr format.
313    csr_scale()            : apply data scaling to data in csr format.
314
315- Function: svm_train
316
317    There are three ways to call svm_train()
318
319    >>> model = svm_train(y, x [, 'training_options'])
320    >>> model = svm_train(prob [, 'training_options'])
321    >>> model = svm_train(prob, param)
322
323    y: a list/tuple/ndarray of l training labels (type must be int/double).
324
325    x: 1. a list/tuple of l training instances. Feature vector of
326          each training instance is a list/tuple or dictionary.
327
328       2. an l * n numpy ndarray or scipy spmatrix (n: number of features).
329
330    training_options: a string in the same form as that for LIBSVM command
331                      mode.
332
333    prob: an svm_problem instance generated by calling
334          svm_problem(y, x).
335          For pre-computed kernel, you should use
336          svm_problem(y, x, isKernel=True)
337
338    param: an svm_parameter instance generated by calling
339           svm_parameter('training_options')
340
341    model: the returned svm_model instance. See svm.h for details of this
342           structure. If '-v' is specified, cross validation is
343           conducted and the returned model is just a scalar: cross-validation
344           accuracy for classification and mean-squared error for regression.
345
346    To train the same data many times with different
347    parameters, the second and the third ways should be faster..
348
349    Examples:
350
351    >>> y, x = svm_read_problem('../heart_scale')
352    >>> prob = svm_problem(y, x)
353    >>> param = svm_parameter('-s 3 -c 5 -h 0')
354    >>> m = svm_train(y, x, '-c 5')
355    >>> m = svm_train(prob, '-t 2 -c 5')
356    >>> m = svm_train(prob, param)
357    >>> CV_ACC = svm_train(y, x, '-v 3')
358
359- Function: svm_predict
360
361    To predict testing data with a model, use
362
363    >>> p_labs, p_acc, p_vals = svm_predict(y, x, model [,'predicting_options'])
364
365    y: a list/tuple/ndarray of l true labels (type must be int/double).
366       It is used for calculating the accuracy. Use [] if true labels are
367       unavailable.
368
369    x: 1. a list/tuple of l training instances. Feature vector of
370          each training instance is a list/tuple or dictionary.
371
372       2. an l * n numpy ndarray or scipy spmatrix (n: number of features).
373
374    predicting_options: a string of predicting options in the same format as
375                        that of LIBSVM.
376
377    model: an svm_model instance.
378
379    p_labels: a list of predicted labels
380
381    p_acc: a tuple including accuracy (for classification), mean
382           squared error, and squared correlation coefficient (for
383           regression).
384
385    p_vals: a list of decision values or probability estimates (if '-b 1'
386            is specified). If k is the number of classes in training data,
387            for decision values, each element includes results of predicting
388            k(k-1)/2 binary-class SVMs. For classification, k = 1 is a
389            special case. Decision value [+1] is returned for each testing
390            instance, instead of an empty list.
391            For probabilities, each element contains k values indicating
392            the probability that the testing instance is in each class.
393            Note that the order of classes is the same as the 'model.label'
394            field in the model structure.
395
396    Example:
397
398    >>> m = svm_train(y, x, '-c 5')
399    >>> p_labels, p_acc, p_vals = svm_predict(y, x, m)
400
401- Functions: svm_read_problem/svm_load_model/svm_save_model
402
403    See the usage by examples:
404
405    >>> y, x = svm_read_problem('data.txt')
406    >>> m = svm_load_model('model_file')
407    >>> svm_save_model('model_file', m)
408
409- Function: evaluations
410
411    Calculate some evaluations using the true values (ty) and the predicted
412    values (pv):
413
414    >>> (ACC, MSE, SCC) = evaluations(ty, pv, useScipy)
415
416    ty: a list/tuple/ndarray of true values.
417
418    pv: a list/tuple/ndarray of predicted values.
419
420    useScipy: convert ty, pv to ndarray, and use scipy functions to do the evaluation
421
422    ACC: accuracy.
423
424    MSE: mean squared error.
425
426    SCC: squared correlation coefficient.
427
428- Function: csr_find_scale_parameter/csr_scale
429
430    Scale data in csr format.
431
432    >>> param = csr_find_scale_param(x [, lower=l, upper=u])
433    >>> x = csr_scale(x, param)
434
435    x: a csr_matrix of data.
436
437    l: x scaling lower limit; default -1.
438
439    u: x scaling upper limit; default 1.
440
441    The scaling process is: x * diag(coef) + ones(l, 1) * offset'
442
443    param: a dictionary of scaling parameters, where param['coef'] = coef and param['offset'] = offset.
444
445    coef: a scipy array of scaling coefficients.
446
447    offset: a scipy array of scaling offsets.
448
449Additional Information
450======================
451
452This interface was written by Hsiang-Fu Yu from Department of Computer
453Science, National Taiwan University. If you find this tool useful, please
454cite LIBSVM as follows
455
456Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for support
457vector machines. ACM Transactions on Intelligent Systems and
458Technology, 2:27:1--27:27, 2011. Software available at
459http://www.csie.ntu.edu.tw/~cjlin/libsvm
460
461For any question, please contact Chih-Jen Lin <cjlin@csie.ntu.edu.tw>,
462or check the FAQ page:
463
464http://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html
465