1% TiMBL 6.3 API
2
3\documentclass{report}
4\usepackage{epsf}
5\usepackage{a4wide}
6\usepackage{palatino}
7\usepackage{fullname}
8\usepackage{url}
9
10\newcommand{\chisq}{{$ \chi^2 $}}
11
12\author{Ko van der Sloot\\ \ \\ Induction of Linguistic Knowledge\\
13        Computational Linguistics\\ Tilburg University \\ \ \\
14        P.O. Box 90153, NL-5000 LE, Tilburg, The Netherlands \\ URL:
15        http://ilk.uvt.nl}
16
17\title{{\huge TiMBL: Tilburg Memory-Based Learner} \\ \vspace*{0.5cm}
18{\bf version 6.3} \\ \vspace*{0.5cm}{\huge API Reference Guide}\\
19\vspace*{1cm} {\it ILK Technical Report -- ILK 10-03}}
20
21%better paragraph indentation
22\parindent 0pt
23\parskip 9pt
24
25
26\begin{document}
27
28\maketitle
29
30\tableofcontents
31
32\chapter*{Preface}
33
34This is a brief description of the TimblAPI class, the application
35programming interface to the Timbl\footnote{\url{http://ilk.uvt.nl/timbl}} software package, and its main
36functions. For an introduction into Timbl, consult the Timbl Reference
37Guide \cite{Daelemans+10}. Although most of the API can be
38traced in the {\tt TimblAPI.h} file, the reverse is not true; some
39functions {\tt TimblAPI.h} are still ``work in progress'' and some others
40are artefacts to simplify the implementation of the TiMBL main
41program\footnote{Timbl.cxx is therefore {\em not} a good example of
42  how to use the API.}.
43
44To learn more about using the API, you should study programs such as
45{\tt classify.cxx}, {\tt tse.cxx}, and the examples given in this
46manual, which can all be found in the {\tt demos} directory of this
47distribution. As you can readily gather from these examples, the basic
48thing you need to do to get access to the TimblAPI functions is to
49include {\tt TimblAPI.h} in the program, and to include {\tt
50  libTimbl.a} in your linking path.
51
52{\bf Important note}: The described functions return a result (mostly
53a bool) to indicate succes or failure. To simplify the examples, we
54ignore these return values. This is, of course, bad practice, to be avoided in
55real life programming.\footnote{as stated by commandment 6 of ``The
56  Ten Commandments for C Programmers''' by Henry Spencer:
57
58If a function be advertised to return an error code in the event of
59difficulties, thou shalt check for that code, yea, even though the
60checks triple the size of thy code and produce aches in thy typing
61fingers, for if thou thinkest ``it cannot happen to me'', the gods
62shall surely punish thee for thy arrogance.}
63
64{\bf Warning}: Although the TiMBL internals perform some sanity
65checking, it is quite possible to combine API functions such
66that some undetermined state is reached, or even a conflict
67arises. The effect of the {\tt SetOptions()} function, for instance,
68might be quite surprising. If you have created your own program
69with the API it might be wise to test against well-know data to see if
70the results make sense.
71
72\chapter{Changes}
73\label{changes}
74
75\section{From version 6.2 to 6.3}
76
77No changes to the API are made for this release. This Manual is made
78up to date (preserving the beta-state).
79
80\section{From version 6.1 to 6.2}
81
82In version 6.2, some additional functions were added to the API: {\tt
83  matchDepth()}, {\tt matchedAtLeaf()}, {\tt WriteMatrices()}, {\tt
84  GetMatrices()} and {\tt ShowStatistics()}. These reflect the
85additional functionality of Timbl 6.2.  The API is still experimental,
86and contains more functions than described in this manual. Using these
87`undocumented' features is, as usual, unwise.
88
89\section{From version 5.1 to 6.1}
90
91The major change in 6.0 is the introduction of the {\tt neighborSet}
92class, with some special Classify functions.  We added Classify
93functions that deliver pointers into Timbl's internal data. This is
94fast, but dangerous.  Also, a {\tt WriteInstanceBaseXml()} function is
95added, which comes in handy when you want to know more about the
96instance base.  Two more examples demonstrating neighborSets and such
97are added in Appendix B. From version 6.0 to 6.1, the API has not changed.
98
99\section{From version 5.0 to 5.1}
100
101The API is quite stable at the moment. Most TiMBL changes did not
102affect the API. The only real API change is in the {\tt GetWeights()}
103function. (see the section on Storing and retrieving intermediate
104results).  A few options were added to Timbl, influencing the table in
105Appendix A. We have also changed and enhanced the examples in Appendix
106B.
107
108\chapter{Quick-start}
109\section{Setting up an experiment}
110
111There is just one way to start a TiMBL experiment, which is to call
112the TimblAPI constructor:
113
114\begin{footnotesize}
115\begin{verbatim}
116  TimblAPI( const std::string& args, const std::string& name ="" );
117\end{verbatim}
118\end{footnotesize}
119
120args is used as a "command line" and is parsed for all kind of options
121which are used to create the right kind of experiment with the desired
122settings for metric, weighting etc. If something is wrong with the
123settings, {\em no}\/ object is created.
124
125The most important option is {\tt -a}  to set the kind of algorithm,
126e.g. {\tt -a IB1} to invoke an IB1 experiment or {\tt -a IGTREE} to invoke an IGTREE
127experiment. A list of possible options is give in Appendix A.
128
129The optional name can be useful if you have multiple experiments.
130In case of warnings or errors, this name is appended to the message.
131
132For example:
133
134\begin{footnotesize}
135\begin{verbatim}
136  TimblAPI *My_Experiment = new TimblAPI( "-a IGTREE +vDI+DB",
137                                          "test1" );
138\end{verbatim}
139\end{footnotesize}
140
141{\tt My\_Experiment} is created as an IGTREE experiment with the name
142"test1", and the verbosity is set to DI+DB, meaning that the output
143will contain DIstance and DistriBution information.
144
145The counterpart to creation is the {\tt \~{ }TimblAPI()} destructor,
146which is called when you delete an experiment:
147
148\begin{footnotesize}
149\begin{verbatim}
150  delete My_Experiment;
151\end{verbatim}
152\end{footnotesize}
153
154\section{Running an experiment}
155
156Assuming that we have appropriate datafiles (such as the example files {\tt
157dimin.train} and {\tt dimin.test} in the TiMBL package), we can get
158started right away with the functions {\tt Learn()} and {\tt Test()}.
159
160\subsection{Training}
161\begin{footnotesize}
162\begin{verbatim}
163  bool Learn( const std::string& f );
164\end{verbatim}
165\end{footnotesize}
166
167This function takes a file with name 'f', and gathers information
168such as: number of features, number and frequency of feature values and
169the same for class names. After that, these data are used to calculate
170a lot of statistical information, which will be used for
171testing. Finally, an InstanceBase is created, tuned to the current
172algorithm.
173
174\subsection{Testing}
175\begin{footnotesize}
176\begin{verbatim}
177  bool Test( const std::string& in,
178             const std::string& out,
179             const std::string& perc = "" );
180\end{verbatim}
181\end{footnotesize}
182
183Test a file given by 'in' and write results to 'out'. If 'perc' is not
184empty, then a percentage score is written to file 'perc'.
185
186For example:
187
188\begin{footnotesize}
189\begin{verbatim}
190  My_Experiment->Learn( "dimin.train" );
191  My_Experiment->Test( "dimin.test", "my_first_test" );
192\end{verbatim}
193\end{footnotesize}
194
195An InstanceBase will be created from dimin.train, then dimin.test is
196tested against that InstanceBase and output is written to
197my\_first\_test.
198
199\subsection{Special cases of {\tt Learn()} and {\tt Test()}}
200
201There are special cases where {\tt Learn()} behaves differently:
202
203\begin{itemize}
204\item When the algorithm is IB2, {\tt Learn()} will automatically take
205  the first $n$ lines of f (set with the {\tt -b n} option) to
206  bootstrap itself, and then the rest of f for IB2-learning. After
207  Learning IB2, you can use {\tt Test()} as usual.
208
209\item When the algorithm is CV, {\tt Learn()} is not defined, and all
210  work is done in a special version of {\tt Test()}. 'f' is assumed to
211  give the name of a file, which, on separate lines, gives the names
212  of the files to be cross-validated.
213
214  Also, if {\em featureWeights}\/ or {\em probabilities}\/ are read from
215  user-defined datafiles, a special {\tt CVprepare()} function must be called,
216  to make the weigthing, weightFilename and probabilityFileName known to the
217{\tt Test()} function.
218
219See Appendix B for a complete CV example (program {\tt api\_test3}).
220
221%TODO: een voorbeeld met CVPrepare erbij!
222
223\end{itemize}
224
225\section{More about settings}
226
227After an experiment is set up with the TimblAPI constructor, many
228options can be changed "on the fly" with:
229
230\begin{footnotesize}
231\begin{verbatim}
232  bool SetOptions( const std::string& opts );
233\end{verbatim}
234\end{footnotesize}
235
236Here, `opts' is interpreted as a list of option settings, just like in
237the TimblAPI constructor. When an error in the opts string is found,
238{\tt SetOptions()} returns false. Whether any options are really set
239or changed in that case is undefined. Note that a few options can only
240be set {\em once}\/ when creating the experiment, most notably the
241algorithm. Any attempt to change these options will result in a
242failure.  See Appendix A for all valid options and information about
243the possibility to change them within a running experiment.
244
245Note: {\tt SetOptions()} is lazy; changes are cached until the
246moment they are really needed, so you can do several {\tt SetOptions()}
247calls with even different values for the same option. Only the last
248one seen will be used for running the experiment.
249
250To see which options are in effect, you can use the calls {\tt ShowOptions()}
251and {\tt ShowSettings()}.
252
253\begin{footnotesize}
254\begin{verbatim}
255  bool ShowOptions( std::ostream& );
256\end{verbatim}
257\end{footnotesize}
258
259Shows all options with their possible and current values.
260
261\begin{footnotesize}
262\begin{verbatim}
263  bool ShowSettings( std::ostream& );
264\end{verbatim}
265\end{footnotesize}
266
267Shows all options and their currect values.
268
269For example:
270
271\begin{footnotesize}
272\begin{verbatim}
273  My_Experiment->SetOptions( "-w2 -m:M" );
274  My_Experiment->SetOptions( "-w3 -v:DB" );
275  My_Experiment->ShowSettings( cout )
276\end{verbatim}
277\end{footnotesize}
278
279See Appendix B (program {\tt api\_test1}) for the output.
280
281\section{Storing and retrieving intermediate results}
282
283To speed up testing, or to manipulate what is happening internally, we
284can store and retrieve several important parts of our experiment: The
285InstanceBase, the FeatureWeights, the ProbabilityArrays and the ValueDistance Matrices.
286
287Saving is done with:
288
289\begin{footnotesize}
290\begin{verbatim}
291  bool WriteInstanceBase( const std::string& f );
292  bool SaveWeights( const std::string& f );
293  bool WriteArrays( const std::string& f );
294  bool WriteMatrices( const std::string& f );
295\end{verbatim}
296\end{footnotesize}
297
298Retrieve with their counterparts:
299
300\begin{footnotesize}
301\begin{verbatim}
302  bool GetInstanceBase( const std::string& f );
303  bool GetWeights( const std::string& f, Weighting w );
304  bool GetArrays( const std::string& f );
305  bool GetMatrices( const std::string& f );
306\end{verbatim}
307\end{footnotesize}
308
309All use `f' as a filename for storing/retrieving. {\tt GetWeights} needs
310information to decide {\em which}\/ weighting to retrieve.
311Weighting is defined as the enumerated type:
312
313\begin{footnotesize}
314\begin{verbatim}
315  enum Weighting { UNKNOWN_W, UD, NW, GR, IG, X2, SV };
316\end{verbatim}
317\end{footnotesize}
318
319Some notes:
320
321\begin{enumerate}
322\item The InstanceBase is stored in a internal format, with or without
323hashing, depending on the {\tt -H} option. The format is described in the
324TiMBL manual. Remember that it is a bad idea to edit this file in any way.
325\item {\tt GetWeights()} can be used to override the weights that
326{\tt Learn()} calculated. {\tt UNKNOWN\_W} should not be used.
327\item The Probability arrays are described in the TiMBL manual. They can be
328manipulated to tune the MVDM similarity metric.
329\end{enumerate}
330
331If you like you may dump the Instancebase in an XML format. No Retrieve
332function is available for this format.
333
334\begin{footnotesize}
335\begin{verbatim}
336  bool WriteInstanceBaseXml( const std::string& f );
337\end{verbatim}
338\end{footnotesize}
339
340\chapter{Classify functions}
341
342\section{Classify functions: Elementary}
343After an experiment is trained with {\tt Learn()}, we do not have to use
344{\tt Test()} to do bulk-testing on a file.
345We can create our own tests with the {\tt Classify} functions:
346
347\begin{footnotesize}
348\begin{verbatim}
349  bool Classify( const std::string& Line, std::string& result );
350  bool Classify( const std::string& Line, std::string& result,
351                 double& distance );
352  bool Classify( const std::string& Line, std::string& result,
353                 std::string& Distrib, double& distance );
354\end{verbatim}
355\end{footnotesize}
356
357Results are stored in 'result' (the assigned class). 'distance' will
358get the calculated distance, and 'Distrib' the distribution at
359'distance' which is used to calculate 'result'.  Distrib will be a
360string like ``\{ NP 2, PP 6 \}''. It is up to you to parse and
361interpret this. (In this case: There were 8 classes assigned at
362'distance', 2 NP's and 6 PP's, giving a 'result' of ``PP''.)
363
364If you want to perform analyses on these distributions, it might be a
365good idea to read the next section about the other range of Classify()
366functions.
367
368A main disadvantage compared to using {\tt Test()} is that {\tt
369  Test()} is optimized.  {\tt Classify()} has to test for sanity of
370its input and also whether a {\tt SetOptions()} has been
371performed. This slows down the process.
372
373A good example of the use of {\tt Classify()} is the {\tt
374 classify.cxx} program in the TiMBL Distribution.
375
376Depending on the Algorithm and Verbosity setting, it may be possible
377to get some extra information on the details of each classification
378using:
379
380\begin{footnotesize}
381\begin{verbatim}
382   const bool ShowBestNeighbors( std::ostream& os, bool distr ) const;
383\end{verbatim}
384\end{footnotesize}
385
386Provided that the option {\tt +v n} or {\tt +v k} is set and we use
387IB1 or IB2, output is produced similar to what we see in the TiMBL
388program.  When 'distr' is true, their distributions are also
389displayed.  Bear in mind: The {\tt +vn} option is expensive in time
390and memory and does not work for IGTREE, TRIBL, and TRIBL2.
391
392Two other functions provide the results as given by the {\tt +vmd} verbosity
393option:
394
395\begin{footnotesize}
396\begin{verbatim}
397    size_t matchDepth() const;
398    bool matchedAtLeaf() const;
399\end{verbatim}
400\end{footnotesize}
401
402The first returns the matching Depth in the InstanceBase; the second
403flags whether it was a Leaf or a Non-Terminal Node.
404
405\section{Classify functions: Advanced}
406
407A faster, but more dangerous version of Classify is also available.
408It is faster because it returns pointers into Timbl's internal
409datastructures. It is dangerous because it returns pointers into
410Timbl's internal datastructures (using 'const' pointers, so it is
411fortunately difficult te really damage Timbl)
412
413\begin{footnotesize}
414\begin{verbatim}
415  const TargetValue *Classify( const std::string& );
416  const TargetValue *Classify( const std::string&,
417                               const ValueDistribution *& );
418  const TargetValue *Classify( const std::string&, double& );
419  const TargetValue *Classify( const std::string&,
420                               const ValueDistribution *&,
421                               double& );
422\end{verbatim}
423\end{footnotesize}
424
425A ValueDistribution is a list-like object (but it is not a real list!)
426that contains TargetValues objects and weights. It is the result of
427combining all nearest neighbors and applying the desired weightings.
428Timbl chooses a best TargetValue from this ValueDistribution and the
429Classify functions return that as their main result.
430
431{\bf Important}: Because these functions return pointers into Timbl's
432internal representation, the results are only valid until the next
433Classify function is called (or the experiment is deleted).
434
435Both the TargetValue and ValueDistribution objects have output
436operators defined, so you can print them.  TargetValue also has a {\tt
437  Name()} function, which returns a std::string so you can collect
438results.  ValueDistribution has an iterator-like interface which makes
439it possible to walk through the Distribution.
440
441An iterator on a {\tt ValueDistribution *vd} is created like this:
442\begin{footnotesize}
443\begin{verbatim}
444  ValueDistribution::dist_iterator it=vd->begin();
445\end{verbatim}
446\end{footnotesize}
447
448Unfortunately, the iterator cannot be printed or used directly.
449It walks through a map-like structure with pairs of values, of which
450only the {\tt second} part is of interest to you.
451You may print it, or extract its {\tt Value()} (which happens to be a
452TargetValue pointer) or extract its {\tt Weight()}, which is a {\tt double}.
453
454Like this:
455\begin{footnotesize}
456\begin{verbatim}
457  while ( it != vd->end() ){
458    cout << it->second << " has a value: ";
459    cout << it->second->Value() << " an a weight of "
460         << it->second->Weight() << endl;
461    ++it;
462  }
463\end{verbatim}
464\end{footnotesize}
465
466Printing {\tt it->second} is the same as printing the
467TargetValue plus its Weight.
468
469In the {\em demos}\/ directory you will find a complete example in api\_test6.
470
471{\bf Warning}: it is possible to search the Timbl code for the
472internal representation of the TargetValue and ValueDistribution
473objects, but please DON'T DO THAT.  The representation might change
474between Timbl versions.
475
476\section{Classify functions: neighborSets}
477
478A more flexible way of classifying is to use one of these functions:
479
480\begin{footnotesize}
481\begin{verbatim}
482  const neighborSet *classifyNS( const std::string& );
483  bool classifyNS( const std::string&, neighborSet& );
484\end{verbatim}
485\end{footnotesize}
486
487The first function will classify an instance and return a pointer to a
488{\tt neighborSet} object. This object may be seen as an container
489which holds both distances and distributions up to a certain depth,
490(which is {\em at least}\/ the number of neighbors (-k option) that
491was used for the classifying task.)  It is a const object, so you
492cannot directly manipulate its internals, but there are some
493functions defined to get useful information out of the neighborSet.
494
495Important:  The neighborSet {\em will be overwritten}\/ on the next
496call to any of the classify functions. Be sure to get all the
497results out before that happens.
498
499To make life easy, a second variant can be used, which fills a
500neighborSet object that you provide (the same could be achieved by a
501copy of the result of the first function).
502
503{\bf Note}: NeighborSets can be large, and copying therefore
504expensive, so you should only do this if you really have to.
505
506\subsection{How to get results from a neighborSet}
507
508No metric functions (such as exponential decay and the like) are
509performed on the neighborSet. You are free to insert your own metrics, or
510use Timbls built-in metrics.
511
512\begin{footnotesize}
513\begin{verbatim}
514  double getDistance( size_t n ) const;
515  double bestDistance() const;
516  const ValueDistribution *getDistribution( size_t n ) const;
517  ValueDistribution *bestDistribution( const decayStruct * ds=0,
518                                       size_t n=0 ) const ;
519\end{verbatim}
520\end{footnotesize}
521
522{\tt getDistance( n )} will return the distance of the neighbor(s) at n.
523{\tt bestDistance()} is simply {\tt getDistance(0)}.
524
525{\tt getDistribution( n )} will return the distribution of neighbor(s) at
526n.
527
528{\tt bestDistribution()} will return the Weighted distribution
529calculated using the first n elements in the container and a metric
530specified by the {\tt decayStruct}.  The default n=0, means: use the
531whole container. An empty decay struct means zeroDecay.
532
533The returned ValueDistribution object is handed to you, and you are
534responsible for deleting it after using it (see the previous section
535for more details about ValueDistributions).
536
537A decayStruct is one of:
538
539\begin{footnotesize}
540\begin{verbatim}
541  class zeroDecay();
542  class invLinDecay();
543  class invDistDecay();
544  class expDecay( double alpha );
545  class expDecay( double alpha, double beta );
546\end{verbatim}
547\end{footnotesize}
548
549For example, to get a ValueDistribution form a neighborSet {\tt nb}, using
5503 neighbors and exponential decay with alpha=0.3, you can do:
551
552\begin{footnotesize}
553\begin{verbatim}
554  decayStruct *dc = new  expDecay(0.3);
555  ValueDistribution *vd = nb->bestDistribution( dc, 3 );
556\end{verbatim}
557\end{footnotesize}
558
559
560\subsection{Useful operations on neighborSet objects}
561
562You can print neighborSet objects:
563
564\begin{footnotesize}
565\begin{verbatim}
566    std::ostream& operator<<( std::ostream&, const neighborSet& );
567    std::ostream& operator<<( std::ostream&, const neighborSet * );
568\end{verbatim}
569\end{footnotesize}
570
571You may create a neighborSet yourself, and assign and delete them:
572
573\begin{footnotesize}
574\begin{verbatim}
575    neighborSet();
576    neighborSet( const neighborSet& );
577    neighborSet& operator=( const neighborSet& );
578    ~neighborSet();
579\end{verbatim}
580\end{footnotesize}
581
582If you create an neighborSet, you might want to reserve space for it,
583to avoid needless reallocations. Also it can be cleared, and you can
584ask the size (just like with normal containers):
585
586\begin{footnotesize}
587\begin{verbatim}
588    void reserve( size_t );
589    void clear();
590    size_t size() const;
591\end{verbatim}
592\end{footnotesize}
593
594Two neighborSets can be merged:
595
596\begin{footnotesize}
597\begin{verbatim}
598    void merge( const neighborSet& );
599\end{verbatim}
600\end{footnotesize}
601
602A neighborSet can be truncated at a certain level. This is useful
603after merging neighborSets. Merging sets with depth k and n will
604result in a set with a depth somewhere within the range $[max(k,n), k+n]$.
605
606\begin{footnotesize}
607\begin{verbatim}
608    void truncate( size_t );
609\end{verbatim}
610\end{footnotesize}
611
612\chapter{Advanced Functions}
613
614\section{Modifying the InstanceBase}
615
616The instanceBase can be modified with the functions:
617
618\begin{footnotesize}
619\begin{verbatim}
620  bool Increment( const std::string& Line );
621  bool Decrement( const std::string& Line );
622\end{verbatim}
623\end{footnotesize}
624
625These functions add an Instance (as described by Line) to the
626InstanceBase, or remove it.  This can only be done for IB1-like
627experiments (IB1, IB2, CV and LOO), and enforces a lot of
628statistical recalculations.
629
630More sophisticated are:
631
632\begin{footnotesize}
633\begin{verbatim}
634  bool Expand( const std::string& File  );
635  bool Remove( const std::string& File );
636\end{verbatim}
637\end{footnotesize}
638
639which use the contents of File to do a bulk of Increments or Decrements, and
640recalculate afterwards.
641
642\section{Getting more information out of Timbl}
643
644There are a few convenience functions to get extra information on
645TiMBL and its behaviour:
646
647\begin{footnotesize}
648\begin{verbatim}
649  bool WriteNamesFile( const std::string& f );
650\end{verbatim}
651\end{footnotesize}
652
653Create a file which resembles a C4.5 namesfile.
654
655\begin{footnotesize}
656\begin{verbatim}
657  Algorithm Algo()
658\end{verbatim}
659\end{footnotesize}
660
661Give the current algorithm as a type enum Algorithm. First, the
662declaration of the Algorithm type:
663
664\begin{footnotesize}
665\begin{verbatim}
666  enum Algorithm { UNKNOWN_ALG, IB1, IB2, IGTREE,
667                   TRIBL, TRIBL2, LOO, CV };
668\end{verbatim}
669\end{footnotesize}
670
671This can be printed with the helper function:
672
673\begin{footnotesize}
674\begin{verbatim}
675  const std::string to_string( const Algorithm )
676\end{verbatim}
677\end{footnotesize}
678
679\begin{footnotesize}
680\begin{verbatim}
681  Weighting CurrentWeighting()
682\end{verbatim}
683\end{footnotesize}
684
685Gives the current weighting as a type enum Weighting.
686
687Declaration of Weighting:
688
689\begin{footnotesize}
690\begin{verbatim}
691  enum Weighting { UNKNOWN_W, UD, NW, GR, IG, X2, SV };
692\end{verbatim}
693\end{footnotesize}
694
695This can be printed with the helper function:
696
697\begin{footnotesize}
698\begin{verbatim}
699  const std::string to_string( const Weighting )
700\end{verbatim}
701\end{footnotesize}
702
703
704\begin{footnotesize}
705\begin{verbatim}
706  Weighting CurrentWeightings( std::vector<double>& v )
707\end{verbatim}
708\end{footnotesize}
709
710Returns the current weighting as a type enum Weighting and also a
711vector v with all the current values of this weighting.
712
713\begin{footnotesize}
714\begin{verbatim}
715  std::string& ExpName()
716\end{verbatim}
717\end{footnotesize}
718
719Returns the value of 'name' given at the construction of the experiment
720
721\begin{footnotesize}
722\begin{verbatim}
723  static std::string VersionInfo( bool full = false )
724\end{verbatim}
725\end{footnotesize}
726
727Returns a string containing the Version number, the Revision and the
728Revision string of the current API implementation. If full is true,
729also information about the date and time of compilation is included.
730
731\chapter{Server mode}
732\label{Using TiMBL as a Server}
733
734\begin{footnotesize}
735\begin{verbatim}
736  bool StartServer( const int port, const int max_c );
737\end{verbatim}
738\end{footnotesize}
739
740Starts a TimblServer on 'port' with maximally 'max\_c' concurrent
741connections to it. Starting a server makes sense only after the
742experiment is trained.
743
744\clearpage
745\chapter{Annotated example programs}
746
747\subsection{example 1, {\tt api\_test1.cxx}}
748\begin{footnotesize}
749\begin{verbatim}
750#include "TimblAPI.h"
751int main(){
752  TimblAPI My_Experiment( "-a IGTREE +vDI+DB+F", "test1" );
753  My_Experiment.SetOptions( "-w3 -vDB" );
754  My_Experiment.ShowSettings( std::cout );
755  My_Experiment.Learn( "dimin.train" );
756  My_Experiment.Test( "dimin.test", "my_first_test.out" );
757  My_Experiment.SetOptions( "-mM" );
758  My_Experiment.Test( "dimin.test", "my_first_test.out" );
759}
760\end{verbatim}
761\end{footnotesize}
762
763
764Output:
765\begin{footnotesize}
766\begin{verbatim}
767Current Experiment Settings :
768FLENGTH              : 0
769MAXBESTS             : 500
770TRIBL_OFFSET         : 0
771INPUTFORMAT          : Unknown
772TREE_ORDER           : Unknown
773ALL_WEIGHTS          : false
774WEIGHTING            : x2                               [Note 1]
775BIN_SIZE             : 20
776IB2_OFFSET           : 0
777KEEP_DISTRIBUTIONS   : false
778DO_SLOPPY_LOO        : false
779TARGET_POS           : 18446744073709551615
780DO_SILLY             : false
781DO_DIVERSIFY         : false
782DECAY                : Z
783SEED                 : -1
784BEAM_SIZE            : 0
785DECAYPARAM_A         : 1.00000
786DECAYPARAM_B         : 1.00000
787NORMALISATION        : None
788NORMFACTOR           : 1.00000
789EXEMPLAR_WEIGHTS     : false
790IGNORE_EXEMPLAR_WEIGHTS : true
791NO_EXEMPLAR_WEIGHTS_TEST : true
792VERBOSITY            : F+DI                             [Note 2]
793EXACT_MATCH          : false
794HASHED_TREE          : true
795GLOBAL_METRIC        : O
796METRICS              :
797MVD_LIMIT            : 1
798NEIGHBORS            : 1
799PROGRESS             : 100000
800CLIP_FACTOR          : 10
801
802Examine datafile 'dimin.train' gave the following results:
803Number of Features: 12
804InputFormat       : C4.5
805
806-test1-Phase 1: Reading Datafile: dimin.train
807-test1-Start:          0 @ Mon May 31 11:03:34 2010
808-test1-Finished:    2999 @ Mon May 31 11:03:34 2010
809-test1-Calculating Entropy         Mon May 31 11:03:34 2010
810Lines of data     : 2999
811DB Entropy        : 1.6178929
812Number of Classes : 5
813
814Feats   Vals    X-square        Variance        InfoGain        GainRatio
815    1      3    128.41828       0.021410184     0.030971064     0.024891536
816    2     50    364.75812       0.030406645     0.060860038     0.027552191
817    3     19    212.29804       0.017697402     0.039562857     0.018676787
818    4     37    449.83823       0.037499019     0.052541227     0.052620750
819    5      3    288.87218       0.048161417     0.074523225     0.047699231
820    6     61    415.64113       0.034648310     0.10604433      0.024471911
821    7     20    501.33465       0.041791818     0.12348668      0.034953203
822    8     69    367.66021       0.030648567     0.097198760     0.043983864
823    9      2    169.36962       0.056475363     0.045752381     0.046816705
824   10     64    914.61906       0.076243669     0.21388759      0.042844587
825   11     18    2807.0418       0.23399815      0.66970458      0.18507018
826   12     43    7160.3682       0.59689631      1.2780762       0.32537181
827
828Feature Permutation based on Chi-Squared :
829< 12, 11, 10, 7, 4, 6, 8, 2, 5, 3, 9, 1 >
830-test1-Phase 2: Building index on Datafile: dimin.train
831-test1-Start:          0 @ Mon May 31 11:03:34 2010
832-test1-Finished:    2999 @ Mon May 31 11:03:34 2010
833-test1-
834Phase 3: Learning from Datafile: dimin.train
835-test1-Start:          0 @ Mon May 31 11:03:34 2010
836-test1-Finished:    2999 @ Mon May 31 11:03:34 2010
837
838Size of InstanceBase = 148 Nodes, (5920 bytes), 99.61 % compression
839Examine datafile 'dimin.test' gave the following results:
840Number of Features: 12
841InputFormat       : C4.5
842
843
844Starting to test, Testfile: dimin.test
845Writing output in:          my_first_test.out
846Algorithm     : IGTree
847Weighting     : Chi-square
848Feature 1        : 128.418283576224439
849Feature 2        : 364.758115277811896
850Feature 3        : 212.298037236345095
851Feature 4        : 449.838231470681876
852Feature 5        : 288.872176256387263
853Feature 6        : 415.641126446691771
854Feature 7        : 501.334653478280984
855Feature 8        : 367.660212489714240
856Feature 9        : 169.369615106487458
857Feature 10       : 914.619058199288816
858Feature 11       : 2807.041753278295346
859Feature 12       : 7160.368151902808677
860
861-test1-Tested:      1 @ Mon May 31 11:03:34 2010
862-test1-Tested:      2 @ Mon May 31 11:03:34 2010
863-test1-Tested:      3 @ Mon May 31 11:03:34 2010
864-test1-Tested:      4 @ Mon May 31 11:03:34 2010
865-test1-Tested:      5 @ Mon May 31 11:03:34 2010
866-test1-Tested:      6 @ Mon May 31 11:03:34 2010
867-test1-Tested:      7 @ Mon May 31 11:03:34 2010
868-test1-Tested:      8 @ Mon May 31 11:03:34 2010
869-test1-Tested:      9 @ Mon May 31 11:03:34 2010
870-test1-Tested:     10 @ Mon May 31 11:03:34 2010
871-test1-Tested:    100 @ Mon May 31 11:03:34 2010
872-test1-Ready:     950 @ Mon May 31 11:03:34 2010
873Seconds taken: 0.1331 (7135.13 p/s)
874
875overall accuracy:        0.962105  (914/950)
876Examine datafile 'dimin.test' gave the following results:
877Number of Features: 12
878InputFormat       : C4.5
879
880Warning:-test1-Metric must be Overlap for IGTree test.     [Note 3]
881
882\end{verbatim}
883\end{footnotesize}
884
885
886Notes:
887\begin{enumerate}
888\item The {\tt -w2} of the first {\tt SetOptions()} is overruled with
889  {\tt -w3} from the second {\tt SetOptions()}, resulting in a
890  weighting of 3 or Chi-Square.
891\item The first {\tt SetOptions()} sets the verbosity with {\tt +F+DI+DB}.
892The second {\tt SetOptions()}, however, sets the verbosity with {\tt -vDB}, and the resulting verbosity is therefore {\tt F+DI}.
893\item Due to the second {\tt SetOptions()}, the default metric is set to
894MVDM --- this is however not applicable to IGTREE. This raises a warning
895when we start to test.
896\end{enumerate}
897
898Result in my\_first\_test.out (first 20 lines):
899\begin{footnotesize}
900\begin{verbatim}
901=,=,=,=,=,=,=,=,+,p,e,=,T,T        6619.8512628162
902=,=,=,=,+,k,u,=,-,bl,u,m,E,P        2396.8557978603
903+,m,I,=,-,d,A,G,-,d,},t,J,J        6619.8512628162
904-,t,@,=,-,l,|,=,-,G,@,n,T,T        6619.8512628162
905-,=,I,n,-,str,y,=,+,m,E,nt,J,J        6619.8512628162
906=,=,=,=,=,=,=,=,+,br,L,t,J,J        6619.8512628162
907=,=,=,=,+,zw,A,=,-,m,@,r,T,T        6619.8512628162
908=,=,=,=,-,f,u,=,+,dr,a,l,T,T        6619.8512628162
909=,=,=,=,=,=,=,=,+,l,e,w,T,T        13780.219414719
910=,=,=,=,+,tr,K,N,-,k,a,rt,J,J        6619.8512628162
911=,=,=,=,+,=,o,=,-,p,u,=,T,T        3812.8095095379
912=,=,=,=,=,=,=,=,+,l,A,m,E,E        3812.8095095379
913=,=,=,=,=,=,=,=,+,l,A,p,J,J        6619.8512628162
914=,=,=,=,=,=,=,=,+,sx,E,lm,P,P        6619.8512628162
915+,l,a,=,-,d,@,=,-,k,A,st,J,J        6619.8512628162
916-,s,i,=,-,f,E,r,-,st,O,k,J,J        6619.8512628162
917=,=,=,=,=,=,=,=,+,sp,a,n,T,T        6619.8512628162
918=,=,=,=,=,=,=,=,+,st,o,t,J,J        6619.8512628162
919=,=,=,=,+,sp,a,r,-,b,u,k,J,J        6619.8512628162
920+,h,I,N,-,k,@,l,-,bl,O,k,J,J        6619.8512628162
921\end{verbatim}
922\end{footnotesize}
923\clearpage
924
925\subsection{example 2, {\tt api\_test2.cxx}}
926
927This demonstrates IB2 learning. Our example program:
928
929\begin{footnotesize}
930\begin{verbatim}
931#include "TimblAPI.h"
932int main(){
933  TimblAPI *My_Experiment = new TimblAPI( "-a IB2 +vF+DI+DB" ,
934                                          "test2" );
935  My_Experiment->SetOptions( "-b100" );
936  My_Experiment->ShowSettings( std::cout );
937  My_Experiment->Learn( "dimin.train" );
938  My_Experiment->Test( "dimin.test", "my_second_test.out" );
939  delete My_Experiment;
940  exit(1);
941}
942\end{verbatim}
943\end{footnotesize}
944
945We create an experiment for the IB2 algorithm, with the {\tt -b} option set
946to 100, so the first 100 lines of {\tt dimin.train} will be used to
947bootstrap the learning, as we can see from the output:
948
949\begin{footnotesize}
950\begin{verbatim}
951Current Experiment Settings :
952FLENGTH              : 0
953MAXBESTS             : 500
954TRIBL_OFFSET         : 0
955INPUTFORMAT          : Unknown
956TREE_ORDER           : G/V
957ALL_WEIGHTS          : false
958WEIGHTING            : gr
959BIN_SIZE             : 20
960IB2_OFFSET           : 100
961KEEP_DISTRIBUTIONS   : false
962DO_SLOPPY_LOO        : false
963TARGET_POS           : 4294967295
964DO_SILLY             : false
965DO_DIVERSIFY         : false
966DECAY                : Z
967SEED                 : -1
968BEAM_SIZE            : 0
969DECAYPARAM_A         : 1.00000
970DECAYPARAM_B         : 1.00000
971NORMALISATION        : None
972NORM_FACTOR          : 1.00000
973EXEMPLAR_WEIGHTS     : false
974IGNORE_EXEMPLAR_WEIGHTS : true
975NO_EXEMPLAR_WEIGHTS_TEST : true
976VERBOSITY            : F+DI+DB
977EXACT_MATCH          : false
978HASHED_TREE          : true
979GLOBAL_METRIC        : O
980METRICS              :
981MVD_LIMIT            : 1
982NEIGHBORS            : 1
983PROGRESS             : 100000
984CLIP_FACTOR          : 10
985
986Examine datafile 'dimin.train' gave the following results:
987Number of Features: 12
988InputFormat       : C4.5
989
990-test2-Phase 1: Reading Datafile: dimin.train
991-test2-Start:          0 @ Mon May 31 11:03:34 2010
992-test2-Finished:    2999 @ Mon May 31 11:03:34 2010
993-test2-Calculating Entropy         Mon May 31 11:03:34 2010
994Lines of data     : 2999                                  [Note 1]
995DB Entropy        : 1.6178929
996Number of Classes : 5
997
998Feats	Vals	InfoGain	GainRatio
999    1      3	0.030971064	0.024891536
1000    2     50	0.060860038	0.027552191
1001    3     19	0.039562857	0.018676787
1002    4     37	0.052541227	0.052620750
1003    5      3	0.074523225	0.047699231
1004    6     61	0.10604433	0.024471911
1005    7     20	0.12348668	0.034953203
1006    8     69	0.097198760	0.043983864
1007    9      2	0.045752381	0.046816705
1008   10     64	0.21388759	0.042844587
1009   11     18	0.66970458	0.18507018
1010   12     43	1.2780762	0.32537181
1011
1012Feature Permutation based on GainRatio/Values :
1013< 9, 5, 11, 1, 12, 7, 4, 3, 10, 8, 2, 6 >
1014-test2-Phase 2: Learning from Datafile: dimin.train
1015-test2-Start:          0 @ Mon May 31 11:03:34 2010
1016-test2-Finished:     100 @ Mon May 31 11:03:34 2010
1017
1018Size of InstanceBase = 954 Nodes, (38160 bytes), 26.62 % compression
1019-test2-Phase 2: Appending from Datafile: dimin.train (starting at line 101)
1020-test2-Start:        101 @ Mon May 31 11:03:34 2010
1021-test2-Learning:     101 @ Mon May 31 11:03:34 2010	 added:0
1022-test2-Learning:     102 @ Mon May 31 11:03:34 2010	 added:0
1023-test2-Learning:     103 @ Mon May 31 11:03:34 2010	 added:0
1024-test2-Learning:     104 @ Mon May 31 11:03:34 2010	 added:0
1025-test2-Learning:     105 @ Mon May 31 11:03:34 2010	 added:0
1026-test2-Learning:     106 @ Mon May 31 11:03:34 2010	 added:0
1027-test2-Learning:     107 @ Mon May 31 11:03:34 2010	 added:0
1028-test2-Learning:     108 @ Mon May 31 11:03:34 2010	 added:0
1029-test2-Learning:     109 @ Mon May 31 11:03:34 2010	 added:0
1030-test2-Learning:     110 @ Mon May 31 11:03:34 2010	 added:0
1031-test2-Learning:     200 @ Mon May 31 11:03:34 2010	 added:9
1032-test2-Learning:    1100 @ Mon May 31 11:03:34 2010	 added:66
1033-test2-Finished:    2999 @ Mon May 31 11:03:35 2010
1034
1035in total added 173 new entries                                      [Note 2]
1036
1037Size of InstanceBase = 2232 Nodes, (89280 bytes), 32.40 % compression
1038DB Entropy        : 1.61789286
1039Number of Classes : 5
1040
1041Feats	Vals	InfoGain	GainRatio
1042    1      3	0.03097106	0.02489154
1043    2     50	0.06086004	0.02755219
1044    3     19	0.03956286	0.01867679
1045    4     37	0.05254123	0.05262075
1046    5      3	0.07452322	0.04769923
1047    6     61	0.10604433	0.02447191
1048    7     20	0.12348668	0.03495320
1049    8     69	0.09719876	0.04398386
1050    9      2	0.04575238	0.04681670
1051   10     64	0.21388759	0.04284459
1052   11     18	0.66970458	0.18507018
1053   12     43	1.27807625	0.32537181
1054
1055Examine datafile 'dimin.test' gave the following results:
1056Number of Features: 12
1057InputFormat       : C4.5
1058
1059
1060Starting to test, Testfile: dimin.test
1061Writing output in:          my_second_test.out
1062Algorithm     : IB2
1063Global metric : Overlap
1064Deviant Feature Metrics:(none)
1065Weighting     : GainRatio
1066Feature 1	 : 0.026241147173103
1067Feature 2	 : 0.030918769841214
1068Feature 3	 : 0.021445836516602
1069Feature 4	 : 0.056561885447060
1070Feature 5	 : 0.048311436541460
1071Feature 6	 : 0.027043360641622
1072Feature 7	 : 0.037453180788027
1073Feature 8	 : 0.044999091421718
1074Feature 9	 : 0.048992032381874
1075Feature 10	 : 0.044544230779268
1076Feature 11	 : 0.185449683494634
1077Feature 12	 : 0.324719540921155
1078
1079-test2-Tested:      1 @ Mon May 31 11:03:35 2010
1080-test2-Tested:      2 @ Mon May 31 11:03:35 2010
1081-test2-Tested:      3 @ Mon May 31 11:03:35 2010
1082-test2-Tested:      4 @ Mon May 31 11:03:35 2010
1083-test2-Tested:      5 @ Mon May 31 11:03:35 2010
1084-test2-Tested:      6 @ Mon May 31 11:03:35 2010
1085-test2-Tested:      7 @ Mon May 31 11:03:35 2010
1086-test2-Tested:      8 @ Mon May 31 11:03:35 2010
1087-test2-Tested:      9 @ Mon May 31 11:03:35 2010
1088-test2-Tested:     10 @ Mon May 31 11:03:35 2010
1089-test2-Tested:    100 @ Mon May 31 11:03:35 2010
1090-test2-Ready:     950 @ Mon May 31 11:03:35 2010
1091Seconds taken: 0.0456 (20826.48 p/s)
1092
1093overall accuracy:        0.941053  (894/950), of which 15 exact matches
1094                                                         [Note 3]
1095There were 43 ties of which 32 (74.42%) were correctly resolved
1096\end{verbatim}
1097\end{footnotesize}
1098
1099
1100Notes:
1101\begin{enumerate}
1102\item IB2 is bootstrapped with 100 lines, but for the statistics all 2999
1103 lines are used.
1104\item As we see here, 173 entries from the input file had a mismatch,
1105and were therefore entered in the Instancebase.
1106\item We see that IB2 scores 94.11 \%, compared to 96.21 \% for IGTREE
1107  in our first example.  For this data, IB2 is not a good
1108  algorithm. However, it saves a lot of space, and is faster than
1109  IB1. Yet, IGTREE is both faster and better. Had we used IB1, the
1110  score would have been 96.84 \%.
1111\end{enumerate}
1112\clearpage
1113
1114\subsection{example 3, {\tt api\_test3.cxx}}
1115
1116This demonstrates Cross Validation. Let's try the following program:
1117
1118\begin{footnotesize}
1119\begin{verbatim}
1120#include "TimblAPI.h"
1121using Timbl::TimblAPI;
1122
1123int main(){
1124  TimblAPI *My_Experiment = new TimblAPI( "-t cross_validate" );
1125  My_Experiment->Test( "cross_val.test" );
1126  delete My_Experiment;
1127  exit(0);
1128}
1129\end{verbatim}
1130\end{footnotesize}
1131
1132This program creates an experiment, which defaults to IB1 and because of the
1133special option ``-t cross\_validate'' will start a CrossValidation
1134experiment.\\
1135Learn() is not possible now. We must use a special form of Test().
1136
1137``cross\_val.test'' is a file with the following content:
1138\begin{footnotesize}
1139\begin{verbatim}
1140small_1.train
1141small_2.train
1142small_3.train
1143small_4.train
1144small_5.train
1145\end{verbatim}
1146\end{footnotesize}
1147
1148
1149All these files contain an equal part of a bigger dataset, and
1150My\_Experiment will run a CrossValidation test between these files.
1151Note that output filenames are generated and that you cannot influence
1152that.
1153
1154The output of this program is:
1155
1156\begin{footnotesize}
1157\begin{verbatim}
1158Starting Cross validation test on files:
1159small_1.train
1160small_2.train
1161small_3.train
1162small_4.train
1163small_5.train
1164Examine datafile 'small_1.train' gave the following results:
1165Number of Features: 8
1166InputFormat       : C4.5
1167
1168
1169Starting to test, Testfile: small_1.train
1170Writing output in:          small_1.train.cv
1171Algorithm     : CV
1172Global metric : Overlap
1173Deviant Feature Metrics:(none)
1174Weighting     : GainRatio
1175
1176Tested:      1 @ Mon May 31 11:03:35 2010
1177Tested:      2 @ Mon May 31 11:03:35 2010
1178Tested:      3 @ Mon May 31 11:03:35 2010
1179Tested:      4 @ Mon May 31 11:03:35 2010
1180Tested:      5 @ Mon May 31 11:03:35 2010
1181Tested:      6 @ Mon May 31 11:03:35 2010
1182Tested:      7 @ Mon May 31 11:03:35 2010
1183Tested:      8 @ Mon May 31 11:03:35 2010
1184Tested:      9 @ Mon May 31 11:03:35 2010
1185Tested:     10 @ Mon May 31 11:03:35 2010
1186Ready:      10 @ Mon May 31 11:03:35 2010
1187Seconds taken: 0.0006 (16207.46 p/s)
1188
1189overall accuracy:        0.800000  (8/10)
1190Examine datafile 'small_2.train' gave the following results:
1191Number of Features: 8
1192InputFormat       : C4.5
1193
1194
1195Starting to test, Testfile: small_2.train
1196Writing output in:          small_2.train.cv
1197Algorithm     : CV
1198Global metric : Overlap
1199Deviant Feature Metrics:(none)
1200Weighting     : GainRatio
1201
1202Tested:      1 @ Mon May 31 11:03:35 2010
1203Tested:      2 @ Mon May 31 11:03:35 2010
1204Tested:      3 @ Mon May 31 11:03:35 2010
1205Tested:      4 @ Mon May 31 11:03:35 2010
1206Tested:      5 @ Mon May 31 11:03:35 2010
1207Tested:      6 @ Mon May 31 11:03:35 2010
1208Tested:      7 @ Mon May 31 11:03:35 2010
1209Tested:      8 @ Mon May 31 11:03:35 2010
1210Tested:      9 @ Mon May 31 11:03:35 2010
1211Tested:     10 @ Mon May 31 11:03:35 2010
1212Ready:      10 @ Mon May 31 11:03:35 2010
1213Seconds taken: 0.0005 (19646.37 p/s)
1214
1215overall accuracy:        0.800000  (8/10)
1216Examine datafile 'small_3.train' gave the following results:
1217Number of Features: 8
1218InputFormat       : C4.5
1219
1220
1221Starting to test, Testfile: small_3.train
1222Writing output in:          small_3.train.cv
1223Algorithm     : CV
1224Global metric : Overlap
1225Deviant Feature Metrics:(none)
1226Weighting     : GainRatio
1227
1228Tested:      1 @ Mon May 31 11:03:35 2010
1229Tested:      2 @ Mon May 31 11:03:35 2010
1230Tested:      3 @ Mon May 31 11:03:35 2010
1231Tested:      4 @ Mon May 31 11:03:35 2010
1232Tested:      5 @ Mon May 31 11:03:35 2010
1233Tested:      6 @ Mon May 31 11:03:35 2010
1234Tested:      7 @ Mon May 31 11:03:35 2010
1235Tested:      8 @ Mon May 31 11:03:35 2010
1236Tested:      9 @ Mon May 31 11:03:35 2010
1237Tested:     10 @ Mon May 31 11:03:35 2010
1238Ready:      10 @ Mon May 31 11:03:35 2010
1239Seconds taken: 0.0005 (20202.02 p/s)
1240
1241overall accuracy:        0.900000  (9/10)
1242Examine datafile 'small_4.train' gave the following results:
1243Number of Features: 8
1244InputFormat       : C4.5
1245
1246
1247Starting to test, Testfile: small_4.train
1248Writing output in:          small_4.train.cv
1249Algorithm     : CV
1250Global metric : Overlap
1251Deviant Feature Metrics:(none)
1252Weighting     : GainRatio
1253
1254Tested:      1 @ Mon May 31 11:03:35 2010
1255Tested:      2 @ Mon May 31 11:03:35 2010
1256Tested:      3 @ Mon May 31 11:03:35 2010
1257Tested:      4 @ Mon May 31 11:03:35 2010
1258Tested:      5 @ Mon May 31 11:03:35 2010
1259Tested:      6 @ Mon May 31 11:03:35 2010
1260Tested:      7 @ Mon May 31 11:03:35 2010
1261Tested:      8 @ Mon May 31 11:03:35 2010
1262Tested:      9 @ Mon May 31 11:03:35 2010
1263Tested:     10 @ Mon May 31 11:03:35 2010
1264Ready:      10 @ Mon May 31 11:03:35 2010
1265Seconds taken: 0.0005 (19880.72 p/s)
1266
1267overall accuracy:        0.800000  (8/10)
1268Examine datafile 'small_5.train' gave the following results:
1269Number of Features: 8
1270InputFormat       : C4.5
1271
1272
1273Starting to test, Testfile: small_5.train
1274Writing output in:          small_5.train.cv
1275Algorithm     : CV
1276Global metric : Overlap
1277Deviant Feature Metrics:(none)
1278Weighting     : GainRatio
1279
1280Tested:      1 @ Mon May 31 11:03:35 2010
1281Tested:      2 @ Mon May 31 11:03:35 2010
1282Tested:      3 @ Mon May 31 11:03:35 2010
1283Tested:      4 @ Mon May 31 11:03:35 2010
1284Tested:      5 @ Mon May 31 11:03:35 2010
1285Tested:      6 @ Mon May 31 11:03:35 2010
1286Tested:      7 @ Mon May 31 11:03:35 2010
1287Tested:      8 @ Mon May 31 11:03:35 2010
1288Ready:       8 @ Mon May 31 11:03:35 2010
1289Seconds taken: 0.0004 (19093.08 p/s)
1290
1291overall accuracy:        1.000000  (8/8)
1292\end{verbatim}
1293\end{footnotesize}
1294
1295
1296What has happened here?
1297
1298\begin{enumerate}
1299\item TiMBL trained itself with inputfiles small\_2.train through
1300small\_5.train. (in fact using the {\tt Expand()} API call.
1301\item Then TiMBL tested small\_1.train against the InstanceBase.
1302\item Next, small\_2.train is removed from the database (API call {\tt
1303Remove()} ) and small\_1.train is added.
1304\item Then small\_2.train is tested against the InstanceBase.
1305\item And so forth with small\_3.train $\ldots$
1306\end{enumerate}
1307\clearpage
1308
1309\subsection{example 4, {\tt api\_test4.cxx}}
1310
1311This program demonstrates adding and deleting of the InstanceBase.  It
1312also proves that weights are (re)calculated correctly each time (which
1313also explains why this is a time-consuming thing to do). After running
1314this program, wg.1.wgt should be equal to wg.5.wgt and wg.2.wgt equal to
1315wg.4.wgt . Important to note is also, that while we do not use a weighting
1316of X2 or SV here, only the ``simple'' weights are calculated and
1317stored.
1318
1319Further, arr.1.arr should be equal to arr.5.arr and arr.2.arr should be equal
1320to arr.4.arr
1321
1322First the program:
1323
1324\begin{footnotesize}
1325\begin{verbatim}
1326#include <iostream>
1327#include "TimblAPI.h"
1328
1329int main(){
1330  TimblAPI *My_Experiment = new TimblAPI( "-a IB1 +vDI+DB +mM" ,
1331                                          "test4" );
1332  My_Experiment->ShowSettings( std::cout );
1333  My_Experiment->Learn( "dimin.train" );
1334  My_Experiment->Test( "dimin.test", "inc1.out" );
1335  My_Experiment->SaveWeights( "wg.1.wgt" );
1336  My_Experiment->WriteArrays( "arr.1.arr" );
1337  My_Experiment->Increment( "=,=,=,=,+,k,e,=,-,r,@,l,T" );
1338  My_Experiment->Test( "dimin.test", "inc2.out" );
1339  My_Experiment->SaveWeights( "wg.2.wgt" );
1340  My_Experiment->WriteArrays( "arr.2.arr" );
1341  My_Experiment->Increment( "+,zw,A,rt,-,k,O,p,-,n,O,n,E" );
1342  My_Experiment->Test( "dimin.test", "inc3.out" );
1343  My_Experiment->SaveWeights( "wg.3.wgt" );
1344  My_Experiment->WriteArrays( "arr.3.arr" );
1345  My_Experiment->Decrement( "+,zw,A,rt,-,k,O,p,-,n,O,n,E" );
1346  My_Experiment->Test( "dimin.test", "inc4.out" );
1347  My_Experiment->SaveWeights( "wg.4.wgt" );
1348  My_Experiment->WriteArrays( "arr.4.arr" );
1349  My_Experiment->Decrement( "=,=,=,=,+,k,e,=,-,r,@,l,T" );
1350  My_Experiment->Test( "dimin.test", "inc5.out" );
1351  My_Experiment->SaveWeights( "wg.5.wgt" );
1352  My_Experiment->WriteArrays( "arr.5.arr" );
1353  delete My_Experiment;
1354  exit(1);
1355}
1356\end{verbatim}
1357\end{footnotesize}
1358
1359
1360This produces the following output:
1361
1362\begin{footnotesize}
1363\begin{verbatim}
1364Current Experiment Settings :
1365FLENGTH              : 0
1366MAXBESTS             : 500
1367TRIBL_OFFSET         : 0
1368IG_THRESHOLD         : 1000
1369INPUTFORMAT          : Unknown
1370TREE_ORDER           : G/V
1371ALL_WEIGHTS          : false
1372WEIGHTING            : gr
1373BIN_SIZE             : 20
1374IB2_OFFSET           : 0
1375KEEP_DISTRIBUTIONS   : false
1376DO_SLOPPY_LOO        : false
1377TARGET_POS           : 18446744073709551615
1378DO_SILLY             : false
1379DO_DIVERSIFY         : false
1380DECAY                : Z
1381SEED                 : -1
1382BEAM_SIZE            : 0
1383DECAYPARAM_A         : 1.00000
1384DECAYPARAM_B         : 1.00000
1385NORMALISATION        : None
1386NORM_FACTOR          : 1.00000
1387EXEMPLAR_WEIGHTS     : false
1388IGNORE_EXEMPLAR_WEIGHTS : true
1389NO_EXEMPLAR_WEIGHTS_TEST : true
1390VERBOSITY            : DI+DB
1391EXACT_MATCH          : false
1392HASHED_TREE          : true
1393GLOBAL_METRIC        : M
1394METRICS              :
1395MVD_LIMIT            : 1
1396NEIGHBORS            : 1
1397PROGRESS             : 100000
1398CLIP_FACTOR          : 10
1399
1400Examine datafile 'dimin.train' gave the following results:
1401Number of Features: 12
1402InputFormat       : C4.5
1403
1404-test4-Phase 1: Reading Datafile: dimin.train
1405-test4-Start:          0 @ Mon May 31 11:03:35 2010
1406-test4-Finished:    2999 @ Mon May 31 11:03:35 2010
1407-test4-Calculating Entropy         Mon May 31 11:03:35 2010
1408Feature Permutation based on GainRatio/Values :
1409< 9, 5, 11, 1, 12, 7, 4, 3, 10, 8, 2, 6 >
1410-test4-Phase 2: Learning from Datafile: dimin.train
1411-test4-Start:          0 @ Mon May 31 11:03:35 2010
1412-test4-Finished:    2999 @ Mon May 31 11:03:35 2010
1413
1414Size of InstanceBase = 19231 Nodes, (769240 bytes), 49.77 % compression
1415Examine datafile 'dimin.test' gave the following results:
1416Number of Features: 12
1417InputFormat       : C4.5
1418
1419
1420Starting to test, Testfile: dimin.test
1421Writing output in:          inc1.out
1422Algorithm     : IB1
1423Global metric : Value Difference, Prestored matrix
1424Deviant Feature Metrics:(none)
1425Size of value-matrix[1] = 168 Bytes
1426Size of value-matrix[2] = 968 Bytes
1427Size of value-matrix[3] = 968 Bytes
1428Size of value-matrix[4] = 168 Bytes
1429Size of value-matrix[5] = 168 Bytes
1430Size of value-matrix[6] = 1904 Bytes
1431Size of value-matrix[7] = 1904 Bytes
1432Size of value-matrix[8] = 504 Bytes
1433Size of value-matrix[9] = 104 Bytes
1434Size of value-matrix[10] = 2904 Bytes
1435Size of value-matrix[11] = 1728 Bytes
1436Size of value-matrix[12] = 1248 Bytes
1437Total Size of value-matrices 12736 Bytes
1438
1439Weighting     : GainRatio
1440
1441-test4-Tested:      1 @ Mon May 31 11:03:35 2010
1442-test4-Tested:      2 @ Mon May 31 11:03:35 2010
1443-test4-Tested:      3 @ Mon May 31 11:03:35 2010
1444-test4-Tested:      4 @ Mon May 31 11:03:35 2010
1445-test4-Tested:      5 @ Mon May 31 11:03:35 2010
1446-test4-Tested:      6 @ Mon May 31 11:03:35 2010
1447-test4-Tested:      7 @ Mon May 31 11:03:35 2010
1448-test4-Tested:      8 @ Mon May 31 11:03:35 2010
1449-test4-Tested:      9 @ Mon May 31 11:03:35 2010
1450-test4-Tested:     10 @ Mon May 31 11:03:35 2010
1451-test4-Tested:    100 @ Mon May 31 11:03:35 2010
1452-test4-Ready:     950 @ Mon May 31 11:03:35 2010
1453Seconds taken: 0.0791 (12003.74 p/s)
1454
1455overall accuracy:        0.964211  (916/950), of which 62 exact matches
1456There were 6 ties of which 6 (100.00%) were correctly resolved
1457-test4-Saving Weights in wg.1.wgt
1458-test4-Saving Probability Arrays in arr.1.arr
1459Examine datafile 'dimin.test' gave the following results:
1460Number of Features: 12
1461InputFormat       : C4.5
1462
1463
1464Starting to test, Testfile: dimin.test
1465Writing output in:          inc2.out
1466Algorithm     : IB1
1467Global metric : Value Difference, Prestored matrix
1468Deviant Feature Metrics:(none)
1469Size of value-matrix[1] = 168 Bytes
1470Size of value-matrix[2] = 968 Bytes
1471Size of value-matrix[3] = 968 Bytes
1472Size of value-matrix[4] = 168 Bytes
1473Size of value-matrix[5] = 168 Bytes
1474Size of value-matrix[6] = 1904 Bytes
1475Size of value-matrix[7] = 1904 Bytes
1476Size of value-matrix[8] = 504 Bytes
1477Size of value-matrix[9] = 104 Bytes
1478Size of value-matrix[10] = 2904 Bytes
1479Size of value-matrix[11] = 1728 Bytes
1480Size of value-matrix[12] = 1248 Bytes
1481Total Size of value-matrices 12736 Bytes
1482
1483Weighting     : GainRatio
1484
1485-test4-Tested:      1 @ Mon May 31 11:03:35 2010
1486-test4-Tested:      2 @ Mon May 31 11:03:35 2010
1487-test4-Tested:      3 @ Mon May 31 11:03:35 2010
1488-test4-Tested:      4 @ Mon May 31 11:03:35 2010
1489-test4-Tested:      5 @ Mon May 31 11:03:35 2010
1490-test4-Tested:      6 @ Mon May 31 11:03:35 2010
1491-test4-Tested:      7 @ Mon May 31 11:03:35 2010
1492-test4-Tested:      8 @ Mon May 31 11:03:35 2010
1493-test4-Tested:      9 @ Mon May 31 11:03:35 2010
1494-test4-Tested:     10 @ Mon May 31 11:03:35 2010
1495-test4-Tested:    100 @ Mon May 31 11:03:35 2010
1496-test4-Ready:     950 @ Mon May 31 11:03:35 2010
1497Seconds taken: 0.0866 (10965.92 p/s)
1498
1499overall accuracy:        0.964211  (916/950), of which 62 exact matches
1500There were 6 ties of which 6 (100.00%) were correctly resolved
1501-test4-Saving Weights in wg.2.wgt
1502-test4-Saving Probability Arrays in arr.2.arr
1503Examine datafile 'dimin.test' gave the following results:
1504Number of Features: 12
1505InputFormat       : C4.5
1506
1507
1508Starting to test, Testfile: dimin.test
1509Writing output in:          inc3.out
1510Algorithm     : IB1
1511Global metric : Value Difference, Prestored matrix
1512Deviant Feature Metrics:(none)
1513Size of value-matrix[1] = 168 Bytes
1514Size of value-matrix[2] = 968 Bytes
1515Size of value-matrix[3] = 968 Bytes
1516Size of value-matrix[4] = 168 Bytes
1517Size of value-matrix[5] = 168 Bytes
1518Size of value-matrix[6] = 1904 Bytes
1519Size of value-matrix[7] = 1904 Bytes
1520Size of value-matrix[8] = 504 Bytes
1521Size of value-matrix[9] = 104 Bytes
1522Size of value-matrix[10] = 2904 Bytes
1523Size of value-matrix[11] = 1728 Bytes
1524Size of value-matrix[12] = 1248 Bytes
1525Total Size of value-matrices 12736 Bytes
1526
1527Weighting     : GainRatio
1528
1529-test4-Tested:      1 @ Mon May 31 11:03:35 2010
1530-test4-Tested:      2 @ Mon May 31 11:03:35 2010
1531-test4-Tested:      3 @ Mon May 31 11:03:35 2010
1532-test4-Tested:      4 @ Mon May 31 11:03:35 2010
1533-test4-Tested:      5 @ Mon May 31 11:03:35 2010
1534-test4-Tested:      6 @ Mon May 31 11:03:35 2010
1535-test4-Tested:      7 @ Mon May 31 11:03:35 2010
1536-test4-Tested:      8 @ Mon May 31 11:03:35 2010
1537-test4-Tested:      9 @ Mon May 31 11:03:35 2010
1538-test4-Tested:     10 @ Mon May 31 11:03:35 2010
1539-test4-Tested:    100 @ Mon May 31 11:03:35 2010
1540-test4-Ready:     950 @ Mon May 31 11:03:35 2010
1541Seconds taken: 0.0740 (12844.09 p/s)
1542
1543overall accuracy:        0.964211  (916/950), of which 62 exact matches
1544There were 6 ties of which 6 (100.00%) were correctly resolved
1545-test4-Saving Weights in wg.3.wgt
1546-test4-Saving Probability Arrays in arr.3.arr
1547Examine datafile 'dimin.test' gave the following results:
1548Number of Features: 12
1549InputFormat       : C4.5
1550
1551
1552Starting to test, Testfile: dimin.test
1553Writing output in:          inc4.out
1554Algorithm     : IB1
1555Global metric : Value Difference, Prestored matrix
1556Deviant Feature Metrics:(none)
1557Size of value-matrix[1] = 168 Bytes
1558Size of value-matrix[2] = 968 Bytes
1559Size of value-matrix[3] = 968 Bytes
1560Size of value-matrix[4] = 168 Bytes
1561Size of value-matrix[5] = 168 Bytes
1562Size of value-matrix[6] = 1904 Bytes
1563Size of value-matrix[7] = 1904 Bytes
1564Size of value-matrix[8] = 504 Bytes
1565Size of value-matrix[9] = 104 Bytes
1566Size of value-matrix[10] = 2904 Bytes
1567Size of value-matrix[11] = 1728 Bytes
1568Size of value-matrix[12] = 1248 Bytes
1569Total Size of value-matrices 12736 Bytes
1570
1571Weighting     : GainRatio
1572
1573-test4-Tested:      1 @ Mon May 31 11:03:36 2010
1574-test4-Tested:      2 @ Mon May 31 11:03:36 2010
1575-test4-Tested:      3 @ Mon May 31 11:03:36 2010
1576-test4-Tested:      4 @ Mon May 31 11:03:36 2010
1577-test4-Tested:      5 @ Mon May 31 11:03:36 2010
1578-test4-Tested:      6 @ Mon May 31 11:03:36 2010
1579-test4-Tested:      7 @ Mon May 31 11:03:36 2010
1580-test4-Tested:      8 @ Mon May 31 11:03:36 2010
1581-test4-Tested:      9 @ Mon May 31 11:03:36 2010
1582-test4-Tested:     10 @ Mon May 31 11:03:36 2010
1583-test4-Tested:    100 @ Mon May 31 11:03:36 2010
1584-test4-Ready:     950 @ Mon May 31 11:03:36 2010
1585Seconds taken: 0.0727 (13075.49 p/s)
1586
1587overall accuracy:        0.964211  (916/950), of which 62 exact matches
1588There were 6 ties of which 6 (100.00%) were correctly resolved
1589-test4-Saving Weights in wg.4.wgt
1590-test4-Saving Probability Arrays in arr.4.arr
1591Examine datafile 'dimin.test' gave the following results:
1592Number of Features: 12
1593InputFormat       : C4.5
1594
1595
1596Starting to test, Testfile: dimin.test
1597Writing output in:          inc5.out
1598Algorithm     : IB1
1599Global metric : Value Difference, Prestored matrix
1600Deviant Feature Metrics:(none)
1601Size of value-matrix[1] = 168 Bytes
1602Size of value-matrix[2] = 968 Bytes
1603Size of value-matrix[3] = 968 Bytes
1604Size of value-matrix[4] = 168 Bytes
1605Size of value-matrix[5] = 168 Bytes
1606Size of value-matrix[6] = 1904 Bytes
1607Size of value-matrix[7] = 1904 Bytes
1608Size of value-matrix[8] = 504 Bytes
1609Size of value-matrix[9] = 104 Bytes
1610Size of value-matrix[10] = 2904 Bytes
1611Size of value-matrix[11] = 1728 Bytes
1612Size of value-matrix[12] = 1248 Bytes
1613Total Size of value-matrices 12736 Bytes
1614
1615Weighting     : GainRatio
1616
1617-test4-Tested:      1 @ Mon May 31 11:03:36 2010
1618-test4-Tested:      2 @ Mon May 31 11:03:36 2010
1619-test4-Tested:      3 @ Mon May 31 11:03:36 2010
1620-test4-Tested:      4 @ Mon May 31 11:03:36 2010
1621-test4-Tested:      5 @ Mon May 31 11:03:36 2010
1622-test4-Tested:      6 @ Mon May 31 11:03:36 2010
1623-test4-Tested:      7 @ Mon May 31 11:03:36 2010
1624-test4-Tested:      8 @ Mon May 31 11:03:36 2010
1625-test4-Tested:      9 @ Mon May 31 11:03:36 2010
1626-test4-Tested:     10 @ Mon May 31 11:03:36 2010
1627-test4-Tested:    100 @ Mon May 31 11:03:36 2010
1628-test4-Ready:     950 @ Mon May 31 11:03:36 2010
1629Seconds taken: 0.0732 (12975.31 p/s)
1630
1631overall accuracy:        0.964211  (916/950), of which 62 exact matches
1632There were 6 ties of which 6 (100.00%) were correctly resolved
1633-test4-Saving Weights in wg.5.wgt
1634-test4-Saving Probability Arrays in arr.5.arr
1635\end{verbatim}
1636\end{footnotesize}
1637\clearpage
1638
1639\subsection{example 5, {\tt api\_test5.cxx}}
1640
1641This program demonstrates the use of neighborSets to classify and
1642store results. It also demonstrates some neighborSet basics.
1643
1644\begin{footnotesize}
1645\begin{verbatim}
1646#include <iostream>
1647#include <string>
1648#include "TimblAPI.h"
1649
1650using std::endl;
1651using std::cout;
1652using std::string;
1653using namespace Timbl;
1654
1655int main(){
1656  TimblAPI *My_Experiment = new TimblAPI( "-a IB1 +vDI+DB+n +mM +k4 " ,
1657                                          "test5" );
1658  My_Experiment->Learn( "dimin.train" );
1659  {
1660    string line =  "=,=,=,=,+,k,e,=,-,r,@,l,T";
1661    const neighborSet *neighbours1 = My_Experiment->classifyNS( line );
1662    if ( neighbours1 ){
1663      cout << "Classify OK on " << line << endl;
1664      cout << neighbours1;
1665    } else
1666      cout << "Classify failed on " << line << endl;
1667    neighborSet neighbours2;
1668    line = "+,zw,A,rt,-,k,O,p,-,n,O,n,E";
1669    if ( My_Experiment->classifyNS( line, neighbours2 ) ){
1670      cout << "Classify OK on " << line << endl;
1671      cout << neighbours2;
1672    } else
1673      cout << "Classify failed on " << line << endl;
1674    line = "+,z,O,n,-,d,A,xs,-,=,A,rm,P";
1675    const neighborSet *neighbours3 = My_Experiment->classifyNS( line );
1676    if ( neighbours3 ){
1677      cout << "Classify OK on " << line << endl;
1678      cout << neighbours3;
1679    } else
1680      cout << "Classify failed on " << line << endl;
1681    neighborSet uit2;
1682    {
1683      neighborSet uit;
1684      uit.setShowDistance(true);
1685      uit.setShowDistribution(true);
1686      cout << " before first merge " << endl;
1687      cout << uit;
1688      uit.merge( *neighbours1 );
1689      cout << " after first merge " << endl;
1690      cout << uit;
1691      uit.merge( *neighbours3 );
1692      cout << " after second merge " << endl;
1693      cout << uit;
1694      uit.merge( neighbours2 );
1695      cout << " after third merge " << endl;
1696      cout << uit;
1697      uit.truncate( 3 );
1698      cout << " after truncate " << endl;
1699      cout << uit;
1700      cout << " test assignment" << endl;
1701      uit2 = *neighbours1;
1702    }
1703    cout << "assignment result: " << endl;
1704    cout << uit2;
1705    {
1706      cout << " test copy construction" << endl;
1707      neighborSet uit(uit2);
1708      cout << "result: " << endl;
1709      cout << uit;
1710    }
1711    cout << "almost done!" << endl;
1712  }
1713  delete My_Experiment;
1714  cout << "done!" << endl;
1715}
1716\end{verbatim}
1717\end{footnotesize}
1718
1719Its expected output is (without further comment):
1720
1721\begin{footnotesize}
1722\begin{verbatim}
1723Examine datafile 'dimin.train' gave the following results:
1724Number of Features: 12
1725InputFormat       : C4.5
1726
1727-test5-Phase 1: Reading Datafile: dimin.train
1728-test5-Start:          0 @ Mon May 31 11:03:36 2010
1729-test5-Finished:    2999 @ Mon May 31 11:03:36 2010
1730-test5-Calculating Entropy         Mon May 31 11:03:36 2010
1731Feature Permutation based on GainRatio/Values :
1732< 9, 5, 11, 1, 12, 7, 4, 3, 10, 8, 2, 6 >
1733-test5-Phase 2: Learning from Datafile: dimin.train
1734-test5-Start:          0 @ Mon May 31 11:03:36 2010
1735-test5-Finished:    2999 @ Mon May 31 11:03:36 2010
1736
1737Size of InstanceBase = 19231 Nodes, (769240 bytes), 49.77 % compression
1738Classify OK on =,=,=,=,+,k,e,=,-,r,@,l,T
1739# k=1 { T 1.00000 } 0.0000000000000
1740# k=2 { T 1.00000 } 0.0031862902473388
1741# k=3 { T 1.00000 } 0.0034182315118303
1742# k=4 { T 1.00000 } 0.0037433772844615
1743Classify OK on +,zw,A,rt,-,k,O,p,-,n,O,n,E
1744# k=1 { E 1.00000 } 0.0000000000000
1745# k=2 { E 1.00000 } 0.056667880327190
1746# k=3 { E 1.00000 } 0.062552636617742
1747# k=4 { E 1.00000 } 0.064423860361889
1748Classify OK on +,z,O,n,-,d,A,xs,-,=,A,rm,P
1749# k=1 { P 1.00000 } 0.059729836255170
1750# k=2 { P 1.00000 } 0.087740769132651
1751# k=3 { P 1.00000 } 0.088442788919723
1752# k=4 { P 1.00000 } 0.097058649951429
1753 before first merge
1754 after first merge
1755# k=1 { P 1.00000 } 0.059729836255170
1756# k=2 { P 1.00000 } 0.087740769132651
1757# k=3 { P 1.00000 } 0.088442788919723
1758# k=4 { P 1.00000 } 0.097058649951429
1759 after second merge
1760# k=1 { P 2.00000 } 0.059729836255170
1761# k=2 { P 2.00000 } 0.087740769132651
1762# k=3 { P 2.00000 } 0.088442788919723
1763# k=4 { P 2.00000 } 0.097058649951429
1764 after third merge
1765# k=1 { E 1.00000 } 0.0000000000000
1766# k=2 { E 1.00000 } 0.056667880327190
1767# k=3 { P 2.00000 } 0.059729836255170
1768# k=4 { E 1.00000 } 0.062552636617742
1769# k=5 { E 1.00000 } 0.064423860361889
1770# k=6 { P 2.00000 } 0.087740769132651
1771# k=7 { P 2.00000 } 0.088442788919723
1772# k=8 { P 2.00000 } 0.097058649951429
1773 after truncate
1774# k=1 { E 1.00000 } 0.0000000000000
1775# k=2 { E 1.00000 } 0.056667880327190
1776# k=3 { P 2.00000 } 0.059729836255170
1777 test assignment
1778assignment result:
1779# k=1 { P 1.00000 } 0.059729836255170
1780# k=2 { P 1.00000 } 0.087740769132651
1781# k=3 { P 1.00000 } 0.088442788919723
1782# k=4 { P 1.00000 } 0.097058649951429
1783 test copy construction
1784result:
1785# k=1 { P 1.00000 } 0.059729836255170
1786# k=2 { P 1.00000 } 0.087740769132651
1787# k=3 { P 1.00000 } 0.088442788919723
1788# k=4 { P 1.00000 } 0.097058649951429
1789almost done!
1790done!
1791\end{verbatim}
1792\end{footnotesize}
1793\clearpage
1794
1795\subsection{example 6, {\tt api\_test6.cxx}}
1796
1797This program demonstrates the use of ValueDistributions, TargetValues
1798an neighborSets for classification.
1799
1800\begin{footnotesize}
1801\begin{verbatim}
1802#include <iostream>
1803#include "TimblAPI.h"
1804
1805using std::cout;
1806using std::endl;
1807using namespace Timbl;
1808
1809int main(){
1810  TimblAPI My_Experiment( "-a IB1 +vDI+DB -k3", "test6" );
1811  My_Experiment.Learn( "dimin.train" );
1812  const ValueDistribution *vd;
1813  const TargetValue *tv
1814    = My_Experiment.Classify( "-,=,O,m,+,h,K,=,-,n,I,N,K", vd );
1815  cout << "resulting target: " << tv << endl;
1816  cout << "resulting Distribution: " << vd << endl;
1817  ValueDistribution::dist_iterator it=vd->begin();
1818  while ( it != vd->end() ){
1819    cout << it->second << " OR ";
1820    cout << it->second->Value() << " " << it->second->Weight() << endl;
1821    ++it;
1822  }
1823
1824  cout << "the same with neighborSets" << endl;
1825  const neighborSet *nb = My_Experiment.classifyNS( "-,=,O,m,+,h,K,=,-,n,I,N,K" );
1826  ValueDistribution *vd2 = nb->bestDistribution();
1827  cout << "default answer " << vd2 << endl;
1828  decayStruct *dc = new  expDecay(0.3);
1829  delete vd2;
1830  vd2 = nb->bestDistribution( dc );
1831  delete dc;
1832  cout << "with exponenial decay, alpha = 0.3 " << vd2 << endl;
1833  delete vd2;
1834}
1835\end{verbatim}
1836\end{footnotesize}
1837
1838This is the output produced:
1839
1840\begin{footnotesize}
1841\begin{verbatim}
1842Examine datafile 'dimin.train' gave the following results:
1843Number of Features: 12
1844InputFormat       : C4.5
1845
1846-test6-Phase 1: Reading Datafile: dimin.train
1847-test6-Start:          0 @ Mon May 31 11:03:36 2010
1848-test6-Finished:    2999 @ Mon May 31 11:03:36 2010
1849-test6-Calculating Entropy         Mon May 31 11:03:36 2010
1850Feature Permutation based on GainRatio/Values :
1851< 9, 5, 11, 1, 12, 7, 4, 3, 10, 8, 2, 6 >
1852-test6-Phase 2: Learning from Datafile: dimin.train
1853-test6-Start:          0 @ Mon May 31 11:03:36 2010
1854-test6-Finished:    2999 @ Mon May 31 11:03:36 2010
1855
1856Size of InstanceBase = 19231 Nodes, (769240 bytes), 49.77 % compression
1857resulting target: K
1858resulting Distribution: { E 1.00000, K 7.00000 }
1859E 1 OR E 1
1860K 7 OR K 7
1861the same with neighborSets
1862default answer { E 1.00000, K 7.00000 }
1863with exponenial decay, alpha = 0.3 { E 0.971556, K 6.69810 }
1864\end{verbatim}
1865\end{footnotesize}
1866
1867\end{document}
1868