1% TiMBL 6.3 API 2 3\documentclass{report} 4\usepackage{epsf} 5\usepackage{a4wide} 6\usepackage{palatino} 7\usepackage{fullname} 8\usepackage{url} 9 10\newcommand{\chisq}{{$ \chi^2 $}} 11 12\author{Ko van der Sloot\\ \ \\ Induction of Linguistic Knowledge\\ 13 Computational Linguistics\\ Tilburg University \\ \ \\ 14 P.O. Box 90153, NL-5000 LE, Tilburg, The Netherlands \\ URL: 15 http://ilk.uvt.nl} 16 17\title{{\huge TiMBL: Tilburg Memory-Based Learner} \\ \vspace*{0.5cm} 18{\bf version 6.3} \\ \vspace*{0.5cm}{\huge API Reference Guide}\\ 19\vspace*{1cm} {\it ILK Technical Report -- ILK 10-03}} 20 21%better paragraph indentation 22\parindent 0pt 23\parskip 9pt 24 25 26\begin{document} 27 28\maketitle 29 30\tableofcontents 31 32\chapter*{Preface} 33 34This is a brief description of the TimblAPI class, the application 35programming interface to the Timbl\footnote{\url{http://ilk.uvt.nl/timbl}} software package, and its main 36functions. For an introduction into Timbl, consult the Timbl Reference 37Guide \cite{Daelemans+10}. Although most of the API can be 38traced in the {\tt TimblAPI.h} file, the reverse is not true; some 39functions {\tt TimblAPI.h} are still ``work in progress'' and some others 40are artefacts to simplify the implementation of the TiMBL main 41program\footnote{Timbl.cxx is therefore {\em not} a good example of 42 how to use the API.}. 43 44To learn more about using the API, you should study programs such as 45{\tt classify.cxx}, {\tt tse.cxx}, and the examples given in this 46manual, which can all be found in the {\tt demos} directory of this 47distribution. As you can readily gather from these examples, the basic 48thing you need to do to get access to the TimblAPI functions is to 49include {\tt TimblAPI.h} in the program, and to include {\tt 50 libTimbl.a} in your linking path. 51 52{\bf Important note}: The described functions return a result (mostly 53a bool) to indicate succes or failure. To simplify the examples, we 54ignore these return values. This is, of course, bad practice, to be avoided in 55real life programming.\footnote{as stated by commandment 6 of ``The 56 Ten Commandments for C Programmers''' by Henry Spencer: 57 58If a function be advertised to return an error code in the event of 59difficulties, thou shalt check for that code, yea, even though the 60checks triple the size of thy code and produce aches in thy typing 61fingers, for if thou thinkest ``it cannot happen to me'', the gods 62shall surely punish thee for thy arrogance.} 63 64{\bf Warning}: Although the TiMBL internals perform some sanity 65checking, it is quite possible to combine API functions such 66that some undetermined state is reached, or even a conflict 67arises. The effect of the {\tt SetOptions()} function, for instance, 68might be quite surprising. If you have created your own program 69with the API it might be wise to test against well-know data to see if 70the results make sense. 71 72\chapter{Changes} 73\label{changes} 74 75\section{From version 6.2 to 6.3} 76 77No changes to the API are made for this release. This Manual is made 78up to date (preserving the beta-state). 79 80\section{From version 6.1 to 6.2} 81 82In version 6.2, some additional functions were added to the API: {\tt 83 matchDepth()}, {\tt matchedAtLeaf()}, {\tt WriteMatrices()}, {\tt 84 GetMatrices()} and {\tt ShowStatistics()}. These reflect the 85additional functionality of Timbl 6.2. The API is still experimental, 86and contains more functions than described in this manual. Using these 87`undocumented' features is, as usual, unwise. 88 89\section{From version 5.1 to 6.1} 90 91The major change in 6.0 is the introduction of the {\tt neighborSet} 92class, with some special Classify functions. We added Classify 93functions that deliver pointers into Timbl's internal data. This is 94fast, but dangerous. Also, a {\tt WriteInstanceBaseXml()} function is 95added, which comes in handy when you want to know more about the 96instance base. Two more examples demonstrating neighborSets and such 97are added in Appendix B. From version 6.0 to 6.1, the API has not changed. 98 99\section{From version 5.0 to 5.1} 100 101The API is quite stable at the moment. Most TiMBL changes did not 102affect the API. The only real API change is in the {\tt GetWeights()} 103function. (see the section on Storing and retrieving intermediate 104results). A few options were added to Timbl, influencing the table in 105Appendix A. We have also changed and enhanced the examples in Appendix 106B. 107 108\chapter{Quick-start} 109\section{Setting up an experiment} 110 111There is just one way to start a TiMBL experiment, which is to call 112the TimblAPI constructor: 113 114\begin{footnotesize} 115\begin{verbatim} 116 TimblAPI( const std::string& args, const std::string& name ="" ); 117\end{verbatim} 118\end{footnotesize} 119 120args is used as a "command line" and is parsed for all kind of options 121which are used to create the right kind of experiment with the desired 122settings for metric, weighting etc. If something is wrong with the 123settings, {\em no}\/ object is created. 124 125The most important option is {\tt -a} to set the kind of algorithm, 126e.g. {\tt -a IB1} to invoke an IB1 experiment or {\tt -a IGTREE} to invoke an IGTREE 127experiment. A list of possible options is give in Appendix A. 128 129The optional name can be useful if you have multiple experiments. 130In case of warnings or errors, this name is appended to the message. 131 132For example: 133 134\begin{footnotesize} 135\begin{verbatim} 136 TimblAPI *My_Experiment = new TimblAPI( "-a IGTREE +vDI+DB", 137 "test1" ); 138\end{verbatim} 139\end{footnotesize} 140 141{\tt My\_Experiment} is created as an IGTREE experiment with the name 142"test1", and the verbosity is set to DI+DB, meaning that the output 143will contain DIstance and DistriBution information. 144 145The counterpart to creation is the {\tt \~{ }TimblAPI()} destructor, 146which is called when you delete an experiment: 147 148\begin{footnotesize} 149\begin{verbatim} 150 delete My_Experiment; 151\end{verbatim} 152\end{footnotesize} 153 154\section{Running an experiment} 155 156Assuming that we have appropriate datafiles (such as the example files {\tt 157dimin.train} and {\tt dimin.test} in the TiMBL package), we can get 158started right away with the functions {\tt Learn()} and {\tt Test()}. 159 160\subsection{Training} 161\begin{footnotesize} 162\begin{verbatim} 163 bool Learn( const std::string& f ); 164\end{verbatim} 165\end{footnotesize} 166 167This function takes a file with name 'f', and gathers information 168such as: number of features, number and frequency of feature values and 169the same for class names. After that, these data are used to calculate 170a lot of statistical information, which will be used for 171testing. Finally, an InstanceBase is created, tuned to the current 172algorithm. 173 174\subsection{Testing} 175\begin{footnotesize} 176\begin{verbatim} 177 bool Test( const std::string& in, 178 const std::string& out, 179 const std::string& perc = "" ); 180\end{verbatim} 181\end{footnotesize} 182 183Test a file given by 'in' and write results to 'out'. If 'perc' is not 184empty, then a percentage score is written to file 'perc'. 185 186For example: 187 188\begin{footnotesize} 189\begin{verbatim} 190 My_Experiment->Learn( "dimin.train" ); 191 My_Experiment->Test( "dimin.test", "my_first_test" ); 192\end{verbatim} 193\end{footnotesize} 194 195An InstanceBase will be created from dimin.train, then dimin.test is 196tested against that InstanceBase and output is written to 197my\_first\_test. 198 199\subsection{Special cases of {\tt Learn()} and {\tt Test()}} 200 201There are special cases where {\tt Learn()} behaves differently: 202 203\begin{itemize} 204\item When the algorithm is IB2, {\tt Learn()} will automatically take 205 the first $n$ lines of f (set with the {\tt -b n} option) to 206 bootstrap itself, and then the rest of f for IB2-learning. After 207 Learning IB2, you can use {\tt Test()} as usual. 208 209\item When the algorithm is CV, {\tt Learn()} is not defined, and all 210 work is done in a special version of {\tt Test()}. 'f' is assumed to 211 give the name of a file, which, on separate lines, gives the names 212 of the files to be cross-validated. 213 214 Also, if {\em featureWeights}\/ or {\em probabilities}\/ are read from 215 user-defined datafiles, a special {\tt CVprepare()} function must be called, 216 to make the weigthing, weightFilename and probabilityFileName known to the 217{\tt Test()} function. 218 219See Appendix B for a complete CV example (program {\tt api\_test3}). 220 221%TODO: een voorbeeld met CVPrepare erbij! 222 223\end{itemize} 224 225\section{More about settings} 226 227After an experiment is set up with the TimblAPI constructor, many 228options can be changed "on the fly" with: 229 230\begin{footnotesize} 231\begin{verbatim} 232 bool SetOptions( const std::string& opts ); 233\end{verbatim} 234\end{footnotesize} 235 236Here, `opts' is interpreted as a list of option settings, just like in 237the TimblAPI constructor. When an error in the opts string is found, 238{\tt SetOptions()} returns false. Whether any options are really set 239or changed in that case is undefined. Note that a few options can only 240be set {\em once}\/ when creating the experiment, most notably the 241algorithm. Any attempt to change these options will result in a 242failure. See Appendix A for all valid options and information about 243the possibility to change them within a running experiment. 244 245Note: {\tt SetOptions()} is lazy; changes are cached until the 246moment they are really needed, so you can do several {\tt SetOptions()} 247calls with even different values for the same option. Only the last 248one seen will be used for running the experiment. 249 250To see which options are in effect, you can use the calls {\tt ShowOptions()} 251and {\tt ShowSettings()}. 252 253\begin{footnotesize} 254\begin{verbatim} 255 bool ShowOptions( std::ostream& ); 256\end{verbatim} 257\end{footnotesize} 258 259Shows all options with their possible and current values. 260 261\begin{footnotesize} 262\begin{verbatim} 263 bool ShowSettings( std::ostream& ); 264\end{verbatim} 265\end{footnotesize} 266 267Shows all options and their currect values. 268 269For example: 270 271\begin{footnotesize} 272\begin{verbatim} 273 My_Experiment->SetOptions( "-w2 -m:M" ); 274 My_Experiment->SetOptions( "-w3 -v:DB" ); 275 My_Experiment->ShowSettings( cout ) 276\end{verbatim} 277\end{footnotesize} 278 279See Appendix B (program {\tt api\_test1}) for the output. 280 281\section{Storing and retrieving intermediate results} 282 283To speed up testing, or to manipulate what is happening internally, we 284can store and retrieve several important parts of our experiment: The 285InstanceBase, the FeatureWeights, the ProbabilityArrays and the ValueDistance Matrices. 286 287Saving is done with: 288 289\begin{footnotesize} 290\begin{verbatim} 291 bool WriteInstanceBase( const std::string& f ); 292 bool SaveWeights( const std::string& f ); 293 bool WriteArrays( const std::string& f ); 294 bool WriteMatrices( const std::string& f ); 295\end{verbatim} 296\end{footnotesize} 297 298Retrieve with their counterparts: 299 300\begin{footnotesize} 301\begin{verbatim} 302 bool GetInstanceBase( const std::string& f ); 303 bool GetWeights( const std::string& f, Weighting w ); 304 bool GetArrays( const std::string& f ); 305 bool GetMatrices( const std::string& f ); 306\end{verbatim} 307\end{footnotesize} 308 309All use `f' as a filename for storing/retrieving. {\tt GetWeights} needs 310information to decide {\em which}\/ weighting to retrieve. 311Weighting is defined as the enumerated type: 312 313\begin{footnotesize} 314\begin{verbatim} 315 enum Weighting { UNKNOWN_W, UD, NW, GR, IG, X2, SV }; 316\end{verbatim} 317\end{footnotesize} 318 319Some notes: 320 321\begin{enumerate} 322\item The InstanceBase is stored in a internal format, with or without 323hashing, depending on the {\tt -H} option. The format is described in the 324TiMBL manual. Remember that it is a bad idea to edit this file in any way. 325\item {\tt GetWeights()} can be used to override the weights that 326{\tt Learn()} calculated. {\tt UNKNOWN\_W} should not be used. 327\item The Probability arrays are described in the TiMBL manual. They can be 328manipulated to tune the MVDM similarity metric. 329\end{enumerate} 330 331If you like you may dump the Instancebase in an XML format. No Retrieve 332function is available for this format. 333 334\begin{footnotesize} 335\begin{verbatim} 336 bool WriteInstanceBaseXml( const std::string& f ); 337\end{verbatim} 338\end{footnotesize} 339 340\chapter{Classify functions} 341 342\section{Classify functions: Elementary} 343After an experiment is trained with {\tt Learn()}, we do not have to use 344{\tt Test()} to do bulk-testing on a file. 345We can create our own tests with the {\tt Classify} functions: 346 347\begin{footnotesize} 348\begin{verbatim} 349 bool Classify( const std::string& Line, std::string& result ); 350 bool Classify( const std::string& Line, std::string& result, 351 double& distance ); 352 bool Classify( const std::string& Line, std::string& result, 353 std::string& Distrib, double& distance ); 354\end{verbatim} 355\end{footnotesize} 356 357Results are stored in 'result' (the assigned class). 'distance' will 358get the calculated distance, and 'Distrib' the distribution at 359'distance' which is used to calculate 'result'. Distrib will be a 360string like ``\{ NP 2, PP 6 \}''. It is up to you to parse and 361interpret this. (In this case: There were 8 classes assigned at 362'distance', 2 NP's and 6 PP's, giving a 'result' of ``PP''.) 363 364If you want to perform analyses on these distributions, it might be a 365good idea to read the next section about the other range of Classify() 366functions. 367 368A main disadvantage compared to using {\tt Test()} is that {\tt 369 Test()} is optimized. {\tt Classify()} has to test for sanity of 370its input and also whether a {\tt SetOptions()} has been 371performed. This slows down the process. 372 373A good example of the use of {\tt Classify()} is the {\tt 374 classify.cxx} program in the TiMBL Distribution. 375 376Depending on the Algorithm and Verbosity setting, it may be possible 377to get some extra information on the details of each classification 378using: 379 380\begin{footnotesize} 381\begin{verbatim} 382 const bool ShowBestNeighbors( std::ostream& os, bool distr ) const; 383\end{verbatim} 384\end{footnotesize} 385 386Provided that the option {\tt +v n} or {\tt +v k} is set and we use 387IB1 or IB2, output is produced similar to what we see in the TiMBL 388program. When 'distr' is true, their distributions are also 389displayed. Bear in mind: The {\tt +vn} option is expensive in time 390and memory and does not work for IGTREE, TRIBL, and TRIBL2. 391 392Two other functions provide the results as given by the {\tt +vmd} verbosity 393option: 394 395\begin{footnotesize} 396\begin{verbatim} 397 size_t matchDepth() const; 398 bool matchedAtLeaf() const; 399\end{verbatim} 400\end{footnotesize} 401 402The first returns the matching Depth in the InstanceBase; the second 403flags whether it was a Leaf or a Non-Terminal Node. 404 405\section{Classify functions: Advanced} 406 407A faster, but more dangerous version of Classify is also available. 408It is faster because it returns pointers into Timbl's internal 409datastructures. It is dangerous because it returns pointers into 410Timbl's internal datastructures (using 'const' pointers, so it is 411fortunately difficult te really damage Timbl) 412 413\begin{footnotesize} 414\begin{verbatim} 415 const TargetValue *Classify( const std::string& ); 416 const TargetValue *Classify( const std::string&, 417 const ValueDistribution *& ); 418 const TargetValue *Classify( const std::string&, double& ); 419 const TargetValue *Classify( const std::string&, 420 const ValueDistribution *&, 421 double& ); 422\end{verbatim} 423\end{footnotesize} 424 425A ValueDistribution is a list-like object (but it is not a real list!) 426that contains TargetValues objects and weights. It is the result of 427combining all nearest neighbors and applying the desired weightings. 428Timbl chooses a best TargetValue from this ValueDistribution and the 429Classify functions return that as their main result. 430 431{\bf Important}: Because these functions return pointers into Timbl's 432internal representation, the results are only valid until the next 433Classify function is called (or the experiment is deleted). 434 435Both the TargetValue and ValueDistribution objects have output 436operators defined, so you can print them. TargetValue also has a {\tt 437 Name()} function, which returns a std::string so you can collect 438results. ValueDistribution has an iterator-like interface which makes 439it possible to walk through the Distribution. 440 441An iterator on a {\tt ValueDistribution *vd} is created like this: 442\begin{footnotesize} 443\begin{verbatim} 444 ValueDistribution::dist_iterator it=vd->begin(); 445\end{verbatim} 446\end{footnotesize} 447 448Unfortunately, the iterator cannot be printed or used directly. 449It walks through a map-like structure with pairs of values, of which 450only the {\tt second} part is of interest to you. 451You may print it, or extract its {\tt Value()} (which happens to be a 452TargetValue pointer) or extract its {\tt Weight()}, which is a {\tt double}. 453 454Like this: 455\begin{footnotesize} 456\begin{verbatim} 457 while ( it != vd->end() ){ 458 cout << it->second << " has a value: "; 459 cout << it->second->Value() << " an a weight of " 460 << it->second->Weight() << endl; 461 ++it; 462 } 463\end{verbatim} 464\end{footnotesize} 465 466Printing {\tt it->second} is the same as printing the 467TargetValue plus its Weight. 468 469In the {\em demos}\/ directory you will find a complete example in api\_test6. 470 471{\bf Warning}: it is possible to search the Timbl code for the 472internal representation of the TargetValue and ValueDistribution 473objects, but please DON'T DO THAT. The representation might change 474between Timbl versions. 475 476\section{Classify functions: neighborSets} 477 478A more flexible way of classifying is to use one of these functions: 479 480\begin{footnotesize} 481\begin{verbatim} 482 const neighborSet *classifyNS( const std::string& ); 483 bool classifyNS( const std::string&, neighborSet& ); 484\end{verbatim} 485\end{footnotesize} 486 487The first function will classify an instance and return a pointer to a 488{\tt neighborSet} object. This object may be seen as an container 489which holds both distances and distributions up to a certain depth, 490(which is {\em at least}\/ the number of neighbors (-k option) that 491was used for the classifying task.) It is a const object, so you 492cannot directly manipulate its internals, but there are some 493functions defined to get useful information out of the neighborSet. 494 495Important: The neighborSet {\em will be overwritten}\/ on the next 496call to any of the classify functions. Be sure to get all the 497results out before that happens. 498 499To make life easy, a second variant can be used, which fills a 500neighborSet object that you provide (the same could be achieved by a 501copy of the result of the first function). 502 503{\bf Note}: NeighborSets can be large, and copying therefore 504expensive, so you should only do this if you really have to. 505 506\subsection{How to get results from a neighborSet} 507 508No metric functions (such as exponential decay and the like) are 509performed on the neighborSet. You are free to insert your own metrics, or 510use Timbls built-in metrics. 511 512\begin{footnotesize} 513\begin{verbatim} 514 double getDistance( size_t n ) const; 515 double bestDistance() const; 516 const ValueDistribution *getDistribution( size_t n ) const; 517 ValueDistribution *bestDistribution( const decayStruct * ds=0, 518 size_t n=0 ) const ; 519\end{verbatim} 520\end{footnotesize} 521 522{\tt getDistance( n )} will return the distance of the neighbor(s) at n. 523{\tt bestDistance()} is simply {\tt getDistance(0)}. 524 525{\tt getDistribution( n )} will return the distribution of neighbor(s) at 526n. 527 528{\tt bestDistribution()} will return the Weighted distribution 529calculated using the first n elements in the container and a metric 530specified by the {\tt decayStruct}. The default n=0, means: use the 531whole container. An empty decay struct means zeroDecay. 532 533The returned ValueDistribution object is handed to you, and you are 534responsible for deleting it after using it (see the previous section 535for more details about ValueDistributions). 536 537A decayStruct is one of: 538 539\begin{footnotesize} 540\begin{verbatim} 541 class zeroDecay(); 542 class invLinDecay(); 543 class invDistDecay(); 544 class expDecay( double alpha ); 545 class expDecay( double alpha, double beta ); 546\end{verbatim} 547\end{footnotesize} 548 549For example, to get a ValueDistribution form a neighborSet {\tt nb}, using 5503 neighbors and exponential decay with alpha=0.3, you can do: 551 552\begin{footnotesize} 553\begin{verbatim} 554 decayStruct *dc = new expDecay(0.3); 555 ValueDistribution *vd = nb->bestDistribution( dc, 3 ); 556\end{verbatim} 557\end{footnotesize} 558 559 560\subsection{Useful operations on neighborSet objects} 561 562You can print neighborSet objects: 563 564\begin{footnotesize} 565\begin{verbatim} 566 std::ostream& operator<<( std::ostream&, const neighborSet& ); 567 std::ostream& operator<<( std::ostream&, const neighborSet * ); 568\end{verbatim} 569\end{footnotesize} 570 571You may create a neighborSet yourself, and assign and delete them: 572 573\begin{footnotesize} 574\begin{verbatim} 575 neighborSet(); 576 neighborSet( const neighborSet& ); 577 neighborSet& operator=( const neighborSet& ); 578 ~neighborSet(); 579\end{verbatim} 580\end{footnotesize} 581 582If you create an neighborSet, you might want to reserve space for it, 583to avoid needless reallocations. Also it can be cleared, and you can 584ask the size (just like with normal containers): 585 586\begin{footnotesize} 587\begin{verbatim} 588 void reserve( size_t ); 589 void clear(); 590 size_t size() const; 591\end{verbatim} 592\end{footnotesize} 593 594Two neighborSets can be merged: 595 596\begin{footnotesize} 597\begin{verbatim} 598 void merge( const neighborSet& ); 599\end{verbatim} 600\end{footnotesize} 601 602A neighborSet can be truncated at a certain level. This is useful 603after merging neighborSets. Merging sets with depth k and n will 604result in a set with a depth somewhere within the range $[max(k,n), k+n]$. 605 606\begin{footnotesize} 607\begin{verbatim} 608 void truncate( size_t ); 609\end{verbatim} 610\end{footnotesize} 611 612\chapter{Advanced Functions} 613 614\section{Modifying the InstanceBase} 615 616The instanceBase can be modified with the functions: 617 618\begin{footnotesize} 619\begin{verbatim} 620 bool Increment( const std::string& Line ); 621 bool Decrement( const std::string& Line ); 622\end{verbatim} 623\end{footnotesize} 624 625These functions add an Instance (as described by Line) to the 626InstanceBase, or remove it. This can only be done for IB1-like 627experiments (IB1, IB2, CV and LOO), and enforces a lot of 628statistical recalculations. 629 630More sophisticated are: 631 632\begin{footnotesize} 633\begin{verbatim} 634 bool Expand( const std::string& File ); 635 bool Remove( const std::string& File ); 636\end{verbatim} 637\end{footnotesize} 638 639which use the contents of File to do a bulk of Increments or Decrements, and 640recalculate afterwards. 641 642\section{Getting more information out of Timbl} 643 644There are a few convenience functions to get extra information on 645TiMBL and its behaviour: 646 647\begin{footnotesize} 648\begin{verbatim} 649 bool WriteNamesFile( const std::string& f ); 650\end{verbatim} 651\end{footnotesize} 652 653Create a file which resembles a C4.5 namesfile. 654 655\begin{footnotesize} 656\begin{verbatim} 657 Algorithm Algo() 658\end{verbatim} 659\end{footnotesize} 660 661Give the current algorithm as a type enum Algorithm. First, the 662declaration of the Algorithm type: 663 664\begin{footnotesize} 665\begin{verbatim} 666 enum Algorithm { UNKNOWN_ALG, IB1, IB2, IGTREE, 667 TRIBL, TRIBL2, LOO, CV }; 668\end{verbatim} 669\end{footnotesize} 670 671This can be printed with the helper function: 672 673\begin{footnotesize} 674\begin{verbatim} 675 const std::string to_string( const Algorithm ) 676\end{verbatim} 677\end{footnotesize} 678 679\begin{footnotesize} 680\begin{verbatim} 681 Weighting CurrentWeighting() 682\end{verbatim} 683\end{footnotesize} 684 685Gives the current weighting as a type enum Weighting. 686 687Declaration of Weighting: 688 689\begin{footnotesize} 690\begin{verbatim} 691 enum Weighting { UNKNOWN_W, UD, NW, GR, IG, X2, SV }; 692\end{verbatim} 693\end{footnotesize} 694 695This can be printed with the helper function: 696 697\begin{footnotesize} 698\begin{verbatim} 699 const std::string to_string( const Weighting ) 700\end{verbatim} 701\end{footnotesize} 702 703 704\begin{footnotesize} 705\begin{verbatim} 706 Weighting CurrentWeightings( std::vector<double>& v ) 707\end{verbatim} 708\end{footnotesize} 709 710Returns the current weighting as a type enum Weighting and also a 711vector v with all the current values of this weighting. 712 713\begin{footnotesize} 714\begin{verbatim} 715 std::string& ExpName() 716\end{verbatim} 717\end{footnotesize} 718 719Returns the value of 'name' given at the construction of the experiment 720 721\begin{footnotesize} 722\begin{verbatim} 723 static std::string VersionInfo( bool full = false ) 724\end{verbatim} 725\end{footnotesize} 726 727Returns a string containing the Version number, the Revision and the 728Revision string of the current API implementation. If full is true, 729also information about the date and time of compilation is included. 730 731\chapter{Server mode} 732\label{Using TiMBL as a Server} 733 734\begin{footnotesize} 735\begin{verbatim} 736 bool StartServer( const int port, const int max_c ); 737\end{verbatim} 738\end{footnotesize} 739 740Starts a TimblServer on 'port' with maximally 'max\_c' concurrent 741connections to it. Starting a server makes sense only after the 742experiment is trained. 743 744\clearpage 745\chapter{Annotated example programs} 746 747\subsection{example 1, {\tt api\_test1.cxx}} 748\begin{footnotesize} 749\begin{verbatim} 750#include "TimblAPI.h" 751int main(){ 752 TimblAPI My_Experiment( "-a IGTREE +vDI+DB+F", "test1" ); 753 My_Experiment.SetOptions( "-w3 -vDB" ); 754 My_Experiment.ShowSettings( std::cout ); 755 My_Experiment.Learn( "dimin.train" ); 756 My_Experiment.Test( "dimin.test", "my_first_test.out" ); 757 My_Experiment.SetOptions( "-mM" ); 758 My_Experiment.Test( "dimin.test", "my_first_test.out" ); 759} 760\end{verbatim} 761\end{footnotesize} 762 763 764Output: 765\begin{footnotesize} 766\begin{verbatim} 767Current Experiment Settings : 768FLENGTH : 0 769MAXBESTS : 500 770TRIBL_OFFSET : 0 771INPUTFORMAT : Unknown 772TREE_ORDER : Unknown 773ALL_WEIGHTS : false 774WEIGHTING : x2 [Note 1] 775BIN_SIZE : 20 776IB2_OFFSET : 0 777KEEP_DISTRIBUTIONS : false 778DO_SLOPPY_LOO : false 779TARGET_POS : 18446744073709551615 780DO_SILLY : false 781DO_DIVERSIFY : false 782DECAY : Z 783SEED : -1 784BEAM_SIZE : 0 785DECAYPARAM_A : 1.00000 786DECAYPARAM_B : 1.00000 787NORMALISATION : None 788NORMFACTOR : 1.00000 789EXEMPLAR_WEIGHTS : false 790IGNORE_EXEMPLAR_WEIGHTS : true 791NO_EXEMPLAR_WEIGHTS_TEST : true 792VERBOSITY : F+DI [Note 2] 793EXACT_MATCH : false 794HASHED_TREE : true 795GLOBAL_METRIC : O 796METRICS : 797MVD_LIMIT : 1 798NEIGHBORS : 1 799PROGRESS : 100000 800CLIP_FACTOR : 10 801 802Examine datafile 'dimin.train' gave the following results: 803Number of Features: 12 804InputFormat : C4.5 805 806-test1-Phase 1: Reading Datafile: dimin.train 807-test1-Start: 0 @ Mon May 31 11:03:34 2010 808-test1-Finished: 2999 @ Mon May 31 11:03:34 2010 809-test1-Calculating Entropy Mon May 31 11:03:34 2010 810Lines of data : 2999 811DB Entropy : 1.6178929 812Number of Classes : 5 813 814Feats Vals X-square Variance InfoGain GainRatio 815 1 3 128.41828 0.021410184 0.030971064 0.024891536 816 2 50 364.75812 0.030406645 0.060860038 0.027552191 817 3 19 212.29804 0.017697402 0.039562857 0.018676787 818 4 37 449.83823 0.037499019 0.052541227 0.052620750 819 5 3 288.87218 0.048161417 0.074523225 0.047699231 820 6 61 415.64113 0.034648310 0.10604433 0.024471911 821 7 20 501.33465 0.041791818 0.12348668 0.034953203 822 8 69 367.66021 0.030648567 0.097198760 0.043983864 823 9 2 169.36962 0.056475363 0.045752381 0.046816705 824 10 64 914.61906 0.076243669 0.21388759 0.042844587 825 11 18 2807.0418 0.23399815 0.66970458 0.18507018 826 12 43 7160.3682 0.59689631 1.2780762 0.32537181 827 828Feature Permutation based on Chi-Squared : 829< 12, 11, 10, 7, 4, 6, 8, 2, 5, 3, 9, 1 > 830-test1-Phase 2: Building index on Datafile: dimin.train 831-test1-Start: 0 @ Mon May 31 11:03:34 2010 832-test1-Finished: 2999 @ Mon May 31 11:03:34 2010 833-test1- 834Phase 3: Learning from Datafile: dimin.train 835-test1-Start: 0 @ Mon May 31 11:03:34 2010 836-test1-Finished: 2999 @ Mon May 31 11:03:34 2010 837 838Size of InstanceBase = 148 Nodes, (5920 bytes), 99.61 % compression 839Examine datafile 'dimin.test' gave the following results: 840Number of Features: 12 841InputFormat : C4.5 842 843 844Starting to test, Testfile: dimin.test 845Writing output in: my_first_test.out 846Algorithm : IGTree 847Weighting : Chi-square 848Feature 1 : 128.418283576224439 849Feature 2 : 364.758115277811896 850Feature 3 : 212.298037236345095 851Feature 4 : 449.838231470681876 852Feature 5 : 288.872176256387263 853Feature 6 : 415.641126446691771 854Feature 7 : 501.334653478280984 855Feature 8 : 367.660212489714240 856Feature 9 : 169.369615106487458 857Feature 10 : 914.619058199288816 858Feature 11 : 2807.041753278295346 859Feature 12 : 7160.368151902808677 860 861-test1-Tested: 1 @ Mon May 31 11:03:34 2010 862-test1-Tested: 2 @ Mon May 31 11:03:34 2010 863-test1-Tested: 3 @ Mon May 31 11:03:34 2010 864-test1-Tested: 4 @ Mon May 31 11:03:34 2010 865-test1-Tested: 5 @ Mon May 31 11:03:34 2010 866-test1-Tested: 6 @ Mon May 31 11:03:34 2010 867-test1-Tested: 7 @ Mon May 31 11:03:34 2010 868-test1-Tested: 8 @ Mon May 31 11:03:34 2010 869-test1-Tested: 9 @ Mon May 31 11:03:34 2010 870-test1-Tested: 10 @ Mon May 31 11:03:34 2010 871-test1-Tested: 100 @ Mon May 31 11:03:34 2010 872-test1-Ready: 950 @ Mon May 31 11:03:34 2010 873Seconds taken: 0.1331 (7135.13 p/s) 874 875overall accuracy: 0.962105 (914/950) 876Examine datafile 'dimin.test' gave the following results: 877Number of Features: 12 878InputFormat : C4.5 879 880Warning:-test1-Metric must be Overlap for IGTree test. [Note 3] 881 882\end{verbatim} 883\end{footnotesize} 884 885 886Notes: 887\begin{enumerate} 888\item The {\tt -w2} of the first {\tt SetOptions()} is overruled with 889 {\tt -w3} from the second {\tt SetOptions()}, resulting in a 890 weighting of 3 or Chi-Square. 891\item The first {\tt SetOptions()} sets the verbosity with {\tt +F+DI+DB}. 892The second {\tt SetOptions()}, however, sets the verbosity with {\tt -vDB}, and the resulting verbosity is therefore {\tt F+DI}. 893\item Due to the second {\tt SetOptions()}, the default metric is set to 894MVDM --- this is however not applicable to IGTREE. This raises a warning 895when we start to test. 896\end{enumerate} 897 898Result in my\_first\_test.out (first 20 lines): 899\begin{footnotesize} 900\begin{verbatim} 901=,=,=,=,=,=,=,=,+,p,e,=,T,T 6619.8512628162 902=,=,=,=,+,k,u,=,-,bl,u,m,E,P 2396.8557978603 903+,m,I,=,-,d,A,G,-,d,},t,J,J 6619.8512628162 904-,t,@,=,-,l,|,=,-,G,@,n,T,T 6619.8512628162 905-,=,I,n,-,str,y,=,+,m,E,nt,J,J 6619.8512628162 906=,=,=,=,=,=,=,=,+,br,L,t,J,J 6619.8512628162 907=,=,=,=,+,zw,A,=,-,m,@,r,T,T 6619.8512628162 908=,=,=,=,-,f,u,=,+,dr,a,l,T,T 6619.8512628162 909=,=,=,=,=,=,=,=,+,l,e,w,T,T 13780.219414719 910=,=,=,=,+,tr,K,N,-,k,a,rt,J,J 6619.8512628162 911=,=,=,=,+,=,o,=,-,p,u,=,T,T 3812.8095095379 912=,=,=,=,=,=,=,=,+,l,A,m,E,E 3812.8095095379 913=,=,=,=,=,=,=,=,+,l,A,p,J,J 6619.8512628162 914=,=,=,=,=,=,=,=,+,sx,E,lm,P,P 6619.8512628162 915+,l,a,=,-,d,@,=,-,k,A,st,J,J 6619.8512628162 916-,s,i,=,-,f,E,r,-,st,O,k,J,J 6619.8512628162 917=,=,=,=,=,=,=,=,+,sp,a,n,T,T 6619.8512628162 918=,=,=,=,=,=,=,=,+,st,o,t,J,J 6619.8512628162 919=,=,=,=,+,sp,a,r,-,b,u,k,J,J 6619.8512628162 920+,h,I,N,-,k,@,l,-,bl,O,k,J,J 6619.8512628162 921\end{verbatim} 922\end{footnotesize} 923\clearpage 924 925\subsection{example 2, {\tt api\_test2.cxx}} 926 927This demonstrates IB2 learning. Our example program: 928 929\begin{footnotesize} 930\begin{verbatim} 931#include "TimblAPI.h" 932int main(){ 933 TimblAPI *My_Experiment = new TimblAPI( "-a IB2 +vF+DI+DB" , 934 "test2" ); 935 My_Experiment->SetOptions( "-b100" ); 936 My_Experiment->ShowSettings( std::cout ); 937 My_Experiment->Learn( "dimin.train" ); 938 My_Experiment->Test( "dimin.test", "my_second_test.out" ); 939 delete My_Experiment; 940 exit(1); 941} 942\end{verbatim} 943\end{footnotesize} 944 945We create an experiment for the IB2 algorithm, with the {\tt -b} option set 946to 100, so the first 100 lines of {\tt dimin.train} will be used to 947bootstrap the learning, as we can see from the output: 948 949\begin{footnotesize} 950\begin{verbatim} 951Current Experiment Settings : 952FLENGTH : 0 953MAXBESTS : 500 954TRIBL_OFFSET : 0 955INPUTFORMAT : Unknown 956TREE_ORDER : G/V 957ALL_WEIGHTS : false 958WEIGHTING : gr 959BIN_SIZE : 20 960IB2_OFFSET : 100 961KEEP_DISTRIBUTIONS : false 962DO_SLOPPY_LOO : false 963TARGET_POS : 4294967295 964DO_SILLY : false 965DO_DIVERSIFY : false 966DECAY : Z 967SEED : -1 968BEAM_SIZE : 0 969DECAYPARAM_A : 1.00000 970DECAYPARAM_B : 1.00000 971NORMALISATION : None 972NORM_FACTOR : 1.00000 973EXEMPLAR_WEIGHTS : false 974IGNORE_EXEMPLAR_WEIGHTS : true 975NO_EXEMPLAR_WEIGHTS_TEST : true 976VERBOSITY : F+DI+DB 977EXACT_MATCH : false 978HASHED_TREE : true 979GLOBAL_METRIC : O 980METRICS : 981MVD_LIMIT : 1 982NEIGHBORS : 1 983PROGRESS : 100000 984CLIP_FACTOR : 10 985 986Examine datafile 'dimin.train' gave the following results: 987Number of Features: 12 988InputFormat : C4.5 989 990-test2-Phase 1: Reading Datafile: dimin.train 991-test2-Start: 0 @ Mon May 31 11:03:34 2010 992-test2-Finished: 2999 @ Mon May 31 11:03:34 2010 993-test2-Calculating Entropy Mon May 31 11:03:34 2010 994Lines of data : 2999 [Note 1] 995DB Entropy : 1.6178929 996Number of Classes : 5 997 998Feats Vals InfoGain GainRatio 999 1 3 0.030971064 0.024891536 1000 2 50 0.060860038 0.027552191 1001 3 19 0.039562857 0.018676787 1002 4 37 0.052541227 0.052620750 1003 5 3 0.074523225 0.047699231 1004 6 61 0.10604433 0.024471911 1005 7 20 0.12348668 0.034953203 1006 8 69 0.097198760 0.043983864 1007 9 2 0.045752381 0.046816705 1008 10 64 0.21388759 0.042844587 1009 11 18 0.66970458 0.18507018 1010 12 43 1.2780762 0.32537181 1011 1012Feature Permutation based on GainRatio/Values : 1013< 9, 5, 11, 1, 12, 7, 4, 3, 10, 8, 2, 6 > 1014-test2-Phase 2: Learning from Datafile: dimin.train 1015-test2-Start: 0 @ Mon May 31 11:03:34 2010 1016-test2-Finished: 100 @ Mon May 31 11:03:34 2010 1017 1018Size of InstanceBase = 954 Nodes, (38160 bytes), 26.62 % compression 1019-test2-Phase 2: Appending from Datafile: dimin.train (starting at line 101) 1020-test2-Start: 101 @ Mon May 31 11:03:34 2010 1021-test2-Learning: 101 @ Mon May 31 11:03:34 2010 added:0 1022-test2-Learning: 102 @ Mon May 31 11:03:34 2010 added:0 1023-test2-Learning: 103 @ Mon May 31 11:03:34 2010 added:0 1024-test2-Learning: 104 @ Mon May 31 11:03:34 2010 added:0 1025-test2-Learning: 105 @ Mon May 31 11:03:34 2010 added:0 1026-test2-Learning: 106 @ Mon May 31 11:03:34 2010 added:0 1027-test2-Learning: 107 @ Mon May 31 11:03:34 2010 added:0 1028-test2-Learning: 108 @ Mon May 31 11:03:34 2010 added:0 1029-test2-Learning: 109 @ Mon May 31 11:03:34 2010 added:0 1030-test2-Learning: 110 @ Mon May 31 11:03:34 2010 added:0 1031-test2-Learning: 200 @ Mon May 31 11:03:34 2010 added:9 1032-test2-Learning: 1100 @ Mon May 31 11:03:34 2010 added:66 1033-test2-Finished: 2999 @ Mon May 31 11:03:35 2010 1034 1035in total added 173 new entries [Note 2] 1036 1037Size of InstanceBase = 2232 Nodes, (89280 bytes), 32.40 % compression 1038DB Entropy : 1.61789286 1039Number of Classes : 5 1040 1041Feats Vals InfoGain GainRatio 1042 1 3 0.03097106 0.02489154 1043 2 50 0.06086004 0.02755219 1044 3 19 0.03956286 0.01867679 1045 4 37 0.05254123 0.05262075 1046 5 3 0.07452322 0.04769923 1047 6 61 0.10604433 0.02447191 1048 7 20 0.12348668 0.03495320 1049 8 69 0.09719876 0.04398386 1050 9 2 0.04575238 0.04681670 1051 10 64 0.21388759 0.04284459 1052 11 18 0.66970458 0.18507018 1053 12 43 1.27807625 0.32537181 1054 1055Examine datafile 'dimin.test' gave the following results: 1056Number of Features: 12 1057InputFormat : C4.5 1058 1059 1060Starting to test, Testfile: dimin.test 1061Writing output in: my_second_test.out 1062Algorithm : IB2 1063Global metric : Overlap 1064Deviant Feature Metrics:(none) 1065Weighting : GainRatio 1066Feature 1 : 0.026241147173103 1067Feature 2 : 0.030918769841214 1068Feature 3 : 0.021445836516602 1069Feature 4 : 0.056561885447060 1070Feature 5 : 0.048311436541460 1071Feature 6 : 0.027043360641622 1072Feature 7 : 0.037453180788027 1073Feature 8 : 0.044999091421718 1074Feature 9 : 0.048992032381874 1075Feature 10 : 0.044544230779268 1076Feature 11 : 0.185449683494634 1077Feature 12 : 0.324719540921155 1078 1079-test2-Tested: 1 @ Mon May 31 11:03:35 2010 1080-test2-Tested: 2 @ Mon May 31 11:03:35 2010 1081-test2-Tested: 3 @ Mon May 31 11:03:35 2010 1082-test2-Tested: 4 @ Mon May 31 11:03:35 2010 1083-test2-Tested: 5 @ Mon May 31 11:03:35 2010 1084-test2-Tested: 6 @ Mon May 31 11:03:35 2010 1085-test2-Tested: 7 @ Mon May 31 11:03:35 2010 1086-test2-Tested: 8 @ Mon May 31 11:03:35 2010 1087-test2-Tested: 9 @ Mon May 31 11:03:35 2010 1088-test2-Tested: 10 @ Mon May 31 11:03:35 2010 1089-test2-Tested: 100 @ Mon May 31 11:03:35 2010 1090-test2-Ready: 950 @ Mon May 31 11:03:35 2010 1091Seconds taken: 0.0456 (20826.48 p/s) 1092 1093overall accuracy: 0.941053 (894/950), of which 15 exact matches 1094 [Note 3] 1095There were 43 ties of which 32 (74.42%) were correctly resolved 1096\end{verbatim} 1097\end{footnotesize} 1098 1099 1100Notes: 1101\begin{enumerate} 1102\item IB2 is bootstrapped with 100 lines, but for the statistics all 2999 1103 lines are used. 1104\item As we see here, 173 entries from the input file had a mismatch, 1105and were therefore entered in the Instancebase. 1106\item We see that IB2 scores 94.11 \%, compared to 96.21 \% for IGTREE 1107 in our first example. For this data, IB2 is not a good 1108 algorithm. However, it saves a lot of space, and is faster than 1109 IB1. Yet, IGTREE is both faster and better. Had we used IB1, the 1110 score would have been 96.84 \%. 1111\end{enumerate} 1112\clearpage 1113 1114\subsection{example 3, {\tt api\_test3.cxx}} 1115 1116This demonstrates Cross Validation. Let's try the following program: 1117 1118\begin{footnotesize} 1119\begin{verbatim} 1120#include "TimblAPI.h" 1121using Timbl::TimblAPI; 1122 1123int main(){ 1124 TimblAPI *My_Experiment = new TimblAPI( "-t cross_validate" ); 1125 My_Experiment->Test( "cross_val.test" ); 1126 delete My_Experiment; 1127 exit(0); 1128} 1129\end{verbatim} 1130\end{footnotesize} 1131 1132This program creates an experiment, which defaults to IB1 and because of the 1133special option ``-t cross\_validate'' will start a CrossValidation 1134experiment.\\ 1135Learn() is not possible now. We must use a special form of Test(). 1136 1137``cross\_val.test'' is a file with the following content: 1138\begin{footnotesize} 1139\begin{verbatim} 1140small_1.train 1141small_2.train 1142small_3.train 1143small_4.train 1144small_5.train 1145\end{verbatim} 1146\end{footnotesize} 1147 1148 1149All these files contain an equal part of a bigger dataset, and 1150My\_Experiment will run a CrossValidation test between these files. 1151Note that output filenames are generated and that you cannot influence 1152that. 1153 1154The output of this program is: 1155 1156\begin{footnotesize} 1157\begin{verbatim} 1158Starting Cross validation test on files: 1159small_1.train 1160small_2.train 1161small_3.train 1162small_4.train 1163small_5.train 1164Examine datafile 'small_1.train' gave the following results: 1165Number of Features: 8 1166InputFormat : C4.5 1167 1168 1169Starting to test, Testfile: small_1.train 1170Writing output in: small_1.train.cv 1171Algorithm : CV 1172Global metric : Overlap 1173Deviant Feature Metrics:(none) 1174Weighting : GainRatio 1175 1176Tested: 1 @ Mon May 31 11:03:35 2010 1177Tested: 2 @ Mon May 31 11:03:35 2010 1178Tested: 3 @ Mon May 31 11:03:35 2010 1179Tested: 4 @ Mon May 31 11:03:35 2010 1180Tested: 5 @ Mon May 31 11:03:35 2010 1181Tested: 6 @ Mon May 31 11:03:35 2010 1182Tested: 7 @ Mon May 31 11:03:35 2010 1183Tested: 8 @ Mon May 31 11:03:35 2010 1184Tested: 9 @ Mon May 31 11:03:35 2010 1185Tested: 10 @ Mon May 31 11:03:35 2010 1186Ready: 10 @ Mon May 31 11:03:35 2010 1187Seconds taken: 0.0006 (16207.46 p/s) 1188 1189overall accuracy: 0.800000 (8/10) 1190Examine datafile 'small_2.train' gave the following results: 1191Number of Features: 8 1192InputFormat : C4.5 1193 1194 1195Starting to test, Testfile: small_2.train 1196Writing output in: small_2.train.cv 1197Algorithm : CV 1198Global metric : Overlap 1199Deviant Feature Metrics:(none) 1200Weighting : GainRatio 1201 1202Tested: 1 @ Mon May 31 11:03:35 2010 1203Tested: 2 @ Mon May 31 11:03:35 2010 1204Tested: 3 @ Mon May 31 11:03:35 2010 1205Tested: 4 @ Mon May 31 11:03:35 2010 1206Tested: 5 @ Mon May 31 11:03:35 2010 1207Tested: 6 @ Mon May 31 11:03:35 2010 1208Tested: 7 @ Mon May 31 11:03:35 2010 1209Tested: 8 @ Mon May 31 11:03:35 2010 1210Tested: 9 @ Mon May 31 11:03:35 2010 1211Tested: 10 @ Mon May 31 11:03:35 2010 1212Ready: 10 @ Mon May 31 11:03:35 2010 1213Seconds taken: 0.0005 (19646.37 p/s) 1214 1215overall accuracy: 0.800000 (8/10) 1216Examine datafile 'small_3.train' gave the following results: 1217Number of Features: 8 1218InputFormat : C4.5 1219 1220 1221Starting to test, Testfile: small_3.train 1222Writing output in: small_3.train.cv 1223Algorithm : CV 1224Global metric : Overlap 1225Deviant Feature Metrics:(none) 1226Weighting : GainRatio 1227 1228Tested: 1 @ Mon May 31 11:03:35 2010 1229Tested: 2 @ Mon May 31 11:03:35 2010 1230Tested: 3 @ Mon May 31 11:03:35 2010 1231Tested: 4 @ Mon May 31 11:03:35 2010 1232Tested: 5 @ Mon May 31 11:03:35 2010 1233Tested: 6 @ Mon May 31 11:03:35 2010 1234Tested: 7 @ Mon May 31 11:03:35 2010 1235Tested: 8 @ Mon May 31 11:03:35 2010 1236Tested: 9 @ Mon May 31 11:03:35 2010 1237Tested: 10 @ Mon May 31 11:03:35 2010 1238Ready: 10 @ Mon May 31 11:03:35 2010 1239Seconds taken: 0.0005 (20202.02 p/s) 1240 1241overall accuracy: 0.900000 (9/10) 1242Examine datafile 'small_4.train' gave the following results: 1243Number of Features: 8 1244InputFormat : C4.5 1245 1246 1247Starting to test, Testfile: small_4.train 1248Writing output in: small_4.train.cv 1249Algorithm : CV 1250Global metric : Overlap 1251Deviant Feature Metrics:(none) 1252Weighting : GainRatio 1253 1254Tested: 1 @ Mon May 31 11:03:35 2010 1255Tested: 2 @ Mon May 31 11:03:35 2010 1256Tested: 3 @ Mon May 31 11:03:35 2010 1257Tested: 4 @ Mon May 31 11:03:35 2010 1258Tested: 5 @ Mon May 31 11:03:35 2010 1259Tested: 6 @ Mon May 31 11:03:35 2010 1260Tested: 7 @ Mon May 31 11:03:35 2010 1261Tested: 8 @ Mon May 31 11:03:35 2010 1262Tested: 9 @ Mon May 31 11:03:35 2010 1263Tested: 10 @ Mon May 31 11:03:35 2010 1264Ready: 10 @ Mon May 31 11:03:35 2010 1265Seconds taken: 0.0005 (19880.72 p/s) 1266 1267overall accuracy: 0.800000 (8/10) 1268Examine datafile 'small_5.train' gave the following results: 1269Number of Features: 8 1270InputFormat : C4.5 1271 1272 1273Starting to test, Testfile: small_5.train 1274Writing output in: small_5.train.cv 1275Algorithm : CV 1276Global metric : Overlap 1277Deviant Feature Metrics:(none) 1278Weighting : GainRatio 1279 1280Tested: 1 @ Mon May 31 11:03:35 2010 1281Tested: 2 @ Mon May 31 11:03:35 2010 1282Tested: 3 @ Mon May 31 11:03:35 2010 1283Tested: 4 @ Mon May 31 11:03:35 2010 1284Tested: 5 @ Mon May 31 11:03:35 2010 1285Tested: 6 @ Mon May 31 11:03:35 2010 1286Tested: 7 @ Mon May 31 11:03:35 2010 1287Tested: 8 @ Mon May 31 11:03:35 2010 1288Ready: 8 @ Mon May 31 11:03:35 2010 1289Seconds taken: 0.0004 (19093.08 p/s) 1290 1291overall accuracy: 1.000000 (8/8) 1292\end{verbatim} 1293\end{footnotesize} 1294 1295 1296What has happened here? 1297 1298\begin{enumerate} 1299\item TiMBL trained itself with inputfiles small\_2.train through 1300small\_5.train. (in fact using the {\tt Expand()} API call. 1301\item Then TiMBL tested small\_1.train against the InstanceBase. 1302\item Next, small\_2.train is removed from the database (API call {\tt 1303Remove()} ) and small\_1.train is added. 1304\item Then small\_2.train is tested against the InstanceBase. 1305\item And so forth with small\_3.train $\ldots$ 1306\end{enumerate} 1307\clearpage 1308 1309\subsection{example 4, {\tt api\_test4.cxx}} 1310 1311This program demonstrates adding and deleting of the InstanceBase. It 1312also proves that weights are (re)calculated correctly each time (which 1313also explains why this is a time-consuming thing to do). After running 1314this program, wg.1.wgt should be equal to wg.5.wgt and wg.2.wgt equal to 1315wg.4.wgt . Important to note is also, that while we do not use a weighting 1316of X2 or SV here, only the ``simple'' weights are calculated and 1317stored. 1318 1319Further, arr.1.arr should be equal to arr.5.arr and arr.2.arr should be equal 1320to arr.4.arr 1321 1322First the program: 1323 1324\begin{footnotesize} 1325\begin{verbatim} 1326#include <iostream> 1327#include "TimblAPI.h" 1328 1329int main(){ 1330 TimblAPI *My_Experiment = new TimblAPI( "-a IB1 +vDI+DB +mM" , 1331 "test4" ); 1332 My_Experiment->ShowSettings( std::cout ); 1333 My_Experiment->Learn( "dimin.train" ); 1334 My_Experiment->Test( "dimin.test", "inc1.out" ); 1335 My_Experiment->SaveWeights( "wg.1.wgt" ); 1336 My_Experiment->WriteArrays( "arr.1.arr" ); 1337 My_Experiment->Increment( "=,=,=,=,+,k,e,=,-,r,@,l,T" ); 1338 My_Experiment->Test( "dimin.test", "inc2.out" ); 1339 My_Experiment->SaveWeights( "wg.2.wgt" ); 1340 My_Experiment->WriteArrays( "arr.2.arr" ); 1341 My_Experiment->Increment( "+,zw,A,rt,-,k,O,p,-,n,O,n,E" ); 1342 My_Experiment->Test( "dimin.test", "inc3.out" ); 1343 My_Experiment->SaveWeights( "wg.3.wgt" ); 1344 My_Experiment->WriteArrays( "arr.3.arr" ); 1345 My_Experiment->Decrement( "+,zw,A,rt,-,k,O,p,-,n,O,n,E" ); 1346 My_Experiment->Test( "dimin.test", "inc4.out" ); 1347 My_Experiment->SaveWeights( "wg.4.wgt" ); 1348 My_Experiment->WriteArrays( "arr.4.arr" ); 1349 My_Experiment->Decrement( "=,=,=,=,+,k,e,=,-,r,@,l,T" ); 1350 My_Experiment->Test( "dimin.test", "inc5.out" ); 1351 My_Experiment->SaveWeights( "wg.5.wgt" ); 1352 My_Experiment->WriteArrays( "arr.5.arr" ); 1353 delete My_Experiment; 1354 exit(1); 1355} 1356\end{verbatim} 1357\end{footnotesize} 1358 1359 1360This produces the following output: 1361 1362\begin{footnotesize} 1363\begin{verbatim} 1364Current Experiment Settings : 1365FLENGTH : 0 1366MAXBESTS : 500 1367TRIBL_OFFSET : 0 1368IG_THRESHOLD : 1000 1369INPUTFORMAT : Unknown 1370TREE_ORDER : G/V 1371ALL_WEIGHTS : false 1372WEIGHTING : gr 1373BIN_SIZE : 20 1374IB2_OFFSET : 0 1375KEEP_DISTRIBUTIONS : false 1376DO_SLOPPY_LOO : false 1377TARGET_POS : 18446744073709551615 1378DO_SILLY : false 1379DO_DIVERSIFY : false 1380DECAY : Z 1381SEED : -1 1382BEAM_SIZE : 0 1383DECAYPARAM_A : 1.00000 1384DECAYPARAM_B : 1.00000 1385NORMALISATION : None 1386NORM_FACTOR : 1.00000 1387EXEMPLAR_WEIGHTS : false 1388IGNORE_EXEMPLAR_WEIGHTS : true 1389NO_EXEMPLAR_WEIGHTS_TEST : true 1390VERBOSITY : DI+DB 1391EXACT_MATCH : false 1392HASHED_TREE : true 1393GLOBAL_METRIC : M 1394METRICS : 1395MVD_LIMIT : 1 1396NEIGHBORS : 1 1397PROGRESS : 100000 1398CLIP_FACTOR : 10 1399 1400Examine datafile 'dimin.train' gave the following results: 1401Number of Features: 12 1402InputFormat : C4.5 1403 1404-test4-Phase 1: Reading Datafile: dimin.train 1405-test4-Start: 0 @ Mon May 31 11:03:35 2010 1406-test4-Finished: 2999 @ Mon May 31 11:03:35 2010 1407-test4-Calculating Entropy Mon May 31 11:03:35 2010 1408Feature Permutation based on GainRatio/Values : 1409< 9, 5, 11, 1, 12, 7, 4, 3, 10, 8, 2, 6 > 1410-test4-Phase 2: Learning from Datafile: dimin.train 1411-test4-Start: 0 @ Mon May 31 11:03:35 2010 1412-test4-Finished: 2999 @ Mon May 31 11:03:35 2010 1413 1414Size of InstanceBase = 19231 Nodes, (769240 bytes), 49.77 % compression 1415Examine datafile 'dimin.test' gave the following results: 1416Number of Features: 12 1417InputFormat : C4.5 1418 1419 1420Starting to test, Testfile: dimin.test 1421Writing output in: inc1.out 1422Algorithm : IB1 1423Global metric : Value Difference, Prestored matrix 1424Deviant Feature Metrics:(none) 1425Size of value-matrix[1] = 168 Bytes 1426Size of value-matrix[2] = 968 Bytes 1427Size of value-matrix[3] = 968 Bytes 1428Size of value-matrix[4] = 168 Bytes 1429Size of value-matrix[5] = 168 Bytes 1430Size of value-matrix[6] = 1904 Bytes 1431Size of value-matrix[7] = 1904 Bytes 1432Size of value-matrix[8] = 504 Bytes 1433Size of value-matrix[9] = 104 Bytes 1434Size of value-matrix[10] = 2904 Bytes 1435Size of value-matrix[11] = 1728 Bytes 1436Size of value-matrix[12] = 1248 Bytes 1437Total Size of value-matrices 12736 Bytes 1438 1439Weighting : GainRatio 1440 1441-test4-Tested: 1 @ Mon May 31 11:03:35 2010 1442-test4-Tested: 2 @ Mon May 31 11:03:35 2010 1443-test4-Tested: 3 @ Mon May 31 11:03:35 2010 1444-test4-Tested: 4 @ Mon May 31 11:03:35 2010 1445-test4-Tested: 5 @ Mon May 31 11:03:35 2010 1446-test4-Tested: 6 @ Mon May 31 11:03:35 2010 1447-test4-Tested: 7 @ Mon May 31 11:03:35 2010 1448-test4-Tested: 8 @ Mon May 31 11:03:35 2010 1449-test4-Tested: 9 @ Mon May 31 11:03:35 2010 1450-test4-Tested: 10 @ Mon May 31 11:03:35 2010 1451-test4-Tested: 100 @ Mon May 31 11:03:35 2010 1452-test4-Ready: 950 @ Mon May 31 11:03:35 2010 1453Seconds taken: 0.0791 (12003.74 p/s) 1454 1455overall accuracy: 0.964211 (916/950), of which 62 exact matches 1456There were 6 ties of which 6 (100.00%) were correctly resolved 1457-test4-Saving Weights in wg.1.wgt 1458-test4-Saving Probability Arrays in arr.1.arr 1459Examine datafile 'dimin.test' gave the following results: 1460Number of Features: 12 1461InputFormat : C4.5 1462 1463 1464Starting to test, Testfile: dimin.test 1465Writing output in: inc2.out 1466Algorithm : IB1 1467Global metric : Value Difference, Prestored matrix 1468Deviant Feature Metrics:(none) 1469Size of value-matrix[1] = 168 Bytes 1470Size of value-matrix[2] = 968 Bytes 1471Size of value-matrix[3] = 968 Bytes 1472Size of value-matrix[4] = 168 Bytes 1473Size of value-matrix[5] = 168 Bytes 1474Size of value-matrix[6] = 1904 Bytes 1475Size of value-matrix[7] = 1904 Bytes 1476Size of value-matrix[8] = 504 Bytes 1477Size of value-matrix[9] = 104 Bytes 1478Size of value-matrix[10] = 2904 Bytes 1479Size of value-matrix[11] = 1728 Bytes 1480Size of value-matrix[12] = 1248 Bytes 1481Total Size of value-matrices 12736 Bytes 1482 1483Weighting : GainRatio 1484 1485-test4-Tested: 1 @ Mon May 31 11:03:35 2010 1486-test4-Tested: 2 @ Mon May 31 11:03:35 2010 1487-test4-Tested: 3 @ Mon May 31 11:03:35 2010 1488-test4-Tested: 4 @ Mon May 31 11:03:35 2010 1489-test4-Tested: 5 @ Mon May 31 11:03:35 2010 1490-test4-Tested: 6 @ Mon May 31 11:03:35 2010 1491-test4-Tested: 7 @ Mon May 31 11:03:35 2010 1492-test4-Tested: 8 @ Mon May 31 11:03:35 2010 1493-test4-Tested: 9 @ Mon May 31 11:03:35 2010 1494-test4-Tested: 10 @ Mon May 31 11:03:35 2010 1495-test4-Tested: 100 @ Mon May 31 11:03:35 2010 1496-test4-Ready: 950 @ Mon May 31 11:03:35 2010 1497Seconds taken: 0.0866 (10965.92 p/s) 1498 1499overall accuracy: 0.964211 (916/950), of which 62 exact matches 1500There were 6 ties of which 6 (100.00%) were correctly resolved 1501-test4-Saving Weights in wg.2.wgt 1502-test4-Saving Probability Arrays in arr.2.arr 1503Examine datafile 'dimin.test' gave the following results: 1504Number of Features: 12 1505InputFormat : C4.5 1506 1507 1508Starting to test, Testfile: dimin.test 1509Writing output in: inc3.out 1510Algorithm : IB1 1511Global metric : Value Difference, Prestored matrix 1512Deviant Feature Metrics:(none) 1513Size of value-matrix[1] = 168 Bytes 1514Size of value-matrix[2] = 968 Bytes 1515Size of value-matrix[3] = 968 Bytes 1516Size of value-matrix[4] = 168 Bytes 1517Size of value-matrix[5] = 168 Bytes 1518Size of value-matrix[6] = 1904 Bytes 1519Size of value-matrix[7] = 1904 Bytes 1520Size of value-matrix[8] = 504 Bytes 1521Size of value-matrix[9] = 104 Bytes 1522Size of value-matrix[10] = 2904 Bytes 1523Size of value-matrix[11] = 1728 Bytes 1524Size of value-matrix[12] = 1248 Bytes 1525Total Size of value-matrices 12736 Bytes 1526 1527Weighting : GainRatio 1528 1529-test4-Tested: 1 @ Mon May 31 11:03:35 2010 1530-test4-Tested: 2 @ Mon May 31 11:03:35 2010 1531-test4-Tested: 3 @ Mon May 31 11:03:35 2010 1532-test4-Tested: 4 @ Mon May 31 11:03:35 2010 1533-test4-Tested: 5 @ Mon May 31 11:03:35 2010 1534-test4-Tested: 6 @ Mon May 31 11:03:35 2010 1535-test4-Tested: 7 @ Mon May 31 11:03:35 2010 1536-test4-Tested: 8 @ Mon May 31 11:03:35 2010 1537-test4-Tested: 9 @ Mon May 31 11:03:35 2010 1538-test4-Tested: 10 @ Mon May 31 11:03:35 2010 1539-test4-Tested: 100 @ Mon May 31 11:03:35 2010 1540-test4-Ready: 950 @ Mon May 31 11:03:35 2010 1541Seconds taken: 0.0740 (12844.09 p/s) 1542 1543overall accuracy: 0.964211 (916/950), of which 62 exact matches 1544There were 6 ties of which 6 (100.00%) were correctly resolved 1545-test4-Saving Weights in wg.3.wgt 1546-test4-Saving Probability Arrays in arr.3.arr 1547Examine datafile 'dimin.test' gave the following results: 1548Number of Features: 12 1549InputFormat : C4.5 1550 1551 1552Starting to test, Testfile: dimin.test 1553Writing output in: inc4.out 1554Algorithm : IB1 1555Global metric : Value Difference, Prestored matrix 1556Deviant Feature Metrics:(none) 1557Size of value-matrix[1] = 168 Bytes 1558Size of value-matrix[2] = 968 Bytes 1559Size of value-matrix[3] = 968 Bytes 1560Size of value-matrix[4] = 168 Bytes 1561Size of value-matrix[5] = 168 Bytes 1562Size of value-matrix[6] = 1904 Bytes 1563Size of value-matrix[7] = 1904 Bytes 1564Size of value-matrix[8] = 504 Bytes 1565Size of value-matrix[9] = 104 Bytes 1566Size of value-matrix[10] = 2904 Bytes 1567Size of value-matrix[11] = 1728 Bytes 1568Size of value-matrix[12] = 1248 Bytes 1569Total Size of value-matrices 12736 Bytes 1570 1571Weighting : GainRatio 1572 1573-test4-Tested: 1 @ Mon May 31 11:03:36 2010 1574-test4-Tested: 2 @ Mon May 31 11:03:36 2010 1575-test4-Tested: 3 @ Mon May 31 11:03:36 2010 1576-test4-Tested: 4 @ Mon May 31 11:03:36 2010 1577-test4-Tested: 5 @ Mon May 31 11:03:36 2010 1578-test4-Tested: 6 @ Mon May 31 11:03:36 2010 1579-test4-Tested: 7 @ Mon May 31 11:03:36 2010 1580-test4-Tested: 8 @ Mon May 31 11:03:36 2010 1581-test4-Tested: 9 @ Mon May 31 11:03:36 2010 1582-test4-Tested: 10 @ Mon May 31 11:03:36 2010 1583-test4-Tested: 100 @ Mon May 31 11:03:36 2010 1584-test4-Ready: 950 @ Mon May 31 11:03:36 2010 1585Seconds taken: 0.0727 (13075.49 p/s) 1586 1587overall accuracy: 0.964211 (916/950), of which 62 exact matches 1588There were 6 ties of which 6 (100.00%) were correctly resolved 1589-test4-Saving Weights in wg.4.wgt 1590-test4-Saving Probability Arrays in arr.4.arr 1591Examine datafile 'dimin.test' gave the following results: 1592Number of Features: 12 1593InputFormat : C4.5 1594 1595 1596Starting to test, Testfile: dimin.test 1597Writing output in: inc5.out 1598Algorithm : IB1 1599Global metric : Value Difference, Prestored matrix 1600Deviant Feature Metrics:(none) 1601Size of value-matrix[1] = 168 Bytes 1602Size of value-matrix[2] = 968 Bytes 1603Size of value-matrix[3] = 968 Bytes 1604Size of value-matrix[4] = 168 Bytes 1605Size of value-matrix[5] = 168 Bytes 1606Size of value-matrix[6] = 1904 Bytes 1607Size of value-matrix[7] = 1904 Bytes 1608Size of value-matrix[8] = 504 Bytes 1609Size of value-matrix[9] = 104 Bytes 1610Size of value-matrix[10] = 2904 Bytes 1611Size of value-matrix[11] = 1728 Bytes 1612Size of value-matrix[12] = 1248 Bytes 1613Total Size of value-matrices 12736 Bytes 1614 1615Weighting : GainRatio 1616 1617-test4-Tested: 1 @ Mon May 31 11:03:36 2010 1618-test4-Tested: 2 @ Mon May 31 11:03:36 2010 1619-test4-Tested: 3 @ Mon May 31 11:03:36 2010 1620-test4-Tested: 4 @ Mon May 31 11:03:36 2010 1621-test4-Tested: 5 @ Mon May 31 11:03:36 2010 1622-test4-Tested: 6 @ Mon May 31 11:03:36 2010 1623-test4-Tested: 7 @ Mon May 31 11:03:36 2010 1624-test4-Tested: 8 @ Mon May 31 11:03:36 2010 1625-test4-Tested: 9 @ Mon May 31 11:03:36 2010 1626-test4-Tested: 10 @ Mon May 31 11:03:36 2010 1627-test4-Tested: 100 @ Mon May 31 11:03:36 2010 1628-test4-Ready: 950 @ Mon May 31 11:03:36 2010 1629Seconds taken: 0.0732 (12975.31 p/s) 1630 1631overall accuracy: 0.964211 (916/950), of which 62 exact matches 1632There were 6 ties of which 6 (100.00%) were correctly resolved 1633-test4-Saving Weights in wg.5.wgt 1634-test4-Saving Probability Arrays in arr.5.arr 1635\end{verbatim} 1636\end{footnotesize} 1637\clearpage 1638 1639\subsection{example 5, {\tt api\_test5.cxx}} 1640 1641This program demonstrates the use of neighborSets to classify and 1642store results. It also demonstrates some neighborSet basics. 1643 1644\begin{footnotesize} 1645\begin{verbatim} 1646#include <iostream> 1647#include <string> 1648#include "TimblAPI.h" 1649 1650using std::endl; 1651using std::cout; 1652using std::string; 1653using namespace Timbl; 1654 1655int main(){ 1656 TimblAPI *My_Experiment = new TimblAPI( "-a IB1 +vDI+DB+n +mM +k4 " , 1657 "test5" ); 1658 My_Experiment->Learn( "dimin.train" ); 1659 { 1660 string line = "=,=,=,=,+,k,e,=,-,r,@,l,T"; 1661 const neighborSet *neighbours1 = My_Experiment->classifyNS( line ); 1662 if ( neighbours1 ){ 1663 cout << "Classify OK on " << line << endl; 1664 cout << neighbours1; 1665 } else 1666 cout << "Classify failed on " << line << endl; 1667 neighborSet neighbours2; 1668 line = "+,zw,A,rt,-,k,O,p,-,n,O,n,E"; 1669 if ( My_Experiment->classifyNS( line, neighbours2 ) ){ 1670 cout << "Classify OK on " << line << endl; 1671 cout << neighbours2; 1672 } else 1673 cout << "Classify failed on " << line << endl; 1674 line = "+,z,O,n,-,d,A,xs,-,=,A,rm,P"; 1675 const neighborSet *neighbours3 = My_Experiment->classifyNS( line ); 1676 if ( neighbours3 ){ 1677 cout << "Classify OK on " << line << endl; 1678 cout << neighbours3; 1679 } else 1680 cout << "Classify failed on " << line << endl; 1681 neighborSet uit2; 1682 { 1683 neighborSet uit; 1684 uit.setShowDistance(true); 1685 uit.setShowDistribution(true); 1686 cout << " before first merge " << endl; 1687 cout << uit; 1688 uit.merge( *neighbours1 ); 1689 cout << " after first merge " << endl; 1690 cout << uit; 1691 uit.merge( *neighbours3 ); 1692 cout << " after second merge " << endl; 1693 cout << uit; 1694 uit.merge( neighbours2 ); 1695 cout << " after third merge " << endl; 1696 cout << uit; 1697 uit.truncate( 3 ); 1698 cout << " after truncate " << endl; 1699 cout << uit; 1700 cout << " test assignment" << endl; 1701 uit2 = *neighbours1; 1702 } 1703 cout << "assignment result: " << endl; 1704 cout << uit2; 1705 { 1706 cout << " test copy construction" << endl; 1707 neighborSet uit(uit2); 1708 cout << "result: " << endl; 1709 cout << uit; 1710 } 1711 cout << "almost done!" << endl; 1712 } 1713 delete My_Experiment; 1714 cout << "done!" << endl; 1715} 1716\end{verbatim} 1717\end{footnotesize} 1718 1719Its expected output is (without further comment): 1720 1721\begin{footnotesize} 1722\begin{verbatim} 1723Examine datafile 'dimin.train' gave the following results: 1724Number of Features: 12 1725InputFormat : C4.5 1726 1727-test5-Phase 1: Reading Datafile: dimin.train 1728-test5-Start: 0 @ Mon May 31 11:03:36 2010 1729-test5-Finished: 2999 @ Mon May 31 11:03:36 2010 1730-test5-Calculating Entropy Mon May 31 11:03:36 2010 1731Feature Permutation based on GainRatio/Values : 1732< 9, 5, 11, 1, 12, 7, 4, 3, 10, 8, 2, 6 > 1733-test5-Phase 2: Learning from Datafile: dimin.train 1734-test5-Start: 0 @ Mon May 31 11:03:36 2010 1735-test5-Finished: 2999 @ Mon May 31 11:03:36 2010 1736 1737Size of InstanceBase = 19231 Nodes, (769240 bytes), 49.77 % compression 1738Classify OK on =,=,=,=,+,k,e,=,-,r,@,l,T 1739# k=1 { T 1.00000 } 0.0000000000000 1740# k=2 { T 1.00000 } 0.0031862902473388 1741# k=3 { T 1.00000 } 0.0034182315118303 1742# k=4 { T 1.00000 } 0.0037433772844615 1743Classify OK on +,zw,A,rt,-,k,O,p,-,n,O,n,E 1744# k=1 { E 1.00000 } 0.0000000000000 1745# k=2 { E 1.00000 } 0.056667880327190 1746# k=3 { E 1.00000 } 0.062552636617742 1747# k=4 { E 1.00000 } 0.064423860361889 1748Classify OK on +,z,O,n,-,d,A,xs,-,=,A,rm,P 1749# k=1 { P 1.00000 } 0.059729836255170 1750# k=2 { P 1.00000 } 0.087740769132651 1751# k=3 { P 1.00000 } 0.088442788919723 1752# k=4 { P 1.00000 } 0.097058649951429 1753 before first merge 1754 after first merge 1755# k=1 { P 1.00000 } 0.059729836255170 1756# k=2 { P 1.00000 } 0.087740769132651 1757# k=3 { P 1.00000 } 0.088442788919723 1758# k=4 { P 1.00000 } 0.097058649951429 1759 after second merge 1760# k=1 { P 2.00000 } 0.059729836255170 1761# k=2 { P 2.00000 } 0.087740769132651 1762# k=3 { P 2.00000 } 0.088442788919723 1763# k=4 { P 2.00000 } 0.097058649951429 1764 after third merge 1765# k=1 { E 1.00000 } 0.0000000000000 1766# k=2 { E 1.00000 } 0.056667880327190 1767# k=3 { P 2.00000 } 0.059729836255170 1768# k=4 { E 1.00000 } 0.062552636617742 1769# k=5 { E 1.00000 } 0.064423860361889 1770# k=6 { P 2.00000 } 0.087740769132651 1771# k=7 { P 2.00000 } 0.088442788919723 1772# k=8 { P 2.00000 } 0.097058649951429 1773 after truncate 1774# k=1 { E 1.00000 } 0.0000000000000 1775# k=2 { E 1.00000 } 0.056667880327190 1776# k=3 { P 2.00000 } 0.059729836255170 1777 test assignment 1778assignment result: 1779# k=1 { P 1.00000 } 0.059729836255170 1780# k=2 { P 1.00000 } 0.087740769132651 1781# k=3 { P 1.00000 } 0.088442788919723 1782# k=4 { P 1.00000 } 0.097058649951429 1783 test copy construction 1784result: 1785# k=1 { P 1.00000 } 0.059729836255170 1786# k=2 { P 1.00000 } 0.087740769132651 1787# k=3 { P 1.00000 } 0.088442788919723 1788# k=4 { P 1.00000 } 0.097058649951429 1789almost done! 1790done! 1791\end{verbatim} 1792\end{footnotesize} 1793\clearpage 1794 1795\subsection{example 6, {\tt api\_test6.cxx}} 1796 1797This program demonstrates the use of ValueDistributions, TargetValues 1798an neighborSets for classification. 1799 1800\begin{footnotesize} 1801\begin{verbatim} 1802#include <iostream> 1803#include "TimblAPI.h" 1804 1805using std::cout; 1806using std::endl; 1807using namespace Timbl; 1808 1809int main(){ 1810 TimblAPI My_Experiment( "-a IB1 +vDI+DB -k3", "test6" ); 1811 My_Experiment.Learn( "dimin.train" ); 1812 const ValueDistribution *vd; 1813 const TargetValue *tv 1814 = My_Experiment.Classify( "-,=,O,m,+,h,K,=,-,n,I,N,K", vd ); 1815 cout << "resulting target: " << tv << endl; 1816 cout << "resulting Distribution: " << vd << endl; 1817 ValueDistribution::dist_iterator it=vd->begin(); 1818 while ( it != vd->end() ){ 1819 cout << it->second << " OR "; 1820 cout << it->second->Value() << " " << it->second->Weight() << endl; 1821 ++it; 1822 } 1823 1824 cout << "the same with neighborSets" << endl; 1825 const neighborSet *nb = My_Experiment.classifyNS( "-,=,O,m,+,h,K,=,-,n,I,N,K" ); 1826 ValueDistribution *vd2 = nb->bestDistribution(); 1827 cout << "default answer " << vd2 << endl; 1828 decayStruct *dc = new expDecay(0.3); 1829 delete vd2; 1830 vd2 = nb->bestDistribution( dc ); 1831 delete dc; 1832 cout << "with exponenial decay, alpha = 0.3 " << vd2 << endl; 1833 delete vd2; 1834} 1835\end{verbatim} 1836\end{footnotesize} 1837 1838This is the output produced: 1839 1840\begin{footnotesize} 1841\begin{verbatim} 1842Examine datafile 'dimin.train' gave the following results: 1843Number of Features: 12 1844InputFormat : C4.5 1845 1846-test6-Phase 1: Reading Datafile: dimin.train 1847-test6-Start: 0 @ Mon May 31 11:03:36 2010 1848-test6-Finished: 2999 @ Mon May 31 11:03:36 2010 1849-test6-Calculating Entropy Mon May 31 11:03:36 2010 1850Feature Permutation based on GainRatio/Values : 1851< 9, 5, 11, 1, 12, 7, 4, 3, 10, 8, 2, 6 > 1852-test6-Phase 2: Learning from Datafile: dimin.train 1853-test6-Start: 0 @ Mon May 31 11:03:36 2010 1854-test6-Finished: 2999 @ Mon May 31 11:03:36 2010 1855 1856Size of InstanceBase = 19231 Nodes, (769240 bytes), 49.77 % compression 1857resulting target: K 1858resulting Distribution: { E 1.00000, K 7.00000 } 1859E 1 OR E 1 1860K 7 OR K 7 1861the same with neighborSets 1862default answer { E 1.00000, K 7.00000 } 1863with exponenial decay, alpha = 0.3 { E 0.971556, K 6.69810 } 1864\end{verbatim} 1865\end{footnotesize} 1866 1867\end{document} 1868