1#!/usr/local/bin/perl -w 2 3=head1 NAME 4 5kocos.pl - Find the Kth order co-occurrences of a word 6 7=head1 SYNOPSIS 8 9This program finds the Kth order co-occurrences of a given word. 10 11=head1 DESCRIPTION 12 13=head2 1. What are Kth order co-occurrences? 14 15Co-occurrences are the words which occur together in the same context. All 16words which co-occur with a given target word are called its co-occurrences. 17The concept of 2nd order co-occurrences is explained in the paper Automatic 18word Sense Discrimination [Schutze98]. According to this paper, the words 19which co-occur with the co-occurring words of a target word are called as the 202nd order co-occurrences of that word. 21 22So with each increasing order of co-occurrences, we introduce an extra level 23of indirection and find words co-occurring with the previous order 24co-occurrences. 25 26We generalize the concept of 2nd order co-occurrences from [Schutze98] to find 27the Kth order co-occurrences of a word. These are the words that co-occur 28with the (K-1)th order co-occurrences of a given target word. 29 30We have also found [Niwa&Nitta94] to be related to kocos. While we do not 31exactly reimplement the co-occurrence vectors they propose, we feel that 32kocos is at least similar in spirit. 33 34=head2 2. Usage 35 36Usage: kocos.pl [OPTIONS] BIGRAM 37 38=head2 3. Input 39 40=head3 3.1 BIGRAM 41 42Specify the BIGRAM file name on the command line after the program name and 43options (if any) as shown in the usage note. 44 45BIGRAM should be a bigram output(normal or extended) created by NSP programs - 46count.pl, statistic.pl or combig.pl. When count.pl and statistic.pl are run for 47creating bigrams (--ngram set to 2 or not specified), the programs list the 48bigrams of all words which co-occur together(in certain window). So we can 49say that if a bigram 'word1<>word2<>' is listed in the output of count.pl 50or statistic.pl program, it means that the words word1 and word2 are the 51co-occurrences of each other. 52 53In general you may want to consider the use of stop lists (--stop option 54in count.pl) to remove very common words such as "the" and "for", and 55also eliminate low frequency bigrams (--remove option in count.pl). The 56stop list is particularly important as high frequency words such as "the" 57or "for" will co-occur with many different words, and greatly expand the 58search needed to find kth order co-occurrences. 59 60If you want to run kocos.pl on a source file not created by either count 61or statistic program of this package, just make sure that each line of BIGRAM 62file will list two words WORD1 and WORD2 as 63WORD1<>WORD2<> 64The program minimally requires that there are exactly two words and they are 65separated by delimiter '<>' with an extra delimiter '<>' after the second 66word. So you may convert any non NSP input to this format where two words 67occurring in the same context are '<>' separated and provide it to kocos. 68 69Controlling scope of the context 70 71You may like to call two words as co-occurrences of each other if they occur 72within a specific distance from each other. We encourage in this case that you 73use --window w option of NSP program count.pl while creating a BIGRAM file. This 74will create bigrams of all words which co-occur within a distance w from each 75other. Thus --window w sets the maximum distance allowed between two words to 76call them co-occurrences of each other. 77 78Note that if the --window option is not used while creating BIGRAM input 79for kocos, only those words which come immediately next to each other will 80be considered as the co-occurrences (default window size being 2 for bigrams). 81 82=head2 4. Options 83 84=head3 4.1 --literal WORD 85 86With this option, the target WORD whose kth order co-occurrences are to be 87found can be directly specified on the command line. 88 89e.g. 90 kocos.pl --literal line test.input 91will find the 1st order co-occurrences (by default) of the word 'line' using 92Bigrams listed in file test.input. 93 94 kocos.pl --literal , --order 3 test.input 95will find 3rd order co-occurrences of ',' from file test.input. 96 97=head3 4.2 --regex REGEXFILE 98 99With this option, target word can be specified using Perl regular expression/s. 100The regex/s should be written in a file and multiple regex/s should either 101appear on separate lines or should be Perl 'OR' (|) separated. 102 103We provide this option to allow user to specify various morphological 104variants of the target word e.g. line, lines, Line,Lines etc. 105 106e.g. 107(1) let test.regex contains a regular expression for target word which is - 108 /^[Ll]ines?$/ 109 110To use this for finding kocos, run kocos.pl with command 111 112 kocos.pl --regex test.regex --order K test.input 113 114(2) To find say 2nd order co-occurrences of any general target word which occurs in 115Data in <head> tags like Senseval Format, 116we use a regular expression 117 /^<head.*>\w+</head>$/ 118in our regex file say test.regex 119and run kocos.pl using command 120 121 kocos.pl --regex test.regex --order 2 eng-lex-sample.training.xml 122 123(3) To find 3rd order co-occurrences of any word that contains period '.' 124run kocos.pl using 125 126 kocos.pl --literal . --order 3 test.input 127 128Or write a regex /\./ in file say test.regex and run kocos using 129 130 kocos.pl --regex test.regex --order 3 test.input 131 132(4) To find 2nd order co-occurrences of all words that are numbers, 133write a regex like /^\d+$/ to a regexfile say test.regex and run kocos 134using, 135 136 kocos.pl --regex test.regex --order 2 test.input 137 138Note: writing a regex /\d+/ will also match words like line20.1.cord, or 139art%10.fine456 that include numbers. 140 141Regex/s that should exactly match as target words should be delimited by 142^ and $ as in /^[Ll]ines?$/. Specifying something like /[Ll]ines?/ will 143match with 'incline'. 144 145Note - The program kocos.pl requires that the target word is specified using 146either of the options --literal or --regex 147 148=head3 4.3 --order K 149 150If the value of K is specified using the command line option --order K, 151kocos.pl will find the Kth order co-occurrences of the target word. K can 152take any integer value greater than 0. If the value of K is not specified, 153the program will set K to 1 and will simply find the co-occurrences of the 154target (the word co-occurrence generally means first order co-occurrences). 155 156=head3 4.4 --trace TRACEFILE 157 158To see a detailed report of how each Kth order co-occurrence is reached as a 159sequence of K words, specify the name of a TRACEFILE on the command line 160using --trace TRACEFILE option. 161 162TRACEFILE will show the chains of K+1 words where the first word is the TARGET 163word and every ith word in the chain is a (i-1)th order co-occurrence of target 164which co-occurs with (i-1)th word in the chain. So a chain of K+1 words, 165 166 TARGET->COC1->COC2->COC3....->COCK-1->COCK 167 168shows that COC1 is a first order co-occurrence of the TARGET. 169 170 COC2 is a second order co-occurrence such that COC2 co-occurs with 171 COC1 which in turn co-occurs with the TARGET. 172 COC3 is a third order co-occurrence such that COC3 co-occurs with 173 COC2 which in turn co-occurs with COC1 which co-occurs with TARGET. 174 175and so on...... 176 177=head3 4.6 --help 178 179This option will display the help message. 180 181=head3 4.7 --version 182 183This option will display version information of the program. 184 185=head2 5. Output 186 187The program will display a list of Kth order co-occurrences to standard 188output such that each co-occurrence occurs on a separate line and is 189followed by '<>' (just to be compatible with other programs in NSP). 190 191Note that the output of kocos.pl could be directly used by the program 192nsp2regex of the SenseTools Package (by Satanjeev Banerjee and Ted 193Pedersen) to convert Senseval data instances into feature vectors in ARFF 194format where our Kth order co-occurrences are used as features. 195 196For more information on SenseTools you can refer to its README: 197http://www.d.umn.edu/~tpederse/sensetools.html 198 199 IMPORTANT NOTE 200 201If there are some kth order co-occurrences which are also the ith order 202co-occurrences (0<i<k) of the target word, program kocos.pl will not 203display them as the Kth order co-occurrences. kocos.pl displays only those 204words as Kth order co-occurrences whose minimum distance from target word 205is K in the co-occurrence graph. 206[Co-occurrence graph shows a network of words where a word is connected to 207all words it co-occurs with.] 208 209 210=head2 6. Usage examples 211 212(a) Using default value of order 213To find the (1st order) co-occurrences of a word 'line' from the BIGRAM file 214test.input, run kocos.pl using the following command. 215 kocos.pl --literal line test.input 216 217(b) Using option order 218To find the 2nd order co-occurrences of a word 'line' from the BIGRAM file 219test.input, run kocos.pl using the following command. 220 kocos.pl --literal line --order 2 test.input 221 222(c) Using the trace option 223To see how the 4th order co-occurrences of a word 'line' is reached as a 224sequence of words which form a co-occurrence chain, run kocos.pl using the 225following command. 226 kocos.pl --literal line --order 4 --trace test.trace test.input 227 228(d) Using a Regex to specify the target word 229To find Kth order co-occurrences of a target word 'line' which is specified as 230a Perl regular expression say /^[Ll]ines?$/ in a file test.regex, 231run kocos.pl using 232 kocos.pl --regex test.regex --order K test.input 233 234(e) Using a generic Regex for Data like Senseval-2, 235To find 2nd order co-occurrences of a target word that occurs in <head> tags 236in the data file eng-lex-sample.training.xml, use a regular expression like 237/<head>\w+</head>/ from a file say test.regex, and run kocos.pl using 238 kocos.pl --regex test.regex --order 2 test.input 239 240=head2 7. General Recommendations 241 242(a) Create a BIGRAM file using programs count.pl, statistic.pl or combig.pl 243 of NSP Package. 244(b) Use --window W option of program count.pl to specify the scope of the 245 context. Any word that occurs within a distance W from a target word will be 246 treated as its co-occurrence. 247(c) Use either --literal or --regex option to specify the target word. We 248 recommend use of regex support to detect forms of target word other than 249 its base form. 250 251=head2 8. Examples of Kth order co-occurrences 252 253In all the following examples, we assume that the input comes from the file 254test.input and word 'line' is a target word. 255 256 test.input => 257 ---------------- 258 print<>in<> | 259 print<>line<> | 260 text<>the<> | 261 text<>line<> | 262 file<>the<> | 263 file<>in<> | 264 line<>file | 265 ---------------- 266 267(Note that test.input doesn't look like a valid count/statistic output because 268kocos.pl will minimally require two words WORD1 and WORD2 separated by '<>' 269with an extra '<>' after WORD2 as described in Section 3.1 of this README) 270 271(a) The 1st order co-occurrences of word 'line' can be found by 272 running kocos.pl with either of the following commands - 273 274 kocos.pl --literal line test.input 275 OR 276 kocos.pl --order 1 --literal line test.input 277 278This will display the co-occurrences of 'line' to standard output as shown 279below in the box. 280 281 -------- 282 text<> | 283 file<> | 284 print<>| 285 -------- 286 287This is because the program finds the bigrams 288 289 print<>line<> 290 text<>line<> 291 line<>file<> 292 293where word 'line' co-occurs with the words print, text and file which become 294the 1st order co-occurrences. 295 296(b) The 2nd order co-occurrences of word 'line' can be found by 297 running kocos.pl with the following command - 298 kocos.pl --literal line --order 2 test.input 299 300This will display the 2nd order co-occurrences of 'line' to standard output 301as shown below in the box. 302 303 -------- 304 the<> | 305 in<> | 306 -------- 307 308This is because the program finds the words print, text and file as the 309first order co-occurrences (as explained in case a) and finds bigrams 310 311 print<>in<> 312 text<>the<> 313 file<>the<> 314 file<>in 315 316where 'the' and 'in' co-occur with the words print, text, file. 317 318(c) To see how the 2nd order co-occurrences of word 'line' are reached 319 run the program using the following command - 320 kocos.pl --order 2 --trace test.trace test.input line 321 322This will display the 2nd order co-occurrences of 'line' to standard output 323as shown below in the box. 324 325 -------- 326 the<> | 327 in<> | 328 -------- 329 330and a detailed report of co-occurrence chains in test.trace file as shown 331in the box below. 332 333 test.trace => 334 335 ---------------- 336 line->text->the| 337 line->file->the| 338 line->file->in | 339 line->print->in| 340 ---------------- 341 342where 343the first line shows that the word 'line' co-occurred with 'text' which 344co-occurred with 'the'. Hence 'the' became a 2nd order co-occurrence. 345Similarly, 'line' co-occurred with 'file' which in turn co-occurred with 346'the' and 'in' which are therefore the 2nd order co-occurrences of 'line'. 347 348=head2 11. References 349 350[Niwa&Nitta94] Y. Niwa and Y. Nitta. Co-occurrence vectors from corpora 351vs. distance vectors from dictionaries. COLING-1994. 352 353[Schutze98] H. Schutze. Automatic word sense discrimination. Computational 354Linguistics,24(1):97-123,1998. 355 356=head1 AUTHORS 357 358 Amruta Purandare, pura0010@umn.edu 359 Ted Pedersen, tpederse@umn.edu 360 361 Last updated on 12/05/2003 by TDP 362 363This work has been partially supported by a National Science Foundation 364Faculty Early CAREER Development award (#0092784). 365 366=head1 BUGS 367 368=head1 SEE ALSO 369 370http://www.d.umn.edu/~tpederse/nsp.html 371 372=head1 COPYRIGHT 373 374Copyright (C) 2002-2003, Amruta Purandare and Ted Pedersen 375 376This program is free software; you can redistribute it and/or modify it under 377the terms of the GNU General Public License as published by the Free Software 378Foundation; either version 2 of the License, or (at your option) any later 379version. 380 381This program is distributed in the hope that it will be useful, but WITHOUT 382ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS 383FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. 384 385You should have received a copy of the GNU General Public License along with 386this program; if not, write to 387 388The Free Software Foundation, Inc., 38959 Temple Place - Suite 330, 390Boston, MA 02111-1307, USA. 391 392=cut 393 394############################################################################### 395 396# Changelogs 397 398# Date Version By Changes Code 399 400# 03/30/2003 0.03 Amruta Regex support for ADP.03.1 401# specifying target 402# word 403# 404# 07/02/2003 0.05 Amruta Redesigned algorithm ADP.05 405# to improve performance 406# 407############################################################################### 408 409# THE CODE STARTS HERE 410 411############################################################################### 412 413# ================================ 414# COMMAND LINE OPTIONS AND USAGE 415# ================================ 416 417# show minimal usage message if no arguments 418if($#ARGV<0) 419{ 420 &showminimal(); 421 exit; 422} 423 424# command line options 425use Getopt::Long; 426GetOptions ("help","version","order=i","trace=s","literal=s","regex=s"); 427# show help option 428if(defined $opt_help) 429{ 430 $opt_help=1; 431 &showhelp(); 432 exit; 433} 434 435# show version information 436if(defined $opt_version) 437{ 438 $opt_version=1; 439 &showversion(); 440 exit; 441} 442# if the order is specified 443if(defined $opt_order) 444{ 445 $order=$opt_order; 446} 447# otherwise set default to 1 448else 449{ 450 $order=1; 451} 452 453# trace report will show how 454# a Kth order co-occurrence is 455# reached via a chain of 456# lower order co-occurrences 457if(defined $opt_trace) 458{ 459 $trace=$opt_trace; 460} 461 462# ---------------- 463# ADP.03.1 start 464# ---------------- 465# this part has been added during NSP version 0.55 release 466 467# target word is specified via --literal 468if(defined $opt_literal) 469{ 470 $target=$opt_literal; 471} 472# target specified as Perl regex/s in a file 473if(defined $opt_regex) 474{ 475 $regex_file=$opt_regex; 476 if(!(-e $regex_file)) 477 { 478 print STDERR "ERROR($0): 479 Regex file <$regex_file> doesn't exist.\n"; 480 exit; 481 } 482 open(REG,$regex_file) || die "ERROR($0): 483 Error(error code=$!) in opening Regex File <$regex_file>.\n"; 484 undef $target; 485 while(<REG>) 486 { 487 chomp; 488 s/^\s+//g; 489 s/\s$//g; 490 if(/^\s*$/) 491 { 492 next; 493 } 494 if(/^\//) 495 { 496 s/^\///; 497 } 498 else 499 { 500 print STDERR "ERROR($0): 501 Regular Expression <$_> should start with '/'\n"; 502 exit; 503 } 504 if(/\/$/) 505 { 506 s/\/$//; 507 } 508 else 509 { 510 print STDERR "ERROR($0): 511 Regular Expression <$_> should end with '/'\n"; 512 exit; 513 } 514 $target.="(".$_.")|"; 515 } 516 if(defined $target) 517 { 518 chop $target; 519 } 520 else 521 { 522 print "ERROR($0): 523 No valid Perl regex found in Regex file <$regex_file>.\n"; 524 exit; 525 } 526} 527# -------------- 528# ADP.03.1 end 529# -------------- 530 531############################################################################# 532 533# ================================ 534# INITIALIZATION AND INPUT 535# ================================ 536 537#$0 contains the program name along with 538#a complete path. Extract just the program 539#name and use in error messages 540$0=~s/.*\/(.+)/$1/; 541 542if(!defined $ARGV[0]) 543{ 544 print STDERR "ERROR($0): 545 Please specify the SOURCE file name...\n"; 546 exit; 547} 548#accept the input file name 549$infile=$ARGV[0]; 550#check if exists 551if(!-e $infile) 552{ 553 print STDERR "ERROR($0): 554 Source file <$infile> doesn't exist...\n"; 555 exit; 556} 557#open if exists 558open(IN,$infile) || die "Error($0): 559 Error(code=$!) in opening <$infile> file.\n"; 560 561#check if the target word exists 562if(!defined $target) 563{ 564 print STDERR "ERROR($0): 565 Please specify the target word using one of the --literal or --regex 566 options.\n"; 567 exit; 568} 569 570#check if order is valid 571if($order<1) 572{ 573 print STDERR "ERROR($0): 574 Order should be greater than or equal to 1.\n"; 575 exit; 576} 577#check if --trace is used 578if(defined $trace) 579{ 580 $ans="n"; 581 #check if the TRACE_FILE already exists 582 if(-e $trace) 583 { 584 print STDERR "WARNING($0): 585 Trace file <$trace> already exists, overwrite(y/n)? "; 586 $ans=<STDIN>; 587 } 588 if(!-e $trace || $ans=~/Y|y/) 589 { 590 #open the TRACE_FILE 591 open(TRACE,">$trace") || die "Error($0): 592 Error(code=$!) in opening Trace file <$trace>.\n"; 593 } 594 else 595 { 596 undef $trace; 597 } 598} 599############################################################################## 600 601# ============================= 602# Reading and Storing Bigrams 603# ============================= 604 605$line_num=0; 606#creating a coc_store data structure for storing all bigram strings from SOURCE 607#so that co-occurrences can be found looking at this data structure 608while($line=<IN>) 609{ 610 $line_num++; 611 chomp $line; 612 # handling blank lines 613 if($line=~/^\s*$/) 614 { 615 next; 616 } 617 #store the bigram strings 618 if($line=~/<>/) 619 { 620 #checking if the SOURCE is a valid NSP output for Bigrams 621 $check_bigram=$line; 622 $cnt=0; 623 #count how many times <> occurs 624 while($check_bigram=~/<>/) 625 { 626 $cnt++; 627 $check_bigram=$'; 628 } 629 #should be 2 for bigrams 630 if($cnt!=2) 631 { 632 print STDERR "ERROR($0): 633 SOURCE file <$infile> is not a valid Bigram output of NSP at line 634 <$line_num>.\n"; 635 exit; 636 } 637 # store bigram in coc_store 638 push @coc_store,$line; 639 } 640} 641 642############################################################################ 643 644# =========================================== 645# Ranking words according to their distance 646# From the target word 647# =========================================== 648 649# start with target word which 650# is at 0th level (rank) 651$rank{$target}=0; 652$word=$target; 653while($rank{$word}<$order) 654{ 655 # rank my co-occurrences 656 &rank_cocs($word,$rank{$word}); 657 # no more words in queue 658 if($#words < 0) 659 { 660 if($rank{$word}<$order) 661 { 662 print "No co-occurrences at $order th level.\n"; 663 } 664 last; 665 } 666 else 667 { 668 # get the first word from queue 669 $word=shift @words; 670 } 671} 672 673############################################################################# 674 675# ============================== 676# Printing Trace Report 677# ============================== 678 679# print trace report 680if(defined $trace) 681{ 682 # trace each kth order co-occurrence back 683 foreach $word (@kocs) 684 { 685 # get all parents until the 686 # target word is reached 687 @chain=(); 688 push @chain,$word; 689 if(defined $regex_file) 690 { 691 while($word !~ /$target/) 692 { 693 push @chain,$parent{$word}; 694 $word=$parent{$word}; 695 } 696 } 697 else 698 { 699 while($word ne $target) 700 { 701 push @chain,$parent{$word}; 702 $word=$parent{$word}; 703 } 704 } 705 # print reverse chain 706 @reversed=reverse @chain; 707 print TRACE join("->",@reversed); 708 print TRACE "\n"; 709 } 710} 711########################################################################### 712 713# ========================== 714# SUBROUTINE SECTION 715# ========================== 716 717#----------------------------------------------------------------------------- 718 719# ranks and queues the co-occurrences of a given word 720sub rank_cocs 721{ 722 my $word=$_[0]; 723 # co-occurrences of given word 724 # will be at rank(word)+1 725 my $level=$_[1]+1; 726 #string from the coc_store 727 my $coc_string=""; 728 # check bigrams and rank words 729 # co-occurring with the given 730 # word 731 foreach $coc_string (@coc_store) 732 { 733 @parts=split(/<>/,$coc_string); 734 $word1=$parts[0]; 735 $word2=$parts[1]; 736 # current word is the target word 737 # specified via regex option 738 undef $got_coc; 739 if($level==1 && defined $regex_file) 740 { 741 # if exactly one of the words matches 742 # the target, extract the other 743 if($word1=~/$word/ && !defined $rank{$word2} && $word2!~/$target/) 744 { 745 $got_coc=$word2; 746 $parent=$word1; 747 } 748 elsif($word2=~/$word/ && !defined $rank{$word1} && $word1!~/$target/) 749 { 750 $got_coc=$word1; 751 $parent=$word2; 752 } 753 } 754 elsif(defined $regex_file) 755 { 756 # if one of the words matches the 757 # given word and other doesn't match 758 # the target word 759 if($word1 eq $word && !defined $rank{$word2} && $word2!~/$target/) 760 { 761 $got_coc=$word2; 762 $parent=$word1; 763 } 764 elsif($word2 eq $word && !defined $rank{$word1} && $word1!~/$target/) 765 { 766 $got_coc=$word1; 767 $parent=$word2; 768 } 769 } 770 else 771 { 772 # one of the words matches the 773 # given word 774 if($word1 eq $word && !defined $rank{$word2}) 775 { 776 $got_coc=$word2; 777 $parent=$word1; 778 } 779 elsif($word2 eq $word && !defined $rank{$word1}) 780 { 781 $got_coc=$word1; 782 $parent=$word2; 783 } 784 } 785 if(defined $got_coc) 786 { 787 # rank the obtained coc 788 $rank{$got_coc}=$level; 789 # queue the coc 790 push @words,$got_coc; 791 # print if level is K 792 if($level==$order) 793 { 794 print "$got_coc<>\n"; 795 } 796 # store link to parent for tracing 797 if(defined $trace) 798 { 799 $parent{$got_coc}=$parent; 800 if($level==$order) 801 { 802 push @kocs,$got_coc; 803 } 804 } 805 } 806 } 807} 808 809#----------------------------------------------------------------------------- 810#show minimal usage message 811sub showminimal() 812{ 813 print "Usage: kocos.pl [OPTIONS] BIGRAM"; 814 print "\nTYPE kocos.pl --help for help\n"; 815} 816 817#----------------------------------------------------------------------------- 818#show help 819sub showhelp() 820{ 821 print "Usage: kocos.pl [OPTIONS] BIGRAM 822Displays the kth order Co-occurrences of a given target word. 823Target word should be specified via --literal or --regex option. 824 825BIGRAM 826 A list of bigrams formatted like the output (extended or normal) 827 of NSP programs count.pl or statistic.pl. 828 829OPTIONS: 830--literal LITERAL 831 Specify the target word directly on command line as a literal. 832 833--regex REGEXFILE 834 Specify a file containing Perl regular expression/s that define 835 the target word. 836 837--order K 838 Specify the value of K (K>0) to find the kth order co-occurrences. 839 A Kth order co-occurrence is a word that co-occurs with a (K-1)th 840 order co-occurrence of the target word. 841 842 By default, the value of K is set to 1 which simply lists the 843 words that co-occur with a given target word. When K is 2, all words 844 that co-occur with the words that co-occur with the target word are 845 shown, and so on for higher orders. 846 847--trace TRACEFILE 848 Specify the name of a TRACEFILE to see a detailed trace report 849 showing the chains of co-occurrences. A chain shows how a Kth 850 order co-occurrence is reached as a sequence of K lower order 851 co-occurrences. 852 e.g. WORD->First->Second->Third..->Kth 853 shows that 'First' is a first order co-occurrence of WORD, 854 'Second' is a second order co-occurrence of WORD which co-occurs 855 with 'First'. 'Third' is a third order co-occurrence of WORD which 856 co-occurs with 'Second' and so on until K is reached. 857--help 858 To display this message. 859 860--version 861 To display the version information.\n"; 862} 863 864#------------------------------------------------------------------------------ 865#version information 866sub showversion() 867{ 868 print "kocos.pl - version 0.05\n"; 869 print "Copyright (C) 2002-2003, Amruta Purandare & Ted Pedersen\n"; 870 print "Date of Last Update: 07/01/2003\n"; 871} 872 873############################################################################# 874 875 876=head1 AUTHORS 877 878 Amruta Purandare, University of Minnesota, Duluth, pura0010@d.umn.edu 879 Ted Pedersen, University of Minnesota, Duluth, tpederse@umn.edu 880 881=head1 BUGS 882 883=head1 SEE ALSO 884 885http://www.d.umn.edu/~tpederse/nsp.html 886 887=head1 COPYRIGHT 888 889Copyright (C) 2002-2003, Amruta Purandare & Ted Pedersen 890 891This program is free software; you can redistribute it and/or modify it 892under the terms of the GNU General Public License as published by the Free 893Software Foundation; either version 2 of the License, or (at your option) 894any later version. 895 896This program is distributed in the hope that it will be useful, but 897WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY 898 or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License 899for more details. 900 901You should have received a copy of the GNU General Public License along 902with this program; if not, write to 903 904The Free Software Foundation, Inc., 90559 Temple Place - Suite 330, 906Boston, MA 02111-1307, USA. 907 908=cut 909 910