1=head1 NAME 2 3Text::NSP::Measures::2D::CHI::phi - Perl module that implements Phi coefficient 4 measure for bigrams. 5 6=head1 SYNOPSIS 7 8=head3 Basic Usage 9 10 use Text::NSP::Measures::2D::CHI::phi; 11 12 my $npp = 60; my $n1p = 20; my $np1 = 20; my $n11 = 10; 13 14 $phi_value = calculateStatistic( n11=>$n11, 15 n1p=>$n1p, 16 np1=>$np1, 17 npp=>$npp); 18 19 if( ($errorCode = getErrorCode())) 20 { 21 print STDERR $errorCode." - ".getErrorMessage()."\n""; 22 } 23 else 24 { 25 print getStatisticName."value for bigram is ".$phi_value."\n""; 26 } 27 28=head1 DESCRIPTION 29 30This function computes the the square of the traditional formulation of 31the Phi Coefficient. 32 33Assume that the frequency count data associated with a bigram 34<word1><word2> is stored in a 2x2 contingency table: 35 36 word2 ~word2 37 word1 n11 n12 | n1p 38 ~word1 n21 n22 | n2p 39 -------------- 40 np1 np2 npp 41 42where n11 is the number of times <word1><word2> occur together, and 43n12 is the number of times <word1> occurs with some word other than 44word2, and n1p is the number of times in total that word1 occurs as 45the first word in a bigram. 46 47 PHI^2 = ((n11 * n22) - (n21 * n21))^2/(n1p * np1 * np2 * n2p) 48 49Note that the value of PHI^2 is equivalent to 50Pearson's Chi-Squared test multiplied by the sample size, that is: 51 52 Chi-Squared = npp * PHI^2 53 54We use PHI^2 rather than PHI since PHI^2 was employed for collocation 55identification in: 56 57Church, K. (1991) Concordances for Parallel Text, Seventh Annual 58Conference of the UW Centre for the New OED and Text Research, Oxford, 59England. 60 61=over 62 63=cut 64 65 66package Text::NSP::Measures::2D::CHI::phi; 67 68 69use Text::NSP::Measures::2D::CHI; 70use strict; 71use Carp; 72use warnings; 73no warnings 'redefine'; 74require Exporter; 75 76our ($VERSION, @EXPORT, @ISA); 77 78@ISA = qw(Exporter); 79 80@EXPORT = qw(initializeStatistic calculateStatistic 81 getErrorCode getErrorMessage getStatisticName); 82 83$VERSION = '0.97'; 84 85 86=item calculateStatistic() - method to calculate the Phi Coefficient 87 88INPUT PARAMS : $count_values .. Reference of an hash containing 89 the count values computed by the 90 count.pl program. 91 92RETURN VALUES : $phi .. phi value for this bigram. 93 94=cut 95 96sub calculateStatistic 97{ 98 my %values = @_; 99 100 # computes and returns the observed and expected values from 101 # the frequency combination values. returns 0 if there is an 102 # error in the computation or the values are inconsistent. 103 if( !(Text::NSP::Measures::2D::CHI::getValues(\%values)) ) { 104 return; 105 } 106 107 # Now calculate the phi coefficient 108 my $phi = 0; 109 110 $phi += Text::NSP::Measures::2D::CHI::computeVal($n11, $m11); 111 $phi += Text::NSP::Measures::2D::CHI::computeVal($n12, $m12); 112 $phi += Text::NSP::Measures::2D::CHI::computeVal($n21, $m21); 113 $phi += Text::NSP::Measures::2D::CHI::computeVal($n22, $m22); 114 115 return $phi/$values{npp}; 116} 117 118 119 120=item getStatisticName() - Returns the name of this statistic 121 122INPUT PARAMS : none 123 124RETURN VALUES : $name .. Name of the measure. 125 126=cut 127 128sub getStatisticName 129{ 130 return "Phi Coefficient"; 131} 132 133 134 1351; 136__END__ 137 138 139=back 140 141=head1 AUTHOR 142 143Ted Pedersen, University of Minnesota Duluth 144 E<lt>tpederse@d.umn.eduE<gt> 145 146Satanjeev Banerjee, Carnegie Mellon University 147 E<lt>satanjeev@cmu.eduE<gt> 148 149Amruta Purandare, University of Pittsburgh 150 E<lt>amruta@cs.pitt.eduE<gt> 151 152Bridget Thomson-McInnes, University of Minnesota Twin Cities 153 E<lt>bthompson@d.umn.eduE<gt> 154 155Saiyam Kohli, University of Minnesota Duluth 156 E<lt>kohli003@d.umn.eduE<gt> 157 158=head1 HISTORY 159 160Last updated: $Id: phi.pm,v 1.12 2006/06/21 11:10:52 saiyam_kohli Exp $ 161 162=head1 BUGS 163 164 165=head1 SEE ALSO 166 167 @inproceedings{GaleC91, 168 author = {Gale, W. and Church, K.}, 169 title = {A Program for Aligning Sentences in Bilingual Corpora}, 170 booktitle = {Proceedings of the 29th Annual Meeting of the 171 Association for Computational Linguistics}, 172 address = {Berkeley, CA}, 173 year = {1991} 174 url = L<http://www.cs.mu.oz.au/acl/J/J93/J93-1004.pdf>} 175 176 177L<http://groups.yahoo.com/group/ngram/> 178 179L<http://www.d.umn.edu/~tpederse/nsp.html> 180 181 182=head1 COPYRIGHT 183 184Copyright (C) 2000-2006, Ted Pedersen, Satanjeev Banerjee, Amruta 185Purandare, Bridget Thomson-McInnes and Saiyam Kohli 186 187This program is free software; you can redistribute it and/or modify it 188under the terms of the GNU General Public License as published by the Free 189Software Foundation; either version 2 of the License, or (at your option) 190any later version. 191 192This program is distributed in the hope that it will be useful, but 193WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY 194or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License 195for more details. 196 197You should have received a copy of the GNU General Public License along 198with this program; if not, write to 199 200 The Free Software Foundation, Inc., 201 59 Temple Place - Suite 330, 202 Boston, MA 02111-1307, USA. 203 204Note: a copy of the GNU General Public License is available on the web 205at L<http://www.gnu.org/licenses/gpl.txt> and is included in this 206distribution as GPL.txt. 207 208=cut