1NAME
2 DBD::google - Treat Google as a datasource for DBI
3
4SYNOPSIS
5 use DBI;
6
7 my $dbh = DBI->connect("dbi:google:", $KEY);
8 my $sth = $dbh->prepare(qq[
9 SELECT title, URL FROM google WHERE q = "perl"
10 ]);
11
12 while (my $r = $sth->fetchrow_hashref) {
13 ...
14
15DESCRIPTION
16 DBD::google allows you to use Google as a datasource; google can be
17 queried using SQL *SELECT* statements, and iterated over using standard
18 DBI conventions.
19
20 WARNING: This is still alpha-quality software. It works for me, but that
21 doesn't really mean anything.
22
23WHY?
24 For general queries, what better source of information is there than
25 Google?
26
27BASIC USAGE
28 For the most part, use "DBD::google" like you use any other DBD, except
29 instead of going through the trouble of building and installing (or
30 buying!) database software, and employing a DBA to manage your data, you
31 can take advantage of Google's ability to do this for you. Think of it
32 as outsourcing your DBA, if you like.
33
34 Connection Information
35
36 The connection string should look like: "dbi:google:" (DBI requires the
37 trailing ":").
38
39 Your Google API key should be specified in the username portion (the
40 password is currently ignored; do whatever you want with it, but be
41 warned that I might put that field to use some day):
42
43 my $dbh = DBI->connect("dbi:google:", "my key", undef, \%opts);
44
45 Alternatively, you can specify a filename in the user portion; the first
46 line of that file will be treated as the key:
47
48 my $dbh =DBI->connect("dbi:google:",
49 File::Spec->catfile($ENV{HOME}, ".googlekey"))
50
51 In addition to the standard DBI options, the fourth argument to connect
52 can also include the following "DBD::google" specific options, the full
53 details of each of which can be found in the Net::Google manpage:
54
55 ie Input Encoding. String, e.g., "utf-8".
56
57 oe Output Encoding. String, e.g., "utf-8".
58
59 safe Should safe mode be on. Boolean.
60
61 filter Should results be filtered. Boolean.
62
63 lr Something to do with language. Arrayref.
64
65 debug Should "Net::Google" be put into debug mode or not?
66 Boolean.
67
68 Supported SQL Syntax and Random Notes Thereon
69
70 The only supported SQL statement type is the *SELECT* statement. Since
71 there is no real "table" involved, I've created a hypothetical table,
72 called *google*; this table has one queryable field, *q* (just like the
73 public web-based interface). The available columns are currently
74 dictated by the data available from the underlying transport, which is
75 the Google SOAP API (see http://www.google.com/apis), as implemented by
76 Aaron Straup Cope's "Net::Google" module.
77
78 The basic SQL syntax supported looks like:
79
80 SELECT @fields FROM google WHERE q = '$query'
81
82 There is also an optional LIMIT clause, the syntax of which is similar
83 to that of MySQL's LIMIT clause; it takes a pair: offset from 0, number
84 of results. In practice, Google returns 10 results at a time by default,
85 so specifying a high LIMIT clause at the beginning might make sense for
86 many queries.
87
88 The list of available fields in the *google* table includes:
89
90 title Returns the title of the result, as a string.
91
92 URL Returns the URL of the result, as a (non-HTML encoded!)
93 string.
94
95 snippet Returns a snippet of the result.
96
97 cachedSize Returns a string indicating the size of the cached
98 version of the document.
99
100 directoryTitle Returns a string.
101
102 summary Returns a summary of the result.
103
104 hostName Returns the hostname of the result.
105
106 directoryCategory
107 Returns the directory category of the result.
108
109 The column specifications can include aliases:
110
111 SELECT directoryCategory as DC FROM google WHERE...
112
113 Finally, "DBD::google" supports functions:
114
115 SELECT title, html_encode(url) FROM google WHERE q = '$stuff'
116
117 There are several available functions available by default:
118
119 uri_escape This comes from the "URI::Escape" module.
120
121 html_escape This wraps around "HTML::Entities::encode_entities".
122
123 html_strip This removes HTML from a field. Some fields, such as
124 title, summary, and snippet, have the query terms
125 highlighted with <b> tags by Google; this function can
126 be used to undo that damage.
127
128 Finally, "DBD::google" also supports arbitrary functions, specified
129 using a fully qualified Perl package identifier:
130
131 SELECT title, Digest::MD5::md5_hex(title) FROM google WHERE ...
132
133 Functions and aliases can be combined:
134
135 SELECT html_strip(snippet) as stripped_snippet FROM google...
136
137 Unsupported SQL includes ORDER BY clauses (Google does this, and
138 provides no interface to modify it), HAVING clauses, JOINs of any type
139 (there's only 1 "table", after all), sub-SELECTS (I can't even imagine
140 of what use they would be here), and, actually, anything not explicitly
141 mentioned above.
142
143INSTALLATION
144 "DBD::google" is pure perl, and has a few module requirements:
145
146 Net::Google This is the heart of the module; "DBD::google" is
147 basically a DBI-compliant wrapper around "Net::Google".
148
149 HTML::Entities, URI::Escape
150 These two modules provide the uri_escape and html_escape
151 functions.
152
153 DBI Duh.
154
155 To install:
156
157 $ perl Makefile.PL
158 $ make
159 $ make test
160 # make install
161 $ echo 'I love your module!' | mail darren@cpan.org -s "DBD::google"
162
163 The last step is optional; the others are not.
164
165EXAMPLES
166 Here is a complete script that takes a query from the command line and
167 formats the results nicely:
168
169 #!/usr/bin/perl -w
170
171 use strict;
172
173 use DBI;
174 use Text::TabularDisplay;
175
176 my $query = "@ARGV" || "perl";
177
178 # Set up SQL statement -- note the multiple lines
179 my $sql = qq~
180 SELECT
181 title, URL, hostName
182 FROM
183 google
184 WHERE
185 q = "$query"
186 ~;
187
188 # DBI/DBD options:
189 my %opts = ( RaiseError => 1, # Standard DBI options
190 PrintError => 0,
191 lr => [ 'en' ], # DBD::google options
192 oe => "utf-8",
193 ie => "utf-8",
194 );
195
196 # Get API key
197 my $keyfile = glob "~/.googlekey";
198
199 # Get database handle
200 my $dbh = DBI->connect("dbi:google:", $keyfile, undef, \%opts);
201
202 # Create Text::TabularDisplay instance, and set the columns
203 my $table = Text::TabularDisplay->new;
204 $table->columns("Title", "URL", "Hostname");
205
206 # Do the query
207 my $sth = $dbh->prepare($sql);
208 $sth->execute;
209 while (my @row = $sth->fetchrow_array) {
210 $table->add(@row);
211 }
212 $sth->finish;
213
214 print $table->render;
215
216TODO
217 These are listed in the order in which I'd like to implement them.
218
219 More tests!
220 I'm particularly unimpressed with the test suite for the SQL parser;
221 I think it is pretty pathetic. It needs much better testing, with
222 more edge cases and more things I'm not expecting to find.
223
224 I've specifically avoided including tests that actually query
225 Google, because the free API keys have a daily limit to the number
226 of requests that will be answered. My original test suite did a few
227 dozen queries each time it was run; if you run the test suite a few
228 dozen times in a day (easy to do if you are actively developing the
229 software or changing the feature set), your daily quota can be eaten
230 up very easily.
231
232 Integration of search metadata
233 There are several pieces of metadata that come back with searches;
234 access to the via the statement handle ($sth) would be nice:
235
236 my $search_time = $sth->searchTime();
237 my $total = $sth->estimatedTotalResultsNumber();
238
239 The metadata includes:
240
241 o documentFiltering
242
243 o searchTime
244
245 o estimatedTotalResultsNumber
246
247 o estimateIsExact
248
249 o searchTips
250
251 o searchTime
252
253 These are described in the Net::Google::Response manpage.
254
255 Extensible functions
256 Unknown functions that look like Perl package::function names should
257 probably be treated as such, and AUTOLOADed:
258
259 SELECT Foo::frob(title) FROM google WHERE q = "perl"
260
261 Would do, effectively:
262
263 require Foo;
264 $title = Foo::frob($title);
265
266 I'm slightly afraid of where this could lead, though:
267
268 SELECT title, LWP::Simple::get(url) as WholeDamnThing
269 FROM google
270 WHERE q = "perl apache"
271 LIMIT 0, 100
272
273 Elements return objects, instead of strings
274 It would be interesting for columns like URL and hostName to return
275 "URI" and "Net::hostent" objects, respectively.
276
277 On the other hand, this is definitely related to the previous item;
278 the parser could be extended to accept function names in method
279 format:
280
281 SELECT title, URI->new(URL), Net::hostent->new(hostName)
282 FROM google WHERE q = "perl"
283
284 DESCRIBE statement on the "google" table
285 It would be nice to provide a little introspection.
286
287CAVEATS, BUGS, IMPROVEMENTS, SUGGESTIONS, FOIBLES, ETC
288 I've only tested this using my free, 1000-uses-per-day API key, so I
289 don't know how well/if this software will work for those of you who have
290 purchased real licenses for unlimited usage.
291
292 Placeholders are currently unsupported. They won't do any good, but
293 would be nice to have for consistency with other DBDs. I'll get around
294 to it someday.
295
296 There are many Interesting Things that can be done with this module, I
297 think -- suggestions as to what those things might actually be are very
298 welcome. Patches implementing said Interesting Things are also welcome,
299 of course.
300
301 More specifically, queries that the SQL parser chokes on would be very
302 useful, so I can refine the test suite (and the parser itself, of
303 course).
304
305 There are probably a few bugs, though I don't know of any. Please report
306 them via the DBD::google queue at <http://rt.cpan.org/>.
307
308SEE ALSO
309 the DBI manpage, the DBI::DBD manpage, the Net::Google manpage, the
310 URI::Escape manpage, the HTML::Entities manpage
311
312AUTHOR
313 darren chamberlain <darren@cpan.org>
314
315