• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

lib/H15-Mar-2009-1,440480

t/H15-Mar-2009-328224

ChangesH A D23-Apr-20081.9 KiB4433

MANIFESTH A D15-Mar-2009368 2019

META.ymlH A D15-Mar-2009585 2625

Makefile.PLH A D15-Mar-2009571 3125

READMEH A D23-Apr-200810.4 KiB315217

README

1NAME
2    DBD::google - Treat Google as a datasource for DBI
3
4SYNOPSIS
5        use DBI;
6
7        my $dbh = DBI->connect("dbi:google:", $KEY);
8        my $sth = $dbh->prepare(qq[
9            SELECT title, URL FROM google WHERE q = "perl"
10        ]);
11
12        while (my $r = $sth->fetchrow_hashref) {
13            ...
14
15DESCRIPTION
16    DBD::google allows you to use Google as a datasource; google can be
17    queried using SQL *SELECT* statements, and iterated over using standard
18    DBI conventions.
19
20    WARNING: This is still alpha-quality software. It works for me, but that
21    doesn't really mean anything.
22
23WHY?
24    For general queries, what better source of information is there than
25    Google?
26
27BASIC USAGE
28    For the most part, use "DBD::google" like you use any other DBD, except
29    instead of going through the trouble of building and installing (or
30    buying!) database software, and employing a DBA to manage your data, you
31    can take advantage of Google's ability to do this for you. Think of it
32    as outsourcing your DBA, if you like.
33
34  Connection Information
35
36    The connection string should look like: "dbi:google:" (DBI requires the
37    trailing ":").
38
39    Your Google API key should be specified in the username portion (the
40    password is currently ignored; do whatever you want with it, but be
41    warned that I might put that field to use some day):
42
43      my $dbh = DBI->connect("dbi:google:", "my key", undef, \%opts);
44
45    Alternatively, you can specify a filename in the user portion; the first
46    line of that file will be treated as the key:
47
48      my $dbh =DBI->connect("dbi:google:",
49            File::Spec->catfile($ENV{HOME}, ".googlekey"))
50
51    In addition to the standard DBI options, the fourth argument to connect
52    can also include the following "DBD::google" specific options, the full
53    details of each of which can be found in the Net::Google manpage:
54
55    ie              Input Encoding. String, e.g., "utf-8".
56
57    oe              Output Encoding. String, e.g., "utf-8".
58
59    safe            Should safe mode be on. Boolean.
60
61    filter          Should results be filtered. Boolean.
62
63    lr              Something to do with language. Arrayref.
64
65    debug           Should "Net::Google" be put into debug mode or not?
66                    Boolean.
67
68  Supported SQL Syntax and Random Notes Thereon
69
70    The only supported SQL statement type is the *SELECT* statement. Since
71    there is no real "table" involved, I've created a hypothetical table,
72    called *google*; this table has one queryable field, *q* (just like the
73    public web-based interface). The available columns are currently
74    dictated by the data available from the underlying transport, which is
75    the Google SOAP API (see http://www.google.com/apis), as implemented by
76    Aaron Straup Cope's "Net::Google" module.
77
78    The basic SQL syntax supported looks like:
79
80      SELECT @fields FROM google WHERE q = '$query'
81
82    There is also an optional LIMIT clause, the syntax of which is similar
83    to that of MySQL's LIMIT clause; it takes a pair: offset from 0, number
84    of results. In practice, Google returns 10 results at a time by default,
85    so specifying a high LIMIT clause at the beginning might make sense for
86    many queries.
87
88    The list of available fields in the *google* table includes:
89
90    title           Returns the title of the result, as a string.
91
92    URL             Returns the URL of the result, as a (non-HTML encoded!)
93                    string.
94
95    snippet         Returns a snippet of the result.
96
97    cachedSize      Returns a string indicating the size of the cached
98                    version of the document.
99
100    directoryTitle  Returns a string.
101
102    summary         Returns a summary of the result.
103
104    hostName        Returns the hostname of the result.
105
106    directoryCategory
107                    Returns the directory category of the result.
108
109    The column specifications can include aliases:
110
111      SELECT directoryCategory as DC FROM google WHERE...
112
113    Finally, "DBD::google" supports functions:
114
115      SELECT title, html_encode(url) FROM google WHERE q = '$stuff'
116
117    There are several available functions available by default:
118
119    uri_escape      This comes from the "URI::Escape" module.
120
121    html_escape     This wraps around "HTML::Entities::encode_entities".
122
123    html_strip      This removes HTML from a field. Some fields, such as
124                    title, summary, and snippet, have the query terms
125                    highlighted with <b> tags by Google; this function can
126                    be used to undo that damage.
127
128    Finally, "DBD::google" also supports arbitrary functions, specified
129    using a fully qualified Perl package identifier:
130
131      SELECT title, Digest::MD5::md5_hex(title) FROM google WHERE ...
132
133    Functions and aliases can be combined:
134
135      SELECT html_strip(snippet) as stripped_snippet FROM google...
136
137    Unsupported SQL includes ORDER BY clauses (Google does this, and
138    provides no interface to modify it), HAVING clauses, JOINs of any type
139    (there's only 1 "table", after all), sub-SELECTS (I can't even imagine
140    of what use they would be here), and, actually, anything not explicitly
141    mentioned above.
142
143INSTALLATION
144    "DBD::google" is pure perl, and has a few module requirements:
145
146    Net::Google     This is the heart of the module; "DBD::google" is
147                    basically a DBI-compliant wrapper around "Net::Google".
148
149    HTML::Entities, URI::Escape
150                    These two modules provide the uri_escape and html_escape
151                    functions.
152
153    DBI             Duh.
154
155    To install:
156
157      $ perl Makefile.PL
158      $ make
159      $ make test
160      # make install
161      $ echo 'I love your module!' | mail darren@cpan.org -s "DBD::google"
162
163    The last step is optional; the others are not.
164
165EXAMPLES
166    Here is a complete script that takes a query from the command line and
167    formats the results nicely:
168
169      #!/usr/bin/perl -w
170
171      use strict;
172
173      use DBI;
174      use Text::TabularDisplay;
175
176      my $query = "@ARGV" || "perl";
177
178      # Set up SQL statement -- note the multiple lines
179      my $sql = qq~
180        SELECT
181          title, URL, hostName
182        FROM
183          google
184        WHERE
185          q = "$query"
186      ~;
187
188      # DBI/DBD options:
189      my %opts = ( RaiseError => 1,  # Standard DBI options
190                   PrintError => 0,
191                   lr => [ 'en' ],   # DBD::google options
192                   oe => "utf-8",
193                   ie => "utf-8",
194                 );
195
196      # Get API key
197      my $keyfile = glob "~/.googlekey";
198
199      # Get database handle
200      my $dbh = DBI->connect("dbi:google:", $keyfile, undef, \%opts);
201
202      # Create Text::TabularDisplay instance, and set the columns
203      my $table = Text::TabularDisplay->new;
204      $table->columns("Title", "URL", "Hostname");
205
206      # Do the query
207      my $sth = $dbh->prepare($sql);
208      $sth->execute;
209      while (my @row = $sth->fetchrow_array) {
210          $table->add(@row);
211      }
212      $sth->finish;
213
214      print $table->render;
215
216TODO
217    These are listed in the order in which I'd like to implement them.
218
219    More tests!
220        I'm particularly unimpressed with the test suite for the SQL parser;
221        I think it is pretty pathetic. It needs much better testing, with
222        more edge cases and more things I'm not expecting to find.
223
224        I've specifically avoided including tests that actually query
225        Google, because the free API keys have a daily limit to the number
226        of requests that will be answered. My original test suite did a few
227        dozen queries each time it was run; if you run the test suite a few
228        dozen times in a day (easy to do if you are actively developing the
229        software or changing the feature set), your daily quota can be eaten
230        up very easily.
231
232    Integration of search metadata
233        There are several pieces of metadata that come back with searches;
234        access to the via the statement handle ($sth) would be nice:
235
236          my $search_time = $sth->searchTime();
237          my $total = $sth->estimatedTotalResultsNumber();
238
239        The metadata includes:
240
241        o   documentFiltering
242
243        o   searchTime
244
245        o   estimatedTotalResultsNumber
246
247        o   estimateIsExact
248
249        o   searchTips
250
251        o   searchTime
252
253        These are described in the Net::Google::Response manpage.
254
255    Extensible functions
256        Unknown functions that look like Perl package::function names should
257        probably be treated as such, and AUTOLOADed:
258
259          SELECT Foo::frob(title) FROM google WHERE q = "perl"
260
261        Would do, effectively:
262
263          require Foo;
264          $title = Foo::frob($title);
265
266        I'm slightly afraid of where this could lead, though:
267
268          SELECT title, LWP::Simple::get(url) as WholeDamnThing
269          FROM   google
270          WHERE  q = "perl apache"
271          LIMIT  0, 100
272
273    Elements return objects, instead of strings
274        It would be interesting for columns like URL and hostName to return
275        "URI" and "Net::hostent" objects, respectively.
276
277        On the other hand, this is definitely related to the previous item;
278        the parser could be extended to accept function names in method
279        format:
280
281          SELECT title, URI->new(URL), Net::hostent->new(hostName)
282          FROM google WHERE q = "perl"
283
284    DESCRIBE statement on the "google" table
285        It would be nice to provide a little introspection.
286
287CAVEATS, BUGS, IMPROVEMENTS, SUGGESTIONS, FOIBLES, ETC
288    I've only tested this using my free, 1000-uses-per-day API key, so I
289    don't know how well/if this software will work for those of you who have
290    purchased real licenses for unlimited usage.
291
292    Placeholders are currently unsupported. They won't do any good, but
293    would be nice to have for consistency with other DBDs. I'll get around
294    to it someday.
295
296    There are many Interesting Things that can be done with this module, I
297    think -- suggestions as to what those things might actually be are very
298    welcome. Patches implementing said Interesting Things are also welcome,
299    of course.
300
301    More specifically, queries that the SQL parser chokes on would be very
302    useful, so I can refine the test suite (and the parser itself, of
303    course).
304
305    There are probably a few bugs, though I don't know of any. Please report
306    them via the DBD::google queue at <http://rt.cpan.org/>.
307
308SEE ALSO
309    the DBI manpage, the DBI::DBD manpage, the Net::Google manpage, the
310    URI::Escape manpage, the HTML::Entities manpage
311
312AUTHOR
313    darren chamberlain <darren@cpan.org>
314
315