• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

Collect/H16-Oct-2021-1,6531,107

Core/H16-Oct-2021-13,11010,230

IndexSearch/H16-Oct-2021-14,29510,447

Monitor/H16-Oct-2021-1,131737

SQL/H16-Oct-2021-3,7882,694

Tokenize/H16-Oct-2021-9,5556,473

UI/H16-Oct-2021-15,87712,496

Utils/H16-Oct-2021-4,6593,212

po/H16-Oct-2021-14,45011,087

scripts/bash/H16-Oct-2021-263201

AUTHORSH A D16-Oct-20214.6 KiB117107

COPYINGH A D16-Oct-202117.6 KiB341281

ChangeLogH A D16-Oct-202136.5 KiB1,138786

ChangeLog-dijonH A D16-Oct-202130 KiB823641

ChangeLog-svnH A D16-Oct-2021713.1 KiB19,63215,868

FAQH A D16-Oct-20214.4 KiB8673

INSTALLH A D16-Oct-20219.3 KiB237179

LICENSEH A D16-Oct-202117.6 KiB340281

Makefile.amH A D16-Oct-20216.2 KiB117107

NEWSH A D16-Oct-202145.7 KiB1,1841,128

READMEH A D16-Oct-202123.4 KiB586446

TODOH A D16-Oct-20215.1 KiB10997

acinclude.m4H A D16-Oct-20211 KiB4336

aclocal.m4H A D16-Oct-2021391.4 KiB11,0029,944

autogen.shH A D16-Oct-20211.6 KiB6249

configure.inH A D16-Oct-202112.2 KiB434401

globalconfig.xmlH A D16-Oct-2021955 3113

ltmain.shH A D16-Oct-2021316.6 KiB11,1507,980

mkinstalldirsH A D16-Oct-20211.9 KiB11285

pinot-dbus-daemon.desktopH A D16-Oct-2021379 1817

pinot-prefs.desktopH A D16-Oct-2021215 109

pinot.desktopH A D16-Oct-20211.3 KiB3635

pinot.spec.inH A D16-Oct-20214.3 KiB136121

README

1Pinot
2Copyright 2005-2021 Fabrice Colin <fabrice dot colin at gmail dot com>
3
4Homepage - https://github.com/FabriceColin/pinot
5 previously hosted at http://code.google.com/p/pinot-search/
6 and http://pinot.berlios.de/
7Translations - https://translations.launchpad.net/pinot/trunk/+pots/pinot
8
9
101. What is Pinot
112. Building Pinot
122. Available engines
133. Indexes
144. Indexing and monitoring
155. Searching
166. Viewing cached results
177. File formats
188. File patterns
199. Digging deeper
2010. Saving results
2111. D-Bus service & daemon
2212. CJKV support
2313. Environment variables and aliases
2414. How to reset indexes
2515. Compiling
26
27
281. What is Pinot
29
30
31  Pinot combines desktop search and metasearch. It consists of :
32    * a D-Bus service daemon that crawls, indexes, monitors your documents
33      and that plugs into the GNOME Shell search system ("pinot-dbus-daemon")
34    * a GTK3-based user interface that enables to query the index built by
35      the service as well as Web engines, and which can display and analyze
36      the results ("pinot")
37    * other command-line tools
38
39  It was developed and tested on GNU/Linux and should work on other Unix-like
40  systems.
41
42
432. Available engines
44
45
46  One of the main functionalities of Pinot is metasearch. This lets you query
47  a variety of sources, including Web-based search engines. By default, the
48  list of available engines is hidden and defaults to internal indexes (see
49  section "3. Indexes"). To show the list of engines, click on the Show All
50  Search Engines button, next to the Query field immediately below the menu
51  bar. Click on the same button again to hide the list.
52
53  Any number of engine or engine group may be selected at any one time.
54  Multi-selection is done like in any other application. All queries are always
55  run against the list of currently selected engines.
56
57  Pinot supports both Sherlock and OpenSearch Description plugins. They are
58  installed in $PREFIX/share/pinot/engines/, where PREFIX is usually /usr.
59  Additional engines can be installed in that directory or in ~/.pinot/engines.
60  Note this directory is not created automatically.
61
62  Sherlock is what Firefox and the Mozilla Suite use. Chances are that somebody
63  wrote a plugin for the engine you are interested in. Beware that a lot are
64  out of date and will require some changes. Use pinot-search on the
65  command-line to run a quick check on a plugin, eg
66  $ pinot-search sherlock $PREFIX/share/pinot/engines/Bozo.src "clowns"
67
68  Plugins are categorized by channels. For Sherlock plugins, the routeType
69  element under SEARCH specifies the name of the channel the plugin belongs to.
70
71  As for OpenSearch, Pinot should work with OpenSearch Description 1.0 and 1.1
72  (draft 2) plugins. Keep in mind that the spec doesn't describe how to parse
73  the results pages returned by search engines, therefore Pinot assumes that
74  engines return results formatted according to the OpenSearch Response
75  standard.
76  In practice, this means that plugins that don't stick to the following rules
77  will be ignored or won't show any result :
78    * For Description 1.1 plugins, the type attribute on the Url field must be
79      set to "application/atom+xml" or "application/rss+xml" (default).
80      "text/html" will be rejected.
81    * The search engine's results page content type must be some form of XML,
82      otherwise Pinot won't attempt parsing it.
83  Pinot differs from the Description spec in that it interprets the Tags field
84  as a channel name. The standard defines Tags as a "space-delimited set of
85  words that are used as keywords to identify and categorize this search
86  content".
87
88  The "Xapian Omega" plugin allows to query a locally installed instance of
89  Xapian Omega at http://localhost/. If Omega is installed elsewhere, edit
90  $PREFIX/share/pinot/engines/OmegaDescription.xml.
91
92
933. Indexes
94
95
96  Pinot has two internal indexes. My Documents is populated by the D-Bus
97  service and contains documents found on your computer. My Web Pages is
98  populated by the UI whenever you :
99    * import an external document, using the Index, Import URL menu
100    * index results returned by Web engines, using the Results, Index menu
101      or through a Stored Query
102  Both index may have any of the file types listed in section "7. File formats".
103
104  Indexes built by any other Xapian-based tools can be added to Pinot. To add
105  an external index, click the + button at the bottom of the engines list.
106  It can either be local, in which case you will have to select the directory
107  where it is found, or served from a remote machine by xapian-tcpsrv. See
108  the manual page for xapian-tcpsrv(1).
109
110  All indexes are grouped together under the channel Current User in the
111  engines list.
112
113
1144. Indexing and monitoring
115
116
117  Pinot can index any directory configured under the Indexing tab of the
118  Preferences box. Monitoring is optional and should be disabled for the
119  directories whose contents seldom change, eg $PREFIX/share/doc.
120  Indexing and monitoring of directories is handled by the D-Bus service.
121  The number of files and directories that can be monitored is capped by
122  the value of /proc/sys/fs/inotify/max_user_watches - 1024.
123
124  Symlinks are not followed but are still indexed, with the MIME type
125  "inode/symlink".
126
127  While Pinot is not currently able to get to and index application-specific
128  data held in dot-directories, it can index common file formats as listed
129  in section "7. File formats".
130
131  All files and directories with a name that starts with a dot, eg
132  ".thunderbird", are skipped and their content is not indexed. If you wish
133  to include the contents of some dot-directory, create a symlink to a
134  directory that is configured in Preferences. For instance, if "~/Documents"
135  is configured for indexing, create a symlink from "~/.thunderbird" to
136  "~/Documents/TMail". For this to work, the dot-directory must not be in a
137  directory configured for indexing.
138
139  If you want to exclude any specific files or directories from indexing, use
140  patterns as described in section "8. File patterns".
141
142  Pinot supports stopwords removal. While no such list is provided by default,
143  they can be easily found on the Internet. Each language has its own stopword
144  list, for instance a stopwords list for English should be copied to
145  $PREFIX/share/pinot/stopwords/stopwords.en
146
147  Language detection is done with libexttextcat. Ensure that the paths listed
148  in /etc/pinot/textcat_conf.txt are correct.
149
150  The pinot-index program allows indexing and peeking at documents' properties
151  from the command-line. Using the -i/--index option with the My Documents or
152  My Web Pages index is not recommended. For more details, see the manual page
153  for pinot-index(1).
154
155
1565. Searching
157
158
159  Searches are run differently based on the type of engine being queried.
160
161  When querying a Web engine, Pinot assumes this engine understands the query,
162  which is sent as is. No pre-processing is performed on the text of the query,
163  and the results list is more or less presented as retrieved from the Web
164  engine.
165
166  When querying an index, things are somewhat different. Queries can be
167  expressed in a very natural way, using a combination of operators, filters
168  and ranges. This query syntax is the syntax supported natively by Xapian's
169  QueryParser and is documented at http://www.xapian.org/docs/queryparser.html
170  For instance, the query "type:text/html AND lang:en AND (tcp NEAR ip)" will
171  look for HTML files in English that mention TCP/IP. Note that all operators
172  should be specified in capitals, eg "AND" not "and". The latter will be
173  treated as a regular term.
174
175  Pinot supports these query filters :
176      "site" for host name, eg "site:github.com"
177      "file" for file name, eg "file:index.html"
178      "ext" for file extension, eg "ext:html"
179      "title" for title, eg "title:pinot"
180      "url" for URL, eg "url:https://github.com/"
181      "dir" for directory, eg "dir:/home/fabrice"
182      "inurl" for documents embedded in a URL, eg "inurl:file:///home/fabrice/Documents/backup.tar.gz"
183      "lang" for ISO language code, eg "lang:en"
184      "type" for MIME type, eg "type:text/html"
185      "class" for MIME type classification, eg "class:text"
186      "label" for label, eg "label:Important"
187
188  The directory filter is recursive, ie it applies to sub-directories.
189  Allowed language codes are "da", "nl", "en", "fi", "fr", "de", "hu", "it",
190  "nn", "pt", "ro", "ru", "es", "sv" and "tr".
191
192  Stemming is available to stored queries for which a stemming language is
193  defined. If such a query doesn't return any exact match, the query terms are
194  stemmed and the query is run again. Stopwords are also then removed if a
195  stopwords list was found for the stemming language.
196
197  The values of "file", "url", "dir" and "label" may be double-quoted. It's also
198  worth pointing out that the query "dir:/X/Y" will return files and directories
199  located in /X/Y, but not Y itself, which is what "dir:/X file:Y" would do.
200
201  In addition, these ranges are supported :
202      "YYYYMMDD..YYYYMMDD" for date ranges, eg "20070801..20070831"
203      "HHMMSS..HHMMSS" for time ranges, eg "090000..180000"
204      "size0..size1b" for size in bytes, eg "0..10240b"
205
206  See the manual page for pinot-search(1) for examples.
207
208
2096. Viewing cached results
210
211
212  Results returned by search engines can be viewed "live" by selecting the View
213  menuitem under Results. This opens whatever application defined for the
214  result's MIME type and/or protocol scheme.
215  In addition, Pinot allows to view the page as cached by Google and the Wayback
216  Machine. Cache providers are actually configured in globalconfig.xml, located
217  in /etc/pinot/. For instance :
218  <cache>
219    <name>Google</name>
220    <location>http://www.google.com/search?q=cache:%url0</location>
221    <protocols>http, https</protocols>
222  </cache>
223
224  This is self-explanatory :-) Here it configures a cache provider called
225  "Google" that handles both http and https. The location field supports
226  two parameters that are substituted to obtain the URL to open :
227    * %url is the result's URL as displayed by the UI, eg
228      https://github.com/FabriceColin/pinot
229    * %url0 is the result's URL without the protocol, eg
230      github.com/FabriceColin/pinot
231
232
2337. File formats
234
235
236  The following document types are supported internally :
237    * plain text
238    * HTML
239    * XML
240    * mbox, including attachments and embedded documents
241    * MP3, Ogg Vorbis, FLAC
242    * JPEG
243    * common archive formats (tar, Z, gz, bzip2, deb)
244    * ISO 9660 images
245
246  The following document types are supported through external programs :
247    * PDF (pdftotext required)
248    * RTF (unrtf required)
249    * ReStructured Text (rst2txt required)
250    * OpenDocument/StarOffice files (unzip required)
251    * MS Word (antiword required)
252    * PowerPoint (catppt required)
253    * Excel (xls2csv required)
254    * DVI (catdvi required)
255    * DjVu (djvutext required)
256    * RPM (rpm required)
257
258  For other document types, Pinot will only index metadata such as name,
259  location etc... If you wish to add support for another document type, and
260  know of a command-line program that can handle that type, add it to
261  external-filters.xml, located in /etc/pinot/.
262
263
2648. File patterns
265
266
267  It is possible to skip indexing of files that match glob(3) patterns.
268  These patterns are configured in the Indexing tab of the Preferences box,
269  and can be used as a blacklist or a whitelist.
270
271  Patterns apply to files and directories. For instance, blacklisting
272  "*/Desktop*" will skip "~/Desktop" and not crawl nor monitor this directory's
273  contents. Similarly, a blacklist entry for "*.avi" means that Pinot will not
274  attempt indexing the content of AVI files, and will ignore all monitor events
275  related to these files.
276
277  If you have never run Pinot before, the list will be pre-configured to skip
278  some picture, video and archive file types such as GIF, MPG and RAR.
279
280
2819. Digging deeper
282
283
284  Pinot offers two ways you can dig deeper in your documents : More Like This
285  suggests terms specific to documents that may help in finding related
286  documents, and Search This For allows to search in results.
287  Both features are enabled if one or more of the results currently selected
288  is indexed, and only operate on those.
289
290  When activated, More Like This will create a new Stored Query prefixed with
291  "More Like". For instance, if you run a Stored Query with name "Me", the
292  expanded query's name will be "More Like Me".
293
294  Search For This will search those results for the Stored Query selected in
295  the sub-menu and will present results in a new tab. For instance, running
296  the Stored Query "Me" on a set of results will open a "Me In Results" tab.
297
298  In addition to these, Pinot may suggest alternative spellings for queries
299  that don't return any result. If it does, a new Stored Query prefixed with
300  "Corrected" will be created.
301
302
30310. Saving results
304
305
306  Lists of results can be saved to disk by selecting the Save As menuitem
307  under Results. Two output formats are available to choose from in the file
308  selector opened by Save As :
309    * CSV, a text format
310      The semi-colon character (';') is used to delimit fields.
311    * OpenSearch response, a XML/RSS format
312      See https://en.wikipedia.org/wiki/OpenSearch for details.
313
314
31511. D-Bus service & daemon
316
317
318  Unless Pinot was built without support for D-Bus, the daemon program
319  "pinot-dbus-daemon" implements the D-Bus service and should be
320  auto-started through the desktop file installed at
321  /etc/xdg/autostart/pinot-dbus-daemon.desktop.
322
323  D-Bus activation makes sure the service is running whenever one of its
324  methods is invoked by any consumer application. For instance, clicking
325  OK on the Preferences box will call the service's Reload method, which
326  should start the service. This method also causes the service to reload
327  the configuration file.
328
329  A few things to keep in mind :
330    * when starting, the service will first crawl all configured locations
331      and (re)index new and modified files. The daemon's scheduling priority
332      is set very low (15, can be adjusted with --priority) so that it
333      hopefully doesn't prevent other activities. Crawling is suspended
334      while the system is on battery.
335    * when finished crawling, the service will monitor some locations for
336      changes (as per preferences) and should consume little resources, unless
337      a huge quantity of files needs its attention.
338    * any change detected by the monitor is queued and acted upon as soon as
339      possible, eg reindex a file that was modified.
340    * operations that involve communicating with the service, such as editing
341      documents metadata, may timeout if the system is under heavy load and/or
342      the daemon is busy. In most cases, the message will have been received
343      by the daemon, but the reply may take longer than expected. The Pinot
344      UI may report that the operation failed, even though it was queued for
345      processing and will be acted upon by the daemon.
346
347  See section "13. Environment variables and aliases" for some tips on how to
348  query the D-Bus interface. A list of available D-Bus methods can be found
349  in the file pinot-dbus-daemon.xml.
350
351  Pinot v1.20 implements the GNOME Shell search provider interface to allow
352  searching the contents of files the daemon found at locations it crawled,
353  basically the My Documents index. Go to the GNOME Settings' Search screen
354  to enable Pinot as a provider. For this to work, the file
355  com.github.fabricecolin.Pinot.search-provider.ini should be in the folder
356  $PREFIX/share/gnome-shell/search-providers/
357
358
35912. CJKV support
360
361
362  Pinot supports indexing and searching CJKV text.
363
364  At search time, queries that include CJKV characters are processed in a manner
365  compatible with the CJKV indexing scheme. There is no need to format the query
366  in a specific format, ie no need to separate characters with spaces.
367  For example, the query :
368      Fabrice 你好 title:身体好吗
369  will be modified internally to :
370      Fabrice  (你 你好 好) title:身 title:身体 title:体 title:体好 title:好 title:好吗 title:吗
371
372  It is recommended that filters (eg "title") be used at the end of the query
373  for it to be processed as expected.
374
375  You can get a list of documents in which CJKV characters were detected
376  by the indexer with the special filter "tokens:CJKV".
377
378
37913. Environment variables and aliases
380
381
382  Pinot tries to provide reasonable defaults for most systems, but there may be
383  situations where you want to tweak these values through environment variables :
384    * PINOT_SPELLING_DB
385      By default, Pinot builds indexes with a spelling database. This spelling
386      database may make up as much as a third of the size of the index.
387      If your system is low on disk space, you can disable this with
388      $ export PINOT_SPELLING_DB=NO
389      Make sure this is set for your login session, ie whenever the daemon is
390      auto-started. You will also have to reset indexes, as described in
391      section "16. How to reset indexes".
392    * PINOT_MINIMUM_DISK_SPACE
393      The daemon will stop crawling and indexing files when the partition on
394      which the index resides runs out of free space. By default, this means
395      less than 50 Mb. To change this value to 100 Mb for instance, use
396      $ export PINOT_MINIMUM_DISK_SPACE=100
397    * PINOT_MAXIMUM_INDEX_THREADS
398      This sets the maximum number of concurrent indexing threads used by the
399      daemon. The default value is 1.
400    * PINOT_MAXIMUM_NESTED_SIZE
401      This limits the extraction of documents nested inside others, such as
402      archives or mail messages, based on their size. By default, this is
403      deactivated and set to 0.
404    * PINOT_MAXIMUM_QUERY_RESULTS
405      This overrides the number of results returned by queries run through
406      the UI's Query field as well as the number of results initially set
407      for new stored queries.
408
409  Another environment variable that you may want to tweak comes from Xapian.
410  XAPIAN_FLUSH_THRESHOLD can be set to the number of documents after which
411  Xapian is to flush changes to the index. The default value is set to 10000
412  at the time of writing this.
413  Lowering this value should decrease the amount of memory used to cache
414  changes to the index.
415
416  Pinot provides a "tagged cd" script that enables to change a shell's
417  current directory to the directory that matches the path elements passed
418  as parameter. For instance, after setting :
419  $ alias pcd='. $PREFIX/share/pinot/pinot-cd.sh'
420  if ~/Documents is configured for indexing in Preferences, the following
421  command would change the current directory to ~/Documents/Web/Stats :
422  $ pcd Documents Stats
423  If other directories match the given paths, pinot-cd.sh will display a list
424  of matches. Future work will focus on disambiguation.
425
426  If you have dbus-send installed, you may also want to set the following
427  aliases :
428  $ alias pinot-stats='dbus-send --session --print-reply --type=method_call \
429    --dest=com.github.fabricecolin.Pinot /com/github/fabricecolin/Pinot com.github.fabricecolin.Pinot.GetStatistics'
430  $ alias pinot-stop='dbus-send --session --print-reply --type=method_call \
431    --dest=com.github.fabricecolin.Pinot /com/github/fabricecolin/Pinot com.github.fabricecolin.Pinot.Stop'
432  The first will start the service daemon by calling its GetStatistics method,
433  while the second alias will send it a request to stop and exit.
434
435
43614. How to reset indexes
437
438
439  You may wish to reset one of the index and start from scratch. There
440  are several ways to do this, depending on which index it is.
441
442  If you want to reset My Web Pages, you can either :
443    * use Pinot to unindex every single document by selecting them all
444      and choosing Unindex in the Index menu
445    * or stop Pinot and delete ~/.pinot/index recursively
446
447  If you want to reset My Documents, special considerations apply because
448  of the historical data maintained by the daemon. There are two ways to
449  proceed, and both require that the daemon be stopped.
450
451  The manual way is to delete the index with
452  $ rm -rf ~/.pinot/daemon
453  and remove historical data with
454  $ sqlite3 ~/.pinot/history-daemon "delete from CrawlHistory; delete from CrawlSources; delete from ActionQueue;"
455  If you want to start from scratch and drop metadata (eg labels) that may
456  exist on some documents, remove the history file altogether with
457  $ rm -f ~/.pinot/history-daemon
458
459  The automated way is to tell the daemon to reindex everything by launching
460  it with the "--reindex" option, ie
461  $ pinot-dbus-daemon --reindex
462  It may be useful to take a look at the log file located at
463  ~/.pinot/pinot-dbus-daemon.log.
464
46515. Compiling
466
467
468  Pinot's configure understands the following optional switches.
469
470  --enable-debug enable debug [default=no]
471  --enable-dbus enable DBus support [default=yes]
472  --enable-libnotify enable libnotify support [default=no]
473  --enable-mempool enable memory pool [default=no]
474  --enable-libarchive [enable the libarchive filter [default=no]
475  --enable-chmlib [enable the chmlib filter [default=no]
476
477  Enable support for libarchive and chmlib if the necessary
478  libraries are available. Enable libnotify support when building
479  on BSD systems. Other switches should most likely stay unchanged.
480
481  See the list below for dependencies. The version numbers indicate
482  the minimum version Pinot has been tested with; older versions may
483  or may not work.
484
485---------------------------------------------------------------
486Libraries and tools					Version
487---------------------------------------------------------------
488SQLite							3.3.1
489http://www.sqlite.org/
490
491xapian-core						1.4.10
492http://www.xapian.org/
493
494 zlib							1.2.0
495 http://www.gzip.org/zlib/
496
497curl (1)						7.13.1
498http://curl.haxx.se/
499- OR -
500neon (1)						0.24.7
501http://www.webdav.org/neon/
502
503gdbus-codegen-glibmm (2)
504https://github.com/Pelagicore/gdbus-codegen-glibmm
505
506gtkmm							3.24
507http://www.gtkmm.org/
508
509libxml++						2.12.0
510http://libxmlplusplus.sourceforge.net/
511
512libexttextcat						3.2
513http://cgit.freedesktop.org/libreoffice/libexttextcat/
514
515gmime (3)						2.6.0
516http://spruce.sourceforge.net/gmime
517
518boost (4)						1.75
519http://www.boost.org/
520
521D-Bus with GLib bindings				0.61
522http://www.freedesktop.org/wiki/Software/dbus
523
524shared-mime-info					0.17
525http://freedesktop.org/Software/shared-mime-info
526
527desktop-file-utils					0.10
528http://www.freedesktop.org/software/desktop-file-utils
529
530TagLib							1.4
531http://ktown.kde.org/~wheeler/taglib/
532
533libarchive (5)						2.6.2
534http://people.freebsd.org/~kientzle/libarchive/
535
536exiv2							0.21
537http://www.exiv2.org/
538
539chmlib (6)						0.40
540http://www.jedrea.com/chmlib/
541
542openssh-askpass (7)					4.3
543http://www.openssh.com/portable.html
544
545---------------------------------------------------------------
546External filter programs
547---------------------------------------------------------------
548unzip
549http://www.info-zip.org/pub/infozip/UnZip.html
550
551pdftotext
552http://www.foolabs.com/xpdf/
553http://poppler.freedesktop.org/
554
555antiword
556http://www.winfield.demon.nl/
557
558unrtf
559http://www.gnu.org/software/unrtf/unrtf.html
560
561rst2txt
562https://github.com/stephenfin/rst2txt
563
564djvutxt
565http://djvu.sourceforge.net/
566
567catdvi
568http://catdvi.sourceforge.net/
569
570catppt
571xls2csv
572http://www.wagner.pp.ru/~vitus/software/catdoc/
573
574---------------------------------------------------------------------
575Notes :
576(1) enabled with "./configure --with-http=neon|curl"
577(2) only to regenerate DBus code, with "make dbus-code"
578(3) for gmime 2.4.0 support, edit configure.in
579(4) for building only
580    with boost > 1.48 and < 1.54, turning off memory pooling with "./configure --enable-mempool=no" may be preferable
581(5) optional - enabled with "./configure --enable-libarchive=yes"
582(6) optional - enabled with "./configure --enable-chmlib=yes"
583(7) experimental - required only if _SSH_TUNNEL is set
584---------------------------------------------------------------------
585
586