1\documentclass[final]{ltugboat}
2\usepackage{graphicx,microtype}
3\usepackage[breaklinks,colorlinks,linkcolor=black,citecolor=black,
4           urlcolor=black]{hyperref}
5\def\BibLaTeX{\Bib\LaTeX}
6\interlinepenalty = -300
7
8\title{Biber\Dash{}the next generation \\backend processor for \BibLaTeX}
9\author{Philip L. Kime}
10\address{Z\"{u}rich, Switzerland}
11\netaddress{Philip (at) kime dot org dot uk}
12\personalURL{http://biblatex-biber.sourceforge.net}
13
14\begin{document}
15\maketitle
16\begin{abstract}
17 For many, particularly those
18 writing in the humanities,
19 Philipp Lehman's \BibLaTeX\ package has been a much welcomed
20 innovation in \LaTeX\ bibliography
21 preparation. The ability to avoid the \BibTeX\ stack language and to be
22 able to write sophisticated bibliography styles using a very rich set of
23 \LaTeX\ macros is a considerable advantage. Up until 2009 however,
24 \BibLaTeX\ still relied on \BibTeX\ to sort the bibliography, construct labels and to
25 create the \verb|.bbl|. The requirement for a dedicated backend processor
26 to do such tasks was not going to go away as doing complex, fast sorting
27 in \TeX\ is not a particularly amusing task. It was clear that in the
28 future, the \BibLaTeX\ backend processor needed
29 to be able to handle full Unicode and many feature requests were being
30 raised for things which the backend had to do and which were
31 either impossible or nightmarish to do with \BibTeX. Biber was created to
32 address these issues and this article is about how it works and
33 the many rather nice things it can do. Biber is the recommended backend
34 processor for \BibLaTeX, replacing \BibTeX. There will come a time
35 (probably around \BibLaTeX\ 2) when \BibTeX\ is deprecated for
36 use with \BibLaTeX, so read on \ldots
37\end{abstract}
38
39\section{History}
40Fran\c{c}ois Charette originally started to write Biber in 2008 and after I
41realised that an \acro{APA} style I was writing for \BibLaTeX\ required some
42fundamental changes to the backend processor and that \BibTeX\ wasn't going
43to be it (for why, see below), I had a look at the early Biber. I played
44with it for a while, found a small bug and submitted it. Things escalated
45and development entered a very rapid period where Fran\c{c}ois and I knocked
46Biber into a releasable shape quite quickly. After a year or so, the
47vicissitudes of life pulled Fran\c{c}ois away and I was left
48to my own devices with Biber gaining users rapidly, particularly in
49Germany, probably due to Philipp Lehman's involvement with the development
50as we soon realised we had to coordinate \BibLaTeX\ and Biber releases.
51This continues and \BibLaTeX\ and Biber are now so closely linked, it is
52fair to say that they are essentially one product. As we approach the
53\BibLaTeX~2.0 release, the plan is to drop \BibTeX\ support altogether as
54there are so many features now which are marked ``Biber only'' in the
55\BibLaTeX\ manual. It's those features which I will describe~below.
56
57Fran\c{c}ois says that the name comes from the national animal of the
58last country he lived in, translated into the language of the country
59he currently lives in. It also sounds a bit bibliographical.
60
61\section{What Biber does}
62
63Biber is used just as you would \BibTeX. It's designed to be a drop-in
64replacement for \BibLaTeX\ users. It uses a \BibTeX\ compatible C
65library called ``\verb|btparse|'' and so existing \verb|.bib| files should
66work as-is. When \BibLaTeX\ is told that it's using Biber instead of
67\BibTeX\ as the backend processor, it outputs a special \verb|.bcf| file.
68This is nothing more than a fancy \verb|.aux| file in \XML\ which describes
69all of the necessary options, citation keys and data sources which Biber
70uses to construct the \verb|.bbl|. \XML\ was a natural choice as the options
71can get quite complex (particularly for sorting). Biber reads the
72\verb|.bcf| file, looks for the required data sources, reads them
73and looks for the citation keys also mentioned in the \verb|.bcf|. Then it
74constructs a \verb|.bbl| and writes it. Sounds simple? It's not. Biber is about
7520,000 lines of mostly object-oriented Perl and some of the things it does
76are quite tricky.
77
78\section{Distributing Biber}
79
80Biber is written in Perl. This is an ideal language for such a task, as
81Perl 5.14 (which is what Biber uses now) has full Unicode 6.0 support
82and some really superb modules for collating \acro{UTF-8} which have
83\acro{CLDR}\footnote{Common Locale Data Repository} support, allowing
84sorting to be tailored automatically to the idiosyncrasies of
85particular languages. The \verb|Text::BibTeX| module makes
86parsing \BibTeX\ files easy but I had to change the underlying \verb|btparse| C
87library a little bit to make it deal with \acro{UTF-8} when forming
88initials out of names and to address a few other things which are the
89inevitable consequences of a library written probably fifteen years
90ago; other than that, the library has proven to be a solid foundational
91element of Biber. I have to thank Alberto Manuel Brand\~{o}
92Sim\~{o}es, the current \verb|Text::BibTeX| maintainer for being so
93flexible and releasing new versions so quickly after my hacks.
94
95Distributing Perl programs with such module dependencies is not easy
96and was a major stumbling block to early adoption of Biber. Then I
97came across the marvellous \verb|PAR::Packer| module which allows one
98to package an entire Perl tree with all dependencies into one
99executable which is indistinguishable from a ``real'' executable. One
100virtualised build farm later and Biber had an automated build
101procedure for most major platforms and was swiftly put into
102\TeX{}\ Live. Now all users have to do is to update their \acro{TL}
103installation and type ``\verb|biber|''.
104SourceForge\footnote{\url{https://sourceforge.net/projects/biblatex-biber}}
105is home to regular updates of the development binaries and
106github\footnote{\url{https://github.com/plk/biber}} is home to the
107Perl source which can be used instead of the binary versions if you
108don't mind installing some Perl modules (in fact, I only ever use the
109Perl source version myself).
110
111\section{Unicode and sorting}
112
113One of the main issues with the original \BibTeX\ is that it is \acro{ASCII} only.
114There is an 8-bit version \verb|bibtex8| but that's not really enough these
115days. There is also a newish Unicode version \verb|bibtexu| but that
116doesn't help \BibLaTeX's myriad of other needs for its backend and it doesn't
117help with \acro{CLDR} and the hard problem of complex sorting.
118
119Biber is Unicode 6.0 compliant throughout, even the file names it reads and
120the citation keys themselves. This means that your data sources can be pure
121\acro{UTF-8} which is particularly nice if you are using a \acro{UTF-8} engine like
122\XeTeX\ or Lua\TeX. In fact, Biber will look at the locale settings passed
123by \BibLaTeX\ (or those found in the environment or passed on the command
124line) and automatically (re)encode things to output a \verb|.bbl| in
125whatever encoding you want. It will even automatically convert \acro{UTF-8} to and
126from \LaTeX\ character macros/symbols in case you are using a
127not-quite-Unicode engine like pdf\TeX.
128
129Sorting is one of the most important things that Biber does. Sorting the
130bibliography is done by default using the \acro{UCA} (Unicode Collation Algorithm)
131via the excellent \verb|Unicode::Collate| module. This is \acro{CLDR} aware and so
132it will take notice of the locale from various sources and tailor the sort
133accordingly. Swedes hate it when \"{a} sorts before \r{a} and \acro{CLDR} support
134avoids upsetting Swedes. Sorting a bibliography means dealing with
135sorting requirements such as:
136
137\begin{quote}
138 ``Sort first by name (or editor if there is no name or translator
139 if there is no editor) and then descending by year and month (or by
140 original year and month of publication if there is no year) and then by
141 just the last two digits of the volume and then by title (but case
142 insensitive for title). Oh, and if there is a special shorthand for the
143 entry, sort by that instead and ignore everything else.''
144\end{quote}
145
146\noindent Biber does this in complete generality using a multi-field sorting algorithm
147allowing case sensitivity, direction and substrings to be specified on a
148per-field basis. \BibLaTeX\ defines many common sorting schemes (such as
149name/year/title, etc.)\ but you are free to define your own using a nice
150\LaTeX\ macro interface. This interface makes \BibLaTeX\ write a section in
151the \XML\ \verb|.bcf| which Biber reads to construct the sorting scheme it
152uses to sort the entries before writing the \verb|.bbl|. I am not aware of
153any bibliography system that has better sorting but that may be wishful
154thinking born of spending so much time getting it to work \ldots
155
156\section{Data sources and output}
157
158It may have struck readers as strange that I refer to their \verb|.bib| files
159as ``data sources''. This is because Biber can read more than just \BibTeX\
160format files. It has a modular data source reading/writing architecture and
161so new drivers can be written relatively easily to implement the ability to
162read new data sources and write new output formats. Data sources are read
163and internal entry objects constructed so that the data is processed in a
164source-neutral format internally. Currently, Biber can also read files in
165\acro{RIS} format, Zotero \acro{XML/RDF} format and Endnote \XML\ format but support for
166these formats is experimental, partly due to weaknesses in the formats
167themselves, it has to be said. There is support for remote data sources for
168all formats by specifying a \acro{URL} that returns a file in the format. This is
169quite useful with services such as CiteuLike which has a \verb|.bib|
170gateway.
171
172Biber normally outputs a \verb|.bbl| file but it can also output a GraphViz
173\verb|.dot| file which allows you to visualise your data. This is mainly
174useful for checking complex cross-reference inheritance and other
175entry-linking semantics. Biber can also output \BibLaTeX{}\acro{ML} which is an
176experimental \XML\ data format specially tuned for \BibLaTeX\ (of course it
177can read this too).
178
179A very nice feature of Biber is the ``sourcemap'' option. It is often the
180case that users would like to massage their data sources but they
181have no control over the actual source. Biber allows you to
182specify data mapping rules which are applied to the data as it is read,
183effectively altering the data stream which it sees, but without changing
184the source itself. For example, you can:
185
186\begin{itemize}
187\item Drop all \acro{ABSTRACT} fields as the entries are read so that their strange
188 formatting doesn't break \LaTeX.
189\item Add or modify a \acro{KEYWORD} field in all \acro{BOOK} or \acro{INBOOK}
190 entries which come from a data source called ``\nolinkurl{references.bib}'' whose
191 \acro{TITLE} field matches ``Collected Works'' so that you can split your
192 bibliography using \BibLaTeX\ filters.
193\item Use full Perl regular expressions to match\slash replace in any field in
194 the entry to regularise messy variants of a name so that the same-author
195 disambiguation features of \BibLaTeX\ work nicely.
196\end{itemize}
197
198\noindent The ``sourcemap'' option is quite general and provides a
199linear mapping interface where you can specify a chain of rules to
200apply to each entry as it is read from the data source. The Biber
201\PDF\ manual has many examples.
202
203\section{Uniqueness}
204
205A major feature is the automated disambiguation system. Depending on the
206options which you set in \BibLaTeX, Biber will automatically disambiguate
207names by using either initials or, if necessary, full names. Even better,
208it can, if you like, disambiguate lists of names which have been truncated using
209``et al.''\ by expanding them past the ``et al.''\ to the point of minimal
210unambiguity.  (This is a requirement for \acro{APA} style and the very feature I
211needed when I started looking at Biber. It took two years to get this
212implemented.) This is fairly deep magic as it interacts with name
213disambiguation in an unbounded loop sort of way.
214
215The disambiguation system can be asked to do more subtle types of work
216too, such as
217disambiguating citations just enough to make them unambiguous pointers into
218the bibliography but not enough to make every single individual author
219unambiguous, etc. These are quite fine points and make sense when you read
220the section of the \BibLaTeX\ manual which covers this, with examples.
221Again, I don't know of any other bibliography system that has automated
222this.
223
224\section{Other features}
225
226The following features are all due to feature requests by \BibLaTeX\
227users and some were quite complex to implement. Some of them are
228waiting until \BibLaTeX~2.x for a macro interface to expose them to
229users as this is when it is planned to deprecate \BibTeX\ support in
230\BibLaTeX.
231
232\hfuzz=1.1pt
233\begin{itemize}
234\item Many \BibLaTeX\ options can be set on a per-entrytype basis so you
235 can, for example, choose to truncate names lists of five or more authors
236with ``et al.''\ for \acro{BOOK} entries and choose a different limit
237for \acro{ARTICLE} entries.
238\newpage
239\item Biber only needs one run to do everything, including processing
240 multiple sections.
241\item You can create an entry ``set'' (a group of entries which are
242 referenced/cited together) dynamically, just using \BibLaTeX\
243 macros. With \BibTeX, this requires changes to the data source.
244\item ``Syntactic'' inheritance via a new \acro{XDATA} entrytype and
245 field. This can be thought of as a field-based generalisation of the
246 \BibTeX\ \verb+@STRING+ functionality (which is also
247 supported). \acro{XDATA} entries can cascade so you can inherit
248 specific fields defining a particular publisher or journal, for
249 example.
250\item ``Semantic'' inheritance via a generalisation of the \BibTeX\
251 cross-reference mechanism using the \acro{CROSSREF} field. This is
252 highly customisable by the user\Dash{}it is possible to choose which
253 fields to inherit for which entrytypes and to inherit fields under
254 different names etc. Nested cross-references are also supported.
255\item Support for related entries, to enable generic treatment of things
256 like ``translated as'', ``reprint\-ed as'', ``reprint of''
257 etc. (\BibLaTeX\ 2.x)
258\item Customisable bibliography labels for styles which use labels
259 (\BibLaTeX\ 2.x)
260\item Multiple bibliography lists in the same section with different
261 sorting and filtering.\\(\BibLaTeX~2.x)
262\item No more restriction to a static data model of specific fields and
263 entrytypes. (\BibLaTeX~2.x)
264\item Structural validation of the data against the data model with a
265 customisable validation model (\BibLaTeX\ 2.x)
266\end{itemize}
267
268\smallskip
269\noindent Feature requests and bug reports are always welcome via the
270SourceForge tracker.
271
272\bigskip
273\advance\signaturewidth by 3pc
274\makesignature
275\end{document}
276