1\documentclass[final]{ltugboat} 2\usepackage{graphicx,microtype} 3\usepackage[breaklinks,colorlinks,linkcolor=black,citecolor=black, 4 urlcolor=black]{hyperref} 5\def\BibLaTeX{\Bib\LaTeX} 6\interlinepenalty = -300 7 8\title{Biber\Dash{}the next generation \\backend processor for \BibLaTeX} 9\author{Philip L. Kime} 10\address{Z\"{u}rich, Switzerland} 11\netaddress{Philip (at) kime dot org dot uk} 12\personalURL{http://biblatex-biber.sourceforge.net} 13 14\begin{document} 15\maketitle 16\begin{abstract} 17 For many, particularly those 18 writing in the humanities, 19 Philipp Lehman's \BibLaTeX\ package has been a much welcomed 20 innovation in \LaTeX\ bibliography 21 preparation. The ability to avoid the \BibTeX\ stack language and to be 22 able to write sophisticated bibliography styles using a very rich set of 23 \LaTeX\ macros is a considerable advantage. Up until 2009 however, 24 \BibLaTeX\ still relied on \BibTeX\ to sort the bibliography, construct labels and to 25 create the \verb|.bbl|. The requirement for a dedicated backend processor 26 to do such tasks was not going to go away as doing complex, fast sorting 27 in \TeX\ is not a particularly amusing task. It was clear that in the 28 future, the \BibLaTeX\ backend processor needed 29 to be able to handle full Unicode and many feature requests were being 30 raised for things which the backend had to do and which were 31 either impossible or nightmarish to do with \BibTeX. Biber was created to 32 address these issues and this article is about how it works and 33 the many rather nice things it can do. Biber is the recommended backend 34 processor for \BibLaTeX, replacing \BibTeX. There will come a time 35 (probably around \BibLaTeX\ 2) when \BibTeX\ is deprecated for 36 use with \BibLaTeX, so read on \ldots 37\end{abstract} 38 39\section{History} 40Fran\c{c}ois Charette originally started to write Biber in 2008 and after I 41realised that an \acro{APA} style I was writing for \BibLaTeX\ required some 42fundamental changes to the backend processor and that \BibTeX\ wasn't going 43to be it (for why, see below), I had a look at the early Biber. I played 44with it for a while, found a small bug and submitted it. Things escalated 45and development entered a very rapid period where Fran\c{c}ois and I knocked 46Biber into a releasable shape quite quickly. After a year or so, the 47vicissitudes of life pulled Fran\c{c}ois away and I was left 48to my own devices with Biber gaining users rapidly, particularly in 49Germany, probably due to Philipp Lehman's involvement with the development 50as we soon realised we had to coordinate \BibLaTeX\ and Biber releases. 51This continues and \BibLaTeX\ and Biber are now so closely linked, it is 52fair to say that they are essentially one product. As we approach the 53\BibLaTeX~2.0 release, the plan is to drop \BibTeX\ support altogether as 54there are so many features now which are marked ``Biber only'' in the 55\BibLaTeX\ manual. It's those features which I will describe~below. 56 57Fran\c{c}ois says that the name comes from the national animal of the 58last country he lived in, translated into the language of the country 59he currently lives in. It also sounds a bit bibliographical. 60 61\section{What Biber does} 62 63Biber is used just as you would \BibTeX. It's designed to be a drop-in 64replacement for \BibLaTeX\ users. It uses a \BibTeX\ compatible C 65library called ``\verb|btparse|'' and so existing \verb|.bib| files should 66work as-is. When \BibLaTeX\ is told that it's using Biber instead of 67\BibTeX\ as the backend processor, it outputs a special \verb|.bcf| file. 68This is nothing more than a fancy \verb|.aux| file in \XML\ which describes 69all of the necessary options, citation keys and data sources which Biber 70uses to construct the \verb|.bbl|. \XML\ was a natural choice as the options 71can get quite complex (particularly for sorting). Biber reads the 72\verb|.bcf| file, looks for the required data sources, reads them 73and looks for the citation keys also mentioned in the \verb|.bcf|. Then it 74constructs a \verb|.bbl| and writes it. Sounds simple? It's not. Biber is about 7520,000 lines of mostly object-oriented Perl and some of the things it does 76are quite tricky. 77 78\section{Distributing Biber} 79 80Biber is written in Perl. This is an ideal language for such a task, as 81Perl 5.14 (which is what Biber uses now) has full Unicode 6.0 support 82and some really superb modules for collating \acro{UTF-8} which have 83\acro{CLDR}\footnote{Common Locale Data Repository} support, allowing 84sorting to be tailored automatically to the idiosyncrasies of 85particular languages. The \verb|Text::BibTeX| module makes 86parsing \BibTeX\ files easy but I had to change the underlying \verb|btparse| C 87library a little bit to make it deal with \acro{UTF-8} when forming 88initials out of names and to address a few other things which are the 89inevitable consequences of a library written probably fifteen years 90ago; other than that, the library has proven to be a solid foundational 91element of Biber. I have to thank Alberto Manuel Brand\~{o} 92Sim\~{o}es, the current \verb|Text::BibTeX| maintainer for being so 93flexible and releasing new versions so quickly after my hacks. 94 95Distributing Perl programs with such module dependencies is not easy 96and was a major stumbling block to early adoption of Biber. Then I 97came across the marvellous \verb|PAR::Packer| module which allows one 98to package an entire Perl tree with all dependencies into one 99executable which is indistinguishable from a ``real'' executable. One 100virtualised build farm later and Biber had an automated build 101procedure for most major platforms and was swiftly put into 102\TeX{}\ Live. Now all users have to do is to update their \acro{TL} 103installation and type ``\verb|biber|''. 104SourceForge\footnote{\url{https://sourceforge.net/projects/biblatex-biber}} 105is home to regular updates of the development binaries and 106github\footnote{\url{https://github.com/plk/biber}} is home to the 107Perl source which can be used instead of the binary versions if you 108don't mind installing some Perl modules (in fact, I only ever use the 109Perl source version myself). 110 111\section{Unicode and sorting} 112 113One of the main issues with the original \BibTeX\ is that it is \acro{ASCII} only. 114There is an 8-bit version \verb|bibtex8| but that's not really enough these 115days. There is also a newish Unicode version \verb|bibtexu| but that 116doesn't help \BibLaTeX's myriad of other needs for its backend and it doesn't 117help with \acro{CLDR} and the hard problem of complex sorting. 118 119Biber is Unicode 6.0 compliant throughout, even the file names it reads and 120the citation keys themselves. This means that your data sources can be pure 121\acro{UTF-8} which is particularly nice if you are using a \acro{UTF-8} engine like 122\XeTeX\ or Lua\TeX. In fact, Biber will look at the locale settings passed 123by \BibLaTeX\ (or those found in the environment or passed on the command 124line) and automatically (re)encode things to output a \verb|.bbl| in 125whatever encoding you want. It will even automatically convert \acro{UTF-8} to and 126from \LaTeX\ character macros/symbols in case you are using a 127not-quite-Unicode engine like pdf\TeX. 128 129Sorting is one of the most important things that Biber does. Sorting the 130bibliography is done by default using the \acro{UCA} (Unicode Collation Algorithm) 131via the excellent \verb|Unicode::Collate| module. This is \acro{CLDR} aware and so 132it will take notice of the locale from various sources and tailor the sort 133accordingly. Swedes hate it when \"{a} sorts before \r{a} and \acro{CLDR} support 134avoids upsetting Swedes. Sorting a bibliography means dealing with 135sorting requirements such as: 136 137\begin{quote} 138 ``Sort first by name (or editor if there is no name or translator 139 if there is no editor) and then descending by year and month (or by 140 original year and month of publication if there is no year) and then by 141 just the last two digits of the volume and then by title (but case 142 insensitive for title). Oh, and if there is a special shorthand for the 143 entry, sort by that instead and ignore everything else.'' 144\end{quote} 145 146\noindent Biber does this in complete generality using a multi-field sorting algorithm 147allowing case sensitivity, direction and substrings to be specified on a 148per-field basis. \BibLaTeX\ defines many common sorting schemes (such as 149name/year/title, etc.)\ but you are free to define your own using a nice 150\LaTeX\ macro interface. This interface makes \BibLaTeX\ write a section in 151the \XML\ \verb|.bcf| which Biber reads to construct the sorting scheme it 152uses to sort the entries before writing the \verb|.bbl|. I am not aware of 153any bibliography system that has better sorting but that may be wishful 154thinking born of spending so much time getting it to work \ldots 155 156\section{Data sources and output} 157 158It may have struck readers as strange that I refer to their \verb|.bib| files 159as ``data sources''. This is because Biber can read more than just \BibTeX\ 160format files. It has a modular data source reading/writing architecture and 161so new drivers can be written relatively easily to implement the ability to 162read new data sources and write new output formats. Data sources are read 163and internal entry objects constructed so that the data is processed in a 164source-neutral format internally. Currently, Biber can also read files in 165\acro{RIS} format, Zotero \acro{XML/RDF} format and Endnote \XML\ format but support for 166these formats is experimental, partly due to weaknesses in the formats 167themselves, it has to be said. There is support for remote data sources for 168all formats by specifying a \acro{URL} that returns a file in the format. This is 169quite useful with services such as CiteuLike which has a \verb|.bib| 170gateway. 171 172Biber normally outputs a \verb|.bbl| file but it can also output a GraphViz 173\verb|.dot| file which allows you to visualise your data. This is mainly 174useful for checking complex cross-reference inheritance and other 175entry-linking semantics. Biber can also output \BibLaTeX{}\acro{ML} which is an 176experimental \XML\ data format specially tuned for \BibLaTeX\ (of course it 177can read this too). 178 179A very nice feature of Biber is the ``sourcemap'' option. It is often the 180case that users would like to massage their data sources but they 181have no control over the actual source. Biber allows you to 182specify data mapping rules which are applied to the data as it is read, 183effectively altering the data stream which it sees, but without changing 184the source itself. For example, you can: 185 186\begin{itemize} 187\item Drop all \acro{ABSTRACT} fields as the entries are read so that their strange 188 formatting doesn't break \LaTeX. 189\item Add or modify a \acro{KEYWORD} field in all \acro{BOOK} or \acro{INBOOK} 190 entries which come from a data source called ``\nolinkurl{references.bib}'' whose 191 \acro{TITLE} field matches ``Collected Works'' so that you can split your 192 bibliography using \BibLaTeX\ filters. 193\item Use full Perl regular expressions to match\slash replace in any field in 194 the entry to regularise messy variants of a name so that the same-author 195 disambiguation features of \BibLaTeX\ work nicely. 196\end{itemize} 197 198\noindent The ``sourcemap'' option is quite general and provides a 199linear mapping interface where you can specify a chain of rules to 200apply to each entry as it is read from the data source. The Biber 201\PDF\ manual has many examples. 202 203\section{Uniqueness} 204 205A major feature is the automated disambiguation system. Depending on the 206options which you set in \BibLaTeX, Biber will automatically disambiguate 207names by using either initials or, if necessary, full names. Even better, 208it can, if you like, disambiguate lists of names which have been truncated using 209``et al.''\ by expanding them past the ``et al.''\ to the point of minimal 210unambiguity. (This is a requirement for \acro{APA} style and the very feature I 211needed when I started looking at Biber. It took two years to get this 212implemented.) This is fairly deep magic as it interacts with name 213disambiguation in an unbounded loop sort of way. 214 215The disambiguation system can be asked to do more subtle types of work 216too, such as 217disambiguating citations just enough to make them unambiguous pointers into 218the bibliography but not enough to make every single individual author 219unambiguous, etc. These are quite fine points and make sense when you read 220the section of the \BibLaTeX\ manual which covers this, with examples. 221Again, I don't know of any other bibliography system that has automated 222this. 223 224\section{Other features} 225 226The following features are all due to feature requests by \BibLaTeX\ 227users and some were quite complex to implement. Some of them are 228waiting until \BibLaTeX~2.x for a macro interface to expose them to 229users as this is when it is planned to deprecate \BibTeX\ support in 230\BibLaTeX. 231 232\hfuzz=1.1pt 233\begin{itemize} 234\item Many \BibLaTeX\ options can be set on a per-entrytype basis so you 235 can, for example, choose to truncate names lists of five or more authors 236with ``et al.''\ for \acro{BOOK} entries and choose a different limit 237for \acro{ARTICLE} entries. 238\newpage 239\item Biber only needs one run to do everything, including processing 240 multiple sections. 241\item You can create an entry ``set'' (a group of entries which are 242 referenced/cited together) dynamically, just using \BibLaTeX\ 243 macros. With \BibTeX, this requires changes to the data source. 244\item ``Syntactic'' inheritance via a new \acro{XDATA} entrytype and 245 field. This can be thought of as a field-based generalisation of the 246 \BibTeX\ \verb+@STRING+ functionality (which is also 247 supported). \acro{XDATA} entries can cascade so you can inherit 248 specific fields defining a particular publisher or journal, for 249 example. 250\item ``Semantic'' inheritance via a generalisation of the \BibTeX\ 251 cross-reference mechanism using the \acro{CROSSREF} field. This is 252 highly customisable by the user\Dash{}it is possible to choose which 253 fields to inherit for which entrytypes and to inherit fields under 254 different names etc. Nested cross-references are also supported. 255\item Support for related entries, to enable generic treatment of things 256 like ``translated as'', ``reprint\-ed as'', ``reprint of'' 257 etc. (\BibLaTeX\ 2.x) 258\item Customisable bibliography labels for styles which use labels 259 (\BibLaTeX\ 2.x) 260\item Multiple bibliography lists in the same section with different 261 sorting and filtering.\\(\BibLaTeX~2.x) 262\item No more restriction to a static data model of specific fields and 263 entrytypes. (\BibLaTeX~2.x) 264\item Structural validation of the data against the data model with a 265 customisable validation model (\BibLaTeX\ 2.x) 266\end{itemize} 267 268\smallskip 269\noindent Feature requests and bug reports are always welcome via the 270SourceForge tracker. 271 272\bigskip 273\advance\signaturewidth by 3pc 274\makesignature 275\end{document} 276