1\name{bibConvert}
2\alias{bibConvert}
3\title{Convert between bibliography formats}
4\description{
5
6  Read a bibliography file in one of the supported formats, convert it
7  to nnother format, and write it to a file.
8
9}
10\usage{
11bibConvert(infile, outfile, informat, outformat, \dots, tex, encoding,
12           options)
13}
14\arguments{
15  \item{infile}{input file, a character string.}
16  \item{outfile}{output file, a character string.}
17  \item{informat}{
18    input format, a character string, see sections \dQuote{Supported
19    formats} and \dQuote{Details}.
20  }
21  \item{outformat}{
22    output format, a character string, see sections \dQuote{Supported
23    formats} and \dQuote{Details}.
24  }
25  \item{...}{not used.}
26  \item{tex}{TeX specific options, see Details, a character vector.}
27  \item{encoding}{
28    \code{character(2)}, a length two vector specifying input and output
29    encodings. Default to both is \code{"utf8"}, see Details.
30  }
31  \item{options}{
32    mainly for debugging: additional options for the converters, see
33    Details.
34  }
35}
36\details{
37
38  Arguments \code{informat} and \code{outformat} can usually be omitted,
39  since \code{bibConvert} infers them from the extensions of the names
40  of the input and output files, see section "File extensions" below.
41  However, there is ambiguity for the extension \code{"bib"}, since it
42  is used for Bibtex and BibLaTeX entries. For this extension, the
43  default for both, \code{informat} and \code{outformat}, is
44  \code{"bibtex"}.
45
46  Package \pkg{rbibutils} supports format \code{"bibentry"}, in addition
47  to the formats supported by the bibutils library. A \code{bibentry}
48  object contains one or more references.  Two formats are supported for
49  \code{"bibentry"} for both input and output. A bibentry object
50  previously saved to a file using \code{saveRDS} (default extension
51  \code{"rds"}) or an R source file containing one or more
52  \code{bibentry} commands. The \code{"rds"} file is just read in and
53  should contain a \code{bibentry} object.
54
55  When \code{bibconvert} outputs to an R source file, two variants are
56  supported: \code{"R"} and \code{"Rstyle"}.  When (\code{outformat =
57  "R"}, there is one \code{bibentry} call for each reference, just as in
58  a Bibtex file, each reference is a single entry.  \code{outformat =
59  "Rstyle"} uses the format of \code{print(be, style = "R")}, i.e., the
60  \code{bibentry} calls are output as a comma separated sequence wrapped
61  in \code{c()}. For input, it is not necessary to specify which
62  variant is used.
63
64  % Such a file can be used as input to \code{bibConvert} (\code{informat
65  % = "R"}). For input \code{bibConvert} accepts also R code containing
66  % additional instructions. The input file is (effectively)
67  % \code{source}'d, all bibentry objects created by it are collected and
68  % merged into a single \code{bibentry} object.
69
70  Note that when the input format and output formats are identical, the
71  conversion is not necessarilly a null operation (except for
72  \verb{xml}, and even that may change). For example, depending on the
73  arguments the character encoding may change. Also, input BibTeX files
74  may contain additional instructions, such as journal abbreviations,
75  which are expanded and incorporated in the references but not
76  exported.  It should be remembered also that there may be loss of
77  information when converting from one format to another.
78
79  For a complete list of supported bibliography formats, see section
80  \dQuote{Supported formats} below. The documentation of the original
81  bibutils library (Putnam 2020) gives further details.
82
83  Argument \code{encoding} is a character vector containing 2 elements,
84  specifying the encoding of the input and output files.  If the
85  encodings are the same, a length one vector can be supplied. The
86  default encodings are UTF-8 for input and output. A large number of
87  familiar encodings are supported, e.g. \code{"latin1"} and
88  \code{"cp1251"} (Windows Cyrillic). Some encodings have two or more
89  aliases and they are also accepted. If an unknown encoding is
90  requested, a list of all supported encodings will be printed.
91
92  Argument \code{tex} is an unnamed character vector containing switches
93  for bibtex input and output (mostly output). Currently, the following
94  are available:
95
96  \describe{
97    \item{uppercase}{write bibtex tags/types in upper case.}
98    \item{no_latex}{
99      do not convert latex-style character combinations to letters.
100    }
101    \item{brackets}{use brackets, not quotation marks surrounding data.}
102    \item{dash}{
103      use one dash \code{"-"}, not two \code{"--"}, in page ranges.
104    }
105    \item{fc}{add final comma to bibtex output.}
106  }
107
108  By default latex encodings for accented characters are converted to
109  letters. This may be a problem if the output encoding is not UTF-8,
110  since some characters created by this process may be invalid in that
111  encoding. For example, a BibTeX file which otherwise contains only
112  cyriilic and latin characters may have a few entries with authors
113  containing latin accented characters represented using the TeX
114  convention.  If those characters are not converted to Unicode letters,
115  they can be exported to \code{"cp1251"} (Windows Cyrillic) for
116  example. Specifying the option \code{no_latex} should solve the
117  problem in such cases.
118
119  Argument \code{options} is mostly for debugging and mimics the command
120  line options of the bibutils' binaries. The argument is a named
121  character vector and is supplied as \code{c(tag1= val1, tag2 = val2,
122  ...)}, where each tag is the name of an option and the value is the
123  corresponding value. The value for options that do not require one is
124  ignored and can be set to \code{""}. Some of the available options
125  are:
126
127  \describe{
128
129    \item{h}{help, show all available options.}
130    \item{nb}{do not write Byte Order Mark in UTF8 output.}
131
132    \item{verbose}{print intermediate output.}
133    \item{debug}{print even more intermediate output.}
134  }
135
136
137}
138
139\section{Supported formats}{
140
141  If an input or output format is not specified by arguments, it is
142  inferred, if possible, from the file extension.
143
144  In the table below column Abbreviation shows the abbreviation for
145  arguments \code{informat} and \code{outformat}, column FileExt gives
146  the default file extension for that format, column Input (Output)
147  contains TRUE if the format is supported for input (output) and FALSE
148  otherwise. Column Description gives basic description of the format.
149
150  % \tabular{ll}{%
151  %   ads      \tab ADS reference format \cr
152  %   bib      \tab BibTeX  \cr
153  %   bibtex   \tab BibTeX  \cr
154  %   biblatex \tab BibLaTeX  \cr
155  %   copac    \tab COPAC format references   \cr
156  %   ebi                                   \cr
157  %   end      \tab EndNote (Refer format)  \cr
158  %   endx     \tab EndNote XML  \cr
159  %   isi      \tab ISI web of science  \cr
160  %   med      \tab Pubmed XML references  \cr
161  %   nbib     \tab Pubmed/National Library of Medicine nbib format  \cr
162  %   ris      \tab RIS format  \cr
163  %   R        \tab R source file containing \code{bibentry} commands  \cr
164  %   r        \tab R source file containing \code{bibentry} commands  \cr
165  %   Rstyle   \tab R source file containing \code{bibentry} commands  \cr
166  %   rds      \tab bibentry object in a binary file created by \code{saveRDS()}   \cr
167  %   xml      \tab MODS XML intermediate  \cr
168  %   wordbib  \tab Word 2007 bibliography format
169  % }
170
171\Sexpr[stage=build,results=rd]{paste("\\\\tabular{lllll}{", paste0(paste("\\\\strong{", colnames(rbibutils::rbibutils_formats), "}", collapse = " \\\\tab ")), "\\\\cr ", paste(rbibutils::rbibutils_formats[ , 1], rbibutils::rbibutils_formats[ , 2], rbibutils::rbibutils_formats[ , 3], rbibutils::rbibutils_formats[ , 4], rbibutils::rbibutils_formats[ , 5], sep = " \\\\tab ", collapse = "\\\\cr "), "\n}")}
172
173  The file \code{"easyPubMedvig.xml"} used in the examples for Pubmed
174  XML (\code{"med"}) was obtained using code from the vignette in
175  package \pkg{easyPubMed} (Fantini 2019).
176
177}
178
179
180\value{
181  The function is used for the side effect of creating a file in the
182  requested format. It returns a list, currently containing the
183  following components:
184
185  \item{infile}{name of the input file,}
186  \item{outfile}{name of the output file,}
187  \item{nref_in}{number of references read from the input file,}
188  \item{nref_out}{number of references written to the output file.}
189
190  Normally, \code{nref_in} and \code{nref_out} are the same. If some
191  references were imported successfully but failed on export,
192  \code{nref_out} may be smaller than \code{nref_in}. In such cases
193  informative messages are printed during processing. (If this happens
194  silently, it is probably a bug and please create an issue on Github.)
195
196}
197\author{Georgi N. Boshnakov}
198%\note{
199%%%  ~~further notes~~
200%}
201
202
203
204%% ~Make other sections like Warning with \section{Warning }{....} ~
205\references{
206
207  % bibentry: Rpackage:easyPubMed
208Damiano Fantini (2019).
209\dQuote{easyPubMed: Search and Retrieve Scientific Publication Records from PubMed.}
210R package version 2.13, \url{https://CRAN.R-project.org/package=easyPubMed}.
211% end:bibentry:  Rpackage:easyPubMed
212
213  % bibentry: bibutils6.10
214Chris Putnam (2020).
215\dQuote{Library bibutils, version 6.10.}
216\url{https://sourceforge.net/projects/bibutils/}.
217% end:bibentry:  bibutils6.10
218
219}
220%\seealso{
221%%% ~~objects to See Also as \code{\link{help}}, ~~~
222%}
223\examples{
224fn_biblatex <- system.file("bib", "ex0.biblatex",  package = "rbibutils")
225fn_biblatex
226## file.show(fn_biblatex)
227
228## convert a biblatex file to xml
229modl <- tempfile(fileext = ".xml")
230bibConvert(infile = fn_biblatex, outfile = modl, informat = "biblatex", outformat = "xml")
231## file.show(modl)
232
233## convert a biblatex file to bibtex
234bib <- tempfile(fileext = ".bib")
235bib2 <- tempfile(fileext = ".bib")
236bibConvert(infile = fn_biblatex, outfile = bib, informat = "biblatex", outformat = "bib")
237## file.show(bib)
238
239## convert a biblatex file to bibentry
240rds <- tempfile(fileext = ".rds")
241fn_biblatex
242rds
243be <- bibConvert(fn_biblatex, rds, "biblatex", "bibentry")
244bea <- bibConvert(fn_biblatex, rds, "biblatex") # same
245readRDS(rds)
246
247## convert to R source file
248r <- tempfile(fileext = ".R")
249bibConvert(fn_biblatex, r, "biblatex")
250## file.show(r)
251cat(readLines(r), sep = "\n")
252
253fn_cyr_utf8 <- system.file("bib", "cyr_utf8.bib",  package = "rbibutils")
254
255## Can't have files with different encodings in the package, so below
256## first convert a UTF-8 file to something else.
257##
258## input here contains cyrillic (UTF-8) output to Windows Cyrillic,
259## notice the "no_latex" option
260a <- bibConvert(fn_cyr_utf8, bib, encoding = c("utf8", "cp1251"), tex = "no_latex")
261
262## now take the bib file and convert it to UTF-8
263bibConvert(bib, bib2, encoding = c("cp1251", "utf8"))
264
265## Latin-1 example: Author and Title fileds contain Latin-1 accented
266##   characters, not real names. As above, the file is in UTF-8
267fn_latin1_utf8  <- system.file("bib", "latin1accents_utf8.bib", package = "rbibutils")
268## convert to Latin-1, by default the accents are converted to TeX combinations:
269b <- bibConvert(fn_latin1_utf8, bib , encoding = c("utf8", "latin1"))
270cat(readLines(bib), sep = "\n")
271## use "no_latex" option to keep them Latin1:
272c <- bibConvert(fn_latin1_utf8, bib , encoding = c("utf8", "latin1"), tex = "no_latex")
273## this will show properly in Latin-1 locale (or suitable text editor):
274##cat(readLines(bib), sep = "\n")
275
276## gb18030 example (Chinese)
277##
278## prepare some filenames for the examples below:
279xeCJK_utf8    <- system.file("bib/xeCJK_utf8.bib", package = "rbibutils")
280xeCJK_gb18030 <- system.file("bib/xeCJK_gb18030.bib", package = "rbibutils")
281fn_gb18030 <- tempfile(fileext = ".bib")
282fn_rds <- tempfile(fileext = ".rds")
283
284## input bib file utf8, output bib file gb18030:
285bibConvert(xeCJK_utf8, fn_gb18030, encoding = c("utf8", "gb18030"))
286
287## input bib file utf8, output file rds (and the rds object is returned
288bibConvert(xeCJK_utf8, fn_rds)
289
290
291## a Pubmed file
292fn_med <- system.file("bib/easyPubMedvig.xml", package = "rbibutils")
293## convert a Pubmed file to bibtex:
294bibConvert(fn_med, bib, informat = "med")
295## convert a Pubmed file to rds and import:
296bibConvert(fn_med, rds, informat = "med")
297
298unlink(c(modl, bib, bib2, r, rds))
299unlink(c(fn_gb18030, fn_rds))
300}
301\keyword{documentation}
302% use one of  RShowDoc("KEYWORDS")
303