1=head1 NAME
2
3bt_format_names - formatting BibTeX names for consistent output
4
5=head1 SYNOPSIS
6
7   bt_name_format * bt_create_name_format (char * parts,
8                                           boolean abbrev_first);
9   void bt_free_name_format (bt_name_format * format);
10   void bt_set_format_text (bt_name_format * format,
11                            bt_namepart part,
12                            char * pre_part,
13                            char * post_part,
14                            char * pre_token,
15                            char * post_token);
16   void bt_set_format_options (bt_name_format * format,
17                               bt_namepart part,
18                               boolean abbrev,
19                               bt_joinmethod join_tokens,
20                               bt_joinmethod join_part);
21   char * bt_format_name (bt_name * name, bt_name_format * format);
22
23=head1 DESCRIPTION
24
25After splitting a name into its components parts (represented as a
26C<bt_name> structure), you often want to put it back together again as a
27single string in a consistent way.  B<btparse> provides a very flexible
28way to do this, generally in two stages: first, you create a "name
29format" which describes how to put the tokens and parts of any name back
30together, and then you apply the format to a particular name.
31
32The "name format" is encapsulated in a C<bt_name_format> structure,
33which is created with C<bt_create_name_format()>.  This function
34includes some clever trickery that means you can usually get away with
35calling it alone, and not need to do any customization of the format.
36If you do need to customize the format, though, C<bt_set_format_text()>
37and C<bt_set_format_options()> provide that capability.
38
39The format controls the following:
40
41=over 4
42
43=item *
44
45which name parts are printed, and in what order (e.g. "first von last
46jr", or "von last jr first")
47
48=item *
49
50the text that precedes and follows each part (e.g. if the first name
51follows the last name, you probably want a comma before the `first'
52part: "Smith, John" rather than "Smith John")
53
54=item *
55
56the text that precedes and follows each token (e.g. if the first name is
57abbreviated, you may want a period after each token: "J. R. Smith"
58rather than "J R Smith")
59
60=item *
61
62the method used to join the tokens of each part together
63
64=item *
65
66the method used to join each part to the following part
67
68=back
69
70All of these except the list of parts to format are kept in arrays
71indexed by name part: for example, the structure has a field
72
73   char * post_token[BT_MAX_NAMEPARTS]
74
75and C<post_token[BTN_FIRST]> (C<BTN_FIRST> is from the C<bt_namepart>
76C<enum>) is the string to be added after each token in the first
77name---for example, C<"."> if the first name is to be abbreviated in the
78conventional way.
79
80Yet another C<enum>, C<bt_joinmethod>, describes the available methods
81for joining tokens together.  Note that there are I<two> sets of join
82methods in a name format: between tokens within a single part, and
83between the tokens of two different parts.  The first allows you, for
84example, to change C<"J R Smith"> (first name abbreviated with no
85post-token text but tokens joined by a space) to C<"JR Smith"> (the
86same, but first-name tokens jammed together).  The second is mainly used
87to ensure that "von" and "last" name-parts may be joined with a tie:
88C<"de~Roche"> rather than C<"de Roche">.
89
90The token join methods are:
91
92=over 4
93
94=item BTJ_MAYTIE
95
96Insert a "discretionary tie" between tokens.  That is, either a space or
97a "tie" is inserted, depending on context.  (A "tie," otherwise known as
98unbreakable space, is currently hard-coded as C<"~">---from TeX.)
99
100The format is then applied to a particular name by C<bt_format_name()>,
101which returns a new string.
102
103=item BTJ_SPACE
104
105Always insert a space between tokens.
106
107=item BTJ_FORCETIE
108
109Always insert a "tie" (C<"~">) between tokens.
110
111=item BTJ_NOTHING
112
113Insert nothing between tokens---just jam them together.
114
115=back
116
117Tokens are joined together, and thus the choice of whether to insert a
118"discretionary tie" is made, at two places: within a part and between
119two parts.  Naturally, this only applies when C<BTJ_MAYTIE> was supplied
120as the token-join method; C<BTJ_SPACE> and C<BTJ_FORCETIE> always insert
121either a space or tie, and C<BTJ_NOTHING> always adds nothing between
122tokens.  Within a part, ties are added after a the first token if it is
123less than three characters long, and before the last token.  Between
124parts, a tie is added only if the preceding part consisted of single
125token that was less than three characters long.  In all other cases,
126spaces are inserted.  (This implementation slavishly follows BibTeX.)
127
128=head1 FUNCTIONS
129
130=over 4
131
132=item bt_create_name_format()
133
134   bt_name_format * bt_create_name_format (char * parts,
135                                           boolean abbrev_first)
136
137Creates a name format for a given set of parts, with variations for the
138most common forms of customization---the order of parts and whether to
139abbreviate the first name.
140
141The C<parts> parameter specifies which parts to include in a formatted
142name, as well as the order in which to format them.  C<parts> must be a
143string of four or fewer characters, each of which denotes one of the
144four name parts: for instance, C<"vljf"> means to format all four parts
145in "von last jr first" order.  No characters outside of the set
146C<"fvlj"> are allowed, and no characters may be repeated.
147C<abbrev_first> controls whether the `first' part will be abbreviated
148(i.e., only the first letter from each token will be printed).
149
150In addition to simply setting the list of parts to format and the
151"abbreviate" flag for the first name, C<bt_create_name_format()>
152initializes the entire format structure so as to minimize the need for
153further customizations:
154
155=over 4
156
157=item *
158
159The "token join method"---what to insert between tokens of the same
160part---is set to C<BTJ_MAYTIE> (discretionary tie) for all parts
161
162=item *
163
164The "part join method"---what to insert after the final token of a
165particular part, assuming there are more parts to come---is set to
166C<BTJ_SPACE> for the `first', `last', and `jr' parts.  If the `von' part
167is present and immediately precedes the `last' part (which will almost
168always be the case), C<BTJ_MAYTIE> is used to join `von' to `last';
169otherwise, `von' also gets C<BTJ_SPACE> for the inter-part join method.
170
171=item *
172
173The abbreviation flag is set to C<FALSE> for the `von', `last', and `jr'
174parts; for `first', the abbreviation flag is set to whatever you pass in
175as C<abbrev_first>.
176
177=item *
178
179Initially, all "surrounding text" (pre-part, post-part, pre-token, and
180post-token) for all parts is set to the empty string.  Then a few tweaks
181are done, depending on the C<abbrev_first> flag and the order of
182tokens.  First, if C<abbrev_first> is C<TRUE>, the post-token text for
183first name is set to C<".">---this changes C<"J R Smith"> to
184C<"J. R. Smith">, which is usually the desired form.  (If you I<don't>
185want the periods, you'll have to set the post-token text yourself with
186C<bt_set_format_text()>.)
187
188Then, if `jr' is present and immediately after `last' (almost always the
189case), the pre-part text for `jr' is set to C<", ">, and the inter-part
190join method for `last' is set to C<BTJ_NOTHING>.  This changes
191C<"John Smith Jr"> (where the space following C<"Smith"> comes from
192formatting the last name with a C<BTJ_SPACE> inter-part join method) to
193C<"John Smith, Jr"> (where the C<", "> is now associated with
194C<"Jr">---that way, if there is no `jr' part, the C<", "> will
195not be printed.)
196
197Finally, if `first' is present and immediately follows either `jr' or
198`last' (which will usually be the case in "last-name first" formats),
199the same sort of trickery is applied: the pre-part text for `first' is
200set to C<", ">, and the part join method for the preceding part (either
201`jr' or `last') is set to C<BTJ_NOTHING>.
202
203=back
204
205While all these rules are rather complicated, they mean that you are
206usually freed from having to do any customization of the name format.
207Certainly this is the case if you only need C<"fvlj"> and C<"vljf"> part
208orders, only want to abbreviate the first name, want periods after
209abbreviated tokens, non-breaking spaces in the "right" places, and
210commas in the conventional places.
211
212If you want something out of the ordinary---for instance, abbreviated
213tokens jammed together with no puncuation, or abbreviated last
214names---you'll need to customize the name format a bit with
215C<bt_set_format_text()> and C<bt_set_format_options()>.
216
217=item bt_free_name_format()
218
219   void bt_free_name_format (bt_name_format * format)
220
221Frees a name format created by C<bt_create_name_format()>.
222
223=item bt_set_format_text()
224
225   void bt_set_format_text (bt_name_format * format,
226                            bt_namepart part,
227                            char * pre_part,
228                            char * post_part,
229                            char * pre_token,
230                            char * post_token)
231
232Allows you to customize some or all of the surrounding text for a single
233name part.  Supply C<NULL> for any chunk of text that you don't want to
234change.
235
236For instance, say you want a name format that will abbreviate first
237names, but without any punctuation after the abbreviated
238tokens.  You could create and customize the format as follows:
239
240   format = bt_create_name_format ("fvlj", TRUE);
241   bt_set_format_text (format,
242                       BTN_FIRST,       /* name-part to customize */
243                       NULL, NULL,      /* pre- and post- part text */
244                       NULL, "");       /* empty string for post-token */
245
246Without the C<bt_set_format_text()> call, C<format> would result in
247names formatted like C<"J. R. Smith">.  After setting the post-token
248text for first names to C<"">, this name would become C<"J R Smith">.
249
250=item bt_set_format_options()
251
252   void bt_set_format_options (bt_name_format * format,
253                               bt_namepart part,
254                               boolean abbrev,
255                               bt_joinmethod join_tokens,
256                               bt_joinmethod join_part)
257
258Allows further customization of a name format: you can set the
259abbreviation flag and the two token-join methods.  Alas, there is no
260mechanism for leaving a value unchanged; you must set everything with
261C<bt_set_format_options()>.
262
263For example, let's say that just dropping periods from abbreviated
264tokens in the first name isn't enough; you I<really> want to save
265space by jamming the abbreviated tokens together: C<"JR Smith"> rather
266than C<"J R Smith">  Assuming the two calls in the above example have
267been done, the following will finish the job:
268
269   bt_set_format_options (format, BTN_FIRST,
270                          TRUE,         /* keep same value for abbrev flag */
271                          BTJ_NOTHING,  /* jam tokens together */
272                          BTJ_SPACE);   /* space after final token of part */
273
274Note that we unfortunately had to know (and supply) the current values
275for the abbreviation flag and post-part join method, even though we were
276only setting the intra-part join method.
277
278=item bt_format_name()
279
280   char * bt_format_name (bt_name * name, bt_name_format * format)
281
282Once a name format has been created and customized to your heart's
283content, you can use it to format any number of names that have been
284split with C<bt_split_name> (see L<bt_split_names>).  Simply pass the
285name structure and name format structure, and a newly-allocated string
286containing the formatted name will be returned to you.  It is your
287responsibility to C<free()> this string.
288
289=back
290
291=head1 SEE ALSO
292
293L<btparse>, L<bt_split_names>
294
295=head1 AUTHOR
296
297Greg Ward <gward@python.net>
298