1=head1 NAME 2 3bt_format_names - formatting BibTeX names for consistent output 4 5=head1 SYNOPSIS 6 7 bt_name_format * bt_create_name_format (char * parts, 8 boolean abbrev_first); 9 void bt_free_name_format (bt_name_format * format); 10 void bt_set_format_text (bt_name_format * format, 11 bt_namepart part, 12 char * pre_part, 13 char * post_part, 14 char * pre_token, 15 char * post_token); 16 void bt_set_format_options (bt_name_format * format, 17 bt_namepart part, 18 boolean abbrev, 19 bt_joinmethod join_tokens, 20 bt_joinmethod join_part); 21 char * bt_format_name (bt_name * name, bt_name_format * format); 22 23=head1 DESCRIPTION 24 25After splitting a name into its components parts (represented as a 26C<bt_name> structure), you often want to put it back together again as a 27single string in a consistent way. B<btparse> provides a very flexible 28way to do this, generally in two stages: first, you create a "name 29format" which describes how to put the tokens and parts of any name back 30together, and then you apply the format to a particular name. 31 32The "name format" is encapsulated in a C<bt_name_format> structure, 33which is created with C<bt_create_name_format()>. This function 34includes some clever trickery that means you can usually get away with 35calling it alone, and not need to do any customization of the format. 36If you do need to customize the format, though, C<bt_set_format_text()> 37and C<bt_set_format_options()> provide that capability. 38 39The format controls the following: 40 41=over 4 42 43=item * 44 45which name parts are printed, and in what order (e.g. "first von last 46jr", or "von last jr first") 47 48=item * 49 50the text that precedes and follows each part (e.g. if the first name 51follows the last name, you probably want a comma before the `first' 52part: "Smith, John" rather than "Smith John") 53 54=item * 55 56the text that precedes and follows each token (e.g. if the first name is 57abbreviated, you may want a period after each token: "J. R. Smith" 58rather than "J R Smith") 59 60=item * 61 62the method used to join the tokens of each part together 63 64=item * 65 66the method used to join each part to the following part 67 68=back 69 70All of these except the list of parts to format are kept in arrays 71indexed by name part: for example, the structure has a field 72 73 char * post_token[BT_MAX_NAMEPARTS] 74 75and C<post_token[BTN_FIRST]> (C<BTN_FIRST> is from the C<bt_namepart> 76C<enum>) is the string to be added after each token in the first 77name---for example, C<"."> if the first name is to be abbreviated in the 78conventional way. 79 80Yet another C<enum>, C<bt_joinmethod>, describes the available methods 81for joining tokens together. Note that there are I<two> sets of join 82methods in a name format: between tokens within a single part, and 83between the tokens of two different parts. The first allows you, for 84example, to change C<"J R Smith"> (first name abbreviated with no 85post-token text but tokens joined by a space) to C<"JR Smith"> (the 86same, but first-name tokens jammed together). The second is mainly used 87to ensure that "von" and "last" name-parts may be joined with a tie: 88C<"de~Roche"> rather than C<"de Roche">. 89 90The token join methods are: 91 92=over 4 93 94=item BTJ_MAYTIE 95 96Insert a "discretionary tie" between tokens. That is, either a space or 97a "tie" is inserted, depending on context. (A "tie," otherwise known as 98unbreakable space, is currently hard-coded as C<"~">---from TeX.) 99 100The format is then applied to a particular name by C<bt_format_name()>, 101which returns a new string. 102 103=item BTJ_SPACE 104 105Always insert a space between tokens. 106 107=item BTJ_FORCETIE 108 109Always insert a "tie" (C<"~">) between tokens. 110 111=item BTJ_NOTHING 112 113Insert nothing between tokens---just jam them together. 114 115=back 116 117Tokens are joined together, and thus the choice of whether to insert a 118"discretionary tie" is made, at two places: within a part and between 119two parts. Naturally, this only applies when C<BTJ_MAYTIE> was supplied 120as the token-join method; C<BTJ_SPACE> and C<BTJ_FORCETIE> always insert 121either a space or tie, and C<BTJ_NOTHING> always adds nothing between 122tokens. Within a part, ties are added after a the first token if it is 123less than three characters long, and before the last token. Between 124parts, a tie is added only if the preceding part consisted of single 125token that was less than three characters long. In all other cases, 126spaces are inserted. (This implementation slavishly follows BibTeX.) 127 128=head1 FUNCTIONS 129 130=over 4 131 132=item bt_create_name_format() 133 134 bt_name_format * bt_create_name_format (char * parts, 135 boolean abbrev_first) 136 137Creates a name format for a given set of parts, with variations for the 138most common forms of customization---the order of parts and whether to 139abbreviate the first name. 140 141The C<parts> parameter specifies which parts to include in a formatted 142name, as well as the order in which to format them. C<parts> must be a 143string of four or fewer characters, each of which denotes one of the 144four name parts: for instance, C<"vljf"> means to format all four parts 145in "von last jr first" order. No characters outside of the set 146C<"fvlj"> are allowed, and no characters may be repeated. 147C<abbrev_first> controls whether the `first' part will be abbreviated 148(i.e., only the first letter from each token will be printed). 149 150In addition to simply setting the list of parts to format and the 151"abbreviate" flag for the first name, C<bt_create_name_format()> 152initializes the entire format structure so as to minimize the need for 153further customizations: 154 155=over 4 156 157=item * 158 159The "token join method"---what to insert between tokens of the same 160part---is set to C<BTJ_MAYTIE> (discretionary tie) for all parts 161 162=item * 163 164The "part join method"---what to insert after the final token of a 165particular part, assuming there are more parts to come---is set to 166C<BTJ_SPACE> for the `first', `last', and `jr' parts. If the `von' part 167is present and immediately precedes the `last' part (which will almost 168always be the case), C<BTJ_MAYTIE> is used to join `von' to `last'; 169otherwise, `von' also gets C<BTJ_SPACE> for the inter-part join method. 170 171=item * 172 173The abbreviation flag is set to C<FALSE> for the `von', `last', and `jr' 174parts; for `first', the abbreviation flag is set to whatever you pass in 175as C<abbrev_first>. 176 177=item * 178 179Initially, all "surrounding text" (pre-part, post-part, pre-token, and 180post-token) for all parts is set to the empty string. Then a few tweaks 181are done, depending on the C<abbrev_first> flag and the order of 182tokens. First, if C<abbrev_first> is C<TRUE>, the post-token text for 183first name is set to C<".">---this changes C<"J R Smith"> to 184C<"J. R. Smith">, which is usually the desired form. (If you I<don't> 185want the periods, you'll have to set the post-token text yourself with 186C<bt_set_format_text()>.) 187 188Then, if `jr' is present and immediately after `last' (almost always the 189case), the pre-part text for `jr' is set to C<", ">, and the inter-part 190join method for `last' is set to C<BTJ_NOTHING>. This changes 191C<"John Smith Jr"> (where the space following C<"Smith"> comes from 192formatting the last name with a C<BTJ_SPACE> inter-part join method) to 193C<"John Smith, Jr"> (where the C<", "> is now associated with 194C<"Jr">---that way, if there is no `jr' part, the C<", "> will 195not be printed.) 196 197Finally, if `first' is present and immediately follows either `jr' or 198`last' (which will usually be the case in "last-name first" formats), 199the same sort of trickery is applied: the pre-part text for `first' is 200set to C<", ">, and the part join method for the preceding part (either 201`jr' or `last') is set to C<BTJ_NOTHING>. 202 203=back 204 205While all these rules are rather complicated, they mean that you are 206usually freed from having to do any customization of the name format. 207Certainly this is the case if you only need C<"fvlj"> and C<"vljf"> part 208orders, only want to abbreviate the first name, want periods after 209abbreviated tokens, non-breaking spaces in the "right" places, and 210commas in the conventional places. 211 212If you want something out of the ordinary---for instance, abbreviated 213tokens jammed together with no puncuation, or abbreviated last 214names---you'll need to customize the name format a bit with 215C<bt_set_format_text()> and C<bt_set_format_options()>. 216 217=item bt_free_name_format() 218 219 void bt_free_name_format (bt_name_format * format) 220 221Frees a name format created by C<bt_create_name_format()>. 222 223=item bt_set_format_text() 224 225 void bt_set_format_text (bt_name_format * format, 226 bt_namepart part, 227 char * pre_part, 228 char * post_part, 229 char * pre_token, 230 char * post_token) 231 232Allows you to customize some or all of the surrounding text for a single 233name part. Supply C<NULL> for any chunk of text that you don't want to 234change. 235 236For instance, say you want a name format that will abbreviate first 237names, but without any punctuation after the abbreviated 238tokens. You could create and customize the format as follows: 239 240 format = bt_create_name_format ("fvlj", TRUE); 241 bt_set_format_text (format, 242 BTN_FIRST, /* name-part to customize */ 243 NULL, NULL, /* pre- and post- part text */ 244 NULL, ""); /* empty string for post-token */ 245 246Without the C<bt_set_format_text()> call, C<format> would result in 247names formatted like C<"J. R. Smith">. After setting the post-token 248text for first names to C<"">, this name would become C<"J R Smith">. 249 250=item bt_set_format_options() 251 252 void bt_set_format_options (bt_name_format * format, 253 bt_namepart part, 254 boolean abbrev, 255 bt_joinmethod join_tokens, 256 bt_joinmethod join_part) 257 258Allows further customization of a name format: you can set the 259abbreviation flag and the two token-join methods. Alas, there is no 260mechanism for leaving a value unchanged; you must set everything with 261C<bt_set_format_options()>. 262 263For example, let's say that just dropping periods from abbreviated 264tokens in the first name isn't enough; you I<really> want to save 265space by jamming the abbreviated tokens together: C<"JR Smith"> rather 266than C<"J R Smith"> Assuming the two calls in the above example have 267been done, the following will finish the job: 268 269 bt_set_format_options (format, BTN_FIRST, 270 TRUE, /* keep same value for abbrev flag */ 271 BTJ_NOTHING, /* jam tokens together */ 272 BTJ_SPACE); /* space after final token of part */ 273 274Note that we unfortunately had to know (and supply) the current values 275for the abbreviation flag and post-part join method, even though we were 276only setting the intra-part join method. 277 278=item bt_format_name() 279 280 char * bt_format_name (bt_name * name, bt_name_format * format) 281 282Once a name format has been created and customized to your heart's 283content, you can use it to format any number of names that have been 284split with C<bt_split_name> (see L<bt_split_names>). Simply pass the 285name structure and name format structure, and a newly-allocated string 286containing the formatted name will be returned to you. It is your 287responsibility to C<free()> this string. 288 289=back 290 291=head1 SEE ALSO 292 293L<btparse>, L<bt_split_names> 294 295=head1 AUTHOR 296 297Greg Ward <gward@python.net> 298