• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

DataMessageValue.phpH A D13-Nov-20212.6 KiB9537

IMessageFormatterFactory.phpH A D13-Nov-2021393 195

ITextFormatter.phpH A D13-Nov-2021909 356

ListParam.phpH A D13-Nov-20211.2 KiB5227

ListType.phpH A D13-Nov-2021456 228

MessageParam.phpH A D13-Nov-2021671 3913

MessageValue.phpH A D15-Dec-20217.9 KiB28889

ParamType.phpH A D15-Dec-20212 KiB6614

README.mdH A D15-Dec-202113.2 KiB316241

ScalarParam.phpH A D15-Dec-20211.2 KiB4727

README.md

1Wikimedia Internationalization Library
2======================================
3
4This library provides interfaces and value objects for internationalization (i18n)
5of applications in PHP.
6
7It is based on the i18n code used in MediaWiki, and is also intended to be
8compatible with [jQuery.i18n], a JavaScript i18n library.
9
10Concepts
11--------
12
13Any text string that is needed in an application is a **message**. This might
14be something like a button label, a sentence, or a longer text. Each message is
15assigned a **message key**, which is used as the identifier in code.
16
17Each message is translated into various languages, each represented by a
18**language code**. The message's text (as translated into each language) can
19contain **placeholders**, which represents a place in the message where a
20**parameter** is to be inserted, and **formatting commands**. It might be plain
21text other than these placeholders and formatting commands, or it might be in a
22**markup language** such as wikitext or Markdown.
23
24A **formatter** is used to convert the message key and parameters into a text
25representation in a particular language and **output format**.
26
27The library itself imposes few restrictions on all of these concepts; this
28document contains recommendations to help various implementations operate in
29compatible ways.
30
31Usage
32-----
33
34<pre lang="php">
35use Wikimedia\Message\MessageValue;
36use Wikimedia\Message\MessageParam;
37use Wikimedia\Message\ParamType;
38
39// Constructor interface
40$message = new MessageValue( 'message-key', [
41    'parameter',
42    new MessageValue( 'another-message' ),
43    new MessageParam( ParamType::NUM, 12345 ),
44] );
45
46// Fluent interface
47$message = ( new MessageValue( 'message-key' ) )
48    ->params( 'parameter', new MessageValue( 'another-message' ) )
49    ->numParams( 12345 );
50
51// Formatting
52$messageFormatter = $serviceContainter->get( 'MessageFormatterFactory' )->getTextFormatter( 'de' );
53$output = $messageFormatter->format( $message );
54</pre>
55
56Class Overview
57--------------
58
59### Messages
60
61Messages and their parameters are represented by newable value objects.
62
63**MessageValue** represents an instance of a message, holding the key and any
64parameters. It is mutable in that parameters can be added to the object after
65creation.
66
67**MessageParam** is an abstract value class representing a parameter to a message.
68It has a type (using constants defined in the **ParamType** class) and a value. It
69has two implementations:
70
71- **ScalarParam** represents a single-valued parameter, such as a text string, a
72  number, or another message.
73- **ListParam** represents a list of values, which will be joined together with
74  appropriate separators. It has a "list type" (using constants defined in the
75  **ListType** class) defining the desired separators.
76
77#### Machine-readable messages
78
79**DataMessageValue** represents a message with additional machine-readable
80data. In addition to the key and message parameters, it holds a "code" and
81structured data that would be a useful representation of the message in an API
82response or the like.
83
84For example, a message for an "integer out of range" error might have one of
85three different keys depending on whether the range has a minimum, maximum, or
86both. But all should have the same code (representing the concept of "integer
87out of range") and should likely have structured data representing the range
88directly as `[ 'min' => 1, 'max' => 10 ]` rather than as a flat array of
89MessageParam objects.
90
91### Formatters
92
93A formatter for a particular language is obtained from an implementation of
94**IMessageFormatterFactory**. No implementation of this interface is provided by
95this library. If an environment needs its formatters to vary behavior on things
96other than the language code, for example selecting among multiple sources of
97messages or markup language used for processing message texts, it should define
98a MessageFormatterFactoryFactory of some sort to provide appropriate
99IMessageFormatterFactory implementations.
100
101There is no one base interface for all formatters; the intent is that type
102hinting will ensure that the formatter being used will produce output in the
103expected output format. The defined output formats are:
104
105- **ITextFormatter** produces plain text output.
106
107No implementation of these interfaces are provided by this library.
108
109Formatter implementations are expected to perform the following procedure to
110generate the output string:
111
1121. Fetch the message's translation in the formatter's language. Details of this
113   fetching are unspecified here.
114   - If no translation is found in the formatter's language, it should attempt
115     to fall back to appropriate other languages. Details of the fallback are
116     unspecified here.
117   - If no translation can be found in any fallback language, a string should
118	 be returned that indicates at minimum the message key that was unable to
119	 be found.
1202. Replace placeholders with parameter values.
121   - Note that placeholders must not be replaced recursively. That is, if a
122     parameter's value contains text that looks like a placeholder, it must not
123     be replaced as if it really were a placeholder.
124   - Certain types of parameters are not substituted directly at this stage.
125     Instead their placeholders must be replaced with an opaque representation
126     that will not be misinterpreted during later stages.
127     - Parameters of type RAW or PLAINTEXT
128     - TEXT parameters with a MessageValue as the value
129     - LIST parameters with any late-substituted value as one of their values.
1303. Process any formatting commands.
1314. Process the source markup language to produce a string in the desired output
132   format. This may be a no-op, and may be combined with the previous step if
133   the markup language implements compatible formatting commands.
1345. Replace any opaque representations from step 2 with the actual values of
135   the corresponding parameters.
136
137Guidelines for Interoperability
138-------------------------------
139
140Besides allowing for libraries to safely supply their own translations for
141every app using them, and apps to easily use libraries' translations instead of
142having to retranslate everything, following these guidelines will also help
143open source projects use [translatewiki.net] for crowdsourced volunteer
144translation into many languages.
145
146### Language codes
147
148[BCP 47] language tags should be used for language codes. If a supplied
149language tag is not recognized, at minimum the corresponding tag with all
150optional subtags stripped should be tried as a fallback.
151
152All messages must have a translation in English (code "en"). All languages
153should fall back to English as a last resort.
154
155The English translations should use `{{PLURAL:...}}` and `{{GENDER:...}}` even
156when English doesn't make a grammatical distinction, to signal to translators
157that plural/gender support is available.
158
159Language code "qqq" is reserved for documenting messages. Documentation should
160describe the context in which the message is used and the values of all
161parameters used with the message. Generally this is written in English.
162Attempting to obtain a message formatter for "qqq" should return one for "en"
163instead.
164
165Language code "qqx" is reserved for debugging. Rather than retrieving
166translations from some underlying storage, every key should act as if it were
167translated as something `(key-name: $1, $2, $3)` with the number of
168placeholders depending on how many parameters are included in the
169MessageValue.
170
171### Message keys
172
173Message keys intended for use with external implementations should follow
174certain guidelines for interoperability:
175
176- Keys should be restricted to the regular expression `/^[a-z][a-z0-9-]*$/`.
177  That is, it should consist of lowercase ASCII letters, numbers, and hyphen
178  only, and should begin with a letter.
179- Keys should be prefixed to help avoid collisions. For example, a library
180  named "ApplePicker" should prefix its message keys with "applepicker-".
181- Common values needing translation, such as names of months and weekdays,
182  should not be prefixed by each library. Libraries needing these should use
183  keys from the [Common Locale Data Repository][CLDR] and document this
184  requirement, and environments should provide these messages.
185
186### Message format
187
188Placeholders are represented by `$1`, `$2`, `$3`, and so on. Text like `$100`
189is interpreted as a placeholder for parameter 100 if 100 or more parameters
190were supplied, as a placeholder for parameter 10 followed by text "0" if
191between ten and 99 parameters were supplied, and as a placeholder for parameter
1921 followed by text "00" if between one and nine parameters were supplied.
193
194All formatting commands look like `{{NAME:$value1|$value2|$value3|...}}`. Braces
195are to be balanced, e.g. `{{NAME:foo|{{bar|baz}}}}` has $value1 as "foo" and
196$value2 as "{{bar|baz}}". The name is always case-insensitive.
197
198Anything syntactically resembling a placeholder or formatting command that does
199not correspond to an actual paramter or known command should be left unchanged
200for processing by the markup language processor.
201
202Libraries providing messages for use by externally-defined formatters should
203generally assume no markup language will be applied, and should avoid
204constructs used by common markup languages unless they also make sense when
205read as plain text.
206
207### Formatting commands
208
209The following formatting commands should be supported.
210
211#### PLURAL
212
213`{{PLURAL:$count|$formA|$formB|...}}` is used to produce plurals.
214
215$count is a number, which may have been formatted with ParamType::NUM.
216
217The number of forms and which count corresponds to which form depend on the
218language, for example English uses `{{PLURAL:$1|one|other}}` while Arabic uses
219`{{PLURAL:$1|zero|one|two|few|many|other}}`. Details are defined in
220[CLDR][CLDR plurals].
221
222It is not possible to "skip" positions while still suppling later ones. If too
223few values are supplied, the final form is repeated for subsequent positions.
224
225If there is an explicit plural form to be given for a specific number, it may
226be specified with syntax like `{{PLURAL:$1|one egg|$1 eggs|12=a dozen eggs}}`.
227
228#### GENDER
229
230`{{GENDER:$name|$masculine|$feminine|$unspecified}}` is used to handle
231grammatical gender, typically when messages refer to user accounts.
232
233This supports three grammatical genders: "male", "female", and a third option
234for cases where the gender is unspecified, unknown, or neither male nor female.
235It does not attempt to handle animate-inanimate or [T-V] distinctions.
236
237$name is a user account name or other similar identifier. If the name given
238does not correspond to any known user account, it should probably use the
239$unspecified gender.
240
241If $feminine and/or $unspecified is not specified, the value of $masculine
242is normally used in its place.
243
244#### GRAMMAR
245
246`{{GRAMMAR:$form|$term}}` converts a term to an appropriate grammatical form.
247
248If no mapping for $term to $form exists, $term should be returned unchanged.
249
250See [jQuery.i18n § Grammar][jQuery.i18n grammar] for details.
251
252#### BIDI
253
254`{{BIDI:$text}}` applies directional isolation to the wrapped text, to attempt
255to avoid errors where directionally-neutral characters are wrongly displayed
256when between LTR and RTL content.
257
258This should output U+202A (left-to-right embedding) or U+202B (right-to-left
259embedding) before the text, depending on the directionality of the first
260strongly-directional character in $text, and U+202C (pop directional
261formatting) after, or do something equivalent for the target output format.
262
263### Supplying translations
264
265Code intending its messages to be used by externally-defined formatters should
266supply the translations as described by
267[jQuery.i18n § Message File Format][jQuery.i18n file format].
268
269In brief, the base directory of the library should contain a directory named
270"i18n". This directory should contain JSON files named by code such as
271"en.json", "de.json", "qqq.json", each with contents like:
272
273```json
274{
275    "@metadata": {
276        "authors": [
277            "Alice",
278            "Bob",
279            "Carol",
280            "David"
281        ],
282        "last-updated": "2012-09-21"
283    },
284    "appname-title": "Example Application",
285    "appname-sub-title": "An example application",
286    "appname-header-introduction": "Introduction",
287    "appname-about": "About this application",
288    "appname-footer": "Footer text"
289}
290```
291
292Formatter implementations should be able to consume message data supplied in
293this format, either directly via registration of i18n directories to check or
294by providing tooling to incorporate it during a build step.
295
296### Machine-readable data
297
298Libraries producing MessageValues as error messages should generally produce
299DataMessageValues instead. Codes should be similar to message keys but need
300not be prefixed. Data should be restricted to values that will produce valid
301output when passed to `json_encode()`.
302
303Libraries producing MessageValues in other contexts should consider whether the
304same applies to those contexts.
305
306
307---
308[jQuery.i18n]: https://github.com/wikimedia/jquery.i18n
309[BCP 47]: https://tools.ietf.org/rfc/bcp/bcp47.txt
310[CLDR]: http://cldr.unicode.org/
311[CLDR plurals]: https://www.unicode.org/cldr/charts/latest/supplemental/language_plural_rules.html
312[jQuery.i18n grammar]: https://github.com/wikimedia/jquery.i18n#grammar
313[jQuery.i18n file format]: https://github.com/wikimedia/jquery.i18n#message-file-format
314[translatewiki.net]: https://translatewiki.net/wiki/Translating:New_project
315[T-V]: https://en.wikipedia.org/wiki/T%E2%80%93V_distinction
316