1import Kernel, except: [length: 1]
2
3defmodule String do
4  @moduledoc ~S"""
5  Strings in Elixir are UTF-8 encoded binaries.
6
7  Strings in Elixir are a sequence of Unicode characters,
8  typically written between double quoted strings, such
9  as `"hello"` and `"héllò"`.
10
11  In case a string must have a double-quote in itself,
12  the double quotes must be escaped with a backslash,
13  for example: `"this is a string with \"double quotes\""`.
14
15  You can concatenate two strings with the `<>/2` operator:
16
17      iex> "hello" <> " " <> "world"
18      "hello world"
19
20  ## Interpolation
21
22  Strings in Elixir also support interpolation. This allows
23  you to place some value in the middle of a string by using
24  the `#{}` syntax:
25
26      iex> name = "joe"
27      iex> "hello #{name}"
28      "hello joe"
29
30  Any Elixir expression is valid inside the interpolation.
31  If a string is given, the string is interpolated as is.
32  If any other value is given, Elixir will attempt to convert
33  it to a string using the `String.Chars` protocol. This
34  allows, for example, to output an integer from the interpolation:
35
36      iex> "2 + 2 = #{2 + 2}"
37      "2 + 2 = 4"
38
39  In case the value you want to interpolate cannot be
40  converted to a string, because it doesn't have an human
41  textual representation, a protocol error will be raised.
42
43  ## Escape characters
44
45  Besides allowing double-quotes to be escaped with a backslash,
46  strings also support the following escape characters:
47
48    * `\a` - Bell
49    * `\b` - Backspace
50    * `\t` - Horizontal tab
51    * `\n` - Line feed (New lines)
52    * `\v` - Vertical tab
53    * `\f` - Form feed
54    * `\r` - Carriage return
55    * `\e` - Command Escape
56    * `\#` - Returns the `#` character itself, skipping interpolation
57    * `\xNN` - A byte represented by the hexadecimal `NN`
58    * `\uNNNN` - A Unicode code point represented by `NNNN`
59
60  Note it is generally not advised to use `\xNN` in Elixir
61  strings, as introducing an invalid byte sequence would
62  make the string invalid. If you have to introduce a
63  character by its hexadecimal representation, it is best
64  to work with Unicode code points, such as `\uNNNN`. In fact,
65  understanding Unicode code points can be essential when doing
66  low-level manipulations of string, so let's explore them in
67  detail next.
68
69  ## Code points and grapheme cluster
70
71  The functions in this module act according to the Unicode
72  Standard, version 13.0.0.
73
74  As per the standard, a code point is a single Unicode Character,
75  which may be represented by one or more bytes.
76
77  For example, although the code point "é" is a single character,
78  its underlying representation uses two bytes:
79
80      iex> String.length("é")
81      1
82      iex> byte_size("é")
83      2
84
85  Furthermore, this module also presents the concept of grapheme cluster
86  (from now on referenced as graphemes). Graphemes can consist of multiple
87  code points that may be perceived as a single character by readers. For
88  example, "é" can be represented either as a single "e with acute" code point
89  or as the letter "e" followed by a "combining acute accent" (two code points):
90
91      iex> string = "\u0065\u0301"
92      iex> byte_size(string)
93      3
94      iex> String.length(string)
95      1
96      iex> String.codepoints(string)
97      ["e", "́"]
98      iex> String.graphemes(string)
99      ["é"]
100
101  Although the example above is made of two characters, it is
102  perceived by users as one.
103
104  Graphemes can also be two characters that are interpreted
105  as one by some languages. For example, some languages may
106  consider "ch" as a single character. However, since this
107  information depends on the locale, it is not taken into account
108  by this module.
109
110  In general, the functions in this module rely on the Unicode
111  Standard, but do not contain any of the locale specific behaviour.
112  More information about graphemes can be found in the [Unicode
113  Standard Annex #29](https://www.unicode.org/reports/tr29/).
114
115  For converting a binary to a different encoding and for Unicode
116  normalization mechanisms, see Erlang's `:unicode` module.
117
118  ## String and binary operations
119
120  To act according to the Unicode Standard, many functions
121  in this module run in linear time, as they need to traverse
122  the whole string considering the proper Unicode code points.
123
124  For example, `String.length/1` will take longer as
125  the input grows. On the other hand, `Kernel.byte_size/1` always runs
126  in constant time (i.e. regardless of the input size).
127
128  This means often there are performance costs in using the
129  functions in this module, compared to the more low-level
130  operations that work directly with binaries:
131
132    * `Kernel.binary_part/3` - retrieves part of the binary
133    * `Kernel.bit_size/1` and `Kernel.byte_size/1` - size related functions
134    * `Kernel.is_bitstring/1` and `Kernel.is_binary/1` - type-check function
135    * Plus a number of functions for working with binaries (bytes)
136      in the [`:binary` module](`:binary`)
137
138  There are many situations where using the `String` module can
139  be avoided in favor of binary functions or pattern matching.
140  For example, imagine you have a string `prefix` and you want to
141  remove this prefix from another string named `full`.
142
143  One may be tempted to write:
144
145      iex> take_prefix = fn full, prefix ->
146      ...>   base = String.length(prefix)
147      ...>   String.slice(full, base, String.length(full) - base)
148      ...> end
149      iex> take_prefix.("Mr. John", "Mr. ")
150      "John"
151
152  Although the function above works, it performs poorly. To
153  calculate the length of the string, we need to traverse it
154  fully, so we traverse both `prefix` and `full` strings, then
155  slice the `full` one, traversing it again.
156
157  A first attempt at improving it could be with ranges:
158
159      iex> take_prefix = fn full, prefix ->
160      ...>   base = String.length(prefix)
161      ...>   String.slice(full, base..-1)
162      ...> end
163      iex> take_prefix.("Mr. John", "Mr. ")
164      "John"
165
166  While this is much better (we don't traverse `full` twice),
167  it could still be improved. In this case, since we want to
168  extract a substring from a string, we can use `Kernel.byte_size/1`
169  and `Kernel.binary_part/3` as there is no chance we will slice in
170  the middle of a code point made of more than one byte:
171
172      iex> take_prefix = fn full, prefix ->
173      ...>   base = byte_size(prefix)
174      ...>   binary_part(full, base, byte_size(full) - base)
175      ...> end
176      iex> take_prefix.("Mr. John", "Mr. ")
177      "John"
178
179  Or simply use pattern matching:
180
181      iex> take_prefix = fn full, prefix ->
182      ...>   base = byte_size(prefix)
183      ...>   <<_::binary-size(base), rest::binary>> = full
184      ...>   rest
185      ...> end
186      iex> take_prefix.("Mr. John", "Mr. ")
187      "John"
188
189  On the other hand, if you want to dynamically slice a string
190  based on an integer value, then using `String.slice/3` is the
191  best option as it guarantees we won't incorrectly split a valid
192  code point into multiple bytes.
193
194  ## Integer code points
195
196  Although code points are represented as integers, this module
197  represents code points in their encoded format as strings.
198  For example:
199
200      iex> String.codepoints("olá")
201      ["o", "l", "á"]
202
203  There are a couple of ways to retrieve the character code point.
204  One may use the `?` construct:
205
206      iex> ?o
207      111
208
209      iex> ?á
210      225
211
212  Or also via pattern matching:
213
214      iex> <<aacute::utf8>> = "á"
215      iex> aacute
216      225
217
218  As we have seen above, code points can be inserted into
219  a string by their hexadecimal code:
220
221      iex> "ol\u00E1"
222      "olá"
223
224  Finally, to convert a String into a list of integer
225  code points, known as "charlists" in Elixir, you can call
226  `String.to_charlist`:
227
228      iex> String.to_charlist("olá")
229      [111, 108, 225]
230
231  ## Self-synchronization
232
233  The UTF-8 encoding is self-synchronizing. This means that
234  if malformed data (i.e., data that is not possible according
235  to the definition of the encoding) is encountered, only one
236  code point needs to be rejected.
237
238  This module relies on this behaviour to ignore such invalid
239  characters. For example, `length/1` will return
240  a correct result even if an invalid code point is fed into it.
241
242  In other words, this module expects invalid data to be detected
243  elsewhere, usually when retrieving data from the external source.
244  For example, a driver that reads strings from a database will be
245  responsible to check the validity of the encoding. `String.chunk/2`
246  can be used for breaking a string into valid and invalid parts.
247
248  ## Compile binary patterns
249
250  Many functions in this module work with patterns. For example,
251  `String.split/3` can split a string into multiple strings given
252  a pattern. This pattern can be a string, a list of strings or
253  a compiled pattern:
254
255      iex> String.split("foo bar", " ")
256      ["foo", "bar"]
257
258      iex> String.split("foo bar!", [" ", "!"])
259      ["foo", "bar", ""]
260
261      iex> pattern = :binary.compile_pattern([" ", "!"])
262      iex> String.split("foo bar!", pattern)
263      ["foo", "bar", ""]
264
265  The compiled pattern is useful when the same match will
266  be done over and over again. Note though that the compiled
267  pattern cannot be stored in a module attribute as the pattern
268  is generated at runtime and does not survive compile time.
269  """
270
271  @typedoc """
272  A UTF-8 encoded binary.
273
274  The types `String.t()` and `binary()` are equivalent to analysis tools.
275  Although, for those reading the documentation, `String.t()` implies
276  it is a UTF-8 encoded binary.
277  """
278  @type t :: binary
279
280  @typedoc "A single Unicode code point encoded in UTF-8. It may be one or more bytes."
281  @type codepoint :: t
282
283  @typedoc "Multiple code points that may be perceived as a single character by readers"
284  @type grapheme :: t
285
286  @typedoc "Pattern used in functions like `replace/4` and `split/3`"
287  @type pattern :: t | [t] | :binary.cp()
288
289  @conditional_mappings [:greek, :turkic]
290
291  @doc """
292  Checks if a string contains only printable characters up to `character_limit`.
293
294  Takes an optional `character_limit` as a second argument. If `character_limit` is `0`, this
295  function will return `true`.
296
297  ## Examples
298
299      iex> String.printable?("abc")
300      true
301
302      iex> String.printable?("abc" <> <<0>>)
303      false
304
305      iex> String.printable?("abc" <> <<0>>, 2)
306      true
307
308      iex> String.printable?("abc" <> <<0>>, 0)
309      true
310
311  """
312  @spec printable?(t, 0) :: true
313  @spec printable?(t, pos_integer | :infinity) :: boolean
314  def printable?(string, character_limit \\ :infinity)
315      when is_binary(string) and
316             (character_limit == :infinity or
317                (is_integer(character_limit) and character_limit >= 0)) do
318    recur_printable?(string, character_limit)
319  end
320
321  defp recur_printable?(_string, 0), do: true
322  defp recur_printable?(<<>>, _character_limit), do: true
323
324  for char <- 0x20..0x7E do
325    defp recur_printable?(<<unquote(char), rest::binary>>, character_limit) do
326      recur_printable?(rest, decrement(character_limit))
327    end
328  end
329
330  for char <- '\n\r\t\v\b\f\e\d\a' do
331    defp recur_printable?(<<unquote(char), rest::binary>>, character_limit) do
332      recur_printable?(rest, decrement(character_limit))
333    end
334  end
335
336  defp recur_printable?(<<char::utf8, rest::binary>>, character_limit)
337       when char in 0xA0..0xD7FF
338       when char in 0xE000..0xFFFD
339       when char in 0x10000..0x10FFFF do
340    recur_printable?(rest, decrement(character_limit))
341  end
342
343  defp recur_printable?(_string, _character_limit) do
344    false
345  end
346
347  defp decrement(:infinity), do: :infinity
348  defp decrement(character_limit), do: character_limit - 1
349
350  @doc ~S"""
351  Divides a string into substrings at each Unicode whitespace
352  occurrence with leading and trailing whitespace ignored. Groups
353  of whitespace are treated as a single occurrence. Divisions do
354  not occur on non-breaking whitespace.
355
356  ## Examples
357
358      iex> String.split("foo bar")
359      ["foo", "bar"]
360
361      iex> String.split("foo" <> <<194, 133>> <> "bar")
362      ["foo", "bar"]
363
364      iex> String.split(" foo   bar ")
365      ["foo", "bar"]
366
367      iex> String.split("no\u00a0break")
368      ["no\u00a0break"]
369
370  """
371  @spec split(t) :: [t]
372  defdelegate split(binary), to: String.Break
373
374  @doc ~S"""
375  Divides a string into parts based on a pattern.
376
377  Returns a list of these parts.
378
379  The `pattern` may be a string, a list of strings, a regular expression, or a
380  compiled pattern.
381
382  The string is split into as many parts as possible by
383  default, but can be controlled via the `:parts` option.
384
385  Empty strings are only removed from the result if the
386  `:trim` option is set to `true`.
387
388  When the pattern used is a regular expression, the string is
389  split using `Regex.split/3`.
390
391  ## Options
392
393    * `:parts` (positive integer or `:infinity`) - the string
394      is split into at most as many parts as this option specifies.
395      If `:infinity`, the string will be split into all possible
396      parts. Defaults to `:infinity`.
397
398    * `:trim` (boolean) - if `true`, empty strings are removed from
399      the resulting list.
400
401  This function also accepts all options accepted by `Regex.split/3`
402  if `pattern` is a regular expression.
403
404  ## Examples
405
406  Splitting with a string pattern:
407
408      iex> String.split("a,b,c", ",")
409      ["a", "b", "c"]
410
411      iex> String.split("a,b,c", ",", parts: 2)
412      ["a", "b,c"]
413
414      iex> String.split(" a b c ", " ", trim: true)
415      ["a", "b", "c"]
416
417  A list of patterns:
418
419      iex> String.split("1,2 3,4", [" ", ","])
420      ["1", "2", "3", "4"]
421
422  A regular expression:
423
424      iex> String.split("a,b,c", ~r{,})
425      ["a", "b", "c"]
426
427      iex> String.split("a,b,c", ~r{,}, parts: 2)
428      ["a", "b,c"]
429
430      iex> String.split(" a b c ", ~r{\s}, trim: true)
431      ["a", "b", "c"]
432
433      iex> String.split("abc", ~r{b}, include_captures: true)
434      ["a", "b", "c"]
435
436  A compiled pattern:
437
438      iex> pattern = :binary.compile_pattern([" ", ","])
439      iex> String.split("1,2 3,4", pattern)
440      ["1", "2", "3", "4"]
441
442  Splitting on empty string returns graphemes:
443
444      iex> String.split("abc", "")
445      ["", "a", "b", "c", ""]
446
447      iex> String.split("abc", "", trim: true)
448      ["a", "b", "c"]
449
450      iex> String.split("abc", "", parts: 1)
451      ["abc"]
452
453      iex> String.split("abc", "", parts: 3)
454      ["", "a", "bc"]
455
456  Be aware that this function can split within or across grapheme boundaries.
457  For example, take the grapheme "é" which is made of the characters
458  "e" and the acute accent. The following will split the string into two parts:
459
460      iex> String.split(String.normalize("é", :nfd), "e")
461      ["", "́"]
462
463  However, if "é" is represented by the single character "e with acute"
464  accent, then it will split the string into just one part:
465
466      iex> String.split(String.normalize("é", :nfc), "e")
467      ["é"]
468
469  """
470  @spec split(t, pattern | Regex.t(), keyword) :: [t]
471  def split(string, pattern, options \\ [])
472
473  def split(string, %Regex{} = pattern, options) when is_binary(string) and is_list(options) do
474    Regex.split(pattern, string, options)
475  end
476
477  def split(string, "", options) when is_binary(string) and is_list(options) do
478    parts = Keyword.get(options, :parts, :infinity)
479    index = parts_to_index(parts)
480    trim = Keyword.get(options, :trim, false)
481
482    if trim == false and index != 1 do
483      ["" | split_empty(string, trim, index - 1)]
484    else
485      split_empty(string, trim, index)
486    end
487  end
488
489  def split(string, pattern, options) when is_binary(string) and is_list(options) do
490    parts = Keyword.get(options, :parts, :infinity)
491    trim = Keyword.get(options, :trim, false)
492
493    case {parts, trim} do
494      {:infinity, false} ->
495        :binary.split(string, pattern, [:global])
496
497      _ ->
498        pattern = maybe_compile_pattern(pattern)
499        split_each(string, pattern, trim, parts_to_index(parts))
500    end
501  end
502
503  defp parts_to_index(:infinity), do: 0
504  defp parts_to_index(n) when is_integer(n) and n > 0, do: n
505
506  defp split_empty("", true, 1), do: []
507  defp split_empty(string, _, 1), do: [string]
508
509  defp split_empty(string, trim, count) do
510    case next_grapheme(string) do
511      {h, t} -> [h | split_empty(t, trim, count - 1)]
512      nil -> split_empty("", trim, 1)
513    end
514  end
515
516  defp split_each("", _pattern, true, 1), do: []
517  defp split_each(string, _pattern, _trim, 1) when is_binary(string), do: [string]
518
519  defp split_each(string, pattern, trim, count) do
520    case do_splitter(string, pattern, trim) do
521      {h, t} -> [h | split_each(t, pattern, trim, count - 1)]
522      nil -> []
523    end
524  end
525
526  @doc """
527  Returns an enumerable that splits a string on demand.
528
529  This is in contrast to `split/3` which splits the
530  entire string upfront.
531
532  This function does not support regular expressions
533  by design. When using regular expressions, it is often
534  more efficient to have the regular expressions traverse
535  the string at once than in parts, like this function does.
536
537  ## Options
538
539    * :trim - when `true`, does not emit empty patterns
540
541  ## Examples
542
543      iex> String.splitter("1,2 3,4 5,6 7,8,...,99999", [" ", ","]) |> Enum.take(4)
544      ["1", "2", "3", "4"]
545
546      iex> String.splitter("abcd", "") |> Enum.take(10)
547      ["", "a", "b", "c", "d", ""]
548
549      iex> String.splitter("abcd", "", trim: true) |> Enum.take(10)
550      ["a", "b", "c", "d"]
551
552  A compiled pattern can also be given:
553
554      iex> pattern = :binary.compile_pattern([" ", ","])
555      iex> String.splitter("1,2 3,4 5,6 7,8,...,99999", pattern) |> Enum.take(4)
556      ["1", "2", "3", "4"]
557
558  """
559  @spec splitter(t, pattern, keyword) :: Enumerable.t()
560  def splitter(string, pattern, options \\ [])
561
562  def splitter(string, "", options) when is_binary(string) and is_list(options) do
563    if Keyword.get(options, :trim, false) do
564      Stream.unfold(string, &next_grapheme/1)
565    else
566      Stream.unfold(:match, &do_empty_splitter(&1, string))
567    end
568  end
569
570  def splitter(string, pattern, options) when is_binary(string) and is_list(options) do
571    pattern = maybe_compile_pattern(pattern)
572    trim = Keyword.get(options, :trim, false)
573    Stream.unfold(string, &do_splitter(&1, pattern, trim))
574  end
575
576  defp do_empty_splitter(:match, string), do: {"", string}
577  defp do_empty_splitter(:nomatch, _string), do: nil
578  defp do_empty_splitter("", _), do: {"", :nomatch}
579  defp do_empty_splitter(string, _), do: next_grapheme(string)
580
581  defp do_splitter(:nomatch, _pattern, _), do: nil
582  defp do_splitter("", _pattern, false), do: {"", :nomatch}
583  defp do_splitter("", _pattern, true), do: nil
584
585  defp do_splitter(bin, pattern, trim) do
586    case :binary.split(bin, pattern) do
587      ["", second] when trim -> do_splitter(second, pattern, trim)
588      [first, second] -> {first, second}
589      [first] -> {first, :nomatch}
590    end
591  end
592
593  defp maybe_compile_pattern(pattern) when is_tuple(pattern), do: pattern
594  defp maybe_compile_pattern(pattern), do: :binary.compile_pattern(pattern)
595
596  @doc """
597  Splits a string into two at the specified offset. When the offset given is
598  negative, location is counted from the end of the string.
599
600  The offset is capped to the length of the string. Returns a tuple with
601  two elements.
602
603  Note: keep in mind this function splits on graphemes and for such it
604  has to linearly traverse the string. If you want to split a string or
605  a binary based on the number of bytes, use `Kernel.binary_part/3`
606  instead.
607
608  ## Examples
609
610      iex> String.split_at("sweetelixir", 5)
611      {"sweet", "elixir"}
612
613      iex> String.split_at("sweetelixir", -6)
614      {"sweet", "elixir"}
615
616      iex> String.split_at("abc", 0)
617      {"", "abc"}
618
619      iex> String.split_at("abc", 1000)
620      {"abc", ""}
621
622      iex> String.split_at("abc", -1000)
623      {"", "abc"}
624
625  """
626  @spec split_at(t, integer) :: {t, t}
627  def split_at(string, position)
628
629  def split_at(string, position)
630      when is_binary(string) and is_integer(position) and position >= 0 do
631    do_split_at(string, position)
632  end
633
634  def split_at(string, position)
635      when is_binary(string) and is_integer(position) and position < 0 do
636    position = length(string) + position
637
638    case position >= 0 do
639      true -> do_split_at(string, position)
640      false -> {"", string}
641    end
642  end
643
644  defp do_split_at(string, position) do
645    {byte_size, rest} = String.Unicode.split_at(string, position)
646    {binary_part(string, 0, byte_size), rest || ""}
647  end
648
649  @doc ~S"""
650  Returns `true` if `string1` is canonically equivalent to `string2`.
651
652  It performs Normalization Form Canonical Decomposition (NFD) on the
653  strings before comparing them. This function is equivalent to:
654
655      String.normalize(string1, :nfd) == String.normalize(string2, :nfd)
656
657  If you plan to compare multiple strings, multiple times in a row, you
658  may normalize them upfront and compare them directly to avoid multiple
659  normalization passes.
660
661  ## Examples
662
663      iex> String.equivalent?("abc", "abc")
664      true
665
666      iex> String.equivalent?("man\u0303ana", "mañana")
667      true
668
669      iex> String.equivalent?("abc", "ABC")
670      false
671
672      iex> String.equivalent?("nø", "nó")
673      false
674
675  """
676  @spec equivalent?(t, t) :: boolean
677  def equivalent?(string1, string2) when is_binary(string1) and is_binary(string2) do
678    normalize(string1, :nfd) == normalize(string2, :nfd)
679  end
680
681  @doc """
682  Converts all characters in `string` to Unicode normalization
683  form identified by `form`.
684
685  Invalid Unicode codepoints are skipped and the remaining of
686  the string is converted. If you want the algorithm to stop
687  and return on invalid codepoint, use `:unicode.characters_to_nfd_binary/1`,
688  `:unicode.characters_to_nfc_binary/1`, `:unicode.characters_to_nfkd_binary/1`,
689  and `:unicode.characters_to_nfkc_binary/1` instead.
690
691  Normalization forms `:nfkc` and `:nfkd` should not be blindly applied
692  to arbitrary text. Because they erase many formatting distinctions,
693  they will prevent round-trip conversion to and from many legacy
694  character sets.
695
696  ## Forms
697
698  The supported forms are:
699
700    * `:nfd` - Normalization Form Canonical Decomposition.
701      Characters are decomposed by canonical equivalence, and
702      multiple combining characters are arranged in a specific
703      order.
704
705    * `:nfc` - Normalization Form Canonical Composition.
706      Characters are decomposed and then recomposed by canonical equivalence.
707
708    * `:nfkd` - Normalization Form Compatibility Decomposition.
709      Characters are decomposed by compatibility equivalence, and
710      multiple combining characters are arranged in a specific
711      order.
712
713    * `:nfkc` - Normalization Form Compatibility Composition.
714      Characters are decomposed and then recomposed by compatibility equivalence.
715
716  ## Examples
717
718      iex> String.normalize("yêṩ", :nfd)
719      "yêṩ"
720
721      iex> String.normalize("leña", :nfc)
722      "leña"
723
724      iex> String.normalize("fi", :nfkd)
725      "fi"
726
727      iex> String.normalize("fi", :nfkc)
728      "fi"
729
730  """
731  def normalize(string, form)
732
733  def normalize(string, :nfd) when is_binary(string) do
734    case :unicode.characters_to_nfd_binary(string) do
735      string when is_binary(string) -> string
736      {:error, good, <<head, rest::binary>>} -> good <> <<head>> <> normalize(rest, :nfd)
737    end
738  end
739
740  def normalize(string, :nfc) when is_binary(string) do
741    case :unicode.characters_to_nfc_binary(string) do
742      string when is_binary(string) -> string
743      {:error, good, <<head, rest::binary>>} -> good <> <<head>> <> normalize(rest, :nfc)
744    end
745  end
746
747  def normalize(string, :nfkd) when is_binary(string) do
748    case :unicode.characters_to_nfkd_binary(string) do
749      string when is_binary(string) -> string
750      {:error, good, <<head, rest::binary>>} -> good <> <<head>> <> normalize(rest, :nfkd)
751    end
752  end
753
754  def normalize(string, :nfkc) when is_binary(string) do
755    case :unicode.characters_to_nfkc_binary(string) do
756      string when is_binary(string) -> string
757      {:error, good, <<head, rest::binary>>} -> good <> <<head>> <> normalize(rest, :nfkc)
758    end
759  end
760
761  @doc """
762  Converts all characters in the given string to uppercase according to `mode`.
763
764  `mode` may be `:default`, `:ascii`, `:greek` or `:turkic`. The `:default` mode considers
765  all non-conditional transformations outlined in the Unicode standard. `:ascii`
766  uppercases only the letters a to z. `:greek` includes the context sensitive
767  mappings found in Greek. `:turkic` properly handles the letter i with the dotless variant.
768
769  ## Examples
770
771      iex> String.upcase("abcd")
772      "ABCD"
773
774      iex> String.upcase("ab 123 xpto")
775      "AB 123 XPTO"
776
777      iex> String.upcase("olá")
778      "OLÁ"
779
780  The `:ascii` mode ignores Unicode characters and provides a more
781  performant implementation when you know the string contains only
782  ASCII characters:
783
784      iex> String.upcase("olá", :ascii)
785      "OLá"
786
787  And `:turkic` properly handles the letter i with the dotless variant:
788
789      iex> String.upcase("ıi")
790      "II"
791
792      iex> String.upcase("ıi", :turkic)
793      "Iİ"
794
795  """
796  @spec upcase(t, :default | :ascii | :greek | :turkic) :: t
797  def upcase(string, mode \\ :default)
798
799  def upcase("", _mode) do
800    ""
801  end
802
803  def upcase(string, :default) when is_binary(string) do
804    String.Casing.upcase(string, [], :default)
805  end
806
807  def upcase(string, :ascii) when is_binary(string) do
808    IO.iodata_to_binary(upcase_ascii(string))
809  end
810
811  def upcase(string, mode) when is_binary(string) and mode in @conditional_mappings do
812    String.Casing.upcase(string, [], mode)
813  end
814
815  defp upcase_ascii(<<char, rest::bits>>) when char >= ?a and char <= ?z,
816    do: [char - 32 | upcase_ascii(rest)]
817
818  defp upcase_ascii(<<char, rest::bits>>), do: [char | upcase_ascii(rest)]
819  defp upcase_ascii(<<>>), do: []
820
821  @doc """
822  Converts all characters in the given string to lowercase according to `mode`.
823
824  `mode` may be `:default`, `:ascii`, `:greek` or `:turkic`. The `:default` mode considers
825  all non-conditional transformations outlined in the Unicode standard. `:ascii`
826  lowercases only the letters A to Z. `:greek` includes the context sensitive
827  mappings found in Greek. `:turkic` properly handles the letter i with the dotless variant.
828
829  ## Examples
830
831      iex> String.downcase("ABCD")
832      "abcd"
833
834      iex> String.downcase("AB 123 XPTO")
835      "ab 123 xpto"
836
837      iex> String.downcase("OLÁ")
838      "olá"
839
840  The `:ascii` mode ignores Unicode characters and provides a more
841  performant implementation when you know the string contains only
842  ASCII characters:
843
844      iex> String.downcase("OLÁ", :ascii)
845      "olÁ"
846
847  The `:greek` mode properly handles the context sensitive sigma in Greek:
848
849      iex> String.downcase("ΣΣ")
850      "σσ"
851
852      iex> String.downcase("ΣΣ", :greek)
853      "σς"
854
855  And `:turkic` properly handles the letter i with the dotless variant:
856
857      iex> String.downcase("Iİ")
858      "ii̇"
859
860      iex> String.downcase("Iİ", :turkic)
861      "ıi"
862
863  """
864  @spec downcase(t, :default | :ascii | :greek | :turkic) :: t
865  def downcase(string, mode \\ :default)
866
867  def downcase("", _mode) do
868    ""
869  end
870
871  def downcase(string, :default) when is_binary(string) do
872    String.Casing.downcase(string, [], :default)
873  end
874
875  def downcase(string, :ascii) when is_binary(string) do
876    IO.iodata_to_binary(downcase_ascii(string))
877  end
878
879  def downcase(string, mode) when is_binary(string) and mode in @conditional_mappings do
880    String.Casing.downcase(string, [], mode)
881  end
882
883  defp downcase_ascii(<<char, rest::bits>>) when char >= ?A and char <= ?Z,
884    do: [char + 32 | downcase_ascii(rest)]
885
886  defp downcase_ascii(<<char, rest::bits>>), do: [char | downcase_ascii(rest)]
887  defp downcase_ascii(<<>>), do: []
888
889  @doc """
890  Converts the first character in the given string to
891  uppercase and the remainder to lowercase according to `mode`.
892
893  `mode` may be `:default`, `:ascii`, `:greek` or `:turkic`. The `:default` mode considers
894  all non-conditional transformations outlined in the Unicode standard. `:ascii`
895  capitalizes only the letters A to Z. `:greek` includes the context sensitive
896  mappings found in Greek. `:turkic` properly handles the letter i with the dotless variant.
897
898  ## Examples
899
900      iex> String.capitalize("abcd")
901      "Abcd"
902
903      iex> String.capitalize("fin")
904      "Fin"
905
906      iex> String.capitalize("olá")
907      "Olá"
908
909  """
910  @spec capitalize(t, :default | :ascii | :greek | :turkic) :: t
911  def capitalize(string, mode \\ :default)
912
913  def capitalize(<<char, rest::binary>>, :ascii) do
914    char = if char >= ?a and char <= ?z, do: char - 32, else: char
915    <<char>> <> downcase(rest, :ascii)
916  end
917
918  def capitalize(string, mode) when is_binary(string) do
919    {char, rest} = String.Casing.titlecase_once(string, mode)
920    char <> downcase(rest, mode)
921  end
922
923  @doc false
924  @deprecated "Use String.trim_trailing/1 instead"
925  defdelegate rstrip(binary), to: String.Break, as: :trim_trailing
926
927  @doc false
928  @deprecated "Use String.trim_trailing/2 with a binary as second argument instead"
929  def rstrip(string, char) when is_integer(char) do
930    replace_trailing(string, <<char::utf8>>, "")
931  end
932
933  @doc """
934  Replaces all leading occurrences of `match` by `replacement` of `match` in `string`.
935
936  Returns the string untouched if there are no occurrences.
937
938  If `match` is `""`, this function raises an `ArgumentError` exception: this
939  happens because this function replaces **all** the occurrences of `match` at
940  the beginning of `string`, and it's impossible to replace "multiple"
941  occurrences of `""`.
942
943  ## Examples
944
945      iex> String.replace_leading("hello world", "hello ", "")
946      "world"
947      iex> String.replace_leading("hello hello world", "hello ", "")
948      "world"
949
950      iex> String.replace_leading("hello world", "hello ", "ola ")
951      "ola world"
952      iex> String.replace_leading("hello hello world", "hello ", "ola ")
953      "ola ola world"
954
955  """
956  @spec replace_leading(t, t, t) :: t
957  def replace_leading(string, match, replacement)
958      when is_binary(string) and is_binary(match) and is_binary(replacement) do
959    if match == "" do
960      raise ArgumentError, "cannot use an empty string as the match to replace"
961    end
962
963    prefix_size = byte_size(match)
964    suffix_size = byte_size(string) - prefix_size
965    replace_leading(string, match, replacement, prefix_size, suffix_size, 0)
966  end
967
968  defp replace_leading(string, match, replacement, prefix_size, suffix_size, acc)
969       when suffix_size >= 0 do
970    case string do
971      <<prefix::size(prefix_size)-binary, suffix::binary>> when prefix == match ->
972        replace_leading(
973          suffix,
974          match,
975          replacement,
976          prefix_size,
977          suffix_size - prefix_size,
978          acc + 1
979        )
980
981      _ ->
982        prepend_unless_empty(duplicate(replacement, acc), string)
983    end
984  end
985
986  defp replace_leading(string, _match, replacement, _prefix_size, _suffix_size, acc) do
987    prepend_unless_empty(duplicate(replacement, acc), string)
988  end
989
990  @doc """
991  Replaces all trailing occurrences of `match` by `replacement` in `string`.
992
993  Returns the string untouched if there are no occurrences.
994
995  If `match` is `""`, this function raises an `ArgumentError` exception: this
996  happens because this function replaces **all** the occurrences of `match` at
997  the end of `string`, and it's impossible to replace "multiple" occurrences of
998  `""`.
999
1000  ## Examples
1001
1002      iex> String.replace_trailing("hello world", " world", "")
1003      "hello"
1004      iex> String.replace_trailing("hello world world", " world", "")
1005      "hello"
1006
1007      iex> String.replace_trailing("hello world", " world", " mundo")
1008      "hello mundo"
1009      iex> String.replace_trailing("hello world world", " world", " mundo")
1010      "hello mundo mundo"
1011
1012  """
1013  @spec replace_trailing(t, t, t) :: t
1014  def replace_trailing(string, match, replacement)
1015      when is_binary(string) and is_binary(match) and is_binary(replacement) do
1016    if match == "" do
1017      raise ArgumentError, "cannot use an empty string as the match to replace"
1018    end
1019
1020    suffix_size = byte_size(match)
1021    prefix_size = byte_size(string) - suffix_size
1022    replace_trailing(string, match, replacement, prefix_size, suffix_size, 0)
1023  end
1024
1025  defp replace_trailing(string, match, replacement, prefix_size, suffix_size, acc)
1026       when prefix_size >= 0 do
1027    case string do
1028      <<prefix::size(prefix_size)-binary, suffix::binary>> when suffix == match ->
1029        replace_trailing(
1030          prefix,
1031          match,
1032          replacement,
1033          prefix_size - suffix_size,
1034          suffix_size,
1035          acc + 1
1036        )
1037
1038      _ ->
1039        append_unless_empty(string, duplicate(replacement, acc))
1040    end
1041  end
1042
1043  defp replace_trailing(string, _match, replacement, _prefix_size, _suffix_size, acc) do
1044    append_unless_empty(string, duplicate(replacement, acc))
1045  end
1046
1047  @doc """
1048  Replaces prefix in `string` by `replacement` if it matches `match`.
1049
1050  Returns the string untouched if there is no match. If `match` is an empty
1051  string (`""`), `replacement` is just prepended to `string`.
1052
1053  ## Examples
1054
1055      iex> String.replace_prefix("world", "hello ", "")
1056      "world"
1057      iex> String.replace_prefix("hello world", "hello ", "")
1058      "world"
1059      iex> String.replace_prefix("hello hello world", "hello ", "")
1060      "hello world"
1061
1062      iex> String.replace_prefix("world", "hello ", "ola ")
1063      "world"
1064      iex> String.replace_prefix("hello world", "hello ", "ola ")
1065      "ola world"
1066      iex> String.replace_prefix("hello hello world", "hello ", "ola ")
1067      "ola hello world"
1068
1069      iex> String.replace_prefix("world", "", "hello ")
1070      "hello world"
1071
1072  """
1073  @spec replace_prefix(t, t, t) :: t
1074  def replace_prefix(string, match, replacement)
1075      when is_binary(string) and is_binary(match) and is_binary(replacement) do
1076    prefix_size = byte_size(match)
1077
1078    case string do
1079      <<prefix::size(prefix_size)-binary, suffix::binary>> when prefix == match ->
1080        prepend_unless_empty(replacement, suffix)
1081
1082      _ ->
1083        string
1084    end
1085  end
1086
1087  @doc """
1088  Replaces suffix in `string` by `replacement` if it matches `match`.
1089
1090  Returns the string untouched if there is no match. If `match` is an empty
1091  string (`""`), `replacement` is just appended to `string`.
1092
1093  ## Examples
1094
1095      iex> String.replace_suffix("hello", " world", "")
1096      "hello"
1097      iex> String.replace_suffix("hello world", " world", "")
1098      "hello"
1099      iex> String.replace_suffix("hello world world", " world", "")
1100      "hello world"
1101
1102      iex> String.replace_suffix("hello", " world", " mundo")
1103      "hello"
1104      iex> String.replace_suffix("hello world", " world", " mundo")
1105      "hello mundo"
1106      iex> String.replace_suffix("hello world world", " world", " mundo")
1107      "hello world mundo"
1108
1109      iex> String.replace_suffix("hello", "", " world")
1110      "hello world"
1111
1112  """
1113  @spec replace_suffix(t, t, t) :: t
1114  def replace_suffix(string, match, replacement)
1115      when is_binary(string) and is_binary(match) and is_binary(replacement) do
1116    suffix_size = byte_size(match)
1117    prefix_size = byte_size(string) - suffix_size
1118
1119    case string do
1120      <<prefix::size(prefix_size)-binary, suffix::binary>> when suffix == match ->
1121        append_unless_empty(prefix, replacement)
1122
1123      _ ->
1124        string
1125    end
1126  end
1127
1128  @compile {:inline, prepend_unless_empty: 2, append_unless_empty: 2}
1129
1130  defp prepend_unless_empty("", suffix), do: suffix
1131  defp prepend_unless_empty(prefix, suffix), do: prefix <> suffix
1132
1133  defp append_unless_empty(prefix, ""), do: prefix
1134  defp append_unless_empty(prefix, suffix), do: prefix <> suffix
1135
1136  @doc false
1137  @deprecated "Use String.trim_leading/1 instead"
1138  defdelegate lstrip(binary), to: String.Break, as: :trim_leading
1139
1140  @doc false
1141  @deprecated "Use String.trim_leading/2 with a binary as second argument instead"
1142  def lstrip(string, char) when is_integer(char) do
1143    replace_leading(string, <<char::utf8>>, "")
1144  end
1145
1146  @doc false
1147  @deprecated "Use String.trim/1 instead"
1148  def strip(string) do
1149    trim(string)
1150  end
1151
1152  @doc false
1153  @deprecated "Use String.trim/2 with a binary second argument instead"
1154  def strip(string, char) do
1155    trim(string, <<char::utf8>>)
1156  end
1157
1158  @doc ~S"""
1159  Returns a string where all leading Unicode whitespaces
1160  have been removed.
1161
1162  ## Examples
1163
1164      iex> String.trim_leading("\n  abc   ")
1165      "abc   "
1166
1167  """
1168  @spec trim_leading(t) :: t
1169  defdelegate trim_leading(string), to: String.Break
1170
1171  @doc """
1172  Returns a string where all leading `to_trim` characters have been removed.
1173
1174  ## Examples
1175
1176      iex> String.trim_leading("__ abc _", "_")
1177      " abc _"
1178
1179      iex> String.trim_leading("1 abc", "11")
1180      "1 abc"
1181
1182  """
1183  @spec trim_leading(t, t) :: t
1184  def trim_leading(string, to_trim)
1185      when is_binary(string) and is_binary(to_trim) do
1186    replace_leading(string, to_trim, "")
1187  end
1188
1189  @doc ~S"""
1190  Returns a string where all trailing Unicode whitespaces
1191  has been removed.
1192
1193  ## Examples
1194
1195      iex> String.trim_trailing("   abc\n  ")
1196      "   abc"
1197
1198  """
1199  @spec trim_trailing(t) :: t
1200  defdelegate trim_trailing(string), to: String.Break
1201
1202  @doc """
1203  Returns a string where all trailing `to_trim` characters have been removed.
1204
1205  ## Examples
1206
1207      iex> String.trim_trailing("_ abc __", "_")
1208      "_ abc "
1209
1210      iex> String.trim_trailing("abc 1", "11")
1211      "abc 1"
1212
1213  """
1214  @spec trim_trailing(t, t) :: t
1215  def trim_trailing(string, to_trim)
1216      when is_binary(string) and is_binary(to_trim) do
1217    replace_trailing(string, to_trim, "")
1218  end
1219
1220  @doc ~S"""
1221  Returns a string where all leading and trailing Unicode whitespaces
1222  have been removed.
1223
1224  ## Examples
1225
1226      iex> String.trim("\n  abc\n  ")
1227      "abc"
1228
1229  """
1230  @spec trim(t) :: t
1231  def trim(string) when is_binary(string) do
1232    string
1233    |> trim_leading()
1234    |> trim_trailing()
1235  end
1236
1237  @doc """
1238  Returns a string where all leading and trailing `to_trim` characters have been
1239  removed.
1240
1241  ## Examples
1242
1243      iex> String.trim("a  abc  a", "a")
1244      "  abc  "
1245
1246  """
1247  @spec trim(t, t) :: t
1248  def trim(string, to_trim) when is_binary(string) and is_binary(to_trim) do
1249    string
1250    |> trim_leading(to_trim)
1251    |> trim_trailing(to_trim)
1252  end
1253
1254  @doc ~S"""
1255  Returns a new string padded with a leading filler
1256  which is made of elements from the `padding`.
1257
1258  Passing a list of strings as `padding` will take one element of the list
1259  for every missing entry. If the list is shorter than the number of inserts,
1260  the filling will start again from the beginning of the list.
1261  Passing a string `padding` is equivalent to passing the list of graphemes in it.
1262  If no `padding` is given, it defaults to whitespace.
1263
1264  When `count` is less than or equal to the length of `string`,
1265  given `string` is returned.
1266
1267  Raises `ArgumentError` if the given `padding` contains a non-string element.
1268
1269  ## Examples
1270
1271      iex> String.pad_leading("abc", 5)
1272      "  abc"
1273
1274      iex> String.pad_leading("abc", 4, "12")
1275      "1abc"
1276
1277      iex> String.pad_leading("abc", 6, "12")
1278      "121abc"
1279
1280      iex> String.pad_leading("abc", 5, ["1", "23"])
1281      "123abc"
1282
1283  """
1284  @spec pad_leading(t, non_neg_integer, t | [t]) :: t
1285  def pad_leading(string, count, padding \\ [" "])
1286
1287  def pad_leading(string, count, padding) when is_binary(padding) do
1288    pad_leading(string, count, graphemes(padding))
1289  end
1290
1291  def pad_leading(string, count, [_ | _] = padding)
1292      when is_binary(string) and is_integer(count) and count >= 0 do
1293    pad(:leading, string, count, padding)
1294  end
1295
1296  @doc ~S"""
1297  Returns a new string padded with a trailing filler
1298  which is made of elements from the `padding`.
1299
1300  Passing a list of strings as `padding` will take one element of the list
1301  for every missing entry. If the list is shorter than the number of inserts,
1302  the filling will start again from the beginning of the list.
1303  Passing a string `padding` is equivalent to passing the list of graphemes in it.
1304  If no `padding` is given, it defaults to whitespace.
1305
1306  When `count` is less than or equal to the length of `string`,
1307  given `string` is returned.
1308
1309  Raises `ArgumentError` if the given `padding` contains a non-string element.
1310
1311  ## Examples
1312
1313      iex> String.pad_trailing("abc", 5)
1314      "abc  "
1315
1316      iex> String.pad_trailing("abc", 4, "12")
1317      "abc1"
1318
1319      iex> String.pad_trailing("abc", 6, "12")
1320      "abc121"
1321
1322      iex> String.pad_trailing("abc", 5, ["1", "23"])
1323      "abc123"
1324
1325  """
1326  @spec pad_trailing(t, non_neg_integer, t | [t]) :: t
1327  def pad_trailing(string, count, padding \\ [" "])
1328
1329  def pad_trailing(string, count, padding) when is_binary(padding) do
1330    pad_trailing(string, count, graphemes(padding))
1331  end
1332
1333  def pad_trailing(string, count, [_ | _] = padding)
1334      when is_binary(string) and is_integer(count) and count >= 0 do
1335    pad(:trailing, string, count, padding)
1336  end
1337
1338  defp pad(kind, string, count, padding) do
1339    string_length = length(string)
1340
1341    if string_length >= count do
1342      string
1343    else
1344      filler = build_filler(count - string_length, padding, padding, 0, [])
1345
1346      case kind do
1347        :leading -> [filler | string]
1348        :trailing -> [string | filler]
1349      end
1350      |> IO.iodata_to_binary()
1351    end
1352  end
1353
1354  defp build_filler(0, _source, _padding, _size, filler), do: filler
1355
1356  defp build_filler(count, source, [], size, filler) do
1357    rem_filler =
1358      rem(count, size)
1359      |> build_filler(source, source, 0, [])
1360
1361    filler =
1362      filler
1363      |> IO.iodata_to_binary()
1364      |> duplicate(div(count, size) + 1)
1365
1366    [filler | rem_filler]
1367  end
1368
1369  defp build_filler(count, source, [elem | rest], size, filler)
1370       when is_binary(elem) do
1371    build_filler(count - 1, source, rest, size + 1, [filler | elem])
1372  end
1373
1374  defp build_filler(_count, _source, [elem | _rest], _size, _filler) do
1375    raise ArgumentError, "expected a string padding element, got: #{inspect(elem)}"
1376  end
1377
1378  @doc false
1379  @deprecated "Use String.pad_leading/2 instead"
1380  def rjust(subject, length) do
1381    rjust(subject, length, ?\s)
1382  end
1383
1384  @doc false
1385  @deprecated "Use String.pad_leading/3 with a binary padding instead"
1386  def rjust(subject, length, pad) when is_integer(pad) and is_integer(length) and length >= 0 do
1387    pad(:leading, subject, length, [<<pad::utf8>>])
1388  end
1389
1390  @doc false
1391  @deprecated "Use String.pad_trailing/2 instead"
1392  def ljust(subject, length) do
1393    ljust(subject, length, ?\s)
1394  end
1395
1396  @doc false
1397  @deprecated "Use String.pad_trailing/3 with a binary padding instead"
1398  def ljust(subject, length, pad) when is_integer(pad) and is_integer(length) and length >= 0 do
1399    pad(:trailing, subject, length, [<<pad::utf8>>])
1400  end
1401
1402  @doc ~S"""
1403  Returns a new string created by replacing occurrences of `pattern` in
1404  `subject` with `replacement`.
1405
1406  The `subject` is always a string.
1407
1408  The `pattern` may be a string, a list of strings, a regular expression, or a
1409  compiled pattern.
1410
1411  The `replacement` may be a string or a function that receives the matched
1412  pattern and must return the replacement as a string or iodata.
1413
1414  By default it replaces all occurrences but this behaviour can be controlled
1415  through the `:global` option; see the "Options" section below.
1416
1417  ## Options
1418
1419    * `:global` - (boolean) if `true`, all occurrences of `pattern` are replaced
1420      with `replacement`, otherwise only the first occurrence is
1421      replaced. Defaults to `true`
1422
1423  ## Examples
1424
1425      iex> String.replace("a,b,c", ",", "-")
1426      "a-b-c"
1427
1428      iex> String.replace("a,b,c", ",", "-", global: false)
1429      "a-b,c"
1430
1431  The pattern may also be a list of strings and the replacement may also
1432  be a function that receives the matches:
1433
1434      iex> String.replace("a,b,c", ["a", "c"], fn <<char>> -> <<char + 1>> end)
1435      "b,b,d"
1436
1437  When the pattern is a regular expression, one can give `\N` or
1438  `\g{N}` in the `replacement` string to access a specific capture in the
1439  regular expression:
1440
1441      iex> String.replace("a,b,c", ~r/,(.)/, ",\\1\\g{1}")
1442      "a,bb,cc"
1443
1444  Note that we had to escape the backslash escape character (i.e., we used `\\N`
1445  instead of just `\N` to escape the backslash; same thing for `\\g{N}`). By
1446  giving `\0`, one can inject the whole match in the replacement string.
1447
1448  A compiled pattern can also be given:
1449
1450      iex> pattern = :binary.compile_pattern(",")
1451      iex> String.replace("a,b,c", pattern, "[]")
1452      "a[]b[]c"
1453
1454  When an empty string is provided as a `pattern`, the function will treat it as
1455  an implicit empty string between each grapheme and the string will be
1456  interspersed. If an empty string is provided as `replacement` the `subject`
1457  will be returned:
1458
1459      iex> String.replace("ELIXIR", "", ".")
1460      ".E.L.I.X.I.R."
1461
1462      iex> String.replace("ELIXIR", "", "")
1463      "ELIXIR"
1464
1465  """
1466  @spec replace(t, pattern | Regex.t(), t | (t -> t | iodata), keyword) :: t
1467  def replace(subject, pattern, replacement, options \\ [])
1468      when is_binary(subject) and
1469             (is_binary(replacement) or is_function(replacement, 1)) and
1470             is_list(options) do
1471    replace_guarded(subject, pattern, replacement, options)
1472  end
1473
1474  defp replace_guarded(subject, %{__struct__: Regex} = regex, replacement, options) do
1475    Regex.replace(regex, subject, replacement, options)
1476  end
1477
1478  defp replace_guarded(subject, "", "", _) do
1479    subject
1480  end
1481
1482  defp replace_guarded(subject, "", replacement_binary, options)
1483       when is_binary(replacement_binary) do
1484    if Keyword.get(options, :global, true) do
1485      IO.iodata_to_binary([replacement_binary | intersperse_bin(subject, replacement_binary)])
1486    else
1487      replacement_binary <> subject
1488    end
1489  end
1490
1491  defp replace_guarded(subject, "", replacement_fun, options) do
1492    if Keyword.get(options, :global, true) do
1493      IO.iodata_to_binary([replacement_fun.("") | intersperse_fun(subject, replacement_fun)])
1494    else
1495      IO.iodata_to_binary([replacement_fun.("") | subject])
1496    end
1497  end
1498
1499  defp replace_guarded(subject, pattern, replacement, options) do
1500    if insert = Keyword.get(options, :insert_replaced) do
1501      IO.warn(
1502        "String.replace/4 with :insert_replaced option is deprecated. " <>
1503          "Please use :binary.replace/4 instead or pass an anonymous function as replacement"
1504      )
1505
1506      binary_options = if Keyword.get(options, :global) != false, do: [:global], else: []
1507      :binary.replace(subject, pattern, replacement, [insert_replaced: insert] ++ binary_options)
1508    else
1509      matches =
1510        if Keyword.get(options, :global, true) do
1511          :binary.matches(subject, pattern)
1512        else
1513          case :binary.match(subject, pattern) do
1514            :nomatch -> []
1515            match -> [match]
1516          end
1517        end
1518
1519      IO.iodata_to_binary(do_replace(subject, matches, replacement, 0))
1520    end
1521  end
1522
1523  defp intersperse_bin(subject, replacement) do
1524    case next_grapheme(subject) do
1525      {current, rest} -> [current, replacement | intersperse_bin(rest, replacement)]
1526      nil -> []
1527    end
1528  end
1529
1530  defp intersperse_fun(subject, replacement) do
1531    case next_grapheme(subject) do
1532      {current, rest} -> [current, replacement.("") | intersperse_fun(rest, replacement)]
1533      nil -> []
1534    end
1535  end
1536
1537  defp do_replace(subject, [], _, n) do
1538    [binary_part(subject, n, byte_size(subject) - n)]
1539  end
1540
1541  defp do_replace(subject, [{start, length} | matches], replacement, n) do
1542    prefix = binary_part(subject, n, start - n)
1543
1544    middle =
1545      if is_binary(replacement) do
1546        replacement
1547      else
1548        replacement.(binary_part(subject, start, length))
1549      end
1550
1551    [prefix, middle | do_replace(subject, matches, replacement, start + length)]
1552  end
1553
1554  @doc ~S"""
1555  Reverses the graphemes in given string.
1556
1557  ## Examples
1558
1559      iex> String.reverse("abcd")
1560      "dcba"
1561
1562      iex> String.reverse("hello world")
1563      "dlrow olleh"
1564
1565      iex> String.reverse("hello ∂og")
1566      "go∂ olleh"
1567
1568  Keep in mind reversing the same string twice does
1569  not necessarily yield the original string:
1570
1571      iex> "̀e"
1572      "̀e"
1573      iex> String.reverse("̀e")
1574      "è"
1575      iex> String.reverse(String.reverse("̀e"))
1576      "è"
1577
1578  In the first example the accent is before the vowel, so
1579  it is considered two graphemes. However, when you reverse
1580  it once, you have the vowel followed by the accent, which
1581  becomes one grapheme. Reversing it again will keep it as
1582  one single grapheme.
1583  """
1584  @spec reverse(t) :: t
1585  def reverse(string) when is_binary(string) do
1586    do_reverse(next_grapheme(string), [])
1587  end
1588
1589  defp do_reverse({grapheme, rest}, acc) do
1590    do_reverse(next_grapheme(rest), [grapheme | acc])
1591  end
1592
1593  defp do_reverse(nil, acc), do: IO.iodata_to_binary(acc)
1594
1595  @compile {:inline, duplicate: 2}
1596
1597  @doc """
1598  Returns a string `subject` repeated `n` times.
1599
1600  Inlined by the compiler.
1601
1602  ## Examples
1603
1604      iex> String.duplicate("abc", 0)
1605      ""
1606
1607      iex> String.duplicate("abc", 1)
1608      "abc"
1609
1610      iex> String.duplicate("abc", 2)
1611      "abcabc"
1612
1613  """
1614  @spec duplicate(t, non_neg_integer) :: t
1615  def duplicate(subject, n) when is_binary(subject) and is_integer(n) and n >= 0 do
1616    :binary.copy(subject, n)
1617  end
1618
1619  @doc ~S"""
1620  Returns a list of code points encoded as strings.
1621
1622  To retrieve code points in their natural integer
1623  representation, see `to_charlist/1`. For details about
1624  code points and graphemes, see the `String` module
1625  documentation.
1626
1627  ## Examples
1628
1629      iex> String.codepoints("olá")
1630      ["o", "l", "á"]
1631
1632      iex> String.codepoints("оптими зации")
1633      ["о", "п", "т", "и", "м", "и", " ", "з", "а", "ц", "и", "и"]
1634
1635      iex> String.codepoints("ἅἪῼ")
1636      ["ἅ", "Ἢ", "ῼ"]
1637
1638      iex> String.codepoints("\u00e9")
1639      ["é"]
1640
1641      iex> String.codepoints("\u0065\u0301")
1642      ["e", "́"]
1643
1644  """
1645  @spec codepoints(t) :: [codepoint]
1646  defdelegate codepoints(string), to: String.Unicode
1647
1648  @doc ~S"""
1649  Returns the next code point in a string.
1650
1651  The result is a tuple with the code point and the
1652  remainder of the string or `nil` in case
1653  the string reached its end.
1654
1655  As with other functions in the `String` module, `next_codepoint/1`
1656  works with binaries that are invalid UTF-8. If the string starts
1657  with a sequence of bytes that is not valid in UTF-8 encoding, the
1658  first element of the returned tuple is a binary with the first byte.
1659
1660  ## Examples
1661
1662      iex> String.next_codepoint("olá")
1663      {"o", "lá"}
1664
1665      iex> invalid = "\x80\x80OK" # first two bytes are invalid in UTF-8
1666      iex> {_, rest} = String.next_codepoint(invalid)
1667      {<<128>>, <<128, 79, 75>>}
1668      iex> String.next_codepoint(rest)
1669      {<<128>>, "OK"}
1670
1671  ## Comparison with binary pattern matching
1672
1673  Binary pattern matching provides a similar way to decompose
1674  a string:
1675
1676      iex> <<codepoint::utf8, rest::binary>> = "Elixir"
1677      "Elixir"
1678      iex> codepoint
1679      69
1680      iex> rest
1681      "lixir"
1682
1683  though not entirely equivalent because `codepoint` comes as
1684  an integer, and the pattern won't match invalid UTF-8.
1685
1686  Binary pattern matching, however, is simpler and more efficient,
1687  so pick the option that better suits your use case.
1688  """
1689  @compile {:inline, next_codepoint: 1}
1690  @spec next_codepoint(t) :: {codepoint, t} | nil
1691  defdelegate next_codepoint(string), to: String.Unicode
1692
1693  @doc ~S"""
1694  Checks whether `string` contains only valid characters.
1695
1696  ## Examples
1697
1698      iex> String.valid?("a")
1699      true
1700
1701      iex> String.valid?("ø")
1702      true
1703
1704      iex> String.valid?(<<0xFFFF::16>>)
1705      false
1706
1707      iex> String.valid?(<<0xEF, 0xB7, 0x90>>)
1708      true
1709
1710      iex> String.valid?("asd" <> <<0xFFFF::16>>)
1711      false
1712
1713  """
1714  @spec valid?(t) :: boolean
1715  def valid?(<<string::binary>>), do: valid_utf8?(string)
1716  def valid?(_), do: false
1717
1718  defp valid_utf8?(<<_::utf8, rest::bits>>), do: valid_utf8?(rest)
1719  defp valid_utf8?(<<>>), do: true
1720  defp valid_utf8?(_), do: false
1721
1722  @doc false
1723  @deprecated "Use String.valid?/1 instead"
1724  def valid_character?(string) do
1725    case string do
1726      <<_::utf8>> -> valid?(string)
1727      _ -> false
1728    end
1729  end
1730
1731  @doc ~S"""
1732  Splits the string into chunks of characters that share a common trait.
1733
1734  The trait can be one of two options:
1735
1736    * `:valid` - the string is split into chunks of valid and invalid
1737      character sequences
1738
1739    * `:printable` - the string is split into chunks of printable and
1740      non-printable character sequences
1741
1742  Returns a list of binaries each of which contains only one kind of
1743  characters.
1744
1745  If the given string is empty, an empty list is returned.
1746
1747  ## Examples
1748
1749      iex> String.chunk(<<?a, ?b, ?c, 0>>, :valid)
1750      ["abc\0"]
1751
1752      iex> String.chunk(<<?a, ?b, ?c, 0, 0xFFFF::utf16>>, :valid)
1753      ["abc\0", <<0xFFFF::utf16>>]
1754
1755      iex> String.chunk(<<?a, ?b, ?c, 0, 0x0FFFF::utf8>>, :printable)
1756      ["abc", <<0, 0x0FFFF::utf8>>]
1757
1758  """
1759  @spec chunk(t, :valid | :printable) :: [t]
1760
1761  def chunk(string, trait)
1762
1763  def chunk("", _), do: []
1764
1765  def chunk(string, trait) when is_binary(string) and trait in [:valid, :printable] do
1766    {cp, _} = next_codepoint(string)
1767    pred_fn = make_chunk_pred(trait)
1768    do_chunk(string, pred_fn.(cp), pred_fn)
1769  end
1770
1771  defp do_chunk(string, flag, pred_fn), do: do_chunk(string, [], <<>>, flag, pred_fn)
1772
1773  defp do_chunk(<<>>, acc, <<>>, _, _), do: Enum.reverse(acc)
1774
1775  defp do_chunk(<<>>, acc, chunk, _, _), do: Enum.reverse(acc, [chunk])
1776
1777  defp do_chunk(string, acc, chunk, flag, pred_fn) do
1778    {cp, rest} = next_codepoint(string)
1779
1780    if pred_fn.(cp) != flag do
1781      do_chunk(rest, [chunk | acc], cp, not flag, pred_fn)
1782    else
1783      do_chunk(rest, acc, chunk <> cp, flag, pred_fn)
1784    end
1785  end
1786
1787  defp make_chunk_pred(:valid), do: &valid?/1
1788  defp make_chunk_pred(:printable), do: &printable?/1
1789
1790  @doc ~S"""
1791  Returns Unicode graphemes in the string as per Extended Grapheme
1792  Cluster algorithm.
1793
1794  The algorithm is outlined in the [Unicode Standard Annex #29,
1795  Unicode Text Segmentation](https://www.unicode.org/reports/tr29/).
1796
1797  For details about code points and graphemes, see the `String` module documentation.
1798
1799  ## Examples
1800
1801      iex> String.graphemes("Ńaïve")
1802      ["Ń", "a", "ï", "v", "e"]
1803
1804      iex> String.graphemes("\u00e9")
1805      ["é"]
1806
1807      iex> String.graphemes("\u0065\u0301")
1808      ["é"]
1809
1810  """
1811  @spec graphemes(t) :: [grapheme]
1812  defdelegate graphemes(string), to: String.Unicode
1813
1814  @compile {:inline, next_grapheme: 1, next_grapheme_size: 1}
1815
1816  @doc """
1817  Returns the next grapheme in a string.
1818
1819  The result is a tuple with the grapheme and the
1820  remainder of the string or `nil` in case
1821  the String reached its end.
1822
1823  ## Examples
1824
1825      iex> String.next_grapheme("olá")
1826      {"o", "lá"}
1827
1828      iex> String.next_grapheme("")
1829      nil
1830
1831  """
1832  @spec next_grapheme(t) :: {grapheme, t} | nil
1833  def next_grapheme(binary) when is_binary(binary) do
1834    case next_grapheme_size(binary) do
1835      {size, rest} -> {binary_part(binary, 0, size), rest}
1836      nil -> nil
1837    end
1838  end
1839
1840  @doc """
1841  Returns the size (in bytes) of the next grapheme.
1842
1843  The result is a tuple with the next grapheme size in bytes and
1844  the remainder of the string or `nil` in case the string
1845  reached its end.
1846
1847  ## Examples
1848
1849      iex> String.next_grapheme_size("olá")
1850      {1, "lá"}
1851
1852      iex> String.next_grapheme_size("")
1853      nil
1854
1855  """
1856  @spec next_grapheme_size(t) :: {pos_integer, t} | nil
1857  defdelegate next_grapheme_size(string), to: String.Unicode
1858
1859  @doc """
1860  Returns the first grapheme from a UTF-8 string,
1861  `nil` if the string is empty.
1862
1863  ## Examples
1864
1865      iex> String.first("elixir")
1866      "e"
1867
1868      iex> String.first("եոգլի")
1869      "ե"
1870
1871      iex> String.first("")
1872      nil
1873
1874  """
1875  @spec first(t) :: grapheme | nil
1876  def first(string) when is_binary(string) do
1877    case next_grapheme(string) do
1878      {char, _} -> char
1879      nil -> nil
1880    end
1881  end
1882
1883  @doc """
1884  Returns the last grapheme from a UTF-8 string,
1885  `nil` if the string is empty.
1886
1887  ## Examples
1888
1889      iex> String.last("elixir")
1890      "r"
1891
1892      iex> String.last("եոգլի")
1893      "ի"
1894
1895  """
1896  @spec last(t) :: grapheme | nil
1897  def last(string) when is_binary(string) do
1898    do_last(next_grapheme(string), nil)
1899  end
1900
1901  defp do_last({char, rest}, _) do
1902    do_last(next_grapheme(rest), char)
1903  end
1904
1905  defp do_last(nil, last_char), do: last_char
1906
1907  @doc """
1908  Returns the number of Unicode graphemes in a UTF-8 string.
1909
1910  ## Examples
1911
1912      iex> String.length("elixir")
1913      6
1914
1915      iex> String.length("եոգլի")
1916      5
1917
1918  """
1919  @spec length(t) :: non_neg_integer
1920  defdelegate length(string), to: String.Unicode
1921
1922  @doc """
1923  Returns the grapheme at the `position` of the given UTF-8 `string`.
1924  If `position` is greater than `string` length, then it returns `nil`.
1925
1926  ## Examples
1927
1928      iex> String.at("elixir", 0)
1929      "e"
1930
1931      iex> String.at("elixir", 1)
1932      "l"
1933
1934      iex> String.at("elixir", 10)
1935      nil
1936
1937      iex> String.at("elixir", -1)
1938      "r"
1939
1940      iex> String.at("elixir", -10)
1941      nil
1942
1943  """
1944  @spec at(t, integer) :: grapheme | nil
1945
1946  def at(string, position) when is_binary(string) and is_integer(position) and position >= 0 do
1947    do_at(string, position)
1948  end
1949
1950  def at(string, position) when is_binary(string) and is_integer(position) and position < 0 do
1951    position = length(string) + position
1952
1953    case position >= 0 do
1954      true -> do_at(string, position)
1955      false -> nil
1956    end
1957  end
1958
1959  defp do_at(string, position) do
1960    case String.Unicode.split_at(string, position) do
1961      {_, nil} -> nil
1962      {_, rest} -> first(rest)
1963    end
1964  end
1965
1966  @doc """
1967  Returns a substring starting at the offset `start`, and of the given `length`.
1968
1969  If the offset is greater than string length, then it returns `""`.
1970
1971  Remember this function works with Unicode graphemes and considers
1972  the slices to represent grapheme offsets. If you want to split
1973  on raw bytes, check `Kernel.binary_part/3` instead.
1974
1975  ## Examples
1976
1977      iex> String.slice("elixir", 1, 3)
1978      "lix"
1979
1980      iex> String.slice("elixir", 1, 10)
1981      "lixir"
1982
1983      iex> String.slice("elixir", 10, 3)
1984      ""
1985
1986      iex> String.slice("elixir", -4, 4)
1987      "ixir"
1988
1989      iex> String.slice("elixir", -10, 3)
1990      ""
1991
1992      iex> String.slice("a", 0, 1500)
1993      "a"
1994
1995      iex> String.slice("a", 1, 1500)
1996      ""
1997
1998      iex> String.slice("a", 2, 1500)
1999      ""
2000
2001  """
2002  @spec slice(t, integer, non_neg_integer) :: grapheme
2003
2004  def slice(_, _, 0) do
2005    ""
2006  end
2007
2008  def slice(string, start, length)
2009      when is_binary(string) and is_integer(start) and is_integer(length) and start >= 0 and
2010             length >= 0 do
2011    case String.Unicode.split_at(string, start) do
2012      {_, nil} ->
2013        ""
2014
2015      {start_bytes, rest} ->
2016        {len_bytes, _} = String.Unicode.split_at(rest, length)
2017        binary_part(string, start_bytes, len_bytes)
2018    end
2019  end
2020
2021  def slice(string, start, length)
2022      when is_binary(string) and is_integer(start) and is_integer(length) and start < 0 and
2023             length >= 0 do
2024    start = length(string) + start
2025
2026    case start >= 0 do
2027      true -> slice(string, start, length)
2028      false -> ""
2029    end
2030  end
2031
2032  @doc """
2033  Returns a substring from the offset given by the start of the
2034  range to the offset given by the end of the range.
2035
2036  If the start of the range is not a valid offset for the given
2037  string or if the range is in reverse order, returns `""`.
2038
2039  If the start or end of the range is negative, the whole string
2040  is traversed first in order to convert the negative indices into
2041  positive ones.
2042
2043  Remember this function works with Unicode graphemes and considers
2044  the slices to represent grapheme offsets. If you want to split
2045  on raw bytes, check `Kernel.binary_part/3` instead.
2046
2047  ## Examples
2048
2049      iex> String.slice("elixir", 1..3)
2050      "lix"
2051
2052      iex> String.slice("elixir", 1..10)
2053      "lixir"
2054
2055      iex> String.slice("elixir", -4..-1)
2056      "ixir"
2057
2058      iex> String.slice("elixir", -4..6)
2059      "ixir"
2060
2061  For ranges where `start > stop`, you need to explicit
2062  mark them as increasing:
2063
2064      iex> String.slice("elixir", 2..-1//1)
2065      "ixir"
2066
2067      iex> String.slice("elixir", 1..-2//1)
2068      "lixi"
2069
2070  If values are out of bounds, it returns an empty string:
2071
2072      iex> String.slice("elixir", 10..3)
2073      ""
2074
2075      iex> String.slice("elixir", -10..-7)
2076      ""
2077
2078      iex> String.slice("a", 0..1500)
2079      "a"
2080
2081      iex> String.slice("a", 1..1500)
2082      ""
2083
2084  """
2085  @spec slice(t, Range.t()) :: t
2086  def slice(string, first..last//step = range) when is_binary(string) do
2087    # TODO: Deprecate negative steps on Elixir v1.16
2088    # TODO: There are two features we can add to slicing ranges:
2089    # 1. We can allow the step to be any positive number
2090    # 2. We can allow slice and reverse at the same time. However, we can't
2091    #    implement so right now. First we will have to raise if a decreasing
2092    #    range is given on Elixir v2.0.
2093    if step == 1 or (step == -1 and first > last) do
2094      slice_range(string, first, last)
2095    else
2096      raise ArgumentError,
2097            "String.slice/2 does not accept ranges with custom steps, got: #{inspect(range)}"
2098    end
2099  end
2100
2101  # TODO: Remove me on v2.0
2102  def slice(string, %{__struct__: Range, first: first, last: last} = range) do
2103    step = if first <= last, do: 1, else: -1
2104    slice(string, Map.put(range, :step, step))
2105  end
2106
2107  defp slice_range("", _, _), do: ""
2108
2109  defp slice_range(string, first, -1) when first >= 0 do
2110    case String.Unicode.split_at(string, first) do
2111      {_, nil} -> ""
2112      {start_bytes, _} -> binary_part(string, start_bytes, byte_size(string) - start_bytes)
2113    end
2114  end
2115
2116  defp slice_range(string, first, last) when first >= 0 and last >= 0 do
2117    if last >= first do
2118      slice(string, first, last - first + 1)
2119    else
2120      ""
2121    end
2122  end
2123
2124  defp slice_range(string, first, last) do
2125    {bytes, length} = acc_bytes(next_grapheme_size(string), [], 0)
2126    first = add_if_negative(first, length)
2127    last = add_if_negative(last, length)
2128
2129    if first < 0 or first > last or first > length do
2130      ""
2131    else
2132      last = min(last + 1, length)
2133      bytes = Enum.drop(bytes, length - last)
2134      first = last - first
2135      {length_bytes, start_bytes} = split_bytes(bytes, 0, first)
2136      binary_part(string, start_bytes, length_bytes)
2137    end
2138  end
2139
2140  defp acc_bytes({size, rest}, bytes, length) do
2141    acc_bytes(next_grapheme_size(rest), [size | bytes], length + 1)
2142  end
2143
2144  defp acc_bytes(nil, bytes, length) do
2145    {bytes, length}
2146  end
2147
2148  defp add_if_negative(value, to_add) when value < 0, do: value + to_add
2149  defp add_if_negative(value, _to_add), do: value
2150
2151  defp split_bytes(rest, acc, 0), do: {acc, Enum.sum(rest)}
2152  defp split_bytes([], acc, _), do: {acc, 0}
2153  defp split_bytes([head | tail], acc, count), do: split_bytes(tail, head + acc, count - 1)
2154
2155  @doc """
2156  Returns `true` if `string` starts with any of the prefixes given.
2157
2158  `prefix` can be either a string, a list of strings, or a compiled
2159  pattern.
2160
2161  ## Examples
2162
2163      iex> String.starts_with?("elixir", "eli")
2164      true
2165      iex> String.starts_with?("elixir", ["erlang", "elixir"])
2166      true
2167      iex> String.starts_with?("elixir", ["erlang", "ruby"])
2168      false
2169
2170  A compiled pattern can also be given:
2171
2172      iex> pattern = :binary.compile_pattern(["erlang", "elixir"])
2173      iex> String.starts_with?("elixir", pattern)
2174      true
2175
2176  An empty string will always match:
2177
2178      iex> String.starts_with?("elixir", "")
2179      true
2180      iex> String.starts_with?("elixir", ["", "other"])
2181      true
2182
2183  """
2184  @spec starts_with?(t, pattern) :: boolean
2185  def starts_with?(string, prefix) when is_binary(string) and is_binary(prefix) do
2186    starts_with_string?(string, byte_size(string), prefix)
2187  end
2188
2189  def starts_with?(string, prefix) when is_binary(string) and is_list(prefix) do
2190    string_size = byte_size(string)
2191    Enum.any?(prefix, &starts_with_string?(string, string_size, &1))
2192  end
2193
2194  def starts_with?(string, prefix) when is_binary(string) do
2195    Kernel.match?({0, _}, :binary.match(string, prefix))
2196  end
2197
2198  @compile {:inline, starts_with_string?: 3}
2199  defp starts_with_string?(string, string_size, prefix) when is_binary(prefix) do
2200    prefix_size = byte_size(prefix)
2201
2202    if prefix_size <= string_size do
2203      prefix == binary_part(string, 0, prefix_size)
2204    else
2205      false
2206    end
2207  end
2208
2209  @doc """
2210  Returns `true` if `string` ends with any of the suffixes given.
2211
2212  `suffixes` can be either a single suffix or a list of suffixes.
2213
2214  ## Examples
2215
2216      iex> String.ends_with?("language", "age")
2217      true
2218      iex> String.ends_with?("language", ["youth", "age"])
2219      true
2220      iex> String.ends_with?("language", ["youth", "elixir"])
2221      false
2222
2223  An empty suffix will always match:
2224
2225      iex> String.ends_with?("language", "")
2226      true
2227      iex> String.ends_with?("language", ["", "other"])
2228      true
2229
2230  """
2231  @spec ends_with?(t, t | [t]) :: boolean
2232  def ends_with?(string, suffix) when is_binary(string) and is_binary(suffix) do
2233    ends_with_string?(string, byte_size(string), suffix)
2234  end
2235
2236  def ends_with?(string, suffix) when is_binary(string) and is_list(suffix) do
2237    string_size = byte_size(string)
2238    Enum.any?(suffix, &ends_with_string?(string, string_size, &1))
2239  end
2240
2241  @compile {:inline, ends_with_string?: 3}
2242  defp ends_with_string?(string, string_size, suffix) when is_binary(suffix) do
2243    suffix_size = byte_size(suffix)
2244
2245    if suffix_size <= string_size do
2246      suffix == binary_part(string, string_size - suffix_size, suffix_size)
2247    else
2248      false
2249    end
2250  end
2251
2252  @doc """
2253  Checks if `string` matches the given regular expression.
2254
2255  ## Examples
2256
2257      iex> String.match?("foo", ~r/foo/)
2258      true
2259
2260      iex> String.match?("bar", ~r/foo/)
2261      false
2262
2263  """
2264  @spec match?(t, Regex.t()) :: boolean
2265  def match?(string, regex) when is_binary(string) do
2266    Regex.match?(regex, string)
2267  end
2268
2269  @doc """
2270  Checks if `string` contains any of the given `contents`.
2271
2272  `contents` can be either a string, a list of strings,
2273  or a compiled pattern.
2274
2275  ## Examples
2276
2277      iex> String.contains?("elixir of life", "of")
2278      true
2279      iex> String.contains?("elixir of life", ["life", "death"])
2280      true
2281      iex> String.contains?("elixir of life", ["death", "mercury"])
2282      false
2283
2284  The argument can also be a compiled pattern:
2285
2286      iex> pattern = :binary.compile_pattern(["life", "death"])
2287      iex> String.contains?("elixir of life", pattern)
2288      true
2289
2290  An empty string will always match:
2291
2292      iex> String.contains?("elixir of life", "")
2293      true
2294      iex> String.contains?("elixir of life", ["", "other"])
2295      true
2296
2297  Be aware that this function can match within or across grapheme boundaries.
2298  For example, take the grapheme "é" which is made of the characters
2299  "e" and the acute accent. The following returns `true`:
2300
2301      iex> String.contains?(String.normalize("é", :nfd), "e")
2302      true
2303
2304  However, if "é" is represented by the single character "e with acute"
2305  accent, then it will return `false`:
2306
2307      iex> String.contains?(String.normalize("é", :nfc), "e")
2308      false
2309
2310  """
2311  @spec contains?(t, pattern) :: boolean
2312  def contains?(string, []) when is_binary(string) do
2313    false
2314  end
2315
2316  def contains?(string, contents) when is_binary(string) and is_list(contents) do
2317    "" in contents or :binary.match(string, contents) != :nomatch
2318  end
2319
2320  def contains?(string, contents) when is_binary(string) do
2321    "" == contents or :binary.match(string, contents) != :nomatch
2322  end
2323
2324  @doc """
2325  Converts a string into a charlist.
2326
2327  Specifically, this function takes a UTF-8 encoded binary and returns a list of its integer
2328  code points. It is similar to `codepoints/1` except that the latter returns a list of code points as
2329  strings.
2330
2331  In case you need to work with bytes, take a look at the
2332  [`:binary` module](`:binary`).
2333
2334  ## Examples
2335
2336      iex> String.to_charlist("æß")
2337      'æß'
2338
2339  """
2340  @spec to_charlist(t) :: charlist
2341  def to_charlist(string) when is_binary(string) do
2342    case :unicode.characters_to_list(string) do
2343      result when is_list(result) ->
2344        result
2345
2346      {:error, encoded, rest} ->
2347        raise UnicodeConversionError, encoded: encoded, rest: rest, kind: :invalid
2348
2349      {:incomplete, encoded, rest} ->
2350        raise UnicodeConversionError, encoded: encoded, rest: rest, kind: :incomplete
2351    end
2352  end
2353
2354  @doc """
2355  Converts a string to an atom.
2356
2357  Warning: this function creates atoms dynamically and atoms are
2358  not garbage-collected. Therefore, `string` should not be an
2359  untrusted value, such as input received from a socket or during
2360  a web request. Consider using `to_existing_atom/1` instead.
2361
2362  By default, the maximum number of atoms is `1_048_576`. This limit
2363  can be raised or lowered using the VM option `+t`.
2364
2365  The maximum atom size is of 255 Unicode code points.
2366
2367  Inlined by the compiler.
2368
2369  ## Examples
2370
2371      iex> String.to_atom("my_atom")
2372      :my_atom
2373
2374  """
2375  @spec to_atom(String.t()) :: atom
2376  def to_atom(string) when is_binary(string) do
2377    :erlang.binary_to_atom(string, :utf8)
2378  end
2379
2380  @doc """
2381  Converts a string to an existing atom.
2382
2383  The maximum atom size is of 255 Unicode code points.
2384
2385  Inlined by the compiler.
2386
2387  ## Examples
2388
2389      iex> _ = :my_atom
2390      iex> String.to_existing_atom("my_atom")
2391      :my_atom
2392
2393  """
2394  @spec to_existing_atom(String.t()) :: atom
2395  def to_existing_atom(string) when is_binary(string) do
2396    :erlang.binary_to_existing_atom(string, :utf8)
2397  end
2398
2399  @doc """
2400  Returns an integer whose text representation is `string`.
2401
2402  `string` must be the string representation of an integer.
2403  Otherwise, an `ArgumentError` will be raised. If you want
2404  to parse a string that may contain an ill-formatted integer,
2405  use `Integer.parse/1`.
2406
2407  Inlined by the compiler.
2408
2409  ## Examples
2410
2411      iex> String.to_integer("123")
2412      123
2413
2414  Passing a string that does not represent an integer leads to an error:
2415
2416      String.to_integer("invalid data")
2417      ** (ArgumentError) argument error
2418
2419  """
2420  @spec to_integer(String.t()) :: integer
2421  def to_integer(string) when is_binary(string) do
2422    :erlang.binary_to_integer(string)
2423  end
2424
2425  @doc """
2426  Returns an integer whose text representation is `string` in base `base`.
2427
2428  Inlined by the compiler.
2429
2430  ## Examples
2431
2432      iex> String.to_integer("3FF", 16)
2433      1023
2434
2435  """
2436  @spec to_integer(String.t(), 2..36) :: integer
2437  def to_integer(string, base) when is_binary(string) and is_integer(base) do
2438    :erlang.binary_to_integer(string, base)
2439  end
2440
2441  @doc """
2442  Returns a float whose text representation is `string`.
2443
2444  `string` must be the string representation of a float including a decimal point.
2445  In order to parse a string without decimal point as a float then `Float.parse/1`
2446  should be used. Otherwise, an `ArgumentError` will be raised.
2447
2448  Inlined by the compiler.
2449
2450  ## Examples
2451
2452      iex> String.to_float("2.2017764e+0")
2453      2.2017764
2454
2455      iex> String.to_float("3.0")
2456      3.0
2457
2458      String.to_float("3")
2459      ** (ArgumentError) argument error
2460
2461  """
2462  @spec to_float(String.t()) :: float
2463  def to_float(string) when is_binary(string) do
2464    :erlang.binary_to_float(string)
2465  end
2466
2467  @doc """
2468  Computes the bag distance between two strings.
2469
2470  Returns a float value between 0 and 1 representing the bag
2471  distance between `string1` and `string2`.
2472
2473  The bag distance is meant to be an efficient approximation
2474  of the distance between two strings to quickly rule out strings
2475  that are largely different.
2476
2477  The algorithm is outlined in the "String Matching with Metric
2478  Trees Using an Approximate Distance" paper by Ilaria Bartolini,
2479  Paolo Ciaccia, and Marco Patella.
2480
2481  ## Examples
2482
2483      iex> String.bag_distance("abc", "")
2484      0.0
2485      iex> String.bag_distance("abcd", "a")
2486      0.25
2487      iex> String.bag_distance("abcd", "ab")
2488      0.5
2489      iex> String.bag_distance("abcd", "abc")
2490      0.75
2491      iex> String.bag_distance("abcd", "abcd")
2492      1.0
2493
2494  """
2495  @spec bag_distance(t, t) :: float
2496  @doc since: "1.8.0"
2497  def bag_distance(_string, ""), do: 0.0
2498  def bag_distance("", _string), do: 0.0
2499
2500  def bag_distance(string1, string2) when is_binary(string1) and is_binary(string2) do
2501    {bag1, length1} = string_to_bag(string1, %{}, 0)
2502    {bag2, length2} = string_to_bag(string2, %{}, 0)
2503
2504    diff1 = bag_difference(bag1, bag2)
2505    diff2 = bag_difference(bag2, bag1)
2506
2507    1 - max(diff1, diff2) / max(length1, length2)
2508  end
2509
2510  defp string_to_bag(string, bag, length) do
2511    case next_grapheme(string) do
2512      {char, rest} ->
2513        bag =
2514          case bag do
2515            %{^char => current} -> %{bag | char => current + 1}
2516            %{} -> Map.put(bag, char, 1)
2517          end
2518
2519        string_to_bag(rest, bag, length + 1)
2520
2521      nil ->
2522        {bag, length}
2523    end
2524  end
2525
2526  defp bag_difference(bag1, bag2) do
2527    Enum.reduce(bag1, 0, fn {char, count1}, sum ->
2528      case bag2 do
2529        %{^char => count2} -> sum + max(count1 - count2, 0)
2530        %{} -> sum + count1
2531      end
2532    end)
2533  end
2534
2535  @doc """
2536  Computes the Jaro distance (similarity) between two strings.
2537
2538  Returns a float value between `0.0` (equates to no similarity) and `1.0`
2539  (is an exact match) representing [Jaro](https://en.wikipedia.org/wiki/Jaro-Winkler_distance)
2540  distance between `string1` and `string2`.
2541
2542  The Jaro distance metric is designed and best suited for short
2543  strings such as person names. Elixir itself uses this function
2544  to provide the "did you mean?" functionality. For instance, when you
2545  are calling a function in a module and you have a typo in the
2546  function name, we attempt to suggest the most similar function
2547  name available, if any, based on the `jaro_distance/2` score.
2548
2549  ## Examples
2550
2551      iex> String.jaro_distance("Dwayne", "Duane")
2552      0.8222222222222223
2553      iex> String.jaro_distance("even", "odd")
2554      0.0
2555      iex> String.jaro_distance("same", "same")
2556      1.0
2557
2558  """
2559  @spec jaro_distance(t, t) :: float
2560  def jaro_distance(string1, string2)
2561
2562  def jaro_distance(string, string), do: 1.0
2563  def jaro_distance(_string, ""), do: 0.0
2564  def jaro_distance("", _string), do: 0.0
2565
2566  def jaro_distance(string1, string2) when is_binary(string1) and is_binary(string2) do
2567    {chars1, len1} = chars_and_length(string1)
2568    {chars2, len2} = chars_and_length(string2)
2569
2570    case match(chars1, len1, chars2, len2) do
2571      {0, _trans} ->
2572        0.0
2573
2574      {comm, trans} ->
2575        (comm / len1 + comm / len2 + (comm - trans) / comm) / 3
2576    end
2577  end
2578
2579  @compile {:inline, chars_and_length: 1}
2580  defp chars_and_length(string) do
2581    chars = graphemes(string)
2582    {chars, Kernel.length(chars)}
2583  end
2584
2585  defp match(chars1, len1, chars2, len2) do
2586    if len1 < len2 do
2587      match(chars1, chars2, div(len2, 2) - 1)
2588    else
2589      match(chars2, chars1, div(len1, 2) - 1)
2590    end
2591  end
2592
2593  defp match(chars1, chars2, lim) do
2594    match(chars1, chars2, {0, lim}, {0, 0, -1}, 0)
2595  end
2596
2597  defp match([char | rest], chars, range, state, idx) do
2598    {chars, state} = submatch(char, chars, range, state, idx)
2599
2600    case range do
2601      {lim, lim} -> match(rest, tl(chars), range, state, idx + 1)
2602      {pre, lim} -> match(rest, chars, {pre + 1, lim}, state, idx + 1)
2603    end
2604  end
2605
2606  defp match([], _, _, {comm, trans, _}, _), do: {comm, trans}
2607
2608  defp submatch(char, chars, {pre, _} = range, state, idx) do
2609    case detect(char, chars, range) do
2610      nil ->
2611        {chars, state}
2612
2613      {subidx, chars} ->
2614        {chars, proceed(state, idx - pre + subidx)}
2615    end
2616  end
2617
2618  defp detect(char, chars, {pre, lim}) do
2619    detect(char, chars, pre + 1 + lim, 0, [])
2620  end
2621
2622  defp detect(_char, _chars, 0, _idx, _acc), do: nil
2623  defp detect(_char, [], _lim, _idx, _acc), do: nil
2624
2625  defp detect(char, [char | rest], _lim, idx, acc), do: {idx, Enum.reverse(acc, [nil | rest])}
2626
2627  defp detect(char, [other | rest], lim, idx, acc),
2628    do: detect(char, rest, lim - 1, idx + 1, [other | acc])
2629
2630  defp proceed({comm, trans, former}, current) do
2631    if current < former do
2632      {comm + 1, trans + 1, current}
2633    else
2634      {comm + 1, trans, current}
2635    end
2636  end
2637
2638  @doc """
2639  Returns a keyword list that represents an edit script.
2640
2641  Check `List.myers_difference/2` for more information.
2642
2643  ## Examples
2644
2645      iex> string1 = "fox hops over the dog"
2646      iex> string2 = "fox jumps over the lazy cat"
2647      iex> String.myers_difference(string1, string2)
2648      [eq: "fox ", del: "ho", ins: "jum", eq: "ps over the ", del: "dog", ins: "lazy cat"]
2649
2650  """
2651  @doc since: "1.3.0"
2652  @spec myers_difference(t, t) :: [{:eq | :ins | :del, t}]
2653  def myers_difference(string1, string2) when is_binary(string1) and is_binary(string2) do
2654    graphemes(string1)
2655    |> List.myers_difference(graphemes(string2))
2656    |> Enum.map(fn {kind, chars} -> {kind, IO.iodata_to_binary(chars)} end)
2657  end
2658
2659  @doc false
2660  @deprecated "Use String.to_charlist/1 instead"
2661  @spec to_char_list(t) :: charlist
2662  def to_char_list(string), do: String.to_charlist(string)
2663end
2664