1import Kernel, except: [length: 1] 2 3defmodule String do 4 @moduledoc ~S""" 5 Strings in Elixir are UTF-8 encoded binaries. 6 7 Strings in Elixir are a sequence of Unicode characters, 8 typically written between double quoted strings, such 9 as `"hello"` and `"héllò"`. 10 11 In case a string must have a double-quote in itself, 12 the double quotes must be escaped with a backslash, 13 for example: `"this is a string with \"double quotes\""`. 14 15 You can concatenate two strings with the `<>/2` operator: 16 17 iex> "hello" <> " " <> "world" 18 "hello world" 19 20 ## Interpolation 21 22 Strings in Elixir also support interpolation. This allows 23 you to place some value in the middle of a string by using 24 the `#{}` syntax: 25 26 iex> name = "joe" 27 iex> "hello #{name}" 28 "hello joe" 29 30 Any Elixir expression is valid inside the interpolation. 31 If a string is given, the string is interpolated as is. 32 If any other value is given, Elixir will attempt to convert 33 it to a string using the `String.Chars` protocol. This 34 allows, for example, to output an integer from the interpolation: 35 36 iex> "2 + 2 = #{2 + 2}" 37 "2 + 2 = 4" 38 39 In case the value you want to interpolate cannot be 40 converted to a string, because it doesn't have an human 41 textual representation, a protocol error will be raised. 42 43 ## Escape characters 44 45 Besides allowing double-quotes to be escaped with a backslash, 46 strings also support the following escape characters: 47 48 * `\a` - Bell 49 * `\b` - Backspace 50 * `\t` - Horizontal tab 51 * `\n` - Line feed (New lines) 52 * `\v` - Vertical tab 53 * `\f` - Form feed 54 * `\r` - Carriage return 55 * `\e` - Command Escape 56 * `\#` - Returns the `#` character itself, skipping interpolation 57 * `\xNN` - A byte represented by the hexadecimal `NN` 58 * `\uNNNN` - A Unicode code point represented by `NNNN` 59 60 Note it is generally not advised to use `\xNN` in Elixir 61 strings, as introducing an invalid byte sequence would 62 make the string invalid. If you have to introduce a 63 character by its hexadecimal representation, it is best 64 to work with Unicode code points, such as `\uNNNN`. In fact, 65 understanding Unicode code points can be essential when doing 66 low-level manipulations of string, so let's explore them in 67 detail next. 68 69 ## Code points and grapheme cluster 70 71 The functions in this module act according to the Unicode 72 Standard, version 13.0.0. 73 74 As per the standard, a code point is a single Unicode Character, 75 which may be represented by one or more bytes. 76 77 For example, although the code point "é" is a single character, 78 its underlying representation uses two bytes: 79 80 iex> String.length("é") 81 1 82 iex> byte_size("é") 83 2 84 85 Furthermore, this module also presents the concept of grapheme cluster 86 (from now on referenced as graphemes). Graphemes can consist of multiple 87 code points that may be perceived as a single character by readers. For 88 example, "é" can be represented either as a single "e with acute" code point 89 or as the letter "e" followed by a "combining acute accent" (two code points): 90 91 iex> string = "\u0065\u0301" 92 iex> byte_size(string) 93 3 94 iex> String.length(string) 95 1 96 iex> String.codepoints(string) 97 ["e", "́"] 98 iex> String.graphemes(string) 99 ["é"] 100 101 Although the example above is made of two characters, it is 102 perceived by users as one. 103 104 Graphemes can also be two characters that are interpreted 105 as one by some languages. For example, some languages may 106 consider "ch" as a single character. However, since this 107 information depends on the locale, it is not taken into account 108 by this module. 109 110 In general, the functions in this module rely on the Unicode 111 Standard, but do not contain any of the locale specific behaviour. 112 More information about graphemes can be found in the [Unicode 113 Standard Annex #29](https://www.unicode.org/reports/tr29/). 114 115 For converting a binary to a different encoding and for Unicode 116 normalization mechanisms, see Erlang's `:unicode` module. 117 118 ## String and binary operations 119 120 To act according to the Unicode Standard, many functions 121 in this module run in linear time, as they need to traverse 122 the whole string considering the proper Unicode code points. 123 124 For example, `String.length/1` will take longer as 125 the input grows. On the other hand, `Kernel.byte_size/1` always runs 126 in constant time (i.e. regardless of the input size). 127 128 This means often there are performance costs in using the 129 functions in this module, compared to the more low-level 130 operations that work directly with binaries: 131 132 * `Kernel.binary_part/3` - retrieves part of the binary 133 * `Kernel.bit_size/1` and `Kernel.byte_size/1` - size related functions 134 * `Kernel.is_bitstring/1` and `Kernel.is_binary/1` - type-check function 135 * Plus a number of functions for working with binaries (bytes) 136 in the [`:binary` module](`:binary`) 137 138 There are many situations where using the `String` module can 139 be avoided in favor of binary functions or pattern matching. 140 For example, imagine you have a string `prefix` and you want to 141 remove this prefix from another string named `full`. 142 143 One may be tempted to write: 144 145 iex> take_prefix = fn full, prefix -> 146 ...> base = String.length(prefix) 147 ...> String.slice(full, base, String.length(full) - base) 148 ...> end 149 iex> take_prefix.("Mr. John", "Mr. ") 150 "John" 151 152 Although the function above works, it performs poorly. To 153 calculate the length of the string, we need to traverse it 154 fully, so we traverse both `prefix` and `full` strings, then 155 slice the `full` one, traversing it again. 156 157 A first attempt at improving it could be with ranges: 158 159 iex> take_prefix = fn full, prefix -> 160 ...> base = String.length(prefix) 161 ...> String.slice(full, base..-1) 162 ...> end 163 iex> take_prefix.("Mr. John", "Mr. ") 164 "John" 165 166 While this is much better (we don't traverse `full` twice), 167 it could still be improved. In this case, since we want to 168 extract a substring from a string, we can use `Kernel.byte_size/1` 169 and `Kernel.binary_part/3` as there is no chance we will slice in 170 the middle of a code point made of more than one byte: 171 172 iex> take_prefix = fn full, prefix -> 173 ...> base = byte_size(prefix) 174 ...> binary_part(full, base, byte_size(full) - base) 175 ...> end 176 iex> take_prefix.("Mr. John", "Mr. ") 177 "John" 178 179 Or simply use pattern matching: 180 181 iex> take_prefix = fn full, prefix -> 182 ...> base = byte_size(prefix) 183 ...> <<_::binary-size(base), rest::binary>> = full 184 ...> rest 185 ...> end 186 iex> take_prefix.("Mr. John", "Mr. ") 187 "John" 188 189 On the other hand, if you want to dynamically slice a string 190 based on an integer value, then using `String.slice/3` is the 191 best option as it guarantees we won't incorrectly split a valid 192 code point into multiple bytes. 193 194 ## Integer code points 195 196 Although code points are represented as integers, this module 197 represents code points in their encoded format as strings. 198 For example: 199 200 iex> String.codepoints("olá") 201 ["o", "l", "á"] 202 203 There are a couple of ways to retrieve the character code point. 204 One may use the `?` construct: 205 206 iex> ?o 207 111 208 209 iex> ?á 210 225 211 212 Or also via pattern matching: 213 214 iex> <<aacute::utf8>> = "á" 215 iex> aacute 216 225 217 218 As we have seen above, code points can be inserted into 219 a string by their hexadecimal code: 220 221 iex> "ol\u00E1" 222 "olá" 223 224 Finally, to convert a String into a list of integer 225 code points, known as "charlists" in Elixir, you can call 226 `String.to_charlist`: 227 228 iex> String.to_charlist("olá") 229 [111, 108, 225] 230 231 ## Self-synchronization 232 233 The UTF-8 encoding is self-synchronizing. This means that 234 if malformed data (i.e., data that is not possible according 235 to the definition of the encoding) is encountered, only one 236 code point needs to be rejected. 237 238 This module relies on this behaviour to ignore such invalid 239 characters. For example, `length/1` will return 240 a correct result even if an invalid code point is fed into it. 241 242 In other words, this module expects invalid data to be detected 243 elsewhere, usually when retrieving data from the external source. 244 For example, a driver that reads strings from a database will be 245 responsible to check the validity of the encoding. `String.chunk/2` 246 can be used for breaking a string into valid and invalid parts. 247 248 ## Compile binary patterns 249 250 Many functions in this module work with patterns. For example, 251 `String.split/3` can split a string into multiple strings given 252 a pattern. This pattern can be a string, a list of strings or 253 a compiled pattern: 254 255 iex> String.split("foo bar", " ") 256 ["foo", "bar"] 257 258 iex> String.split("foo bar!", [" ", "!"]) 259 ["foo", "bar", ""] 260 261 iex> pattern = :binary.compile_pattern([" ", "!"]) 262 iex> String.split("foo bar!", pattern) 263 ["foo", "bar", ""] 264 265 The compiled pattern is useful when the same match will 266 be done over and over again. Note though that the compiled 267 pattern cannot be stored in a module attribute as the pattern 268 is generated at runtime and does not survive compile time. 269 """ 270 271 @typedoc """ 272 A UTF-8 encoded binary. 273 274 The types `String.t()` and `binary()` are equivalent to analysis tools. 275 Although, for those reading the documentation, `String.t()` implies 276 it is a UTF-8 encoded binary. 277 """ 278 @type t :: binary 279 280 @typedoc "A single Unicode code point encoded in UTF-8. It may be one or more bytes." 281 @type codepoint :: t 282 283 @typedoc "Multiple code points that may be perceived as a single character by readers" 284 @type grapheme :: t 285 286 @typedoc "Pattern used in functions like `replace/4` and `split/3`" 287 @type pattern :: t | [t] | :binary.cp() 288 289 @conditional_mappings [:greek, :turkic] 290 291 @doc """ 292 Checks if a string contains only printable characters up to `character_limit`. 293 294 Takes an optional `character_limit` as a second argument. If `character_limit` is `0`, this 295 function will return `true`. 296 297 ## Examples 298 299 iex> String.printable?("abc") 300 true 301 302 iex> String.printable?("abc" <> <<0>>) 303 false 304 305 iex> String.printable?("abc" <> <<0>>, 2) 306 true 307 308 iex> String.printable?("abc" <> <<0>>, 0) 309 true 310 311 """ 312 @spec printable?(t, 0) :: true 313 @spec printable?(t, pos_integer | :infinity) :: boolean 314 def printable?(string, character_limit \\ :infinity) 315 when is_binary(string) and 316 (character_limit == :infinity or 317 (is_integer(character_limit) and character_limit >= 0)) do 318 recur_printable?(string, character_limit) 319 end 320 321 defp recur_printable?(_string, 0), do: true 322 defp recur_printable?(<<>>, _character_limit), do: true 323 324 for char <- 0x20..0x7E do 325 defp recur_printable?(<<unquote(char), rest::binary>>, character_limit) do 326 recur_printable?(rest, decrement(character_limit)) 327 end 328 end 329 330 for char <- '\n\r\t\v\b\f\e\d\a' do 331 defp recur_printable?(<<unquote(char), rest::binary>>, character_limit) do 332 recur_printable?(rest, decrement(character_limit)) 333 end 334 end 335 336 defp recur_printable?(<<char::utf8, rest::binary>>, character_limit) 337 when char in 0xA0..0xD7FF 338 when char in 0xE000..0xFFFD 339 when char in 0x10000..0x10FFFF do 340 recur_printable?(rest, decrement(character_limit)) 341 end 342 343 defp recur_printable?(_string, _character_limit) do 344 false 345 end 346 347 defp decrement(:infinity), do: :infinity 348 defp decrement(character_limit), do: character_limit - 1 349 350 @doc ~S""" 351 Divides a string into substrings at each Unicode whitespace 352 occurrence with leading and trailing whitespace ignored. Groups 353 of whitespace are treated as a single occurrence. Divisions do 354 not occur on non-breaking whitespace. 355 356 ## Examples 357 358 iex> String.split("foo bar") 359 ["foo", "bar"] 360 361 iex> String.split("foo" <> <<194, 133>> <> "bar") 362 ["foo", "bar"] 363 364 iex> String.split(" foo bar ") 365 ["foo", "bar"] 366 367 iex> String.split("no\u00a0break") 368 ["no\u00a0break"] 369 370 """ 371 @spec split(t) :: [t] 372 defdelegate split(binary), to: String.Break 373 374 @doc ~S""" 375 Divides a string into parts based on a pattern. 376 377 Returns a list of these parts. 378 379 The `pattern` may be a string, a list of strings, a regular expression, or a 380 compiled pattern. 381 382 The string is split into as many parts as possible by 383 default, but can be controlled via the `:parts` option. 384 385 Empty strings are only removed from the result if the 386 `:trim` option is set to `true`. 387 388 When the pattern used is a regular expression, the string is 389 split using `Regex.split/3`. 390 391 ## Options 392 393 * `:parts` (positive integer or `:infinity`) - the string 394 is split into at most as many parts as this option specifies. 395 If `:infinity`, the string will be split into all possible 396 parts. Defaults to `:infinity`. 397 398 * `:trim` (boolean) - if `true`, empty strings are removed from 399 the resulting list. 400 401 This function also accepts all options accepted by `Regex.split/3` 402 if `pattern` is a regular expression. 403 404 ## Examples 405 406 Splitting with a string pattern: 407 408 iex> String.split("a,b,c", ",") 409 ["a", "b", "c"] 410 411 iex> String.split("a,b,c", ",", parts: 2) 412 ["a", "b,c"] 413 414 iex> String.split(" a b c ", " ", trim: true) 415 ["a", "b", "c"] 416 417 A list of patterns: 418 419 iex> String.split("1,2 3,4", [" ", ","]) 420 ["1", "2", "3", "4"] 421 422 A regular expression: 423 424 iex> String.split("a,b,c", ~r{,}) 425 ["a", "b", "c"] 426 427 iex> String.split("a,b,c", ~r{,}, parts: 2) 428 ["a", "b,c"] 429 430 iex> String.split(" a b c ", ~r{\s}, trim: true) 431 ["a", "b", "c"] 432 433 iex> String.split("abc", ~r{b}, include_captures: true) 434 ["a", "b", "c"] 435 436 A compiled pattern: 437 438 iex> pattern = :binary.compile_pattern([" ", ","]) 439 iex> String.split("1,2 3,4", pattern) 440 ["1", "2", "3", "4"] 441 442 Splitting on empty string returns graphemes: 443 444 iex> String.split("abc", "") 445 ["", "a", "b", "c", ""] 446 447 iex> String.split("abc", "", trim: true) 448 ["a", "b", "c"] 449 450 iex> String.split("abc", "", parts: 1) 451 ["abc"] 452 453 iex> String.split("abc", "", parts: 3) 454 ["", "a", "bc"] 455 456 Be aware that this function can split within or across grapheme boundaries. 457 For example, take the grapheme "é" which is made of the characters 458 "e" and the acute accent. The following will split the string into two parts: 459 460 iex> String.split(String.normalize("é", :nfd), "e") 461 ["", "́"] 462 463 However, if "é" is represented by the single character "e with acute" 464 accent, then it will split the string into just one part: 465 466 iex> String.split(String.normalize("é", :nfc), "e") 467 ["é"] 468 469 """ 470 @spec split(t, pattern | Regex.t(), keyword) :: [t] 471 def split(string, pattern, options \\ []) 472 473 def split(string, %Regex{} = pattern, options) when is_binary(string) and is_list(options) do 474 Regex.split(pattern, string, options) 475 end 476 477 def split(string, "", options) when is_binary(string) and is_list(options) do 478 parts = Keyword.get(options, :parts, :infinity) 479 index = parts_to_index(parts) 480 trim = Keyword.get(options, :trim, false) 481 482 if trim == false and index != 1 do 483 ["" | split_empty(string, trim, index - 1)] 484 else 485 split_empty(string, trim, index) 486 end 487 end 488 489 def split(string, pattern, options) when is_binary(string) and is_list(options) do 490 parts = Keyword.get(options, :parts, :infinity) 491 trim = Keyword.get(options, :trim, false) 492 493 case {parts, trim} do 494 {:infinity, false} -> 495 :binary.split(string, pattern, [:global]) 496 497 _ -> 498 pattern = maybe_compile_pattern(pattern) 499 split_each(string, pattern, trim, parts_to_index(parts)) 500 end 501 end 502 503 defp parts_to_index(:infinity), do: 0 504 defp parts_to_index(n) when is_integer(n) and n > 0, do: n 505 506 defp split_empty("", true, 1), do: [] 507 defp split_empty(string, _, 1), do: [string] 508 509 defp split_empty(string, trim, count) do 510 case next_grapheme(string) do 511 {h, t} -> [h | split_empty(t, trim, count - 1)] 512 nil -> split_empty("", trim, 1) 513 end 514 end 515 516 defp split_each("", _pattern, true, 1), do: [] 517 defp split_each(string, _pattern, _trim, 1) when is_binary(string), do: [string] 518 519 defp split_each(string, pattern, trim, count) do 520 case do_splitter(string, pattern, trim) do 521 {h, t} -> [h | split_each(t, pattern, trim, count - 1)] 522 nil -> [] 523 end 524 end 525 526 @doc """ 527 Returns an enumerable that splits a string on demand. 528 529 This is in contrast to `split/3` which splits the 530 entire string upfront. 531 532 This function does not support regular expressions 533 by design. When using regular expressions, it is often 534 more efficient to have the regular expressions traverse 535 the string at once than in parts, like this function does. 536 537 ## Options 538 539 * :trim - when `true`, does not emit empty patterns 540 541 ## Examples 542 543 iex> String.splitter("1,2 3,4 5,6 7,8,...,99999", [" ", ","]) |> Enum.take(4) 544 ["1", "2", "3", "4"] 545 546 iex> String.splitter("abcd", "") |> Enum.take(10) 547 ["", "a", "b", "c", "d", ""] 548 549 iex> String.splitter("abcd", "", trim: true) |> Enum.take(10) 550 ["a", "b", "c", "d"] 551 552 A compiled pattern can also be given: 553 554 iex> pattern = :binary.compile_pattern([" ", ","]) 555 iex> String.splitter("1,2 3,4 5,6 7,8,...,99999", pattern) |> Enum.take(4) 556 ["1", "2", "3", "4"] 557 558 """ 559 @spec splitter(t, pattern, keyword) :: Enumerable.t() 560 def splitter(string, pattern, options \\ []) 561 562 def splitter(string, "", options) when is_binary(string) and is_list(options) do 563 if Keyword.get(options, :trim, false) do 564 Stream.unfold(string, &next_grapheme/1) 565 else 566 Stream.unfold(:match, &do_empty_splitter(&1, string)) 567 end 568 end 569 570 def splitter(string, pattern, options) when is_binary(string) and is_list(options) do 571 pattern = maybe_compile_pattern(pattern) 572 trim = Keyword.get(options, :trim, false) 573 Stream.unfold(string, &do_splitter(&1, pattern, trim)) 574 end 575 576 defp do_empty_splitter(:match, string), do: {"", string} 577 defp do_empty_splitter(:nomatch, _string), do: nil 578 defp do_empty_splitter("", _), do: {"", :nomatch} 579 defp do_empty_splitter(string, _), do: next_grapheme(string) 580 581 defp do_splitter(:nomatch, _pattern, _), do: nil 582 defp do_splitter("", _pattern, false), do: {"", :nomatch} 583 defp do_splitter("", _pattern, true), do: nil 584 585 defp do_splitter(bin, pattern, trim) do 586 case :binary.split(bin, pattern) do 587 ["", second] when trim -> do_splitter(second, pattern, trim) 588 [first, second] -> {first, second} 589 [first] -> {first, :nomatch} 590 end 591 end 592 593 defp maybe_compile_pattern(pattern) when is_tuple(pattern), do: pattern 594 defp maybe_compile_pattern(pattern), do: :binary.compile_pattern(pattern) 595 596 @doc """ 597 Splits a string into two at the specified offset. When the offset given is 598 negative, location is counted from the end of the string. 599 600 The offset is capped to the length of the string. Returns a tuple with 601 two elements. 602 603 Note: keep in mind this function splits on graphemes and for such it 604 has to linearly traverse the string. If you want to split a string or 605 a binary based on the number of bytes, use `Kernel.binary_part/3` 606 instead. 607 608 ## Examples 609 610 iex> String.split_at("sweetelixir", 5) 611 {"sweet", "elixir"} 612 613 iex> String.split_at("sweetelixir", -6) 614 {"sweet", "elixir"} 615 616 iex> String.split_at("abc", 0) 617 {"", "abc"} 618 619 iex> String.split_at("abc", 1000) 620 {"abc", ""} 621 622 iex> String.split_at("abc", -1000) 623 {"", "abc"} 624 625 """ 626 @spec split_at(t, integer) :: {t, t} 627 def split_at(string, position) 628 629 def split_at(string, position) 630 when is_binary(string) and is_integer(position) and position >= 0 do 631 do_split_at(string, position) 632 end 633 634 def split_at(string, position) 635 when is_binary(string) and is_integer(position) and position < 0 do 636 position = length(string) + position 637 638 case position >= 0 do 639 true -> do_split_at(string, position) 640 false -> {"", string} 641 end 642 end 643 644 defp do_split_at(string, position) do 645 {byte_size, rest} = String.Unicode.split_at(string, position) 646 {binary_part(string, 0, byte_size), rest || ""} 647 end 648 649 @doc ~S""" 650 Returns `true` if `string1` is canonically equivalent to `string2`. 651 652 It performs Normalization Form Canonical Decomposition (NFD) on the 653 strings before comparing them. This function is equivalent to: 654 655 String.normalize(string1, :nfd) == String.normalize(string2, :nfd) 656 657 If you plan to compare multiple strings, multiple times in a row, you 658 may normalize them upfront and compare them directly to avoid multiple 659 normalization passes. 660 661 ## Examples 662 663 iex> String.equivalent?("abc", "abc") 664 true 665 666 iex> String.equivalent?("man\u0303ana", "mañana") 667 true 668 669 iex> String.equivalent?("abc", "ABC") 670 false 671 672 iex> String.equivalent?("nø", "nó") 673 false 674 675 """ 676 @spec equivalent?(t, t) :: boolean 677 def equivalent?(string1, string2) when is_binary(string1) and is_binary(string2) do 678 normalize(string1, :nfd) == normalize(string2, :nfd) 679 end 680 681 @doc """ 682 Converts all characters in `string` to Unicode normalization 683 form identified by `form`. 684 685 Invalid Unicode codepoints are skipped and the remaining of 686 the string is converted. If you want the algorithm to stop 687 and return on invalid codepoint, use `:unicode.characters_to_nfd_binary/1`, 688 `:unicode.characters_to_nfc_binary/1`, `:unicode.characters_to_nfkd_binary/1`, 689 and `:unicode.characters_to_nfkc_binary/1` instead. 690 691 Normalization forms `:nfkc` and `:nfkd` should not be blindly applied 692 to arbitrary text. Because they erase many formatting distinctions, 693 they will prevent round-trip conversion to and from many legacy 694 character sets. 695 696 ## Forms 697 698 The supported forms are: 699 700 * `:nfd` - Normalization Form Canonical Decomposition. 701 Characters are decomposed by canonical equivalence, and 702 multiple combining characters are arranged in a specific 703 order. 704 705 * `:nfc` - Normalization Form Canonical Composition. 706 Characters are decomposed and then recomposed by canonical equivalence. 707 708 * `:nfkd` - Normalization Form Compatibility Decomposition. 709 Characters are decomposed by compatibility equivalence, and 710 multiple combining characters are arranged in a specific 711 order. 712 713 * `:nfkc` - Normalization Form Compatibility Composition. 714 Characters are decomposed and then recomposed by compatibility equivalence. 715 716 ## Examples 717 718 iex> String.normalize("yêṩ", :nfd) 719 "yêṩ" 720 721 iex> String.normalize("leña", :nfc) 722 "leña" 723 724 iex> String.normalize("fi", :nfkd) 725 "fi" 726 727 iex> String.normalize("fi", :nfkc) 728 "fi" 729 730 """ 731 def normalize(string, form) 732 733 def normalize(string, :nfd) when is_binary(string) do 734 case :unicode.characters_to_nfd_binary(string) do 735 string when is_binary(string) -> string 736 {:error, good, <<head, rest::binary>>} -> good <> <<head>> <> normalize(rest, :nfd) 737 end 738 end 739 740 def normalize(string, :nfc) when is_binary(string) do 741 case :unicode.characters_to_nfc_binary(string) do 742 string when is_binary(string) -> string 743 {:error, good, <<head, rest::binary>>} -> good <> <<head>> <> normalize(rest, :nfc) 744 end 745 end 746 747 def normalize(string, :nfkd) when is_binary(string) do 748 case :unicode.characters_to_nfkd_binary(string) do 749 string when is_binary(string) -> string 750 {:error, good, <<head, rest::binary>>} -> good <> <<head>> <> normalize(rest, :nfkd) 751 end 752 end 753 754 def normalize(string, :nfkc) when is_binary(string) do 755 case :unicode.characters_to_nfkc_binary(string) do 756 string when is_binary(string) -> string 757 {:error, good, <<head, rest::binary>>} -> good <> <<head>> <> normalize(rest, :nfkc) 758 end 759 end 760 761 @doc """ 762 Converts all characters in the given string to uppercase according to `mode`. 763 764 `mode` may be `:default`, `:ascii`, `:greek` or `:turkic`. The `:default` mode considers 765 all non-conditional transformations outlined in the Unicode standard. `:ascii` 766 uppercases only the letters a to z. `:greek` includes the context sensitive 767 mappings found in Greek. `:turkic` properly handles the letter i with the dotless variant. 768 769 ## Examples 770 771 iex> String.upcase("abcd") 772 "ABCD" 773 774 iex> String.upcase("ab 123 xpto") 775 "AB 123 XPTO" 776 777 iex> String.upcase("olá") 778 "OLÁ" 779 780 The `:ascii` mode ignores Unicode characters and provides a more 781 performant implementation when you know the string contains only 782 ASCII characters: 783 784 iex> String.upcase("olá", :ascii) 785 "OLá" 786 787 And `:turkic` properly handles the letter i with the dotless variant: 788 789 iex> String.upcase("ıi") 790 "II" 791 792 iex> String.upcase("ıi", :turkic) 793 "Iİ" 794 795 """ 796 @spec upcase(t, :default | :ascii | :greek | :turkic) :: t 797 def upcase(string, mode \\ :default) 798 799 def upcase("", _mode) do 800 "" 801 end 802 803 def upcase(string, :default) when is_binary(string) do 804 String.Casing.upcase(string, [], :default) 805 end 806 807 def upcase(string, :ascii) when is_binary(string) do 808 IO.iodata_to_binary(upcase_ascii(string)) 809 end 810 811 def upcase(string, mode) when is_binary(string) and mode in @conditional_mappings do 812 String.Casing.upcase(string, [], mode) 813 end 814 815 defp upcase_ascii(<<char, rest::bits>>) when char >= ?a and char <= ?z, 816 do: [char - 32 | upcase_ascii(rest)] 817 818 defp upcase_ascii(<<char, rest::bits>>), do: [char | upcase_ascii(rest)] 819 defp upcase_ascii(<<>>), do: [] 820 821 @doc """ 822 Converts all characters in the given string to lowercase according to `mode`. 823 824 `mode` may be `:default`, `:ascii`, `:greek` or `:turkic`. The `:default` mode considers 825 all non-conditional transformations outlined in the Unicode standard. `:ascii` 826 lowercases only the letters A to Z. `:greek` includes the context sensitive 827 mappings found in Greek. `:turkic` properly handles the letter i with the dotless variant. 828 829 ## Examples 830 831 iex> String.downcase("ABCD") 832 "abcd" 833 834 iex> String.downcase("AB 123 XPTO") 835 "ab 123 xpto" 836 837 iex> String.downcase("OLÁ") 838 "olá" 839 840 The `:ascii` mode ignores Unicode characters and provides a more 841 performant implementation when you know the string contains only 842 ASCII characters: 843 844 iex> String.downcase("OLÁ", :ascii) 845 "olÁ" 846 847 The `:greek` mode properly handles the context sensitive sigma in Greek: 848 849 iex> String.downcase("ΣΣ") 850 "σσ" 851 852 iex> String.downcase("ΣΣ", :greek) 853 "σς" 854 855 And `:turkic` properly handles the letter i with the dotless variant: 856 857 iex> String.downcase("Iİ") 858 "ii̇" 859 860 iex> String.downcase("Iİ", :turkic) 861 "ıi" 862 863 """ 864 @spec downcase(t, :default | :ascii | :greek | :turkic) :: t 865 def downcase(string, mode \\ :default) 866 867 def downcase("", _mode) do 868 "" 869 end 870 871 def downcase(string, :default) when is_binary(string) do 872 String.Casing.downcase(string, [], :default) 873 end 874 875 def downcase(string, :ascii) when is_binary(string) do 876 IO.iodata_to_binary(downcase_ascii(string)) 877 end 878 879 def downcase(string, mode) when is_binary(string) and mode in @conditional_mappings do 880 String.Casing.downcase(string, [], mode) 881 end 882 883 defp downcase_ascii(<<char, rest::bits>>) when char >= ?A and char <= ?Z, 884 do: [char + 32 | downcase_ascii(rest)] 885 886 defp downcase_ascii(<<char, rest::bits>>), do: [char | downcase_ascii(rest)] 887 defp downcase_ascii(<<>>), do: [] 888 889 @doc """ 890 Converts the first character in the given string to 891 uppercase and the remainder to lowercase according to `mode`. 892 893 `mode` may be `:default`, `:ascii`, `:greek` or `:turkic`. The `:default` mode considers 894 all non-conditional transformations outlined in the Unicode standard. `:ascii` 895 capitalizes only the letters A to Z. `:greek` includes the context sensitive 896 mappings found in Greek. `:turkic` properly handles the letter i with the dotless variant. 897 898 ## Examples 899 900 iex> String.capitalize("abcd") 901 "Abcd" 902 903 iex> String.capitalize("fin") 904 "Fin" 905 906 iex> String.capitalize("olá") 907 "Olá" 908 909 """ 910 @spec capitalize(t, :default | :ascii | :greek | :turkic) :: t 911 def capitalize(string, mode \\ :default) 912 913 def capitalize(<<char, rest::binary>>, :ascii) do 914 char = if char >= ?a and char <= ?z, do: char - 32, else: char 915 <<char>> <> downcase(rest, :ascii) 916 end 917 918 def capitalize(string, mode) when is_binary(string) do 919 {char, rest} = String.Casing.titlecase_once(string, mode) 920 char <> downcase(rest, mode) 921 end 922 923 @doc false 924 @deprecated "Use String.trim_trailing/1 instead" 925 defdelegate rstrip(binary), to: String.Break, as: :trim_trailing 926 927 @doc false 928 @deprecated "Use String.trim_trailing/2 with a binary as second argument instead" 929 def rstrip(string, char) when is_integer(char) do 930 replace_trailing(string, <<char::utf8>>, "") 931 end 932 933 @doc """ 934 Replaces all leading occurrences of `match` by `replacement` of `match` in `string`. 935 936 Returns the string untouched if there are no occurrences. 937 938 If `match` is `""`, this function raises an `ArgumentError` exception: this 939 happens because this function replaces **all** the occurrences of `match` at 940 the beginning of `string`, and it's impossible to replace "multiple" 941 occurrences of `""`. 942 943 ## Examples 944 945 iex> String.replace_leading("hello world", "hello ", "") 946 "world" 947 iex> String.replace_leading("hello hello world", "hello ", "") 948 "world" 949 950 iex> String.replace_leading("hello world", "hello ", "ola ") 951 "ola world" 952 iex> String.replace_leading("hello hello world", "hello ", "ola ") 953 "ola ola world" 954 955 """ 956 @spec replace_leading(t, t, t) :: t 957 def replace_leading(string, match, replacement) 958 when is_binary(string) and is_binary(match) and is_binary(replacement) do 959 if match == "" do 960 raise ArgumentError, "cannot use an empty string as the match to replace" 961 end 962 963 prefix_size = byte_size(match) 964 suffix_size = byte_size(string) - prefix_size 965 replace_leading(string, match, replacement, prefix_size, suffix_size, 0) 966 end 967 968 defp replace_leading(string, match, replacement, prefix_size, suffix_size, acc) 969 when suffix_size >= 0 do 970 case string do 971 <<prefix::size(prefix_size)-binary, suffix::binary>> when prefix == match -> 972 replace_leading( 973 suffix, 974 match, 975 replacement, 976 prefix_size, 977 suffix_size - prefix_size, 978 acc + 1 979 ) 980 981 _ -> 982 prepend_unless_empty(duplicate(replacement, acc), string) 983 end 984 end 985 986 defp replace_leading(string, _match, replacement, _prefix_size, _suffix_size, acc) do 987 prepend_unless_empty(duplicate(replacement, acc), string) 988 end 989 990 @doc """ 991 Replaces all trailing occurrences of `match` by `replacement` in `string`. 992 993 Returns the string untouched if there are no occurrences. 994 995 If `match` is `""`, this function raises an `ArgumentError` exception: this 996 happens because this function replaces **all** the occurrences of `match` at 997 the end of `string`, and it's impossible to replace "multiple" occurrences of 998 `""`. 999 1000 ## Examples 1001 1002 iex> String.replace_trailing("hello world", " world", "") 1003 "hello" 1004 iex> String.replace_trailing("hello world world", " world", "") 1005 "hello" 1006 1007 iex> String.replace_trailing("hello world", " world", " mundo") 1008 "hello mundo" 1009 iex> String.replace_trailing("hello world world", " world", " mundo") 1010 "hello mundo mundo" 1011 1012 """ 1013 @spec replace_trailing(t, t, t) :: t 1014 def replace_trailing(string, match, replacement) 1015 when is_binary(string) and is_binary(match) and is_binary(replacement) do 1016 if match == "" do 1017 raise ArgumentError, "cannot use an empty string as the match to replace" 1018 end 1019 1020 suffix_size = byte_size(match) 1021 prefix_size = byte_size(string) - suffix_size 1022 replace_trailing(string, match, replacement, prefix_size, suffix_size, 0) 1023 end 1024 1025 defp replace_trailing(string, match, replacement, prefix_size, suffix_size, acc) 1026 when prefix_size >= 0 do 1027 case string do 1028 <<prefix::size(prefix_size)-binary, suffix::binary>> when suffix == match -> 1029 replace_trailing( 1030 prefix, 1031 match, 1032 replacement, 1033 prefix_size - suffix_size, 1034 suffix_size, 1035 acc + 1 1036 ) 1037 1038 _ -> 1039 append_unless_empty(string, duplicate(replacement, acc)) 1040 end 1041 end 1042 1043 defp replace_trailing(string, _match, replacement, _prefix_size, _suffix_size, acc) do 1044 append_unless_empty(string, duplicate(replacement, acc)) 1045 end 1046 1047 @doc """ 1048 Replaces prefix in `string` by `replacement` if it matches `match`. 1049 1050 Returns the string untouched if there is no match. If `match` is an empty 1051 string (`""`), `replacement` is just prepended to `string`. 1052 1053 ## Examples 1054 1055 iex> String.replace_prefix("world", "hello ", "") 1056 "world" 1057 iex> String.replace_prefix("hello world", "hello ", "") 1058 "world" 1059 iex> String.replace_prefix("hello hello world", "hello ", "") 1060 "hello world" 1061 1062 iex> String.replace_prefix("world", "hello ", "ola ") 1063 "world" 1064 iex> String.replace_prefix("hello world", "hello ", "ola ") 1065 "ola world" 1066 iex> String.replace_prefix("hello hello world", "hello ", "ola ") 1067 "ola hello world" 1068 1069 iex> String.replace_prefix("world", "", "hello ") 1070 "hello world" 1071 1072 """ 1073 @spec replace_prefix(t, t, t) :: t 1074 def replace_prefix(string, match, replacement) 1075 when is_binary(string) and is_binary(match) and is_binary(replacement) do 1076 prefix_size = byte_size(match) 1077 1078 case string do 1079 <<prefix::size(prefix_size)-binary, suffix::binary>> when prefix == match -> 1080 prepend_unless_empty(replacement, suffix) 1081 1082 _ -> 1083 string 1084 end 1085 end 1086 1087 @doc """ 1088 Replaces suffix in `string` by `replacement` if it matches `match`. 1089 1090 Returns the string untouched if there is no match. If `match` is an empty 1091 string (`""`), `replacement` is just appended to `string`. 1092 1093 ## Examples 1094 1095 iex> String.replace_suffix("hello", " world", "") 1096 "hello" 1097 iex> String.replace_suffix("hello world", " world", "") 1098 "hello" 1099 iex> String.replace_suffix("hello world world", " world", "") 1100 "hello world" 1101 1102 iex> String.replace_suffix("hello", " world", " mundo") 1103 "hello" 1104 iex> String.replace_suffix("hello world", " world", " mundo") 1105 "hello mundo" 1106 iex> String.replace_suffix("hello world world", " world", " mundo") 1107 "hello world mundo" 1108 1109 iex> String.replace_suffix("hello", "", " world") 1110 "hello world" 1111 1112 """ 1113 @spec replace_suffix(t, t, t) :: t 1114 def replace_suffix(string, match, replacement) 1115 when is_binary(string) and is_binary(match) and is_binary(replacement) do 1116 suffix_size = byte_size(match) 1117 prefix_size = byte_size(string) - suffix_size 1118 1119 case string do 1120 <<prefix::size(prefix_size)-binary, suffix::binary>> when suffix == match -> 1121 append_unless_empty(prefix, replacement) 1122 1123 _ -> 1124 string 1125 end 1126 end 1127 1128 @compile {:inline, prepend_unless_empty: 2, append_unless_empty: 2} 1129 1130 defp prepend_unless_empty("", suffix), do: suffix 1131 defp prepend_unless_empty(prefix, suffix), do: prefix <> suffix 1132 1133 defp append_unless_empty(prefix, ""), do: prefix 1134 defp append_unless_empty(prefix, suffix), do: prefix <> suffix 1135 1136 @doc false 1137 @deprecated "Use String.trim_leading/1 instead" 1138 defdelegate lstrip(binary), to: String.Break, as: :trim_leading 1139 1140 @doc false 1141 @deprecated "Use String.trim_leading/2 with a binary as second argument instead" 1142 def lstrip(string, char) when is_integer(char) do 1143 replace_leading(string, <<char::utf8>>, "") 1144 end 1145 1146 @doc false 1147 @deprecated "Use String.trim/1 instead" 1148 def strip(string) do 1149 trim(string) 1150 end 1151 1152 @doc false 1153 @deprecated "Use String.trim/2 with a binary second argument instead" 1154 def strip(string, char) do 1155 trim(string, <<char::utf8>>) 1156 end 1157 1158 @doc ~S""" 1159 Returns a string where all leading Unicode whitespaces 1160 have been removed. 1161 1162 ## Examples 1163 1164 iex> String.trim_leading("\n abc ") 1165 "abc " 1166 1167 """ 1168 @spec trim_leading(t) :: t 1169 defdelegate trim_leading(string), to: String.Break 1170 1171 @doc """ 1172 Returns a string where all leading `to_trim` characters have been removed. 1173 1174 ## Examples 1175 1176 iex> String.trim_leading("__ abc _", "_") 1177 " abc _" 1178 1179 iex> String.trim_leading("1 abc", "11") 1180 "1 abc" 1181 1182 """ 1183 @spec trim_leading(t, t) :: t 1184 def trim_leading(string, to_trim) 1185 when is_binary(string) and is_binary(to_trim) do 1186 replace_leading(string, to_trim, "") 1187 end 1188 1189 @doc ~S""" 1190 Returns a string where all trailing Unicode whitespaces 1191 has been removed. 1192 1193 ## Examples 1194 1195 iex> String.trim_trailing(" abc\n ") 1196 " abc" 1197 1198 """ 1199 @spec trim_trailing(t) :: t 1200 defdelegate trim_trailing(string), to: String.Break 1201 1202 @doc """ 1203 Returns a string where all trailing `to_trim` characters have been removed. 1204 1205 ## Examples 1206 1207 iex> String.trim_trailing("_ abc __", "_") 1208 "_ abc " 1209 1210 iex> String.trim_trailing("abc 1", "11") 1211 "abc 1" 1212 1213 """ 1214 @spec trim_trailing(t, t) :: t 1215 def trim_trailing(string, to_trim) 1216 when is_binary(string) and is_binary(to_trim) do 1217 replace_trailing(string, to_trim, "") 1218 end 1219 1220 @doc ~S""" 1221 Returns a string where all leading and trailing Unicode whitespaces 1222 have been removed. 1223 1224 ## Examples 1225 1226 iex> String.trim("\n abc\n ") 1227 "abc" 1228 1229 """ 1230 @spec trim(t) :: t 1231 def trim(string) when is_binary(string) do 1232 string 1233 |> trim_leading() 1234 |> trim_trailing() 1235 end 1236 1237 @doc """ 1238 Returns a string where all leading and trailing `to_trim` characters have been 1239 removed. 1240 1241 ## Examples 1242 1243 iex> String.trim("a abc a", "a") 1244 " abc " 1245 1246 """ 1247 @spec trim(t, t) :: t 1248 def trim(string, to_trim) when is_binary(string) and is_binary(to_trim) do 1249 string 1250 |> trim_leading(to_trim) 1251 |> trim_trailing(to_trim) 1252 end 1253 1254 @doc ~S""" 1255 Returns a new string padded with a leading filler 1256 which is made of elements from the `padding`. 1257 1258 Passing a list of strings as `padding` will take one element of the list 1259 for every missing entry. If the list is shorter than the number of inserts, 1260 the filling will start again from the beginning of the list. 1261 Passing a string `padding` is equivalent to passing the list of graphemes in it. 1262 If no `padding` is given, it defaults to whitespace. 1263 1264 When `count` is less than or equal to the length of `string`, 1265 given `string` is returned. 1266 1267 Raises `ArgumentError` if the given `padding` contains a non-string element. 1268 1269 ## Examples 1270 1271 iex> String.pad_leading("abc", 5) 1272 " abc" 1273 1274 iex> String.pad_leading("abc", 4, "12") 1275 "1abc" 1276 1277 iex> String.pad_leading("abc", 6, "12") 1278 "121abc" 1279 1280 iex> String.pad_leading("abc", 5, ["1", "23"]) 1281 "123abc" 1282 1283 """ 1284 @spec pad_leading(t, non_neg_integer, t | [t]) :: t 1285 def pad_leading(string, count, padding \\ [" "]) 1286 1287 def pad_leading(string, count, padding) when is_binary(padding) do 1288 pad_leading(string, count, graphemes(padding)) 1289 end 1290 1291 def pad_leading(string, count, [_ | _] = padding) 1292 when is_binary(string) and is_integer(count) and count >= 0 do 1293 pad(:leading, string, count, padding) 1294 end 1295 1296 @doc ~S""" 1297 Returns a new string padded with a trailing filler 1298 which is made of elements from the `padding`. 1299 1300 Passing a list of strings as `padding` will take one element of the list 1301 for every missing entry. If the list is shorter than the number of inserts, 1302 the filling will start again from the beginning of the list. 1303 Passing a string `padding` is equivalent to passing the list of graphemes in it. 1304 If no `padding` is given, it defaults to whitespace. 1305 1306 When `count` is less than or equal to the length of `string`, 1307 given `string` is returned. 1308 1309 Raises `ArgumentError` if the given `padding` contains a non-string element. 1310 1311 ## Examples 1312 1313 iex> String.pad_trailing("abc", 5) 1314 "abc " 1315 1316 iex> String.pad_trailing("abc", 4, "12") 1317 "abc1" 1318 1319 iex> String.pad_trailing("abc", 6, "12") 1320 "abc121" 1321 1322 iex> String.pad_trailing("abc", 5, ["1", "23"]) 1323 "abc123" 1324 1325 """ 1326 @spec pad_trailing(t, non_neg_integer, t | [t]) :: t 1327 def pad_trailing(string, count, padding \\ [" "]) 1328 1329 def pad_trailing(string, count, padding) when is_binary(padding) do 1330 pad_trailing(string, count, graphemes(padding)) 1331 end 1332 1333 def pad_trailing(string, count, [_ | _] = padding) 1334 when is_binary(string) and is_integer(count) and count >= 0 do 1335 pad(:trailing, string, count, padding) 1336 end 1337 1338 defp pad(kind, string, count, padding) do 1339 string_length = length(string) 1340 1341 if string_length >= count do 1342 string 1343 else 1344 filler = build_filler(count - string_length, padding, padding, 0, []) 1345 1346 case kind do 1347 :leading -> [filler | string] 1348 :trailing -> [string | filler] 1349 end 1350 |> IO.iodata_to_binary() 1351 end 1352 end 1353 1354 defp build_filler(0, _source, _padding, _size, filler), do: filler 1355 1356 defp build_filler(count, source, [], size, filler) do 1357 rem_filler = 1358 rem(count, size) 1359 |> build_filler(source, source, 0, []) 1360 1361 filler = 1362 filler 1363 |> IO.iodata_to_binary() 1364 |> duplicate(div(count, size) + 1) 1365 1366 [filler | rem_filler] 1367 end 1368 1369 defp build_filler(count, source, [elem | rest], size, filler) 1370 when is_binary(elem) do 1371 build_filler(count - 1, source, rest, size + 1, [filler | elem]) 1372 end 1373 1374 defp build_filler(_count, _source, [elem | _rest], _size, _filler) do 1375 raise ArgumentError, "expected a string padding element, got: #{inspect(elem)}" 1376 end 1377 1378 @doc false 1379 @deprecated "Use String.pad_leading/2 instead" 1380 def rjust(subject, length) do 1381 rjust(subject, length, ?\s) 1382 end 1383 1384 @doc false 1385 @deprecated "Use String.pad_leading/3 with a binary padding instead" 1386 def rjust(subject, length, pad) when is_integer(pad) and is_integer(length) and length >= 0 do 1387 pad(:leading, subject, length, [<<pad::utf8>>]) 1388 end 1389 1390 @doc false 1391 @deprecated "Use String.pad_trailing/2 instead" 1392 def ljust(subject, length) do 1393 ljust(subject, length, ?\s) 1394 end 1395 1396 @doc false 1397 @deprecated "Use String.pad_trailing/3 with a binary padding instead" 1398 def ljust(subject, length, pad) when is_integer(pad) and is_integer(length) and length >= 0 do 1399 pad(:trailing, subject, length, [<<pad::utf8>>]) 1400 end 1401 1402 @doc ~S""" 1403 Returns a new string created by replacing occurrences of `pattern` in 1404 `subject` with `replacement`. 1405 1406 The `subject` is always a string. 1407 1408 The `pattern` may be a string, a list of strings, a regular expression, or a 1409 compiled pattern. 1410 1411 The `replacement` may be a string or a function that receives the matched 1412 pattern and must return the replacement as a string or iodata. 1413 1414 By default it replaces all occurrences but this behaviour can be controlled 1415 through the `:global` option; see the "Options" section below. 1416 1417 ## Options 1418 1419 * `:global` - (boolean) if `true`, all occurrences of `pattern` are replaced 1420 with `replacement`, otherwise only the first occurrence is 1421 replaced. Defaults to `true` 1422 1423 ## Examples 1424 1425 iex> String.replace("a,b,c", ",", "-") 1426 "a-b-c" 1427 1428 iex> String.replace("a,b,c", ",", "-", global: false) 1429 "a-b,c" 1430 1431 The pattern may also be a list of strings and the replacement may also 1432 be a function that receives the matches: 1433 1434 iex> String.replace("a,b,c", ["a", "c"], fn <<char>> -> <<char + 1>> end) 1435 "b,b,d" 1436 1437 When the pattern is a regular expression, one can give `\N` or 1438 `\g{N}` in the `replacement` string to access a specific capture in the 1439 regular expression: 1440 1441 iex> String.replace("a,b,c", ~r/,(.)/, ",\\1\\g{1}") 1442 "a,bb,cc" 1443 1444 Note that we had to escape the backslash escape character (i.e., we used `\\N` 1445 instead of just `\N` to escape the backslash; same thing for `\\g{N}`). By 1446 giving `\0`, one can inject the whole match in the replacement string. 1447 1448 A compiled pattern can also be given: 1449 1450 iex> pattern = :binary.compile_pattern(",") 1451 iex> String.replace("a,b,c", pattern, "[]") 1452 "a[]b[]c" 1453 1454 When an empty string is provided as a `pattern`, the function will treat it as 1455 an implicit empty string between each grapheme and the string will be 1456 interspersed. If an empty string is provided as `replacement` the `subject` 1457 will be returned: 1458 1459 iex> String.replace("ELIXIR", "", ".") 1460 ".E.L.I.X.I.R." 1461 1462 iex> String.replace("ELIXIR", "", "") 1463 "ELIXIR" 1464 1465 """ 1466 @spec replace(t, pattern | Regex.t(), t | (t -> t | iodata), keyword) :: t 1467 def replace(subject, pattern, replacement, options \\ []) 1468 when is_binary(subject) and 1469 (is_binary(replacement) or is_function(replacement, 1)) and 1470 is_list(options) do 1471 replace_guarded(subject, pattern, replacement, options) 1472 end 1473 1474 defp replace_guarded(subject, %{__struct__: Regex} = regex, replacement, options) do 1475 Regex.replace(regex, subject, replacement, options) 1476 end 1477 1478 defp replace_guarded(subject, "", "", _) do 1479 subject 1480 end 1481 1482 defp replace_guarded(subject, "", replacement_binary, options) 1483 when is_binary(replacement_binary) do 1484 if Keyword.get(options, :global, true) do 1485 IO.iodata_to_binary([replacement_binary | intersperse_bin(subject, replacement_binary)]) 1486 else 1487 replacement_binary <> subject 1488 end 1489 end 1490 1491 defp replace_guarded(subject, "", replacement_fun, options) do 1492 if Keyword.get(options, :global, true) do 1493 IO.iodata_to_binary([replacement_fun.("") | intersperse_fun(subject, replacement_fun)]) 1494 else 1495 IO.iodata_to_binary([replacement_fun.("") | subject]) 1496 end 1497 end 1498 1499 defp replace_guarded(subject, pattern, replacement, options) do 1500 if insert = Keyword.get(options, :insert_replaced) do 1501 IO.warn( 1502 "String.replace/4 with :insert_replaced option is deprecated. " <> 1503 "Please use :binary.replace/4 instead or pass an anonymous function as replacement" 1504 ) 1505 1506 binary_options = if Keyword.get(options, :global) != false, do: [:global], else: [] 1507 :binary.replace(subject, pattern, replacement, [insert_replaced: insert] ++ binary_options) 1508 else 1509 matches = 1510 if Keyword.get(options, :global, true) do 1511 :binary.matches(subject, pattern) 1512 else 1513 case :binary.match(subject, pattern) do 1514 :nomatch -> [] 1515 match -> [match] 1516 end 1517 end 1518 1519 IO.iodata_to_binary(do_replace(subject, matches, replacement, 0)) 1520 end 1521 end 1522 1523 defp intersperse_bin(subject, replacement) do 1524 case next_grapheme(subject) do 1525 {current, rest} -> [current, replacement | intersperse_bin(rest, replacement)] 1526 nil -> [] 1527 end 1528 end 1529 1530 defp intersperse_fun(subject, replacement) do 1531 case next_grapheme(subject) do 1532 {current, rest} -> [current, replacement.("") | intersperse_fun(rest, replacement)] 1533 nil -> [] 1534 end 1535 end 1536 1537 defp do_replace(subject, [], _, n) do 1538 [binary_part(subject, n, byte_size(subject) - n)] 1539 end 1540 1541 defp do_replace(subject, [{start, length} | matches], replacement, n) do 1542 prefix = binary_part(subject, n, start - n) 1543 1544 middle = 1545 if is_binary(replacement) do 1546 replacement 1547 else 1548 replacement.(binary_part(subject, start, length)) 1549 end 1550 1551 [prefix, middle | do_replace(subject, matches, replacement, start + length)] 1552 end 1553 1554 @doc ~S""" 1555 Reverses the graphemes in given string. 1556 1557 ## Examples 1558 1559 iex> String.reverse("abcd") 1560 "dcba" 1561 1562 iex> String.reverse("hello world") 1563 "dlrow olleh" 1564 1565 iex> String.reverse("hello ∂og") 1566 "go∂ olleh" 1567 1568 Keep in mind reversing the same string twice does 1569 not necessarily yield the original string: 1570 1571 iex> "̀e" 1572 "̀e" 1573 iex> String.reverse("̀e") 1574 "è" 1575 iex> String.reverse(String.reverse("̀e")) 1576 "è" 1577 1578 In the first example the accent is before the vowel, so 1579 it is considered two graphemes. However, when you reverse 1580 it once, you have the vowel followed by the accent, which 1581 becomes one grapheme. Reversing it again will keep it as 1582 one single grapheme. 1583 """ 1584 @spec reverse(t) :: t 1585 def reverse(string) when is_binary(string) do 1586 do_reverse(next_grapheme(string), []) 1587 end 1588 1589 defp do_reverse({grapheme, rest}, acc) do 1590 do_reverse(next_grapheme(rest), [grapheme | acc]) 1591 end 1592 1593 defp do_reverse(nil, acc), do: IO.iodata_to_binary(acc) 1594 1595 @compile {:inline, duplicate: 2} 1596 1597 @doc """ 1598 Returns a string `subject` repeated `n` times. 1599 1600 Inlined by the compiler. 1601 1602 ## Examples 1603 1604 iex> String.duplicate("abc", 0) 1605 "" 1606 1607 iex> String.duplicate("abc", 1) 1608 "abc" 1609 1610 iex> String.duplicate("abc", 2) 1611 "abcabc" 1612 1613 """ 1614 @spec duplicate(t, non_neg_integer) :: t 1615 def duplicate(subject, n) when is_binary(subject) and is_integer(n) and n >= 0 do 1616 :binary.copy(subject, n) 1617 end 1618 1619 @doc ~S""" 1620 Returns a list of code points encoded as strings. 1621 1622 To retrieve code points in their natural integer 1623 representation, see `to_charlist/1`. For details about 1624 code points and graphemes, see the `String` module 1625 documentation. 1626 1627 ## Examples 1628 1629 iex> String.codepoints("olá") 1630 ["o", "l", "á"] 1631 1632 iex> String.codepoints("оптими зации") 1633 ["о", "п", "т", "и", "м", "и", " ", "з", "а", "ц", "и", "и"] 1634 1635 iex> String.codepoints("ἅἪῼ") 1636 ["ἅ", "Ἢ", "ῼ"] 1637 1638 iex> String.codepoints("\u00e9") 1639 ["é"] 1640 1641 iex> String.codepoints("\u0065\u0301") 1642 ["e", "́"] 1643 1644 """ 1645 @spec codepoints(t) :: [codepoint] 1646 defdelegate codepoints(string), to: String.Unicode 1647 1648 @doc ~S""" 1649 Returns the next code point in a string. 1650 1651 The result is a tuple with the code point and the 1652 remainder of the string or `nil` in case 1653 the string reached its end. 1654 1655 As with other functions in the `String` module, `next_codepoint/1` 1656 works with binaries that are invalid UTF-8. If the string starts 1657 with a sequence of bytes that is not valid in UTF-8 encoding, the 1658 first element of the returned tuple is a binary with the first byte. 1659 1660 ## Examples 1661 1662 iex> String.next_codepoint("olá") 1663 {"o", "lá"} 1664 1665 iex> invalid = "\x80\x80OK" # first two bytes are invalid in UTF-8 1666 iex> {_, rest} = String.next_codepoint(invalid) 1667 {<<128>>, <<128, 79, 75>>} 1668 iex> String.next_codepoint(rest) 1669 {<<128>>, "OK"} 1670 1671 ## Comparison with binary pattern matching 1672 1673 Binary pattern matching provides a similar way to decompose 1674 a string: 1675 1676 iex> <<codepoint::utf8, rest::binary>> = "Elixir" 1677 "Elixir" 1678 iex> codepoint 1679 69 1680 iex> rest 1681 "lixir" 1682 1683 though not entirely equivalent because `codepoint` comes as 1684 an integer, and the pattern won't match invalid UTF-8. 1685 1686 Binary pattern matching, however, is simpler and more efficient, 1687 so pick the option that better suits your use case. 1688 """ 1689 @compile {:inline, next_codepoint: 1} 1690 @spec next_codepoint(t) :: {codepoint, t} | nil 1691 defdelegate next_codepoint(string), to: String.Unicode 1692 1693 @doc ~S""" 1694 Checks whether `string` contains only valid characters. 1695 1696 ## Examples 1697 1698 iex> String.valid?("a") 1699 true 1700 1701 iex> String.valid?("ø") 1702 true 1703 1704 iex> String.valid?(<<0xFFFF::16>>) 1705 false 1706 1707 iex> String.valid?(<<0xEF, 0xB7, 0x90>>) 1708 true 1709 1710 iex> String.valid?("asd" <> <<0xFFFF::16>>) 1711 false 1712 1713 """ 1714 @spec valid?(t) :: boolean 1715 def valid?(<<string::binary>>), do: valid_utf8?(string) 1716 def valid?(_), do: false 1717 1718 defp valid_utf8?(<<_::utf8, rest::bits>>), do: valid_utf8?(rest) 1719 defp valid_utf8?(<<>>), do: true 1720 defp valid_utf8?(_), do: false 1721 1722 @doc false 1723 @deprecated "Use String.valid?/1 instead" 1724 def valid_character?(string) do 1725 case string do 1726 <<_::utf8>> -> valid?(string) 1727 _ -> false 1728 end 1729 end 1730 1731 @doc ~S""" 1732 Splits the string into chunks of characters that share a common trait. 1733 1734 The trait can be one of two options: 1735 1736 * `:valid` - the string is split into chunks of valid and invalid 1737 character sequences 1738 1739 * `:printable` - the string is split into chunks of printable and 1740 non-printable character sequences 1741 1742 Returns a list of binaries each of which contains only one kind of 1743 characters. 1744 1745 If the given string is empty, an empty list is returned. 1746 1747 ## Examples 1748 1749 iex> String.chunk(<<?a, ?b, ?c, 0>>, :valid) 1750 ["abc\0"] 1751 1752 iex> String.chunk(<<?a, ?b, ?c, 0, 0xFFFF::utf16>>, :valid) 1753 ["abc\0", <<0xFFFF::utf16>>] 1754 1755 iex> String.chunk(<<?a, ?b, ?c, 0, 0x0FFFF::utf8>>, :printable) 1756 ["abc", <<0, 0x0FFFF::utf8>>] 1757 1758 """ 1759 @spec chunk(t, :valid | :printable) :: [t] 1760 1761 def chunk(string, trait) 1762 1763 def chunk("", _), do: [] 1764 1765 def chunk(string, trait) when is_binary(string) and trait in [:valid, :printable] do 1766 {cp, _} = next_codepoint(string) 1767 pred_fn = make_chunk_pred(trait) 1768 do_chunk(string, pred_fn.(cp), pred_fn) 1769 end 1770 1771 defp do_chunk(string, flag, pred_fn), do: do_chunk(string, [], <<>>, flag, pred_fn) 1772 1773 defp do_chunk(<<>>, acc, <<>>, _, _), do: Enum.reverse(acc) 1774 1775 defp do_chunk(<<>>, acc, chunk, _, _), do: Enum.reverse(acc, [chunk]) 1776 1777 defp do_chunk(string, acc, chunk, flag, pred_fn) do 1778 {cp, rest} = next_codepoint(string) 1779 1780 if pred_fn.(cp) != flag do 1781 do_chunk(rest, [chunk | acc], cp, not flag, pred_fn) 1782 else 1783 do_chunk(rest, acc, chunk <> cp, flag, pred_fn) 1784 end 1785 end 1786 1787 defp make_chunk_pred(:valid), do: &valid?/1 1788 defp make_chunk_pred(:printable), do: &printable?/1 1789 1790 @doc ~S""" 1791 Returns Unicode graphemes in the string as per Extended Grapheme 1792 Cluster algorithm. 1793 1794 The algorithm is outlined in the [Unicode Standard Annex #29, 1795 Unicode Text Segmentation](https://www.unicode.org/reports/tr29/). 1796 1797 For details about code points and graphemes, see the `String` module documentation. 1798 1799 ## Examples 1800 1801 iex> String.graphemes("Ńaïve") 1802 ["Ń", "a", "ï", "v", "e"] 1803 1804 iex> String.graphemes("\u00e9") 1805 ["é"] 1806 1807 iex> String.graphemes("\u0065\u0301") 1808 ["é"] 1809 1810 """ 1811 @spec graphemes(t) :: [grapheme] 1812 defdelegate graphemes(string), to: String.Unicode 1813 1814 @compile {:inline, next_grapheme: 1, next_grapheme_size: 1} 1815 1816 @doc """ 1817 Returns the next grapheme in a string. 1818 1819 The result is a tuple with the grapheme and the 1820 remainder of the string or `nil` in case 1821 the String reached its end. 1822 1823 ## Examples 1824 1825 iex> String.next_grapheme("olá") 1826 {"o", "lá"} 1827 1828 iex> String.next_grapheme("") 1829 nil 1830 1831 """ 1832 @spec next_grapheme(t) :: {grapheme, t} | nil 1833 def next_grapheme(binary) when is_binary(binary) do 1834 case next_grapheme_size(binary) do 1835 {size, rest} -> {binary_part(binary, 0, size), rest} 1836 nil -> nil 1837 end 1838 end 1839 1840 @doc """ 1841 Returns the size (in bytes) of the next grapheme. 1842 1843 The result is a tuple with the next grapheme size in bytes and 1844 the remainder of the string or `nil` in case the string 1845 reached its end. 1846 1847 ## Examples 1848 1849 iex> String.next_grapheme_size("olá") 1850 {1, "lá"} 1851 1852 iex> String.next_grapheme_size("") 1853 nil 1854 1855 """ 1856 @spec next_grapheme_size(t) :: {pos_integer, t} | nil 1857 defdelegate next_grapheme_size(string), to: String.Unicode 1858 1859 @doc """ 1860 Returns the first grapheme from a UTF-8 string, 1861 `nil` if the string is empty. 1862 1863 ## Examples 1864 1865 iex> String.first("elixir") 1866 "e" 1867 1868 iex> String.first("եոգլի") 1869 "ե" 1870 1871 iex> String.first("") 1872 nil 1873 1874 """ 1875 @spec first(t) :: grapheme | nil 1876 def first(string) when is_binary(string) do 1877 case next_grapheme(string) do 1878 {char, _} -> char 1879 nil -> nil 1880 end 1881 end 1882 1883 @doc """ 1884 Returns the last grapheme from a UTF-8 string, 1885 `nil` if the string is empty. 1886 1887 ## Examples 1888 1889 iex> String.last("elixir") 1890 "r" 1891 1892 iex> String.last("եոգլի") 1893 "ի" 1894 1895 """ 1896 @spec last(t) :: grapheme | nil 1897 def last(string) when is_binary(string) do 1898 do_last(next_grapheme(string), nil) 1899 end 1900 1901 defp do_last({char, rest}, _) do 1902 do_last(next_grapheme(rest), char) 1903 end 1904 1905 defp do_last(nil, last_char), do: last_char 1906 1907 @doc """ 1908 Returns the number of Unicode graphemes in a UTF-8 string. 1909 1910 ## Examples 1911 1912 iex> String.length("elixir") 1913 6 1914 1915 iex> String.length("եոգլի") 1916 5 1917 1918 """ 1919 @spec length(t) :: non_neg_integer 1920 defdelegate length(string), to: String.Unicode 1921 1922 @doc """ 1923 Returns the grapheme at the `position` of the given UTF-8 `string`. 1924 If `position` is greater than `string` length, then it returns `nil`. 1925 1926 ## Examples 1927 1928 iex> String.at("elixir", 0) 1929 "e" 1930 1931 iex> String.at("elixir", 1) 1932 "l" 1933 1934 iex> String.at("elixir", 10) 1935 nil 1936 1937 iex> String.at("elixir", -1) 1938 "r" 1939 1940 iex> String.at("elixir", -10) 1941 nil 1942 1943 """ 1944 @spec at(t, integer) :: grapheme | nil 1945 1946 def at(string, position) when is_binary(string) and is_integer(position) and position >= 0 do 1947 do_at(string, position) 1948 end 1949 1950 def at(string, position) when is_binary(string) and is_integer(position) and position < 0 do 1951 position = length(string) + position 1952 1953 case position >= 0 do 1954 true -> do_at(string, position) 1955 false -> nil 1956 end 1957 end 1958 1959 defp do_at(string, position) do 1960 case String.Unicode.split_at(string, position) do 1961 {_, nil} -> nil 1962 {_, rest} -> first(rest) 1963 end 1964 end 1965 1966 @doc """ 1967 Returns a substring starting at the offset `start`, and of the given `length`. 1968 1969 If the offset is greater than string length, then it returns `""`. 1970 1971 Remember this function works with Unicode graphemes and considers 1972 the slices to represent grapheme offsets. If you want to split 1973 on raw bytes, check `Kernel.binary_part/3` instead. 1974 1975 ## Examples 1976 1977 iex> String.slice("elixir", 1, 3) 1978 "lix" 1979 1980 iex> String.slice("elixir", 1, 10) 1981 "lixir" 1982 1983 iex> String.slice("elixir", 10, 3) 1984 "" 1985 1986 iex> String.slice("elixir", -4, 4) 1987 "ixir" 1988 1989 iex> String.slice("elixir", -10, 3) 1990 "" 1991 1992 iex> String.slice("a", 0, 1500) 1993 "a" 1994 1995 iex> String.slice("a", 1, 1500) 1996 "" 1997 1998 iex> String.slice("a", 2, 1500) 1999 "" 2000 2001 """ 2002 @spec slice(t, integer, non_neg_integer) :: grapheme 2003 2004 def slice(_, _, 0) do 2005 "" 2006 end 2007 2008 def slice(string, start, length) 2009 when is_binary(string) and is_integer(start) and is_integer(length) and start >= 0 and 2010 length >= 0 do 2011 case String.Unicode.split_at(string, start) do 2012 {_, nil} -> 2013 "" 2014 2015 {start_bytes, rest} -> 2016 {len_bytes, _} = String.Unicode.split_at(rest, length) 2017 binary_part(string, start_bytes, len_bytes) 2018 end 2019 end 2020 2021 def slice(string, start, length) 2022 when is_binary(string) and is_integer(start) and is_integer(length) and start < 0 and 2023 length >= 0 do 2024 start = length(string) + start 2025 2026 case start >= 0 do 2027 true -> slice(string, start, length) 2028 false -> "" 2029 end 2030 end 2031 2032 @doc """ 2033 Returns a substring from the offset given by the start of the 2034 range to the offset given by the end of the range. 2035 2036 If the start of the range is not a valid offset for the given 2037 string or if the range is in reverse order, returns `""`. 2038 2039 If the start or end of the range is negative, the whole string 2040 is traversed first in order to convert the negative indices into 2041 positive ones. 2042 2043 Remember this function works with Unicode graphemes and considers 2044 the slices to represent grapheme offsets. If you want to split 2045 on raw bytes, check `Kernel.binary_part/3` instead. 2046 2047 ## Examples 2048 2049 iex> String.slice("elixir", 1..3) 2050 "lix" 2051 2052 iex> String.slice("elixir", 1..10) 2053 "lixir" 2054 2055 iex> String.slice("elixir", -4..-1) 2056 "ixir" 2057 2058 iex> String.slice("elixir", -4..6) 2059 "ixir" 2060 2061 For ranges where `start > stop`, you need to explicit 2062 mark them as increasing: 2063 2064 iex> String.slice("elixir", 2..-1//1) 2065 "ixir" 2066 2067 iex> String.slice("elixir", 1..-2//1) 2068 "lixi" 2069 2070 If values are out of bounds, it returns an empty string: 2071 2072 iex> String.slice("elixir", 10..3) 2073 "" 2074 2075 iex> String.slice("elixir", -10..-7) 2076 "" 2077 2078 iex> String.slice("a", 0..1500) 2079 "a" 2080 2081 iex> String.slice("a", 1..1500) 2082 "" 2083 2084 """ 2085 @spec slice(t, Range.t()) :: t 2086 def slice(string, first..last//step = range) when is_binary(string) do 2087 # TODO: Deprecate negative steps on Elixir v1.16 2088 # TODO: There are two features we can add to slicing ranges: 2089 # 1. We can allow the step to be any positive number 2090 # 2. We can allow slice and reverse at the same time. However, we can't 2091 # implement so right now. First we will have to raise if a decreasing 2092 # range is given on Elixir v2.0. 2093 if step == 1 or (step == -1 and first > last) do 2094 slice_range(string, first, last) 2095 else 2096 raise ArgumentError, 2097 "String.slice/2 does not accept ranges with custom steps, got: #{inspect(range)}" 2098 end 2099 end 2100 2101 # TODO: Remove me on v2.0 2102 def slice(string, %{__struct__: Range, first: first, last: last} = range) do 2103 step = if first <= last, do: 1, else: -1 2104 slice(string, Map.put(range, :step, step)) 2105 end 2106 2107 defp slice_range("", _, _), do: "" 2108 2109 defp slice_range(string, first, -1) when first >= 0 do 2110 case String.Unicode.split_at(string, first) do 2111 {_, nil} -> "" 2112 {start_bytes, _} -> binary_part(string, start_bytes, byte_size(string) - start_bytes) 2113 end 2114 end 2115 2116 defp slice_range(string, first, last) when first >= 0 and last >= 0 do 2117 if last >= first do 2118 slice(string, first, last - first + 1) 2119 else 2120 "" 2121 end 2122 end 2123 2124 defp slice_range(string, first, last) do 2125 {bytes, length} = acc_bytes(next_grapheme_size(string), [], 0) 2126 first = add_if_negative(first, length) 2127 last = add_if_negative(last, length) 2128 2129 if first < 0 or first > last or first > length do 2130 "" 2131 else 2132 last = min(last + 1, length) 2133 bytes = Enum.drop(bytes, length - last) 2134 first = last - first 2135 {length_bytes, start_bytes} = split_bytes(bytes, 0, first) 2136 binary_part(string, start_bytes, length_bytes) 2137 end 2138 end 2139 2140 defp acc_bytes({size, rest}, bytes, length) do 2141 acc_bytes(next_grapheme_size(rest), [size | bytes], length + 1) 2142 end 2143 2144 defp acc_bytes(nil, bytes, length) do 2145 {bytes, length} 2146 end 2147 2148 defp add_if_negative(value, to_add) when value < 0, do: value + to_add 2149 defp add_if_negative(value, _to_add), do: value 2150 2151 defp split_bytes(rest, acc, 0), do: {acc, Enum.sum(rest)} 2152 defp split_bytes([], acc, _), do: {acc, 0} 2153 defp split_bytes([head | tail], acc, count), do: split_bytes(tail, head + acc, count - 1) 2154 2155 @doc """ 2156 Returns `true` if `string` starts with any of the prefixes given. 2157 2158 `prefix` can be either a string, a list of strings, or a compiled 2159 pattern. 2160 2161 ## Examples 2162 2163 iex> String.starts_with?("elixir", "eli") 2164 true 2165 iex> String.starts_with?("elixir", ["erlang", "elixir"]) 2166 true 2167 iex> String.starts_with?("elixir", ["erlang", "ruby"]) 2168 false 2169 2170 A compiled pattern can also be given: 2171 2172 iex> pattern = :binary.compile_pattern(["erlang", "elixir"]) 2173 iex> String.starts_with?("elixir", pattern) 2174 true 2175 2176 An empty string will always match: 2177 2178 iex> String.starts_with?("elixir", "") 2179 true 2180 iex> String.starts_with?("elixir", ["", "other"]) 2181 true 2182 2183 """ 2184 @spec starts_with?(t, pattern) :: boolean 2185 def starts_with?(string, prefix) when is_binary(string) and is_binary(prefix) do 2186 starts_with_string?(string, byte_size(string), prefix) 2187 end 2188 2189 def starts_with?(string, prefix) when is_binary(string) and is_list(prefix) do 2190 string_size = byte_size(string) 2191 Enum.any?(prefix, &starts_with_string?(string, string_size, &1)) 2192 end 2193 2194 def starts_with?(string, prefix) when is_binary(string) do 2195 Kernel.match?({0, _}, :binary.match(string, prefix)) 2196 end 2197 2198 @compile {:inline, starts_with_string?: 3} 2199 defp starts_with_string?(string, string_size, prefix) when is_binary(prefix) do 2200 prefix_size = byte_size(prefix) 2201 2202 if prefix_size <= string_size do 2203 prefix == binary_part(string, 0, prefix_size) 2204 else 2205 false 2206 end 2207 end 2208 2209 @doc """ 2210 Returns `true` if `string` ends with any of the suffixes given. 2211 2212 `suffixes` can be either a single suffix or a list of suffixes. 2213 2214 ## Examples 2215 2216 iex> String.ends_with?("language", "age") 2217 true 2218 iex> String.ends_with?("language", ["youth", "age"]) 2219 true 2220 iex> String.ends_with?("language", ["youth", "elixir"]) 2221 false 2222 2223 An empty suffix will always match: 2224 2225 iex> String.ends_with?("language", "") 2226 true 2227 iex> String.ends_with?("language", ["", "other"]) 2228 true 2229 2230 """ 2231 @spec ends_with?(t, t | [t]) :: boolean 2232 def ends_with?(string, suffix) when is_binary(string) and is_binary(suffix) do 2233 ends_with_string?(string, byte_size(string), suffix) 2234 end 2235 2236 def ends_with?(string, suffix) when is_binary(string) and is_list(suffix) do 2237 string_size = byte_size(string) 2238 Enum.any?(suffix, &ends_with_string?(string, string_size, &1)) 2239 end 2240 2241 @compile {:inline, ends_with_string?: 3} 2242 defp ends_with_string?(string, string_size, suffix) when is_binary(suffix) do 2243 suffix_size = byte_size(suffix) 2244 2245 if suffix_size <= string_size do 2246 suffix == binary_part(string, string_size - suffix_size, suffix_size) 2247 else 2248 false 2249 end 2250 end 2251 2252 @doc """ 2253 Checks if `string` matches the given regular expression. 2254 2255 ## Examples 2256 2257 iex> String.match?("foo", ~r/foo/) 2258 true 2259 2260 iex> String.match?("bar", ~r/foo/) 2261 false 2262 2263 """ 2264 @spec match?(t, Regex.t()) :: boolean 2265 def match?(string, regex) when is_binary(string) do 2266 Regex.match?(regex, string) 2267 end 2268 2269 @doc """ 2270 Checks if `string` contains any of the given `contents`. 2271 2272 `contents` can be either a string, a list of strings, 2273 or a compiled pattern. 2274 2275 ## Examples 2276 2277 iex> String.contains?("elixir of life", "of") 2278 true 2279 iex> String.contains?("elixir of life", ["life", "death"]) 2280 true 2281 iex> String.contains?("elixir of life", ["death", "mercury"]) 2282 false 2283 2284 The argument can also be a compiled pattern: 2285 2286 iex> pattern = :binary.compile_pattern(["life", "death"]) 2287 iex> String.contains?("elixir of life", pattern) 2288 true 2289 2290 An empty string will always match: 2291 2292 iex> String.contains?("elixir of life", "") 2293 true 2294 iex> String.contains?("elixir of life", ["", "other"]) 2295 true 2296 2297 Be aware that this function can match within or across grapheme boundaries. 2298 For example, take the grapheme "é" which is made of the characters 2299 "e" and the acute accent. The following returns `true`: 2300 2301 iex> String.contains?(String.normalize("é", :nfd), "e") 2302 true 2303 2304 However, if "é" is represented by the single character "e with acute" 2305 accent, then it will return `false`: 2306 2307 iex> String.contains?(String.normalize("é", :nfc), "e") 2308 false 2309 2310 """ 2311 @spec contains?(t, pattern) :: boolean 2312 def contains?(string, []) when is_binary(string) do 2313 false 2314 end 2315 2316 def contains?(string, contents) when is_binary(string) and is_list(contents) do 2317 "" in contents or :binary.match(string, contents) != :nomatch 2318 end 2319 2320 def contains?(string, contents) when is_binary(string) do 2321 "" == contents or :binary.match(string, contents) != :nomatch 2322 end 2323 2324 @doc """ 2325 Converts a string into a charlist. 2326 2327 Specifically, this function takes a UTF-8 encoded binary and returns a list of its integer 2328 code points. It is similar to `codepoints/1` except that the latter returns a list of code points as 2329 strings. 2330 2331 In case you need to work with bytes, take a look at the 2332 [`:binary` module](`:binary`). 2333 2334 ## Examples 2335 2336 iex> String.to_charlist("æß") 2337 'æß' 2338 2339 """ 2340 @spec to_charlist(t) :: charlist 2341 def to_charlist(string) when is_binary(string) do 2342 case :unicode.characters_to_list(string) do 2343 result when is_list(result) -> 2344 result 2345 2346 {:error, encoded, rest} -> 2347 raise UnicodeConversionError, encoded: encoded, rest: rest, kind: :invalid 2348 2349 {:incomplete, encoded, rest} -> 2350 raise UnicodeConversionError, encoded: encoded, rest: rest, kind: :incomplete 2351 end 2352 end 2353 2354 @doc """ 2355 Converts a string to an atom. 2356 2357 Warning: this function creates atoms dynamically and atoms are 2358 not garbage-collected. Therefore, `string` should not be an 2359 untrusted value, such as input received from a socket or during 2360 a web request. Consider using `to_existing_atom/1` instead. 2361 2362 By default, the maximum number of atoms is `1_048_576`. This limit 2363 can be raised or lowered using the VM option `+t`. 2364 2365 The maximum atom size is of 255 Unicode code points. 2366 2367 Inlined by the compiler. 2368 2369 ## Examples 2370 2371 iex> String.to_atom("my_atom") 2372 :my_atom 2373 2374 """ 2375 @spec to_atom(String.t()) :: atom 2376 def to_atom(string) when is_binary(string) do 2377 :erlang.binary_to_atom(string, :utf8) 2378 end 2379 2380 @doc """ 2381 Converts a string to an existing atom. 2382 2383 The maximum atom size is of 255 Unicode code points. 2384 2385 Inlined by the compiler. 2386 2387 ## Examples 2388 2389 iex> _ = :my_atom 2390 iex> String.to_existing_atom("my_atom") 2391 :my_atom 2392 2393 """ 2394 @spec to_existing_atom(String.t()) :: atom 2395 def to_existing_atom(string) when is_binary(string) do 2396 :erlang.binary_to_existing_atom(string, :utf8) 2397 end 2398 2399 @doc """ 2400 Returns an integer whose text representation is `string`. 2401 2402 `string` must be the string representation of an integer. 2403 Otherwise, an `ArgumentError` will be raised. If you want 2404 to parse a string that may contain an ill-formatted integer, 2405 use `Integer.parse/1`. 2406 2407 Inlined by the compiler. 2408 2409 ## Examples 2410 2411 iex> String.to_integer("123") 2412 123 2413 2414 Passing a string that does not represent an integer leads to an error: 2415 2416 String.to_integer("invalid data") 2417 ** (ArgumentError) argument error 2418 2419 """ 2420 @spec to_integer(String.t()) :: integer 2421 def to_integer(string) when is_binary(string) do 2422 :erlang.binary_to_integer(string) 2423 end 2424 2425 @doc """ 2426 Returns an integer whose text representation is `string` in base `base`. 2427 2428 Inlined by the compiler. 2429 2430 ## Examples 2431 2432 iex> String.to_integer("3FF", 16) 2433 1023 2434 2435 """ 2436 @spec to_integer(String.t(), 2..36) :: integer 2437 def to_integer(string, base) when is_binary(string) and is_integer(base) do 2438 :erlang.binary_to_integer(string, base) 2439 end 2440 2441 @doc """ 2442 Returns a float whose text representation is `string`. 2443 2444 `string` must be the string representation of a float including a decimal point. 2445 In order to parse a string without decimal point as a float then `Float.parse/1` 2446 should be used. Otherwise, an `ArgumentError` will be raised. 2447 2448 Inlined by the compiler. 2449 2450 ## Examples 2451 2452 iex> String.to_float("2.2017764e+0") 2453 2.2017764 2454 2455 iex> String.to_float("3.0") 2456 3.0 2457 2458 String.to_float("3") 2459 ** (ArgumentError) argument error 2460 2461 """ 2462 @spec to_float(String.t()) :: float 2463 def to_float(string) when is_binary(string) do 2464 :erlang.binary_to_float(string) 2465 end 2466 2467 @doc """ 2468 Computes the bag distance between two strings. 2469 2470 Returns a float value between 0 and 1 representing the bag 2471 distance between `string1` and `string2`. 2472 2473 The bag distance is meant to be an efficient approximation 2474 of the distance between two strings to quickly rule out strings 2475 that are largely different. 2476 2477 The algorithm is outlined in the "String Matching with Metric 2478 Trees Using an Approximate Distance" paper by Ilaria Bartolini, 2479 Paolo Ciaccia, and Marco Patella. 2480 2481 ## Examples 2482 2483 iex> String.bag_distance("abc", "") 2484 0.0 2485 iex> String.bag_distance("abcd", "a") 2486 0.25 2487 iex> String.bag_distance("abcd", "ab") 2488 0.5 2489 iex> String.bag_distance("abcd", "abc") 2490 0.75 2491 iex> String.bag_distance("abcd", "abcd") 2492 1.0 2493 2494 """ 2495 @spec bag_distance(t, t) :: float 2496 @doc since: "1.8.0" 2497 def bag_distance(_string, ""), do: 0.0 2498 def bag_distance("", _string), do: 0.0 2499 2500 def bag_distance(string1, string2) when is_binary(string1) and is_binary(string2) do 2501 {bag1, length1} = string_to_bag(string1, %{}, 0) 2502 {bag2, length2} = string_to_bag(string2, %{}, 0) 2503 2504 diff1 = bag_difference(bag1, bag2) 2505 diff2 = bag_difference(bag2, bag1) 2506 2507 1 - max(diff1, diff2) / max(length1, length2) 2508 end 2509 2510 defp string_to_bag(string, bag, length) do 2511 case next_grapheme(string) do 2512 {char, rest} -> 2513 bag = 2514 case bag do 2515 %{^char => current} -> %{bag | char => current + 1} 2516 %{} -> Map.put(bag, char, 1) 2517 end 2518 2519 string_to_bag(rest, bag, length + 1) 2520 2521 nil -> 2522 {bag, length} 2523 end 2524 end 2525 2526 defp bag_difference(bag1, bag2) do 2527 Enum.reduce(bag1, 0, fn {char, count1}, sum -> 2528 case bag2 do 2529 %{^char => count2} -> sum + max(count1 - count2, 0) 2530 %{} -> sum + count1 2531 end 2532 end) 2533 end 2534 2535 @doc """ 2536 Computes the Jaro distance (similarity) between two strings. 2537 2538 Returns a float value between `0.0` (equates to no similarity) and `1.0` 2539 (is an exact match) representing [Jaro](https://en.wikipedia.org/wiki/Jaro-Winkler_distance) 2540 distance between `string1` and `string2`. 2541 2542 The Jaro distance metric is designed and best suited for short 2543 strings such as person names. Elixir itself uses this function 2544 to provide the "did you mean?" functionality. For instance, when you 2545 are calling a function in a module and you have a typo in the 2546 function name, we attempt to suggest the most similar function 2547 name available, if any, based on the `jaro_distance/2` score. 2548 2549 ## Examples 2550 2551 iex> String.jaro_distance("Dwayne", "Duane") 2552 0.8222222222222223 2553 iex> String.jaro_distance("even", "odd") 2554 0.0 2555 iex> String.jaro_distance("same", "same") 2556 1.0 2557 2558 """ 2559 @spec jaro_distance(t, t) :: float 2560 def jaro_distance(string1, string2) 2561 2562 def jaro_distance(string, string), do: 1.0 2563 def jaro_distance(_string, ""), do: 0.0 2564 def jaro_distance("", _string), do: 0.0 2565 2566 def jaro_distance(string1, string2) when is_binary(string1) and is_binary(string2) do 2567 {chars1, len1} = chars_and_length(string1) 2568 {chars2, len2} = chars_and_length(string2) 2569 2570 case match(chars1, len1, chars2, len2) do 2571 {0, _trans} -> 2572 0.0 2573 2574 {comm, trans} -> 2575 (comm / len1 + comm / len2 + (comm - trans) / comm) / 3 2576 end 2577 end 2578 2579 @compile {:inline, chars_and_length: 1} 2580 defp chars_and_length(string) do 2581 chars = graphemes(string) 2582 {chars, Kernel.length(chars)} 2583 end 2584 2585 defp match(chars1, len1, chars2, len2) do 2586 if len1 < len2 do 2587 match(chars1, chars2, div(len2, 2) - 1) 2588 else 2589 match(chars2, chars1, div(len1, 2) - 1) 2590 end 2591 end 2592 2593 defp match(chars1, chars2, lim) do 2594 match(chars1, chars2, {0, lim}, {0, 0, -1}, 0) 2595 end 2596 2597 defp match([char | rest], chars, range, state, idx) do 2598 {chars, state} = submatch(char, chars, range, state, idx) 2599 2600 case range do 2601 {lim, lim} -> match(rest, tl(chars), range, state, idx + 1) 2602 {pre, lim} -> match(rest, chars, {pre + 1, lim}, state, idx + 1) 2603 end 2604 end 2605 2606 defp match([], _, _, {comm, trans, _}, _), do: {comm, trans} 2607 2608 defp submatch(char, chars, {pre, _} = range, state, idx) do 2609 case detect(char, chars, range) do 2610 nil -> 2611 {chars, state} 2612 2613 {subidx, chars} -> 2614 {chars, proceed(state, idx - pre + subidx)} 2615 end 2616 end 2617 2618 defp detect(char, chars, {pre, lim}) do 2619 detect(char, chars, pre + 1 + lim, 0, []) 2620 end 2621 2622 defp detect(_char, _chars, 0, _idx, _acc), do: nil 2623 defp detect(_char, [], _lim, _idx, _acc), do: nil 2624 2625 defp detect(char, [char | rest], _lim, idx, acc), do: {idx, Enum.reverse(acc, [nil | rest])} 2626 2627 defp detect(char, [other | rest], lim, idx, acc), 2628 do: detect(char, rest, lim - 1, idx + 1, [other | acc]) 2629 2630 defp proceed({comm, trans, former}, current) do 2631 if current < former do 2632 {comm + 1, trans + 1, current} 2633 else 2634 {comm + 1, trans, current} 2635 end 2636 end 2637 2638 @doc """ 2639 Returns a keyword list that represents an edit script. 2640 2641 Check `List.myers_difference/2` for more information. 2642 2643 ## Examples 2644 2645 iex> string1 = "fox hops over the dog" 2646 iex> string2 = "fox jumps over the lazy cat" 2647 iex> String.myers_difference(string1, string2) 2648 [eq: "fox ", del: "ho", ins: "jum", eq: "ps over the ", del: "dog", ins: "lazy cat"] 2649 2650 """ 2651 @doc since: "1.3.0" 2652 @spec myers_difference(t, t) :: [{:eq | :ins | :del, t}] 2653 def myers_difference(string1, string2) when is_binary(string1) and is_binary(string2) do 2654 graphemes(string1) 2655 |> List.myers_difference(graphemes(string2)) 2656 |> Enum.map(fn {kind, chars} -> {kind, IO.iodata_to_binary(chars)} end) 2657 end 2658 2659 @doc false 2660 @deprecated "Use String.to_charlist/1 instead" 2661 @spec to_char_list(t) :: charlist 2662 def to_char_list(string), do: String.to_charlist(string) 2663end 2664