1\name{utf8_normalize} 2\title{Text Normalization} 3\alias{utf8_normalize} 4\description{ 5 Transform text to normalized form, optionally mapping to lowercase 6 and applying compatibility maps. 7} 8\usage{ 9utf8_normalize(x, map_case = FALSE, map_compat = FALSE, 10 map_quote = FALSE, remove_ignorable = FALSE) 11} 12\arguments{ 13 \item{x}{character object.} 14 15 \item{map_case}{a logical value indicating whether to apply Unicode 16 case mapping to the text. For most languages, this transformation 17 changes uppercase characters to their lowercase equivalents.} 18 19 \item{map_compat}{a logical value indicating whether to apply 20 Unicode compatibility mappings to the characters, those required 21 for NFKC and NFKD normal forms.} 22 23 \item{map_quote}{a logical value indicating whether to replace curly 24 single quotes and Unicode apostrophe characters with ASCII 25 apostrophe (U+0027).} 26 27 \item{remove_ignorable}{a logical value indicating whether to remove 28 Unicode "default ignorable" characters like zero-width spaces 29 and soft hyphens.} 30} 31\details{ 32 \code{utf8_normalize} converts the elements of a character object to 33 Unicode normalized composed form (NFC) while applying the character 34 maps specified by the \code{map_case}, \code{map_compat}, 35 \code{map_quote}, and \code{remove_ignorable} arguments. 36} 37\value{ 38 The result is a character object with the same attributes as \code{x} 39 but with \code{Encoding} set to \code{"UTF-8"}. 40} 41\seealso{ 42 \code{\link{as_utf8}}. 43} 44\examples{ 45angstrom <- c("\u00c5", "\u0041\u030a", "\u212b") 46utf8_normalize(angstrom) == "\u00c5" 47} 48