Name | Date | Size | #Lines | LOC | ||
---|---|---|---|---|---|---|
.. | 03-May-2022 | - | ||||
uniseg-0.1.0/ | H | 03-May-2022 | - | |||
README.md | H A D | 30-Aug-2019 | 2.2 KiB | |||
doc.go | H A D | 30-Aug-2019 | 235 | |||
go.mod | H A D | 30-Aug-2019 | 39 | |||
grapheme.go | H A D | 30-Aug-2019 | 8.4 KiB | |||
properties.go | H A D | 30-Aug-2019 | 148.2 KiB |
README.md
1# Unicode Text Segmentation for Go 2 3[![Godoc Reference](https://img.shields.io/badge/godoc-reference-blue.svg)](https://godoc.org/github.com/rivo/uniseg) 4[![Go Report](https://img.shields.io/badge/go%20report-A%2B-brightgreen.svg)](https://goreportcard.com/report/github.com/rivo/uniseg) 5 6This Go package implements Unicode Text Segmentation according to [Unicode Standard Annex #29](http://unicode.org/reports/tr29/) (Unicode version 12.0.0). 7 8At this point, only the determination of grapheme cluster boundaries is implemented. 9 10## Background 11 12In Go, [strings are read-only slices of bytes](https://blog.golang.org/strings). They can be turned into Unicode code points using the `for` loop or by casting: `[]rune(str)`. However, multiple code points may be combined into one user-perceived character or what the Unicode specification calls "grapheme cluster". Here are some examples: 13 14|String|Bytes (UTF-8)|Code points (runes)|Grapheme clusters| 15|-|-|-|-| 16|Käse|6 bytes: `4b 61 cc 88 73 65`|5 code points: `4b 61 308 73 65`|4 clusters: `[4b],[61 308],[73],[65]`| 17|️|14 bytes: `f0 9f 8f b3 ef b8 8f e2 80 8d f0 9f 8c 88`|4 code points: `1f3f3 fe0f 200d 1f308`|1 cluster: `[1f3f3 fe0f 200d 1f308]`| 18||8 bytes: `f0 9f 87 a9 f0 9f 87 aa`|2 code points: `1f1e9 1f1ea`|1 cluster: `[1f1e9 1f1ea]`| 19 20This package provides a tool to iterate over these grapheme clusters. This may be used to determine the number of user-perceived characters, to split strings in their intended places, or to extract individual characters which form a unit. 21 22## Installation 23 24```bash 25go get github.com/rivo/uniseg 26``` 27 28## Basic Example 29 30```go 31package uniseg 32 33import ( 34 "fmt" 35 36 "github.com/rivo/uniseg" 37) 38 39func main() { 40 gr := uniseg.NewGraphemes("!") 41 for gr.Next() { 42 fmt.Printf("%x ", gr.Runes()) 43 } 44 // Output: [1f44d 1f3fc] [21] 45} 46``` 47 48## Documentation 49 50Refer to https://godoc.org/github.com/rivo/uniseg for the package's documentation. 51 52## Dependencies 53 54This package does not depend on any packages outside the standard library. 55 56## Your Feedback 57 58Add your issue here on GitHub. Feel free to get in touch if you have any questions. 59 60## Version 61 62Version tags will be introduced once Golang modules are official. Consider this version 0.1. 63