• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

uniseg-0.1.0/H03-May-2022-

README.mdH A D30-Aug-20192.2 KiB

doc.goH A D30-Aug-2019235

go.modH A D30-Aug-201939

grapheme.goH A D30-Aug-20198.4 KiB

properties.goH A D30-Aug-2019148.2 KiB

README.md

1# Unicode Text Segmentation for Go
2
3[![Godoc Reference](https://img.shields.io/badge/godoc-reference-blue.svg)](https://godoc.org/github.com/rivo/uniseg)
4[![Go Report](https://img.shields.io/badge/go%20report-A%2B-brightgreen.svg)](https://goreportcard.com/report/github.com/rivo/uniseg)
5
6This Go package implements Unicode Text Segmentation according to [Unicode Standard Annex #29](http://unicode.org/reports/tr29/) (Unicode version 12.0.0).
7
8At this point, only the determination of grapheme cluster boundaries is implemented.
9
10## Background
11
12In Go, [strings are read-only slices of bytes](https://blog.golang.org/strings). They can be turned into Unicode code points using the `for` loop or by casting: `[]rune(str)`. However, multiple code points may be combined into one user-perceived character or what the Unicode specification calls "grapheme cluster". Here are some examples:
13
14|String|Bytes (UTF-8)|Code points (runes)|Grapheme clusters|
15|-|-|-|-|
16|Käse|6 bytes: `4b 61 cc 88 73 65`|5 code points: `4b 61 308 73 65`|4 clusters: `[4b],[61 308],[73],[65]`|
17|��️‍��|14 bytes: `f0 9f 8f b3 ef b8 8f e2 80 8d f0 9f 8c 88`|4 code points: `1f3f3 fe0f 200d 1f308`|1 cluster: `[1f3f3 fe0f 200d 1f308]`|
18|����|8 bytes: `f0 9f 87 a9 f0 9f 87 aa`|2 code points: `1f1e9 1f1ea`|1 cluster: `[1f1e9 1f1ea]`|
19
20This package provides a tool to iterate over these grapheme clusters. This may be used to determine the number of user-perceived characters, to split strings in their intended places, or to extract individual characters which form a unit.
21
22## Installation
23
24```bash
25go get github.com/rivo/uniseg
26```
27
28## Basic Example
29
30```go
31package uniseg
32
33import (
34	"fmt"
35
36	"github.com/rivo/uniseg"
37)
38
39func main() {
40	gr := uniseg.NewGraphemes("����!")
41	for gr.Next() {
42		fmt.Printf("%x ", gr.Runes())
43	}
44	// Output: [1f44d 1f3fc] [21]
45}
46```
47
48## Documentation
49
50Refer to https://godoc.org/github.com/rivo/uniseg for the package's documentation.
51
52## Dependencies
53
54This package does not depend on any packages outside the standard library.
55
56## Your Feedback
57
58Add your issue here on GitHub. Feel free to get in touch if you have any questions.
59
60## Version
61
62Version tags will be introduced once Golang modules are official. Consider this version 0.1.
63