README.md
1# Unicode Text Segmentation for Go
2
3This is an implementation of the Unicode Text Segmentation specification for Go.
4Specifically, it currently includes only the "grapheme cluster" segmentation
5algorithm.
6
7## Unicode Version Support
8
9Each major version of Unicode includes a set of tables that define how each
10codepoint participates in the segmentation algorithms. Therefore any caller
11of this library must select a specific version of Unicode to support.
12
13To allow for each caller to make that decision separately even though
14multiple callers may coexist in the same program, there is a separate
15major release of this module for each supported major Unicode version.
16Therefore you can select the specific version you want by module
17path. For example, to use the algorithm and tables defined by Unicode
18version 13:
19
20```
21go get github.com/apparentlymart/go-textseg/v13
22```
23
24```go
25import (
26 "github.com/apparentlymart/go-textseg/v13/textseg"
27)
28```
29
30However, each release of Go also includes some Unicode-version-specific
31functionality and you may prefer to use the text segmentation definitions
32that are relevant to the version of Unicode that your Go runtime is
33using elsewhere. To enable that, this repository has a special separate
34module which uses the current Go runtime version to select a suitable
35versioned implementation automatically:
36
37```
38go get github.com/apparentlymart/go-textseg/autoversion
39```
40
41```go
42import (
43 "github.com/apparentlymart/go-textseg/autoversion/textseg"
44)
45```
46
47**IMPORTANT:** This "autoversion" wrapper uses Go build tags to select
48a `go-textseg` major version based on the current Go version. We use
49this strategy to ensure that only one version of `go-textseg` will
50be compiled into your program, but the downside is that `go-textseg`
51must be updated for each new Go release. If you use this library in
52your program, you will need to fetch a new version of it each time
53you switch to a new version of Go, even if that version of Go does
54not introduce a new Unicode version.
55
56## Usage
57
58The most important function in each `textseg` package is
59`ScanGraphemeClusters`, which is a function compatible with the
60signature of `bufio.Scanner` in the Go standard library. Each
61time the `Scan` function is called, the function will produce one
62full grapheme cluster.
63