• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

R/H30-Apr-2020-540242

man/H30-Apr-2020-398344

tests/H09-Jan-2018-587471

DESCRIPTIONH A D01-May-2020839 2423

LICENSEH A D09-Jan-201866 32

MD5H A D01-May-20201.3 KiB2726

NAMESPACEH A D30-Apr-2020343 1513

NEWS.mdH A D30-Apr-2020828 3922

README.mdH A D28-Mar-20207.1 KiB307241

README.md

1
2
3
4# rematch2
5
6> Match Regular Expressions with a Nicer 'API'
7
8[![Linux Build Status](https://travis-ci.org/r-lib/rematch2.svg?branch=master)](https://travis-ci.org/r-lib/rematch2)
9[![Windows Build status](https://ci.appveyor.com/api/projects/status/github/r-lib/rematch2?svg=true)](https://ci.appveyor.com/project/gaborcsardi/rematch2)
10[![](http://www.r-pkg.org/badges/version/rematch2)](http://www.r-pkg.org/pkg/rematch2)
11[![CRAN RStudio mirror downloads](http://cranlogs.r-pkg.org/badges/rematch2)](http://www.r-pkg.org/pkg/rematch2)
12[![Coverage Status](https://img.shields.io/codecov/c/github/r-lib/rematch2/master.svg)](https://codecov.io/github/r-lib/rematch2?branch=master)
13
14A small wrapper on regular expression matching functions `regexpr`
15and `gregexpr` to return the results in tidy data frames.
16
17---
18
19  - [Installation](#installation)
20  - [Rematch vs rematch2](#rematch-vs-rematch2)
21  - [Usage](#usage)
22    - [First match](#first-match)
23    - [All matches](#all-matches)
24    - [Match positions](#match-positions)
25  - [License](#license)
26
27## Installation
28
29
30```r
31install.packages("rematch2")
32```
33
34## Rematch vs rematch2
35
36Note that `rematch2` is not compatible with the original `rematch` package.
37There are at least three major changes:
38* The order of the arguments for the functions is different. In
39  `rematch2` the `text` vector is first, and `pattern` is second.
40* In the result, `.match` is the last column instead of the first.
41* `rematch2` returns `tibble` data frames. See
42  https://github.com/hadley/tibble.
43
44## Usage
45
46### First match
47
48
49```r
50library(rematch2)
51```
52
53With capture groups:
54
55```r
56dates <- c("2016-04-20", "1977-08-08", "not a date", "2016",
57  "76-03-02", "2012-06-30", "2015-01-21 19:58")
58isodate <- "([0-9]{4})-([0-1][0-9])-([0-3][0-9])"
59re_match(text = dates, pattern = isodate)
60```
61
62```
63#> # A tibble: 7 x 5
64#>      ``    ``    ``            .text     .match
65#>   <chr> <chr> <chr>            <chr>      <chr>
66#> 1  2016    04    20       2016-04-20 2016-04-20
67#> 2  1977    08    08       1977-08-08 1977-08-08
68#> 3  <NA>  <NA>  <NA>       not a date       <NA>
69#> 4  <NA>  <NA>  <NA>             2016       <NA>
70#> 5  <NA>  <NA>  <NA>         76-03-02       <NA>
71#> 6  2012    06    30       2012-06-30 2012-06-30
72#> 7  2015    01    21 2015-01-21 19:58 2015-01-21
73```
74
75Named capture groups:
76
77```r
78isodaten <- "(?<year>[0-9]{4})-(?<month>[0-1][0-9])-(?<day>[0-3][0-9])"
79re_match(text = dates, pattern = isodaten)
80```
81
82```
83#> # A tibble: 7 x 5
84#>    year month   day            .text     .match
85#>   <chr> <chr> <chr>            <chr>      <chr>
86#> 1  2016    04    20       2016-04-20 2016-04-20
87#> 2  1977    08    08       1977-08-08 1977-08-08
88#> 3  <NA>  <NA>  <NA>       not a date       <NA>
89#> 4  <NA>  <NA>  <NA>             2016       <NA>
90#> 5  <NA>  <NA>  <NA>         76-03-02       <NA>
91#> 6  2012    06    30       2012-06-30 2012-06-30
92#> 7  2015    01    21 2015-01-21 19:58 2015-01-21
93```
94
95A slightly more complex example:
96
97```r
98github_repos <- c(
99	"metacran/crandb",
100	"jeroenooms/curl@v0.9.3",
101    "jimhester/covr#47",
102	"hadley/dplyr@*release",
103    "r-lib/remotes@550a3c7d3f9e1493a2ba",
104    "/$&@R64&3"
105)
106owner_rx   <- "(?:(?<owner>[^/]+)/)?"
107repo_rx    <- "(?<repo>[^/@#]+)"
108subdir_rx  <- "(?:/(?<subdir>[^@#]*[^@#/]))?"
109ref_rx     <- "(?:@(?<ref>[^*].*))"
110pull_rx    <- "(?:#(?<pull>[0-9]+))"
111release_rx <- "(?:@(?<release>[*]release))"
112
113subtype_rx <- sprintf("(?:%s|%s|%s)?", ref_rx, pull_rx, release_rx)
114github_rx  <- sprintf(
115	"^(?:%s%s%s%s|(?<catchall>.*))$",
116    owner_rx, repo_rx, subdir_rx, subtype_rx
117)
118re_match(text = github_repos, pattern = github_rx)
119```
120
121```
122#> # A tibble: 6 x 9
123#>        owner    repo subdir                  ref  pull  release  catchall
124#>        <chr>   <chr>  <chr>                <chr> <chr>    <chr>     <chr>
125#> 1   metacran  crandb
126#> 2 jeroenooms    curl                      v0.9.3
127#> 3  jimhester    covr                                47
128#> 4     hadley   dplyr                                   *release
129#> 5      r-lib remotes        550a3c7d3f9e1493a2ba
130#> 6                                                               /$&@R64&3
131#> # ... with 2 more variables: .text <chr>, .match <chr>
132```
133
134### All matches
135
136Extract all names, and also first names and last names:
137
138
139```r
140name_rex <- paste0(
141  "(?<first>[[:upper:]][[:lower:]]+) ",
142  "(?<last>[[:upper:]][[:lower:]]+)"
143)
144notables <- c(
145  "  Ben Franklin and Jefferson Davis",
146  "\tMillard Fillmore"
147)
148not <- re_match_all(notables, name_rex)
149not
150```
151
152```
153#> # A tibble: 2 x 4
154#>       first      last                              .text    .match
155#>      <list>    <list>                              <chr>    <list>
156#> 1 <chr [2]> <chr [2]>   Ben Franklin and Jefferson Davis <chr [2]>
157#> 2 <chr [1]> <chr [1]>               "\tMillard Fillmore" <chr [1]>
158```
159
160
161```r
162not$first
163```
164
165```
166#> [[1]]
167#> [1] "Ben"       "Jefferson"
168#>
169#> [[2]]
170#> [1] "Millard"
171```
172
173```r
174not$last
175```
176
177```
178#> [[1]]
179#> [1] "Franklin" "Davis"
180#>
181#> [[2]]
182#> [1] "Fillmore"
183```
184
185```r
186not$.match
187```
188
189```
190#> [[1]]
191#> [1] "Ben Franklin"    "Jefferson Davis"
192#>
193#> [[2]]
194#> [1] "Millard Fillmore"
195```
196
197### Match positions
198
199`re_exec` and `re_exec_all` are similar to `re_match` and `re_match_all`,
200but they also return match positions. These functions return match
201records. A match record has three components: `match`, `start`, `end`, and
202each component can be a vector. It is similar to a data frame in this
203respect.
204
205
206```r
207pos <- re_exec(notables, name_rex)
208pos
209```
210
211```
212#> # A tibble: 2 x 4
213#>        first       last                              .text     .match
214#> *     <list>     <list>                              <chr>     <list>
215#> 1 <list [3]> <list [3]>   Ben Franklin and Jefferson Davis <list [3]>
216#> 2 <list [3]> <list [3]>               "\tMillard Fillmore" <list [3]>
217```
218
219Unfortunately R does not allow hierarchical data frames (i.e. a column of a
220data frame cannot be another data frame), but `rematch2` defines some
221special classes and an `$` operator, to make it easier to extract parts
222of `re_exec` and `re_exec_all` matches. You simply query the `match`,
223`start` or `end` part of a column:
224
225
226```r
227pos$first$match
228```
229
230```
231#> [1] "Ben"     "Millard"
232```
233
234```r
235pos$first$start
236```
237
238```
239#> [1] 3 2
240```
241
242```r
243pos$first$end
244```
245
246```
247#> [1] 5 8
248```
249
250`re_exec_all` is very similar, but these queries return lists, with
251arbitrary number of matches:
252
253
254```r
255allpos <- re_exec_all(notables, name_rex)
256allpos
257```
258
259```
260#> # A tibble: 2 x 4
261#>        first       last                              .text     .match
262#>       <list>     <list>                              <chr>     <list>
263#> 1 <list [3]> <list [3]>   Ben Franklin and Jefferson Davis <list [3]>
264#> 2 <list [3]> <list [3]>               "\tMillard Fillmore" <list [3]>
265```
266
267
268```r
269allpos$first$match
270```
271
272```
273#> [[1]]
274#> [1] "Ben"       "Jefferson"
275#>
276#> [[2]]
277#> [1] "Millard"
278```
279
280```r
281allpos$first$start
282```
283
284```
285#> [[1]]
286#> [1]  3 20
287#>
288#> [[2]]
289#> [1] 2
290```
291
292```r
293allpos$first$end
294```
295
296```
297#> [[1]]
298#> [1]  5 28
299#>
300#> [[2]]
301#> [1] 8
302```
303
304## License
305
306MIT © Mango Solutions, Gábor Csárdi
307