Create an object of class re
or coerce a character vector to an object of
class re
.
Arguments
- x
Character vector of length one. The value of this character vector is assumed to be a well-formed regular expression. In the current implementation this is assumed, not checked.
- perl
Logical. If
TRUE
,x
is assumed to use PCRE (i.e. Perl Compatible Regular Expressions) notation. IfFALSE
,x
is assumed to use base R's default regular expression notation. Contrary to base R's regular expression functions,re()
assumes that the PCRE regular expression flavor is used by default.- ...
Additional arguments.
Value
An object of class re
, which is a wrapper around a character vector
flagging it as containing a regular expression. In essence it is a named
list: the x
item contains the x
input and the perl
item contains
the value of the perl
argument (TRUE
by default).
It has basic methods such as print()
, summary()
and as.character()
.
Details
This class exists because some functions in the mclm package
require their arguments to be marked as being regular expressions.
For example, keep_re()
does not need its pattern
argument to be a re
object, but if the user wants to subset items with brackets using
a regular expression, they must use a re
object.
Examples
toy_corpus <- "Once upon a time there was a tiny toy corpus.
It consisted of three sentences. And it lived happily ever after."
(tks <- tokenize(toy_corpus))
#> Token sequence of length 21
#> idx token
#> --- ---------
#> 1 once
#> 2 upon
#> 3 a
#> 4 time
#> 5 there
#> 6 was
#> 7 a
#> 8 tiny
#> 9 toy
#> 10 corpus
#> 11 it
#> 12 consisted
#> 13 of
#> 14 three
#> 15 sentences
#> 16 and
#> 17 it
#> 18 lived
#> 19 happily
#> 20 ever
#> ...
#>
# In `keep_re()`, the use of `re()` is optional
keep_re(tks, re("^.{3,}"))
#> Token sequence of length 16
#> idx token
#> --- ---------
#> 1 once
#> 2 upon
#> 3 time
#> 4 there
#> 5 was
#> 6 tiny
#> 7 toy
#> 8 corpus
#> 9 consisted
#> 10 three
#> 11 sentences
#> 12 and
#> 13 lived
#> 14 happily
#> 15 ever
#> 16 after
keep_re(tks, "^.{3,}")
#> Token sequence of length 16
#> idx token
#> --- ---------
#> 1 once
#> 2 upon
#> 3 time
#> 4 there
#> 5 was
#> 6 tiny
#> 7 toy
#> 8 corpus
#> 9 consisted
#> 10 three
#> 11 sentences
#> 12 and
#> 13 lived
#> 14 happily
#> 15 ever
#> 16 after
# When using brackets notation, `re()` is necessary
tks[re("^.{3,}")]
#> Token sequence of length 16
#> idx token
#> --- ---------
#> 1 once
#> 2 upon
#> 3 time
#> 4 there
#> 5 was
#> 6 tiny
#> 7 toy
#> 8 corpus
#> 9 consisted
#> 10 three
#> 11 sentences
#> 12 and
#> 13 lived
#> 14 happily
#> 15 ever
#> 16 after
tks["^.{3,}"]
#> Token sequence of length 0
#>
# build and print a `re` object
re("^.{3,}")
#> Regular expression (perl = TRUE)
#> ------------------
#> ^.{3,}
as_re("^.{3,}")
#> Regular expression (perl = TRUE)
#> ------------------
#> ^.{3,}
as.re("^.{3,}")
#> Regular expression (perl = TRUE)
#> ------------------
#> ^.{3,}
print(re("^.{3,}"))
#> Regular expression (perl = TRUE)
#> ------------------
#> ^.{3,}