Skip to contents

Create an object of class re or coerce a character vector to an object of class re.

Usage

re(x, perl = TRUE, ...)

as_re(x, perl = TRUE, ...)

as.re(x, perl = TRUE, ...)

Arguments

x

Character vector of length one. The value of this character vector is assumed to be a well-formed regular expression. In the current implementation this is assumed, not checked.

perl

Logical. If TRUE, x is assumed to use PCRE (i.e. Perl Compatible Regular Expressions) notation. If FALSE, x is assumed to use base R's default regular expression notation. Contrary to base R's regular expression functions, re() assumes that the PCRE regular expression flavor is used by default.

...

Additional arguments.

Value

An object of class re, which is a wrapper around a character vector flagging it as containing a regular expression. In essence it is a named list: the x item contains the x input and the perl item contains the value of the perl argument (TRUE by default).

It has basic methods such as print(), summary() and as.character().

Details

This class exists because some functions in the mclm package require their arguments to be marked as being regular expressions. For example, keep_re() does not need its pattern argument to be a re object, but if the user wants to subset items with brackets using a regular expression, they must use a re object.

Examples

toy_corpus <- "Once upon a time there was a tiny toy corpus.
  It consisted of three sentences. And it lived happily ever after."

(tks <- tokenize(toy_corpus))
#> Token sequence of length 21
#> idx     token
#> --- ---------
#>   1      once
#>   2      upon
#>   3         a
#>   4      time
#>   5     there
#>   6       was
#>   7         a
#>   8      tiny
#>   9       toy
#>  10    corpus
#>  11        it
#>  12 consisted
#>  13        of
#>  14     three
#>  15 sentences
#>  16       and
#>  17        it
#>  18     lived
#>  19   happily
#>  20      ever
#> ...
#> 

# In `keep_re()`, the use of `re()` is optional
keep_re(tks, re("^.{3,}"))
#> Token sequence of length 16
#> idx     token
#> --- ---------
#>   1      once
#>   2      upon
#>   3      time
#>   4     there
#>   5       was
#>   6      tiny
#>   7       toy
#>   8    corpus
#>   9 consisted
#>  10     three
#>  11 sentences
#>  12       and
#>  13     lived
#>  14   happily
#>  15      ever
#>  16     after
keep_re(tks, "^.{3,}")
#> Token sequence of length 16
#> idx     token
#> --- ---------
#>   1      once
#>   2      upon
#>   3      time
#>   4     there
#>   5       was
#>   6      tiny
#>   7       toy
#>   8    corpus
#>   9 consisted
#>  10     three
#>  11 sentences
#>  12       and
#>  13     lived
#>  14   happily
#>  15      ever
#>  16     after

# When using brackets notation, `re()` is necessary
tks[re("^.{3,}")]
#> Token sequence of length 16
#> idx     token
#> --- ---------
#>   1      once
#>   2      upon
#>   3      time
#>   4     there
#>   5       was
#>   6      tiny
#>   7       toy
#>   8    corpus
#>   9 consisted
#>  10     three
#>  11 sentences
#>  12       and
#>  13     lived
#>  14   happily
#>  15      ever
#>  16     after
tks["^.{3,}"]
#> Token sequence of length 0
#> 

# build and print a `re` object
re("^.{3,}")
#> Regular expression (perl = TRUE)
#> ------------------
#> ^.{3,} 
as_re("^.{3,}")
#> Regular expression (perl = TRUE)
#> ------------------
#> ^.{3,} 
as.re("^.{3,}")
#> Regular expression (perl = TRUE)
#> ------------------
#> ^.{3,} 
print(re("^.{3,}"))
#> Regular expression (perl = TRUE)
#> ------------------
#> ^.{3,}