These functions are essentially simple wrappers around base R functions such as
regexpr()
, gregexpr()
, grepl()
, grep()
, sub()
and gsub()
.
The most important differences between the functions documented here and the
R base functions is the order of the arguments (x
before pattern
) and the
fact that the argument perl
is set to TRUE
by default.
Usage
re_retrieve_first(
x,
pattern,
ignore.case = FALSE,
perl = TRUE,
fixed = FALSE,
useBytes = FALSE,
requested_group = NULL,
drop_NA = FALSE,
...
)
re_retrieve_last(
x,
pattern,
ignore.case = FALSE,
perl = TRUE,
fixed = FALSE,
useBytes = FALSE,
requested_group = NULL,
drop_NA = FALSE,
...
)
re_retrieve_all(
x,
pattern,
ignore.case = FALSE,
perl = TRUE,
fixed = FALSE,
useBytes = FALSE,
requested_group = NULL,
unlist = TRUE,
...
)
re_has_matches(
x,
pattern,
ignore.case = FALSE,
perl = TRUE,
fixed = FALSE,
useBytes = FALSE,
...
)
re_which(
x,
pattern,
ignore.case = FALSE,
perl = TRUE,
fixed = FALSE,
useBytes = FALSE,
...
)
re_replace_first(
x,
pattern,
replacement,
ignore.case = FALSE,
perl = TRUE,
fixed = FALSE,
useBytes = FALSE,
...
)
re_replace_all(
x,
pattern,
replacement,
ignore.case = FALSE,
perl = TRUE,
fixed = FALSE,
useBytes = FALSE,
...
)
Arguments
- x
Character vector to be searched or modified.
- pattern
Regular expression specifying what is to be searched.
- ignore.case
Logical. Should the search be case insensitive?
- perl
Logical. Whether the regular expressions use the PCRE flavor of regular expression. Unlike in base R functions, the default is
TRUE
.- fixed
Logical. If
TRUE
,pattern
is a string to be matched as is, i.e. wildcards and special characters are not interpreted.- useBytes
Logical. If
TRUE
the matching is done byte-by-byte rather than character-by-character. See 'Details' ingrep()
.- requested_group
Numeric. If
NULL
or0
, the output will contain matches forpattern
as a whole. If another numbern
is provided, then the output will not contain matches forpattern
but instead will only contain the matches for then
th capturing group inpattern
(the first ifrequested_group = 1
, the second ifrequested_group = 2
...).- drop_NA
Logical. If
FALSE
, the output always has the same length as the inputx
and items that do not contain a match forpattern
yieldNA
. IfTRUE
, suchNA
values are removed and therefore the result might contain fewer items thanx
.- ...
Additional arguments.
- unlist
Logical. If
FALSE
, the output always has the same length as the inputx
. More specifically, the result will be a list in which input items that do not contain a match forpattern
yield an empty vector, whereas input items that do match will yield a vector of at least length one (depending on the number of matches). IfTRUE
, the output is a single vector the length of which may be shorter or longer thanx
.- replacement
Character vector of length one specifying the replacement string. It is to be taken literally, except that the notation
\\1
,\\2
, etc. can be used to refer to groups inpattern
.
Value
re_retrieve_first()
, re_retrieve_last()
and re_retrieve_all()
return
either a single vector of character data or a list containing such vectors.
re_replace_first()
and re_replace_all()
return the same type of character
vector as x
.
re_has_matches()
returns a logical vector indicating whether a match was
found in each of the elements in x
; re_which()
returns a numeric
vector indicating the indices of the elements of x
for which a match was
found.
Details
For some of the arguments (e.g. perl
, fixed
) the reader is directed to
base R's regex documentation.
Functions
re_retrieve_first()
: Retrieve from each item inx
the first match ofpattern
.re_retrieve_last()
: Retrieve from each item inx
the last match ofpattern
.re_retrieve_all()
: Retrieve from each item inx
all matches ofpattern
.re_has_matches()
: Simple wrapper aroundgrepl()
.re_which()
: Simple wrapper aroundgrep()
.re_replace_first()
: Simple wrapper aroundsub()
.re_replace_all()
: Simple wrapper aroundgsub()
.
Examples
x <- tokenize("This is a sentence with a couple of words in it.")
pattern <- "[oe](.)(.)"
re_retrieve_first(x, pattern)
#> [1] NA NA NA "ent" NA NA "oup" NA "ord" NA NA
re_retrieve_first(x, pattern, drop_NA = TRUE)
#> [1] "ent" "oup" "ord"
re_retrieve_first(x, pattern, requested_group = 1)
#> [1] NA NA NA "n" NA NA "u" NA "r" NA NA
re_retrieve_first(x, pattern, drop_NA = TRUE, requested_group = 1)
#> [1] "n" "u" "r"
re_retrieve_first(x, pattern, requested_group = 2)
#> [1] NA NA NA "t" NA NA "p" NA "d" NA NA
re_retrieve_last(x, pattern)
#> [1] NA NA NA "enc" NA NA "oup" NA "ord" NA NA
re_retrieve_last(x, pattern, drop_NA = TRUE)
#> [1] "enc" "oup" "ord"
re_retrieve_last(x, pattern, requested_group = 1)
#> [1] NA NA NA "n" NA NA "u" NA "r" NA NA
re_retrieve_last(x, pattern, drop_NA = TRUE, requested_group = 1)
#> [1] "n" "u" "r"
re_retrieve_last(x, pattern, requested_group = 2)
#> [1] NA NA NA "c" NA NA "p" NA "d" NA NA
re_retrieve_all(x, pattern)
#> [1] "ent" "enc" "oup" "ord"
re_retrieve_all(x, pattern, unlist = FALSE)
#> [[1]]
#> character(0)
#>
#> [[2]]
#> character(0)
#>
#> [[3]]
#> character(0)
#>
#> [[4]]
#> [1] "ent" "enc"
#>
#> [[5]]
#> character(0)
#>
#> [[6]]
#> character(0)
#>
#> [[7]]
#> [1] "oup"
#>
#> [[8]]
#> character(0)
#>
#> [[9]]
#> [1] "ord"
#>
#> [[10]]
#> character(0)
#>
#> [[11]]
#> character(0)
#>
re_retrieve_all(x, pattern, requested_group = 1)
#> [1] "n" "n" "u" "r"
re_retrieve_all(x, pattern, unlist = FALSE, requested_group = 1)
#> [[1]]
#> character(0)
#>
#> [[2]]
#> character(0)
#>
#> [[3]]
#> character(0)
#>
#> [[4]]
#> [1] "n" "n"
#>
#> [[5]]
#> character(0)
#>
#> [[6]]
#> character(0)
#>
#> [[7]]
#> [1] "u"
#>
#> [[8]]
#> character(0)
#>
#> [[9]]
#> [1] "r"
#>
#> [[10]]
#> character(0)
#>
#> [[11]]
#> character(0)
#>
re_retrieve_all(x, pattern, requested_group = 2)
#> [1] "t" "c" "p" "d"
re_replace_first(x, "([oe].)", "{\\1}")
#> Token sequence of length 11
#> idx token
#> --- ----------
#> 1 this
#> 2 is
#> 3 a
#> 4 s{en}tence
#> 5 with
#> 6 a
#> 7 c{ou}ple
#> 8 {of}
#> 9 w{or}ds
#> 10 in
#> 11 it
re_replace_all(x, "([oe].)", "{\\1}")
#> Token sequence of length 11
#> idx token
#> --- ------------
#> 1 this
#> 2 is
#> 3 a
#> 4 s{en}t{en}ce
#> 5 with
#> 6 a
#> 7 c{ou}ple
#> 8 {of}
#> 9 w{or}ds
#> 10 in
#> 11 it