These functions are essentially simple wrappers around base R functions such as
regexpr(), gregexpr(), grepl(), grep(), sub() and gsub().
The most important differences between the functions documented here and the
R base functions is the order of the arguments (x before pattern) and the
fact that the argument perl is set to TRUE by default.
Usage
re_retrieve_first(
x,
pattern,
ignore.case = FALSE,
perl = TRUE,
fixed = FALSE,
useBytes = FALSE,
requested_group = NULL,
drop_NA = FALSE,
...
)
re_retrieve_last(
x,
pattern,
ignore.case = FALSE,
perl = TRUE,
fixed = FALSE,
useBytes = FALSE,
requested_group = NULL,
drop_NA = FALSE,
...
)
re_retrieve_all(
x,
pattern,
ignore.case = FALSE,
perl = TRUE,
fixed = FALSE,
useBytes = FALSE,
requested_group = NULL,
unlist = TRUE,
...
)
re_has_matches(
x,
pattern,
ignore.case = FALSE,
perl = TRUE,
fixed = FALSE,
useBytes = FALSE,
...
)
re_which(
x,
pattern,
ignore.case = FALSE,
perl = TRUE,
fixed = FALSE,
useBytes = FALSE,
...
)
re_replace_first(
x,
pattern,
replacement,
ignore.case = FALSE,
perl = TRUE,
fixed = FALSE,
useBytes = FALSE,
...
)
re_replace_all(
x,
pattern,
replacement,
ignore.case = FALSE,
perl = TRUE,
fixed = FALSE,
useBytes = FALSE,
...
)Arguments
- x
Character vector to be searched or modified.
- pattern
Regular expression specifying what is to be searched.
- ignore.case
Logical. Should the search be case insensitive?
- perl
Logical. Whether the regular expressions use the PCRE flavor of regular expression. Unlike in base R functions, the default is
TRUE.- fixed
Logical. If
TRUE,patternis a string to be matched as is, i.e. wildcards and special characters are not interpreted.- useBytes
Logical. If
TRUEthe matching is done byte-by-byte rather than character-by-character. See 'Details' ingrep().- requested_group
Numeric. If
NULLor0, the output will contain matches forpatternas a whole. If another numbernis provided, then the output will not contain matches forpatternbut instead will only contain the matches for thenth capturing group inpattern(the first ifrequested_group = 1, the second ifrequested_group = 2...).- drop_NA
Logical. If
FALSE, the output always has the same length as the inputxand items that do not contain a match forpatternyieldNA. IfTRUE, suchNAvalues are removed and therefore the result might contain fewer items thanx.- ...
Additional arguments.
- unlist
Logical. If
FALSE, the output always has the same length as the inputx. More specifically, the result will be a list in which input items that do not contain a match forpatternyield an empty vector, whereas input items that do match will yield a vector of at least length one (depending on the number of matches). IfTRUE, the output is a single vector the length of which may be shorter or longer thanx.- replacement
Character vector of length one specifying the replacement string. It is to be taken literally, except that the notation
\\1,\\2, etc. can be used to refer to groups inpattern.
Value
re_retrieve_first(), re_retrieve_last() and re_retrieve_all() return
either a single vector of character data or a list containing such vectors.
re_replace_first() and re_replace_all() return the same type of character
vector as x.
re_has_matches() returns a logical vector indicating whether a match was
found in each of the elements in x; re_which() returns a numeric
vector indicating the indices of the elements of x for which a match was
found.
Details
For some of the arguments (e.g. perl, fixed) the reader is directed to
base R's regex documentation.
Functions
re_retrieve_first(): Retrieve from each item inxthe first match ofpattern.re_retrieve_last(): Retrieve from each item inxthe last match ofpattern.re_retrieve_all(): Retrieve from each item inxall matches ofpattern.re_has_matches(): Simple wrapper aroundgrepl().re_which(): Simple wrapper aroundgrep().re_replace_first(): Simple wrapper aroundsub().re_replace_all(): Simple wrapper aroundgsub().
Examples
x <- tokenize("This is a sentence with a couple of words in it.")
pattern <- "[oe](.)(.)"
re_retrieve_first(x, pattern)
#> [1] NA NA NA "ent" NA NA "oup" NA "ord" NA NA
re_retrieve_first(x, pattern, drop_NA = TRUE)
#> [1] "ent" "oup" "ord"
re_retrieve_first(x, pattern, requested_group = 1)
#> [1] NA NA NA "n" NA NA "u" NA "r" NA NA
re_retrieve_first(x, pattern, drop_NA = TRUE, requested_group = 1)
#> [1] "n" "u" "r"
re_retrieve_first(x, pattern, requested_group = 2)
#> [1] NA NA NA "t" NA NA "p" NA "d" NA NA
re_retrieve_last(x, pattern)
#> [1] NA NA NA "enc" NA NA "oup" NA "ord" NA NA
re_retrieve_last(x, pattern, drop_NA = TRUE)
#> [1] "enc" "oup" "ord"
re_retrieve_last(x, pattern, requested_group = 1)
#> [1] NA NA NA "n" NA NA "u" NA "r" NA NA
re_retrieve_last(x, pattern, drop_NA = TRUE, requested_group = 1)
#> [1] "n" "u" "r"
re_retrieve_last(x, pattern, requested_group = 2)
#> [1] NA NA NA "c" NA NA "p" NA "d" NA NA
re_retrieve_all(x, pattern)
#> [1] "ent" "enc" "oup" "ord"
re_retrieve_all(x, pattern, unlist = FALSE)
#> [[1]]
#> character(0)
#>
#> [[2]]
#> character(0)
#>
#> [[3]]
#> character(0)
#>
#> [[4]]
#> [1] "ent" "enc"
#>
#> [[5]]
#> character(0)
#>
#> [[6]]
#> character(0)
#>
#> [[7]]
#> [1] "oup"
#>
#> [[8]]
#> character(0)
#>
#> [[9]]
#> [1] "ord"
#>
#> [[10]]
#> character(0)
#>
#> [[11]]
#> character(0)
#>
re_retrieve_all(x, pattern, requested_group = 1)
#> [1] "n" "n" "u" "r"
re_retrieve_all(x, pattern, unlist = FALSE, requested_group = 1)
#> [[1]]
#> character(0)
#>
#> [[2]]
#> character(0)
#>
#> [[3]]
#> character(0)
#>
#> [[4]]
#> [1] "n" "n"
#>
#> [[5]]
#> character(0)
#>
#> [[6]]
#> character(0)
#>
#> [[7]]
#> [1] "u"
#>
#> [[8]]
#> character(0)
#>
#> [[9]]
#> [1] "r"
#>
#> [[10]]
#> character(0)
#>
#> [[11]]
#> character(0)
#>
re_retrieve_all(x, pattern, requested_group = 2)
#> [1] "t" "c" "p" "d"
re_replace_first(x, "([oe].)", "{\\1}")
#> Token sequence of length 11
#> idx token
#> --- ----------
#> 1 this
#> 2 is
#> 3 a
#> 4 s{en}tence
#> 5 with
#> 6 a
#> 7 c{ou}ple
#> 8 {of}
#> 9 w{or}ds
#> 10 in
#> 11 it
re_replace_all(x, "([oe].)", "{\\1}")
#> Token sequence of length 11
#> idx token
#> --- ------------
#> 1 this
#> 2 is
#> 3 a
#> 4 s{en}t{en}ce
#> 5 with
#> 6 a
#> 7 c{ou}ple
#> 8 {of}
#> 9 w{or}ds
#> 10 in
#> 11 it
