This method takes as its argument x
an object that represents a sequence of
character data, such as an object of class tokens
, and truncates it at the
position where a match for the argument pattern
is found. Currently it is
only implemented for tokens
objects.
Usage
trunc_at(x, pattern, ...)
# S3 method for tokens
trunc_at(
x,
pattern,
keep_this = FALSE,
last_match = FALSE,
from_end = FALSE,
...
)
Arguments
- x
An object that represents a sequence of character data.
- pattern
A regular expression.
- ...
Additional arguments.
- keep_this
Logical. Whether the matching token itself should be kept. If
TRUE
, the truncating happens right after the matching token; ifFALSE
, right before.- last_match
Logical. In case there are several matching tokens, if
last_match
isTRUE
, the last match will be used as truncating point; otherwise, the first match will.- from_end
Logical. If
FALSE
, the match starts from the first token progressing forward; ifTRUE
, it starts from the last token progressing backward.If
from_end
isFALSE
, the part ofx
that is kept after truncation is the head ofx
. If it isTRUE
instead, the part that is kept after truncation is the tail ofx
.
Examples
(toks <- tokenize('This is a first sentence . This is a second sentence .',
re_token_splitter = '\\s+'))
#> Token sequence of length 12
#> idx token
#> --- --------
#> 1 this
#> 2 is
#> 3 a
#> 4 first
#> 5 sentence
#> 6 .
#> 7 this
#> 8 is
#> 9 a
#> 10 second
#> 11 sentence
#> 12 .
trunc_at(toks, re("[.]"))
trunc_at(toks, re("[.]"), last_match = TRUE)
trunc_at(toks, re("[.]"), last_match = TRUE, from_end = TRUE)