Truncate a sequence of character data

This method takes as its argument x an object that represents a sequence of character data, such as an object of class tokens, and truncates it at the position where a match for the argument pattern is found. Currently it is only implemented for tokens objects.

Usage

trunc_at(x, pattern, ...)

# S3 method for tokens
trunc_at(
  x,
  pattern,
  keep_this = FALSE,
  last_match = FALSE,
  from_end = FALSE,
  ...
)

Arguments

x

An object that represents a sequence of character data.

pattern

A regular expression.

...

Additional arguments.

keep_this

Logical. Whether the matching token itself should be kept. If TRUE, the truncating happens right after the matching token; if FALSE, right before.

last_match

Logical. In case there are several matching tokens, if last_match is TRUE, the last match will be used as truncating point; otherwise, the first match will.

from_end

Logical. If FALSE, the match starts from the first token progressing forward; if TRUE, it starts from the last token progressing backward.

If from_end is FALSE, the part of x that is kept after truncation is the head of x. If it is TRUE instead, the part that is kept after truncation is the tail of x.

Value

A truncated version of x.

Examples

(toks <- tokenize('This is a first sentence . This is a second sentence .',
re_token_splitter = '\\s+'))
#> Token sequence of length 12
#> idx    token
#> --- --------
#>   1     this
#>   2       is
#>   3        a
#>   4    first
#>   5 sentence
#>   6        .
#>   7     this
#>   8       is
#>   9        a
#>  10   second
#>  11 sentence
#>  12        .

trunc_at(toks, re("[.]"))

trunc_at(toks, re("[.]"), last_match = TRUE)

trunc_at(toks, re("[.]"), last_match = TRUE, from_end = TRUE)