Skip to contents

This function coerces a character object or another object that can be coerced to a character into an object of class tokens.

Usage

as_tokens(x, ...)

Arguments

x

Object to coerce.

...

Additional arguments (not implemented).

Value

An object of class tokens.

Examples

toy_corpus <- "Once upon a time there was a tiny toy corpus.
It consisted of three sentences. And it lived happily ever after."

tks <- tokenize(toy_corpus)
print(tks, n = 1000)
#> Token sequence of length 21
#> idx     token
#> --- ---------
#>   1      once
#>   2      upon
#>   3         a
#>   4      time
#>   5     there
#>   6       was
#>   7         a
#>   8      tiny
#>   9       toy
#>  10    corpus
#>  11        it
#>  12 consisted
#>  13        of
#>  14     three
#>  15 sentences
#>  16       and
#>  17        it
#>  18     lived
#>  19   happily
#>  20      ever
#>  21     after

tks[3:12]
#> Token sequence of length 10
#> idx     token
#> --- ---------
#>   1         a
#>   2      time
#>   3     there
#>   4       was
#>   5         a
#>   6      tiny
#>   7       toy
#>   8    corpus
#>   9        it
#>  10 consisted
print(as_tokens(tks[3:12]), n = 1000)
#> Token sequence of length 10
#> idx     token
#> --- ---------
#>   1         a
#>   2      time
#>   3     there
#>   4       was
#>   5         a
#>   6      tiny
#>   7       toy
#>   8    corpus
#>   9        it
#>  10 consisted
as_tokens(tail(tks))
#> Token sequence of length 6
#> idx   token
#> --- -------
#>   1     and
#>   2      it
#>   3   lived
#>   4 happily
#>   5    ever
#>   6   after