This function coerces a character object or another object that can be coerced
to a character into an object of class tokens
.
Value
An object of class tokens
.
Examples
toy_corpus <- "Once upon a time there was a tiny toy corpus.
It consisted of three sentences. And it lived happily ever after."
tks <- tokenize(toy_corpus)
print(tks, n = 1000)
#> Token sequence of length 21
#> idx token
#> --- ---------
#> 1 once
#> 2 upon
#> 3 a
#> 4 time
#> 5 there
#> 6 was
#> 7 a
#> 8 tiny
#> 9 toy
#> 10 corpus
#> 11 it
#> 12 consisted
#> 13 of
#> 14 three
#> 15 sentences
#> 16 and
#> 17 it
#> 18 lived
#> 19 happily
#> 20 ever
#> 21 after
tks[3:12]
#> Token sequence of length 10
#> idx token
#> --- ---------
#> 1 a
#> 2 time
#> 3 there
#> 4 was
#> 5 a
#> 6 tiny
#> 7 toy
#> 8 corpus
#> 9 it
#> 10 consisted
print(as_tokens(tks[3:12]), n = 1000)
#> Token sequence of length 10
#> idx token
#> --- ---------
#> 1 a
#> 2 time
#> 3 there
#> 4 was
#> 5 a
#> 6 tiny
#> 7 toy
#> 8 corpus
#> 9 it
#> 10 consisted
as_tokens(tail(tks))
#> Token sequence of length 6
#> idx token
#> --- -------
#> 1 and
#> 2 it
#> 3 lived
#> 4 happily
#> 5 ever
#> 6 after