Skip to contents

This function coerces an object, such as a character vector, to an object of class types.

Usage

as_types(x, remove_duplicates = TRUE, sort = TRUE, ...)

Arguments

x

Object to coerce

remove_duplicates

Logical. Should duplicates be removed from x prior to coercing to a vector of types.

sort

Logical. Should x be alphabetically sorted prior to coercing to a vector of types; this argument is ignored if remove_duplicates is TRUE, because the result of removing duplicates is always sorted.

...

Additional arguments (not implemented)

Value

An object of the class types, which is based on a character vector. It has additional attributes and methods such as:

An object of class types can be merged with another by means of types_merge(), written to file with write_types() and read from file with write_types().

See also

Examples


toy_corpus <- "Once upon a time there was a tiny toy corpus.
It consisted of three sentences. And it lived happily ever after."

flist <- freqlist(toy_corpus, re_token_splitter = "\\W+", as_text = TRUE)
print(flist, n = 1000)
#> Frequency list (types in list: 19, tokens in list: 21)
#> rank      type abs_freq nrm_freq
#> ---- --------- -------- --------
#>    1         a        2  952.381
#>    2        it        2  952.381
#>    3     after        1  476.190
#>    4       and        1  476.190
#>    5 consisted        1  476.190
#>    6    corpus        1  476.190
#>    7      ever        1  476.190
#>    8   happily        1  476.190
#>    9     lived        1  476.190
#>   10        of        1  476.190
#>   11      once        1  476.190
#>   12 sentences        1  476.190
#>   13     there        1  476.190
#>   14     three        1  476.190
#>   15      time        1  476.190
#>   16      tiny        1  476.190
#>   17       toy        1  476.190
#>   18      upon        1  476.190
#>   19       was        1  476.190
(sel_types <- as_types(c("happily", "lived", "once")))
#> Type collection of length 3
#>      type
#>   -------
#> 1 happily
#> 2   lived
#> 3    once
keep_types(flist, sel_types)
#> Frequency list (types in list: 3, tokens in list: 3)
#> <total number of tokens: 21>
#> rank orig_rank    type abs_freq nrm_freq
#> ---- --------- ------- -------- --------
#>    1         8 happily        1   476.19
#>    2         9   lived        1   476.19
#>    3        11    once        1   476.19
tks <- tokenize(toy_corpus, re_token_splitter = "\\W+")
print(tks, n = 1000)
#> Token sequence of length 21
#> idx     token
#> --- ---------
#>   1      once
#>   2      upon
#>   3         a
#>   4      time
#>   5     there
#>   6       was
#>   7         a
#>   8      tiny
#>   9       toy
#>  10    corpus
#>  11        it
#>  12 consisted
#>  13        of
#>  14     three
#>  15 sentences
#>  16       and
#>  17        it
#>  18     lived
#>  19   happily
#>  20      ever
#>  21     after
tks[3:12] # idx is relative to selection
#> Token sequence of length 10
#> idx     token
#> --- ---------
#>   1         a
#>   2      time
#>   3     there
#>   4       was
#>   5         a
#>   6      tiny
#>   7       toy
#>   8    corpus
#>   9        it
#>  10 consisted
head(tks) # idx is relative to selection
#> Token sequence of length 6
#> idx token
#> --- -----
#>   1  once
#>   2  upon
#>   3     a
#>   4  time
#>   5 there
#>   6   was
tail(tks) # idx is relative to selection
#> Token sequence of length 6
#> idx   token
#> --- -------
#>   1     and
#>   2      it
#>   3   lived
#>   4 happily
#>   5    ever
#>   6   after