This function coerces an object, such as a character vector, to an object of
class types
.
Arguments
- x
Object to coerce
- remove_duplicates
Logical. Should duplicates be removed from
x
prior to coercing to a vector of types.- sort
Logical. Should
x
be alphabetically sorted prior to coercing to a vector of types; this argument is ignored ifremove_duplicates
isTRUE
, because the result of removing duplicates is always sorted.- ...
Additional arguments (not implemented)
Value
An object of the class types
, which is based on a character vector.
It has additional attributes and methods such as:
base
print()
,as_data_frame()
,sort()
andbase::summary()
(which returns the number of items and of unique items),subsetting methods such as
keep_types()
,keep_pos()
, etc. including[]
subsetting (see brackets).
An object of class types
can be merged with another by means of types_merge()
,
written to file with write_types()
and read from file with write_types()
.
Examples
toy_corpus <- "Once upon a time there was a tiny toy corpus.
It consisted of three sentences. And it lived happily ever after."
flist <- freqlist(toy_corpus, re_token_splitter = "\\W+", as_text = TRUE)
print(flist, n = 1000)
#> Frequency list (types in list: 19, tokens in list: 21)
#> rank type abs_freq nrm_freq
#> ---- --------- -------- --------
#> 1 a 2 952.381
#> 2 it 2 952.381
#> 3 after 1 476.190
#> 4 and 1 476.190
#> 5 consisted 1 476.190
#> 6 corpus 1 476.190
#> 7 ever 1 476.190
#> 8 happily 1 476.190
#> 9 lived 1 476.190
#> 10 of 1 476.190
#> 11 once 1 476.190
#> 12 sentences 1 476.190
#> 13 there 1 476.190
#> 14 three 1 476.190
#> 15 time 1 476.190
#> 16 tiny 1 476.190
#> 17 toy 1 476.190
#> 18 upon 1 476.190
#> 19 was 1 476.190
(sel_types <- as_types(c("happily", "lived", "once")))
#> Type collection of length 3
#> type
#> -------
#> 1 happily
#> 2 lived
#> 3 once
keep_types(flist, sel_types)
#> Frequency list (types in list: 3, tokens in list: 3)
#> <total number of tokens: 21>
#> rank orig_rank type abs_freq nrm_freq
#> ---- --------- ------- -------- --------
#> 1 8 happily 1 476.19
#> 2 9 lived 1 476.19
#> 3 11 once 1 476.19
tks <- tokenize(toy_corpus, re_token_splitter = "\\W+")
print(tks, n = 1000)
#> Token sequence of length 21
#> idx token
#> --- ---------
#> 1 once
#> 2 upon
#> 3 a
#> 4 time
#> 5 there
#> 6 was
#> 7 a
#> 8 tiny
#> 9 toy
#> 10 corpus
#> 11 it
#> 12 consisted
#> 13 of
#> 14 three
#> 15 sentences
#> 16 and
#> 17 it
#> 18 lived
#> 19 happily
#> 20 ever
#> 21 after
tks[3:12] # idx is relative to selection
#> Token sequence of length 10
#> idx token
#> --- ---------
#> 1 a
#> 2 time
#> 3 there
#> 4 was
#> 5 a
#> 6 tiny
#> 7 toy
#> 8 corpus
#> 9 it
#> 10 consisted
head(tks) # idx is relative to selection
#> Token sequence of length 6
#> idx token
#> --- -----
#> 1 once
#> 2 upon
#> 3 a
#> 4 time
#> 5 there
#> 6 was
tail(tks) # idx is relative to selection
#> Token sequence of length 6
#> idx token
#> --- -------
#> 1 and
#> 2 it
#> 3 lived
#> 4 happily
#> 5 ever
#> 6 after