
Association scores used in collocation analysis and keyword analysis
Source:R/assoc.R
assoc_scores.Rdassoc_scores and assoc_abcd take as their arguments co-occurrence
frequencies of a number of items and return a range of association scores used
in collocation analysis, collostruction analysis and keyword analysis.
Usage
assoc_scores(
x,
y = NULL,
min_freq = 3,
measures = NULL,
with_variants = FALSE,
show_dots = FALSE,
p_fisher_2 = FALSE,
haldane = TRUE,
small_pos = 1e-05
)
assoc_abcd(
a,
b,
c,
d,
types = NULL,
measures = NULL,
with_variants = FALSE,
show_dots = FALSE,
p_fisher_2 = FALSE,
haldane = TRUE,
small_pos = 1e-05
)Arguments
- x
Either an object of class
freqlistor an object of classcooc_info.If
xis afreqlist, it is interpreted as the target frequency list (i.e. the list with the frequency of items in the target context) andymust be afreqlistwith the frequency of items in the reference context.If
xis an object of classcooc_infoinstead, it is interpreted as containing target frequency information, reference frequency information and corpus size information.- y
An object of class
freqlistwith the frequencies of the reference context ifxis also afreqlist. Ifxis an object of classcooc_info, this argument is ignored.- min_freq
Minimum value for
a[[i]](or for the frequency of an item in the target frequency list) needed for its corresponding item to be included in the output.- measures
Character vector containing the association measures (or related quantities) for which scores are requested. Supported measure names (and related quantities) are described in
Valuebelow.If
measuresisNULL, it is interpreted as short for the default selection, i.e.c("exp_a", "DP_rows", "RR_rows", "OR", "MS", "Dice", "PMI", "chi2_signed", "G_signed", "t", "fisher").If
measuresis"ALL", all supported measures are calculated (but not necessarily all the variants; seewith_variants).- with_variants
Logical. Whether, for the requested
measures, all variants should be included in the output (TRUE) or only the main version (FALSE). See alsop_fisher_2.- show_dots
Logical. Whether a dot should be shown in console each time calculations for a measure are finished.
- p_fisher_2
Logical. only relevant if
"fisher"is included inmeasures. IfTRUE, the p-value for a two-sided test (testing for either attraction or repulsion) is also calculated. By default, only the (computationally less demanding) p-value for a one-sided test is calculated. SeeValuefor more details.- haldane
Logical. Should the Haldane-Anscombe correction be used? (See the Details section.)
If
haldaneisTRUE, and there is at least one zero frequency in a contingency table, the correction is used for all measures calculated for that table, not just for measures that need this to be done.- small_pos
Alternative (but sometimes inferior) approach to dealing with zero frequencies, compared to
haldane. The argumentsmall_posonly applies whenhaldaneis set toFALSE. (See the Details section.)If
haldaneisFALSE, and there is at least one zero frequency in a contingency table, adding small positive values to the zero frequency cells is done systematically for all measures calculated for that table, not just for measures that need this to be done.- a
Numeric vector expressing how many times some tested item occurs in the target context. More specifically,
a[[i]], withian integer, expresses how many times thei-th tested item occurs in the target context.- b
Numeric vector expressing how many times other items than the tested item occur in the target context. More specifically,
b[[i]], withian integer, expresses how many times other items than thei-th tested item occur in the target context.- c
Numeric vector expressing how many times some tested item occurs in the reference context. More specifically,
c[[i]], withian integer, expresses how many times thei-th tested item occurs in the reference context.- d
Numeric vector expressing how many times items other than the tested item occur in the reference context. More specifically,
d[[i]], withian integer, expresses how many times other items than thei-th tested item occur in the reference context.- types
A character vector containing the names of the linguistic items of which the association scores are to be calculated, or
NULL. IfNULL,assoc_abcd()creates dummy types such as"t001","t002", etc.
Value
An object of class assoc_scores. This is a kind of data frame with
as its rows all items from either the target frequency list or the reference
frequency list with a frequency larger than min_freq in the target list,
and as its columns a range of measures that express the extent to which
the items are attracted to the target context (when compared to the reference
context).
Some columns don't contain actual measures but rather additional information
that is useful for interpreting other measures.
Possible columns
The following sections describe the (possible) columns in the output. All
of these measures are reported if measures is set to "ALL". Alternatively,
each measure can be requested by specifying its name in a character vector
given to the measures argument. Exceptions are described in the sections
below.
Observed and expected frequencies
a,b,c,d: The frequencies in cells a, b, c and d, respectively. If one of them is0, they will be augmented by 0.5 orsmall_pos(seeDetails). These output columns are always present.dir: The direction of the association:1in case of relative attraction between the tested item and the target context (if \(\frac{a}{m} \ge \frac{c}{n}\)) and-1in case of relative repulsion between the target item and the target context (if \(\frac{a}{m} < {c}{n}\)).exp_a,exp_b,exp_c,exp_d: The expected values for cells a, b, c and d, respectively. All these columns will be included if"expected"is inmeasures.exp_ais also one of the default measures and is therefore included ifmeasuresisNULL. The values of these columns are computed as follows:exp_a= \(\frac{m \times k}{N}\)exp_b= \(\frac{m \times l}{N}\)exp_c= \(\frac{n \times k}{N}\)exp_d= \(\frac{n \times l}{N}\)
Effect size measures
Some of these measures are based on proportions and can therefore be computed either on the rows or on the columns of the contingency table. Each measure can be requested on its own, but pairs of measures can also be requested with the first part of their name, as indicated in their corresponding descriptions.
DP_rowsandDP_cols: The difference of proportions, sometimes also called Delta-p (\(\Delta p\)), between rows and columns respectively. Both columns are present if"DP"is included inmeasures.DP_rowsis also included ifmeasuresisNULL. They are calculated as follows:DP_rows= \(\frac{a}{m} - \frac{c}{n}\)DP_cols= \(\frac{a}{k} - \frac{b}{l}\)
perc_DIFF_rowsandperc_DIFF_cols: These measures can be seen as normalized versions of Delta-p, i.e. essentially the same measures divided by the denominator and multiplied by100. They therefore express how large the difference of proportions is, relative to the reference proportion. The multiplication by100turns the resulting 'relative difference of proportion' into a percentage. Both columns are present if"perc_DIFF"is included inmeasures. They are calculated as follows:perc_DIFF_rows= \(100 * \frac{(a / m) - (c / n)}{c / n}\)perc_DIFF_cols= \(100 * \frac{(a / k) - (b / l)}{c / n}\)
DC_rowsandDC_cols: The difference coefficient can be seen as a normalized version of Delta-p, i.e. essentially dividing the difference of proportions by the sum of proportions. Both columns are present if"DC"is included inmeasures. They are calculated as follows:DC_rows= \(\frac{(a / m) - (c / n)}{(a / m) + (c / n)}\)DC_cols= \(\frac{(a / k) - (b / l)}{(a / k) + (b / l)}\)
RR_rowsandRR_cols: Relative risk for the rows and columns respectively.RR_rowsrepresents then how large the proportion in the target context is, relative to the proportion in the reference context. Both columns are present if"RR"is included inmeasures.RR_rowsis also included ifmeasuresisNULL. They are calculated as follows:RR_rows= \(\frac{a / m}{c / n}\)RR_cols= \(\frac{a / k}{b / l}\)
LR_rowsandLR_cols: The so-called 'log ratio' of the rows and columns, respectively. It can be seen as a transformed version of the relative risk, viz. its binary log. Both columns are present if"LR"is included inmeasures. They are calculated as follows:LR_rows= \(\log_2\left(\frac{a / m}{c / n}\right)\)LR_cols= \(\log_2\left(\frac{a / k}{b / l}\right)\)
Other measures use the contingency table in a different way and therefore
don't have a complementary row/column pair. In order to retrieve these columns,
if measures is not "ALL", their name must be in the measures vector.
Some of them are included by default, i.e. if measures is NULL.
OR: The odds ratio, which can be calculated either as \(\frac{a/b}{c/d}\) or as \(\frac{a/c}{b/d}\). This column is presentmeasuresisNULL.log_OR: The log odds ratio, which can be calculated either as \(\log\left(\frac{a/b}{c/d}\right)\) or as \(\log\left(\frac{a/c}{b/d}\right)\). In other words, it is the natural log of the odds ratio.MS: The minimum sensitivity, which is calculated as \(\min(\frac{a}{m}, \frac{a}{k})\). In other words, it is either \(\frac{a}{m}\) or \(\frac{a}{k}\), whichever is lowest. This column is presentmeasuresisNULL.Jaccard: The Jaccard index, which is calculated as \(\frac{a}{a + b + c}\). It expresses a, which is the frequency of the test item in the target context, relative to b + c + d, i.e. the frequency of all other contexts.Dice: The Dice coefficient, which is calculated as \(\frac{2a}{m + k}\). It expresses the harmonic mean of \(\frac{a}{m}\) and \(\frac{a}{k}\) This column is presentmeasuresisNULL.logDice: An adapted version of the Dice coefficient. It is calculated as \(14 + \log_2\left(\frac{2a}{m + k}\right)\). In other words, it is14plus the binary log of the Dice coefficient.phi: The phi coefficient (\(\phi\)), which is calculated as \(\frac{(a \times d) - (b \times c)}{ \sqrt{m \times n \times k \times l}}\).Q: Yule's Q, which is calculated as \(\frac{(a \times d) - (b \times c)}{(a \times d)(b \times c)}\).mu: The measure mu (\(\mu\)), which is calculated as \(\frac{a}{\mathrm{exp\_a}}\) (seeexp_a).PMIandpos_PMI: (Positive) pointwise mutual information, which can be seen as a modification of the mu measure and is calculated as \(\log_2\left(\frac{a}{\mathrm{exp\_a}}\right)\). Inpos_PMI, negative values are set to0. ThePMIcolumn is presentmeasuresisNULL.PMI2andPMI3: Modified versions ofPMIthat aim to give relatively more weight to cases with relatively higher a. However, because of this modification, they are not pure effect size measures any more.PMI2= \(\log_2\left(\frac{a^2}{\mathrm{exp\_a}}\right)\)PMI3= \(\log_2\left(\frac{a^3}{\mathrm{exp\_a}}\right)\)
Strength of evidence measures
The first measures in this section tend to come in triples: a test statistic,
its p-value (preceded by p_) and its signed version (followed by _signed).
The test statistics indicate evidence of either attraction or repulsion.
Thus, in order to indicate the direction of the relationship, a negative
sign is added in the "signed" version when \(\frac{a}{k} < \frac{c}{l}\).
In each of these cases, the name of the main measure (e.g. "chi2")
and/or its signed counterpart (e.g. "chi2_signed") must be in the measures
argument, or measures must be "ALL", for the columns to be included in
the output. If the main function is requested, the signed counterpart will
also be included, but if only the signed counterpart is requested, the non-signed
version will be excluded.
For the p-value to be retrieved, either the main measure or its signed version
must be requested and, additionally, the with_variants argument must be
set to TRUE.
chi2,p_chi2andchi2_signed: The chi-squared test statistic (\(\chi^2\)) as used in a chi-squared test of independence or in a chi-squared test of homogeneity for a two-by-two contingency table. Scores of this measure are high when there is strong evidence for attraction, but also when there is strong evidence for repulsion. Thechi2_signedcolumn is present ifmeasuresisNULL.chi2is calculated as follows: $$ \frac{(a-\mathrm{exp\_a})^2}{\mathrm{exp\_a}} + \frac{(b-\mathrm{exp\_b})^2}{\mathrm{exp\_b}} + \frac{(c-\mathrm{exp\_c})^2}{\mathrm{exp\_c}} + \frac{(d-\mathrm{exp\_d})^2}{\mathrm{exp\_d}} $$.chi2_Y,p_chi2_Yandchi2_Y_signed: The chi-squared test statistic (\(\chi^2\)) as used in a chi-squared test with Yates correction for a two-by-two contingency table.chi2_Yis calculated as follows: $$ \frac{(|a-\mathrm{exp\_a}| - 0.5)^2}{\mathrm{exp\_a}} + \frac{(|b-\mathrm{exp\_b}| - 0.5)^2}{\mathrm{exp\_b}} + \frac{(|c-\mathrm{exp\_c}| - 0.5)^2}{\mathrm{exp\_c}} + \frac{(|d-\mathrm{exp\_d}| - 0.5)^2}{\mathrm{exp\_d}} $$.chi2_2T,p_chi2_2Tandchi2_2T_signed: The chi-squared test statistic (\(\chi^2\)) as used in a chi-squared goodness-of-fit test applied to the first column of the contingency table. The"2T"in the name stands for 'two terms' (as opposed tochi2, which is sometimes the 'four terms' version).chi2_2Tis calculated as follows: $$ \frac{(a-\mathrm{exp\_a})^2}{\mathrm{exp\_a}} + \frac{(c-\mathrm{exp\_c})^2}{\mathrm{exp\_c}} $$.chi2_2T_Y,p_chi2_2T_Yandchi2_2T_Y_signed: The chi-squared test statistic (\(\chi^2\)) as used in a chi-squared goodness-of-fit test with Yates correction, applied to the first column of the contingency table.chi2_2T_Yis calculated as follows: $$ \frac{(|a-\mathrm{exp\_a}| - 0.5)^2}{\mathrm{exp\_a}} + \frac{(|c-\mathrm{exp\_c}| - 0.5)^2}{\mathrm{exp\_c}} $$.G,p_GandG_signed: G test statistic, which is also sometimes called log-likelihood ratio (LLR) and, somewhat confusingly, G-squared. This is the test statistic as used in a log-likelihood ratio test for independence or homogeneity in a two-by-two contingency table. Scores are high in case of strong evidence for attraction, but also in case of strong evidence of repulsion. TheG_signedcolumn is present ifmeasuresisNULL.Gis calculated as follows: $$ 2 \left( a \times \log(\frac{a}{\mathrm{exp\_a}}) + b \times \log(\frac{b}{\mathrm{exp\_b}}) + c \times \log(\frac{c}{\mathrm{exp\_c}}) + d \times \log(\frac{d}{\mathrm{exp\_d}}) \right) $$G_2T,p_G_2TandG_2T_signed: The test statistic used in a log-likelihood ratio test for goodness-of-fit applied to the first column of the contingency table. The"2T"stands for 'two terms'.G_2Tis calculated as follows: $$ 2 \left( a \times \log(\frac{a}{\mathrm{exp\_a}}) + c \times \log(\frac{c}{\mathrm{exp\_c}}) \right) $$
The final two groups of measures take a different shape. The
_as_chisq1 columns compute qchisq(1 - p, 1), with p being the p-values
they are transforming, i.e. the p right quantile in a \(\chi^2\)
distribution with one degree of freedom (see p_to_chisq1()).
t,p_t_1,t_1_as_chisq1,p_t_2andt_2_as_chisq1: The t-test statistic, used for a t-test for the proportion \(\frac{a}{N}\) in which the null hypothesis is based on \(\frac{k}{N}\times\frac{m}{N}\). Columntis present if"t"is included inmeasuresor ifmeasuresis"ALL"orNULL. The other four columns are present iftis requested and if, additionally,with_variantsisTRUE.t= \( \frac{ a/N + k/N + m/N }{ \sqrt{((a/N)\times (1-a/N))/N} } \)p_t_1is the p-value that corresponds totwhen assuming a one-tailed test that only looks at attraction;t_1_as_chisq1is its transformation.p_t_2is the p-value that corresponds totwhen assuming a two-tailed test, viz. that looks at both attraction and repulsion;t_2_as_chisq1is its transformation.
p_fisher_1,fisher_1_as_chisq1,p_fisher_1r,fisher_1r_as_chisq1: The p-value of a one-sided Fisher exact test. The columnp_fisher_1is present if either"fisher"or"p_fisher"are inmeasuresor ifmeasuresis"ALL"orNULL. The other columns are present ifp_fisher_1as been requested and if, additionally,with_variantsisTRUE.p_fisher_1andp_fisher_1rare the p-values of the Fisher exact test that look at attraction and repulsion respectively.fisher_1_as_chisq1andfisher_1r_as_chisq1are their respective transformations..
p_fisher_2andfisher_2_as_chisq1: p-value for a two-sided Fisher exact test, viz. looking at both attraction and repulsion.p_fisher_2returns the p-value andfisher_2_as_chisq1is its transformation. Thep_fisher_2column is present if either"fisher"or"p_fisher_1"are inmeasuresor ifmeasuresis"ALL"orNULLand if, additionally,p_fisher_2isTRUE.fisher_2_as_chisq1is present ifp_fisher_2was requested and, additionally,with_variantsisTRUE.
Properties of the class
An object of class assoc_scores has:
associated
as.data.frame(),print(),sort()andtibble::as_tibble()methods,an interactive
explore()method and useful getters, viz.n_types()andtype_names().
An object of this class can be saved to file with write_assoc() and read
with read_assoc().
Details
Input and output
assoc_scores() takes as its arguments a target frequency list and a reference
frequency lists (either as two freqlist objects or as a
cooc_info object) and returns a number of popular measures
expressing, for (almost) every item in either one of these lists, the extent
to which the item is attracted to the target context, when compared to the
reference context. The "almost" is added between parentheses because, with
the default settings, some items are automatically excluded from the output
(see min_freq).
assoc_abcd() takes as its arguments four vectors a, b, c, and d, of
equal length. Each tuple of values (a[i], b[i], c[i], d[i]), with i some
integer number between 1 and the length of the vectors, is assumed to represent
the four numbers a, b, c, d in a contingency table of the type:
| tested item | any other item | total | |
| target context | a | b | m |
| reference context | c | d | n |
| total | k | l | N |
In the above table m, n, k, l and N are marginal frequencies. More specifically, m = a + b, n = c + d, k = a + c, l = b + d and N = m + n.
Dealing with zeros
Several of the association measures break down when one or more of the values
a, b, c, and d are zero (for instance, because this would lead to
division by zero or taking the log of zero). This can be dealt with in different
ways, such as the Haldane-Anscombe correction.
Strictly speaking, Haldane-Anscombe correction specifically applies to the
context of (log) odds ratios for two-by-two tables and boils down to adding
0.5 to each of the four values a, b, c, and d
in every two-by-two contingency table for which the original values
a, b, c, and d would not allow us to calculate
the (log) odds ratio, which happens when one (or more than one) of the four
cells is zero.
Using the Haldane-Anscombe correction, the (log) odds ratio is then calculated
on the bases of these 'corrected' values for a, b, c, and d.
However, because other measures that do not compute (log) odds ratios might also break down when some value is zero, all measures will be computed on the 'corrected' contingency matrix.
If the haldane argument is set to FALSE, division by zero or taking the
log of zero is avoided by systematically adding a small positive value to all
zero values for a, b, c, and d. The argument small_pos
determines which small positive value is added in such cases. Its default value is 0.00001.
Examples
assoc_abcd(10 , 200, 100, 300, types = "four")
#> Association scores (types in list: 1)
#> type a PMI G_signed| b c d dir exp_a DP_rows RR_rows OR
#> 1 four 10 -1.921 -45.432|200 100 300 -1 37.869 -0.202 0.19 0.15
#> <number of extra columns to the right: 5>
#>
assoc_abcd(30, 1000, 14, 5000, types = "fictitious")
#> Association scores (types in list: 1)
#> type a PMI G_signed| b c d dir exp_a DP_rows RR_rows
#> 1 fictitious 30 2 56.959|1000 14 5000 1 7.498 0.026 10.431
#> <number of extra columns to the right: 6>
#>
assoc_abcd(15, 5000, 16, 1000, types = "toy")
#> Association scores (types in list: 1)
#> type a PMI G_signed| b c d dir exp_a DP_rows RR_rows OR
#> 1 toy 15 -0.781 -19.723|5000 16 1000 -1 25.778 -0.013 0.19 0.188
#> <number of extra columns to the right: 5>
#>
assoc_abcd( 1, 300, 4, 6000, types = "examples")
#> Association scores (types in list: 1)
#> type a PMI G_signed| b c d dir exp_a DP_rows RR_rows OR
#> 1 examples 1 2.067 1.473|300 4 6000 1 0.239 0.003 4.987 5
#> <number of extra columns to the right: 5>
#>
a <- c(10, 30, 15, 1)
b <- c(200, 1000, 5000, 300)
c <- c(100, 14, 16, 4)
d <- c(300, 5000, 10000, 6000)
types <- c("four", "fictitious", "toy", "examples")
(scores <- assoc_abcd(a, b, c, d, types = types))
#> Association scores (types in list: 4)
#> type a PMI G_signed| b c d dir exp_a DP_rows
#> 1 four 10 -1.921 -45.432| 200 100 300 -1 37.869 -0.202
#> 2 fictitious 30 2.000 56.959|1000 14 5000 1 7.498 0.026
#> 3 toy 15 0.536 2.984|5000 16 10000 1 10.343 0.001
#> 4 examples 1 2.067 1.473| 300 4 6000 1 0.239 0.003
#> <number of extra columns to the right: 7>
#>
as_data_frame(scores)
#> type a b c d dir exp_a DP_rows RR_rows OR
#> 1 four 10 200 100 300 -1 37.8688525 -0.202380952 0.1904762 0.15000
#> 2 fictitious 30 1000 14 5000 1 7.4983455 0.026334032 10.4313454 10.71429
#> 3 toy 15 5000 16 10000 1 10.3429579 0.001393583 1.8723829 1.87500
#> 4 examples 1 300 4 6000 1 0.2386994 0.002656037 4.9867110 5.00000
#> MS Dice PMI chi2_signed G_signed t
#> 1 0.047619048 0.062500000 -1.9210117 -38.158009 -45.431519 -8.8860423
#> 2 0.029126214 0.055865922 2.0003183 81.993003 56.958917 4.1184552
#> 3 0.002991027 0.005945303 0.5363137 3.153303 2.983872 1.2030435
#> 4 0.003322259 0.006535948 2.0667329 2.551819 1.473313 0.7613609
#> p_fisher_1
#> 1 1.000000e+00
#> 2 6.106227e-14
#> 3 5.916695e-02
#> 4 2.170331e-01
as_tibble(scores)
#> # A tibble: 4 × 17
#> type a b c d dir exp_a DP_rows RR_rows OR MS
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 four 10 200 100 300 -1 37.9 -0.202 0.190 0.15 0.0476
#> 2 fictitious 30 1000 14 5000 1 7.50 0.0263 10.4 10.7 0.0291
#> 3 toy 15 5000 16 10000 1 10.3 0.00139 1.87 1.88 0.00299
#> 4 examples 1 300 4 6000 1 0.239 0.00266 4.99 5 0.00332
#> # … with 6 more variables: Dice <dbl>, PMI <dbl>, chi2_signed <dbl>,
#> # G_signed <dbl>, t <dbl>, p_fisher_1 <dbl>
print(scores, sort_order = "PMI")
#> Association scores (types in list: 4, sort order criterion: PMI)
#> type a PMI G_signed| b c d dir exp_a DP_rows
#> 1 examples 1 2.067 1.473| 300 4 6000 1 0.239 0.003
#> 2 fictitious 30 2.000 56.959|1000 14 5000 1 7.498 0.026
#> 3 toy 15 0.536 2.984|5000 16 10000 1 10.343 0.001
#> 4 four 10 -1.921 -45.432| 200 100 300 -1 37.869 -0.202
#> <number of extra columns to the right: 7>
#>
print(scores, sort_order = "alpha")
#> Association scores (types in list: 4, sort order criterion: alpha)
#> type a PMI G_signed| b c d dir exp_a DP_rows
#> 1 examples 1 2.067 1.473| 300 4 6000 1 0.239 0.003
#> 2 fictitious 30 2.000 56.959|1000 14 5000 1 7.498 0.026
#> 3 four 10 -1.921 -45.432| 200 100 300 -1 37.869 -0.202
#> 4 toy 15 0.536 2.984|5000 16 10000 1 10.343 0.001
#> <number of extra columns to the right: 7>
#>
print(scores, sort_order = "none")
#> Association scores (types in list: 4)
#> type a PMI G_signed| b c d dir exp_a DP_rows
#> 1 four 10 -1.921 -45.432| 200 100 300 -1 37.869 -0.202
#> 2 fictitious 30 2.000 56.959|1000 14 5000 1 7.498 0.026
#> 3 toy 15 0.536 2.984|5000 16 10000 1 10.343 0.001
#> 4 examples 1 2.067 1.473| 300 4 6000 1 0.239 0.003
#> <number of extra columns to the right: 7>
#>
print(scores, sort_order = "nonsense")
#> Association scores (types in list: 4)
#> type a PMI G_signed| b c d dir exp_a DP_rows
#> 1 four 10 -1.921 -45.432| 200 100 300 -1 37.869 -0.202
#> 2 fictitious 30 2.000 56.959|1000 14 5000 1 7.498 0.026
#> 3 toy 15 0.536 2.984|5000 16 10000 1 10.343 0.001
#> 4 examples 1 2.067 1.473| 300 4 6000 1 0.239 0.003
#> <number of extra columns to the right: 7>
#>
print(scores, sort_order = "PMI",
keep_cols = c("a", "exp_a", "PMI", "G_signed"))
#> Association scores (types in list: 4, sort order criterion: PMI)
#> type a PMI G_signed| b c d dir exp_a DP_rows
#> 1 examples 1 2.067 1.473| 300 4 6000 1 0.239 0.003
#> 2 fictitious 30 2.000 56.959|1000 14 5000 1 7.498 0.026
#> 3 toy 15 0.536 2.984|5000 16 10000 1 10.343 0.001
#> 4 four 10 -1.921 -45.432| 200 100 300 -1 37.869 -0.202
#> <number of extra columns to the right: 7>
#>
print(scores, sort_order = "PMI",
keep_cols = c("a", "b", "c", "d", "exp_a", "G_signed"))
#> Association scores (types in list: 4, sort order criterion: PMI)
#> type a G_signed| b c d dir exp_a DP_rows RR_rows
#> 1 examples 1 1.473| 300 4 6000 1 0.239 0.003 4.987
#> 2 fictitious 30 56.959|1000 14 5000 1 7.498 0.026 10.431
#> 3 toy 15 2.984|5000 16 10000 1 10.343 0.001 1.872
#> 4 four 10 -45.432| 200 100 300 -1 37.869 -0.202 0.190
#> <number of extra columns to the right: 6>
#>
print(scores, sort_order = "PMI",
drop_cols = c("a", "b", "c", "d", "exp_a", "G_signed",
"RR_rows", "chi2_signed", "t"))
#> Association scores (types in list: 4, sort order criterion: PMI)
#> type a PMI G_signed| b c d dir exp_a DP_rows
#> 1 examples 1 2.067 1.473| 300 4 6000 1 0.239 0.003
#> 2 fictitious 30 2.000 56.959|1000 14 5000 1 7.498 0.026
#> 3 toy 15 0.536 2.984|5000 16 10000 1 10.343 0.001
#> 4 four 10 -1.921 -45.432| 200 100 300 -1 37.869 -0.202
#> <number of extra columns to the right: 7>
#>