Wolfram Alpha:
Search by keyword:
Astronomy
Chemistry
Classical Mechanics
Classical Physics
Climate Change
Cosmology
Finance and Accounting
Game Theory
General Relativity
Group Theory
Lagrangian and Hamiltonian Mechanics
Macroeconomics
Mathematics
Microeconomics
Nuclear Physics
Particle Physics
Probability and Statistics
Programming and Computer Science
Quantum Computing
Quantum Field Theory
Quantum Mechanics
Semiconductor Reliability
Solid State Electronics
Special Relativity
Statistical Mechanics
String Theory
Superconductivity
Supersymmetry (SUSY) and Grand Unified Theory (GUT)
The Standard Model
Topology
Units, Constants and Useful Formulas
Categorical Data - Crosstabs
----------------------------
The primary technique for hypothesis testing involving
categorical data is the use of contingency tables (also known
as crosstabs). The underlying method involves use of the
binomial proportion.
W X
_______
Y | a | b | a+b
|-------|
Z | c | d | c+d
-------
a+c b+d a+b+c+d
P(W|Y) = P(W and Y)/P(Y)
= {a/(a + b + c + d)}/{(a + b)}/(a + b + c + d)}
= probability of W given Y
It can be shown that the best way of comparing the cells in a
table is to use the χ^{2} statistic (O - E)^{2}/E, where O and E are the
observed and expected values in a particular cell. The value
(O - E)^{2}/E is summed over all squares.
χ^{2} = Σ(O - E)^{2}/E
The larger the total obtained, the more disagreement there is
between the observed and expected values. The total value is
compared with the critical value found from the χ^{2} distribution
with (r - 1)(c - 1) degrees of freedom. If it is greater than
the critical value, H_{0} is rejected. The table is said to be
significant and an interpretation of cell frequencies is
warranted. Conversely, if the sum is below the critical value
H_{0} is accepted.
This procedure is used only when no expected value in the table is
less than 5. Under these circumstances the normal approximation
to the binomial distribution is valid and the Pearson Chi-Square
value is used. If any of the expected values is below 5, Fisher's
Exact Test must be used and the problem cannot be solved using the
normal approximation.
Under certain circumstances, a version of this test statistic with a
continuity correction yields more accurate p-values than does the
uncorrected version. For the continuity-corrected version, the
statistic (|O - E| - 0.5)^{2}/E rather than (O - E)^{2}/E is computed for
each cell. This test procedure is called the Yates-corrected
Pearson chi-square.
The expected values, E, are computed as:
E_{11} = (a+b)(a+c)/(a+b+c+d)
E_{12} = (a+b)(b+d)/(a+b+c+d)
E_{21} = (c+d)(a+c)/(a+b+c+d)
E_{22} = (c+d)(v+d)/(a+b+c+d)
In the case of a 2 x 2 table, Fisher developed a procedure for
computing the exact p value for the test. The method utilizes
the hypergeometric probability distribution. The formula is:
- - - -
| a + b || c + d |
| a || c |
- - - - (a + b)!(c + d)!(a + c)!(b + d)!
p = ------------------ = --------------------------------
- - a!b!c!d!n!
| n |
| a + c |
- -
Test statistic (independent samples): χ-square test
H_{0}: No effects (the 2 classifications are independent)
H_{1}: There are effects (the 2 classifications are dependent)
SPSS: Analyze>Descriptive Statistics>Crosstabs