# Redshift Academy

Wolfram Alpha:

Last modified: January 26, 2018
```The one-way ANOVA is a method for extending the two-sample
t-test for independent samples to three or more samples. ANOVA
is an acronym for ANalysis Of VAriance.  The acronym is a little
misleading since we are actually analyzing means not variances.

In a one-way ANOVA there is one dependent variable and one
independent variable.

Means Model
-----------

yij = μi + eij

Where,

yij = measured value on jth subject in ith group
μi = mean value for group i
eij = random error about μi

In analysis of variance (ANOVA), the sum of squares helps express
the total variation that can be attributed to various factors.

Sum of Squares
--------------

Let:

k = number of treatments (columns)
r = number of groups in each treatment (rows).
n = total number of groups (rows x columns)
yij = ith observation in jth column.
_
y = grand mean = Σyij/n
_
yj = column means = Σiyij/r
_
Total SS = Σ(yij - y)2
_    _
SST = Σjr(yj - y)2
_
SSE = Σij(yij - yj)2

Mean Total SS  = Total SS/(n - 1)

MSST = SST/(k - 1)

MSSE = SSE/(n - k)

Assumptions:

1.  Samples are randomly selected from the k treatment populations.
2.  All k treatment populations are normal.
3.  All k treatment variances are equal.

Hypotheses:

H0:  μ1 = μ2 ... = μn
H1:  μi ≠ μj for at least one i and j

Test statistic:  F test = = MSST/MSSE

If H0 is rejected how do we determine which samples are
different?  Two methods are used:  LSD and Bonferroni.
Both methods use the t-test.

SPSS:  Analyze>Compare Means>One-Way ANOVA
Post Hoc => check LSD and Bonferroni (more accurate)

df         SS             MS               F
-----       ---      ------------------  ---------
Treatment  k - 1       SST      MSST = SST/(k - 1)  MSST/MSSE
Error      n - k       SSE      MSSE = SSE/(n - k)
Total      n - 1       Tot SS        ***

*** Total ≠ column totals.

Why ANOVA and not t-test?

1. Comparing three groups using t-tests would require that
3 t-tests be conducted.  This increases the chances of
making a type I error.

2. The t-test does not make use of all of the available
information from which the samples were drawn.  For
example, in a comparison of group 1 vs. group 2, the
information from group 3 is neglected.  An ANOVA makes
use of the entire data set.

Pairwise Comparisons
--------------------

The number of ways that the means can be compared is given by:

c = k(k - 1)/2

Thus, for example, if there are 3 treatments c = 3.  These
combinations are:

μ1 with μ2
μ1 with μ3
μ2 with μ3

A simple completely randomized design example:

SAT Score means
Boys     Girls
Sample 1          540     530
Sample 2          530     540
Sample 3          540     520
_
yj               536.66   530
_
y = (540 + 530 + 540 + 530 + 540 + 520)/6 = 533.33

k = 2, r = 3, c = 2, n = 6
_
Total SS = Σ(yij - y)2

= (540 - 533.33)2 + (530 - 533.33)2
+ (540 - 533.33)2  + (530 - 533.33)2
+ (540 - 533.33)2 + (520 - 533.33)2

= 333.334
_    _
SST = Σjr(yj - y)2

= 3(536.66 - 533.33)2 + 3(530 - 533.33)2

= 33.267 + 33.267

= 66.533

SSE = Σij(yij - yj)2

= (540 - 536.66)2 + (530 - 536.66)2
+ (540 - 536.66)2 + (530 - 530)2
+ (540 - 530)2 + (520 - 530)2

= 266.667

Note that:

Total SS = SST + SSE

= 66.533 + 266.667

= 333.2000

Mean Total SS  = Total SS/(n - 1)

= 333.334/5

= 66.667

MSST = SST/(k - 1)

= 66.533

MSSE = SSE/(n - k)

= 266.667/4

= 66.667

F = MSST/MSSE = 66.533/66.667 = 0.9980

MSST df = k - 1 = 1

MSSE df = n - k = 4

F1,4 = 7.7086 for α = 0.05

Summarizing:

df   SS        MS      F
--  ------   ------  ------
Treatment  1  66.533   66.533  0.9980
Error      4  266.667  66.667

When 2 samples are being compared, the t and F tests are
equivalent. F = t2
_    _
x1 - x2
t = ---------------
√s2(1/n1 + 1/n2)

s2 = MSE = 66.667

536.66 - 530
= ------------------
√66.667(1/3 + 1/3)

= 6.66/6.66

= 1

A simple randomized block design example:

With a randomized block design, the experimenter divides subjects
into subgroups called blocks, such that the variability within
blocks is less than the variability between blocks. Then,
subjects within each block are randomly assigned to treatment
conditions. Compared to a completely randomized design, this
design reduces variability within treatment conditions and
potential confounding, producing a better estimate of treatment
effects.

SPSS Output:

df        SS             MS                     F
-------------  ---    -------------------------  ---------
Treatment     k - 1      SST    MSST = SST/(k - 1)         MSST/MSSE
Block         b - 1      SSB    MSSB = SSB/(b - 1)         MSSB/MSSE
Error     n - b - k + 1  SSE    MSSE = SSE/(n - b - k + 1)
Total         n - 1      Tot SS      ***

*** Total ≠ column totals.

SAT Score means
Boys     Girls    Block Means
School 1          540      530       535
School 2          530      540       535
School 3          540      520       530
_
yj                 536.66  530
_
y = (540 + 530 + 540 + 530 + 540 + 520)/6 = 533.33

k = 2, r = 3, c = 2, n = 6, b = 3
_
Total SS = Σ(yij - y)2

= (540 - 533.33)2 + (530 - 533.33)2
+ (540 - 533.33)2  + (530 - 533.33)2
+ (540 - 533.33)2 + (520 - 533.33)2

= 333.334
_    _
SST = Σjr(yj - y)2

= 3(536.66 - 533.33)2 + 3(530 - 533.33)2

= 33.267 + 33.267

= 66.533
_    _
SSB = Σik(yi - y)2

= 2(535 - 533.33)2 + 2(535 - 533.33)2 + 2(530 - 533.33)2

= 5.578 + 5.578 + 22.218

= 33.374

SSE = Total SS - SST - SSB

= 333.334 - 66.533 - 33.374

= 233.427

MSST = SST/(k - 1)

= 66.533

MSSE = SSE/(n - b - k + 1)

= 233.427/(6 - 3 - 2 + 1)

= 116.714

MSSB = SSB/(b - 1)

= 33.374/2

= 16.687

F = MSST/MSSE = 66.533/116.714 = 0.5701

MSST df = k - 1 = 1

MSSE df = n - b - k + 1 = 2

MSSB df = b - 1 = 2

F1,2 = 18.5128 for α = 0.05

This randomized block design removes school as a potential
source of variability and as a potential confounding variable.

Summarizing:

df    SS       MS        F
--  -------  -------  ------
Treatment  1  66.533   66.533   0.5701
Block      2  33.374   16.687   0.1429
Error      2  266.667  116.714
```