Wolfram Alpha:

```Simple Linear Regression
------------------------

The purpose of linear regression is to "predict" the value of the
dependent variable based upon the values of an independent
variable.

yp = βx + α + e  α = intercept

e is an error term and is normally distributed with mean 0 and
variance σ2

Sum of Squares
--------------

The total sum of squares helps express the total variation that
can be attributed to various factors.

Total SS = Model SS + Error SS

= SSM + SSE
_            _
Σ(yi - y)2 = Σ(yp - y)2 + Σ(yi - yp)2
_
= Σ(yp - y)2 + Σ(yi - (βxi + α))2

Where yi is the observed value of the response variable and yp
is the predicted value of the response variable.

SSM is the variation in the dependent variable explained by the
model.  SSE is the variation in the dependent variable NOT
explained by the model.  Regression analysis seek to minimize
SSE.  Minimization of SSE results in a set of equations which
can be solved to obtain β and α.  These are:

β = SSxy/SSxx
_    _
α = y - βx

Where:
_       _
SSxy = Σ(xi - x)(yi - y)
_
SSxx = Σ(xi - x)2
_
SSyy = Σ(yi - y)2

SSE = Σ(yi - yp)2 ≡ SSyy - βSSxy

The mean square for each source of variation is defined as being
the sum of squares divided by its degrees of freedom. Thus:

MSSM = SSM/k and MSSE = SSE/(n - k - 1)

Where n is the number of observations and k is the number
of explanatoty variables (1 in this case).  Note that the Total
SSE has n - 1 degrees of freedom and n - 1 = k + (n - k - 1).

Converting the sum of squares into mean squares allows
comparison of these ratios to determine the significance of the
relationship.

The greater the variability in the error term, e, the greater
the variability will be in β and α.  The variance of e is given
by:

s2 = SSE/(n - 2)

We can expect 95.4% of the observed y values to lie within 2s of
their predicted value.

Hypotheses:

If there is a significant linear relationship between the
independent variable x and the dependent variable y, the
slope will not equal zero.

H0: β = 0
H1: β ≠ 0

Test statistic:  F test = MSSM/MSSE

The F test has k/(n - k - 1) df.

t test = β/SEβ

= β/(s/√SSxx)

The t test has n - 2 df.

The confidence interval for β is:

β +/- tα/2sβ

Assumptions:

1.  The mean of ε is 0.
2.  e is distributed normally.
3.  Values of ε are independent.

To test whether one slope parameter is 0, we can use an
F-test or a t-test.  Both will yield identical p values
since t2 = F.

A simple example:

_         _         _       _
xi  yi (xi - x)2 (yi - y)2 (xi - x)(yi - y)
-  --- -------  -------- --------------
1  1.9  2.25    0.8556       1.3875
2  2.3  0.25    0.2756       0.2625
3  2.8  0.25    0.0006      -0.0125
4  4.3  2.25    2.1756       2.2125
-  ---  ----    ------       ------
Σ         5.00    3.3075       3.8500

_
x = 2.5000
_
y = 2.8250
SSxx = 5.0000
SSxy = 3.8500
SSyy = 3.3075
β = SSxy/SSxy = 3.8500/5.0000 = 0.7700
SSE = SSyy - βSSxy = 3.3075 - 0.7700*3.8500 = 0.3430

Alternatively,

SSE = Σ(yi - yp)2 = 0.3430
_    _
α = y - βx = 2.8250 - 0.7700*2.5 = 0.9

Therefore, the equation for the regression line is:

yp = 0.77x + 0.9
_
yi    yp  (yi - yp)2  (yp - y)2
---  ----  ------     ------
1.9  1.67  0.0529     1.3340
2.3  2.44  0.0196     0.1482
2.8  3.21  0.1681     0.1482
4.3  3.98  0.1024     1.3340
------     ------
Σ            0.3430     2.9645

Total SS = SSM + SSE
_            _
Σ(yi - y)2 = Σ(yp - y)2 + Σ(yi - yp)2

3.3075 = 2.9645 + 0.3430

s2 = SSE/(n - 2) = 0.3430/2 = 0.1715 ∴ s = 0.4141

t = β/(s/√SSxx) = 0.77/(0.4141/√5.0000) = 4.158

t0.05/2 for 4 - 2 = 2df is 4.303 (2-tailed). Therefore,
do not reject H0.

F = MSSM/MSSE

= (SSM/k)(SSE/(n - k - 1)

= SSM/(SSE/2)

= 2.9645/0.1715

= 17.286

F1,2 = 18.513 therefore do not reject H0.

∴ t2 = F

Confidence Interval for α = 0.05:

β +/- tα/2sβ

CI = 0.77 +/- 4.303*(0.4141/√5.0000)

CI = 0.77 +/- 0.797

= (-0.027,1.567)

SPSS:  Analyze>Regression>Linear

Multiple Regression
-------------------

Often there is often more than one independent variable and
we would like to look at the relationship between each of the
independent variables (x1,…, xk) and the dependent variable, y,
after taking into account the remaining independent variables.

y = α + β1x1 + β2x2 + e

Hypotheses:

H0: β1 = β2 = . . . = βk = 0
H1: At Least One of the βj ≠ 0

Test statistic:  F test = MSSM/MSSE

SPSS: Analyze>Regression>Linear

Multiple Regression with Interactions
-------------------------------------

y = α + β1x1 + β2x2 + β3x1x2 + e

MR can also include interaction terms.  Normally, the product
of the 2 terms is constructed in the data and treated as a
separate independent variable.  Both categorical and continuous
data can be analyzed but in the case of the former the
appropriate coding of the 'dummy' variable must be used.
```