Wolfram Alpha:
Search by keyword:
Astronomy
Chemistry
Classical Physics
Climate Change
Cosmology
Finance and Accounting
Game Theory
General Relativity
Lagrangian and Hamiltonian Mechanics
Macroeconomics
Mathematics
Microeconomics
Particle Physics
Probability and Statistics
Programming and Computer Science
Quantum Computing
Quantum Field Theory
Quantum Mechanics
Semiconductor Reliability
Solid State Electronics
Special Relativity
Statistical Mechanics
String Theory
Superconductivity
Supersymmetry (SUSY) and Grand Unified Theory (GUT)
The Standard Model
Topology
Units, Constants and Useful Formulas
Simple Linear Regression
------------------------
The purpose of linear regression is to "predict" the value of the
dependent variable based upon the values of an independent
variable.
y_{p} = βx + α + e α = intercept
e is an error term and is normally distributed with mean 0 and
variance σ^{2}
Sum of Squares
--------------
The total sum of squares helps express the total variation that
can be attributed to various factors.
Total SS = Model SS + Error SS
= SSM + SSE
^{ } _^{ } _
Σ(y_{i} - y)^{2} = Σ(y_{p} - y)^{2} + Σ(y_{i} - y_{p})^{2}
_{ } _
_{ } = Σ(y_{p} - y)^{2} + Σ(y_{i} - (βx_{i} + α))^{2}
Where y_{i} is the observed value of the response variable and y_{p}
is the predicted value of the response variable.
SSM is the variation in the dependent variable explained by the
model. SSE is the variation in the dependent variable NOT
explained by the model. Regression analysis seek to minimize
SSE. Minimization of SSE results in a set of equations which
can be solved to obtain β and α. These are:
β = SS_{xy}/SS_{xx}
_ _
α = y - βx
Where:
_ _
SS_{xy} = Σ(x_{i} - x)(y_{i} - y)
_
SS_{xx} = Σ(x_{i} - x)^{2}
_
SS_{yy} = Σ(y_{i} - y)^{2}
SSE = Σ(y_{i} - y_{p})^{2} ≡ SS_{yy} - βSS_{xy}
The mean square for each source of variation is defined as being
the sum of squares divided by its degrees of freedom. Thus:
MSSM = SSM/k and MSSE = SSE/(n - k - 1)
Where n is the number of observations and k is the number
of explanatoty variables (1 in this case). Note that the Total
SSE has n - 1 degrees of freedom and n - 1 = k + (n - k - 1).
Converting the sum of squares into mean squares allows
comparison of these ratios to determine the significance of the
relationship.
The greater the variability in the error term, e, the greater
the variability will be in β and α. The variance of e is given
by:
s^{2} = SSE/(n - 2)
We can expect 95.4% of the observed y values to lie within 2s of
their predicted value.
Hypotheses:
If there is a significant linear relationship between the
independent variable x and the dependent variable y, the
slope will not equal zero.
H_{0}: β = 0
H_{1}: β ≠ 0
Test statistic: F test = MSSM/MSSE
The F test has k/(n - k - 1) df.
t test = β/SE_{β}
= β/(s/√SS_{xx})
The t test has n - 2 df.
The confidence interval for β is:
β +/- t_{α/2}s_{β}
Assumptions:
1. The mean of ε is 0.
2. e is distributed normally.
3. Values of ε are independent.
To test whether one slope parameter is 0, we can use an
F-test or a t-test. Both will yield identical p values
since t^{2} = F.
A simple example:
_{ } __{ } _ _{ } __{ } _
x_{i} y_{i} (x_{i} - x)^{2} (y_{i} - y)^{2} (x_{i} - x)(y_{i} - y)
- --- ------- -------- --------------
1 1.9 2.25 0.8556 1.3875
2 2.3 0.25 0.2756 0.2625
3 2.8 0.25 0.0006 -0.0125
4 4.3 2.25 2.1756 2.2125
- --- ---- ------ ------
Σ 5.00 3.3075 3.8500
_
x = 2.5000
_
y = 2.8250
SS_{xx} = 5.0000
SS_{xy} = 3.8500
SS_{yy} = 3.3075
β = SS_{xy}/SS_{xy} = 3.8500/5.0000 = 0.7700
SSE = SS_{yy} - βSS_{xy} = 3.3075 - 0.7700*3.8500 = 0.3430
Alternatively,
SSE = Σ(y_{i} - y_{p})^{2} = 0.3430
_ _
α = y - βx = 2.8250 - 0.7700*2.5 = 0.9
Therefore, the equation for the regression line is:
y_{p} = 0.77x + 0.9
_{ } _
y_{i} y_{p} (y_{i} - y_{p})^{2} (y_{p} - y)^{2}
--- ---- ------ ------
1.9 1.67 0.0529 1.3340
2.3 2.44 0.0196 0.1482
2.8 3.21 0.1681 0.1482
4.3 3.98 0.1024 1.3340
------ ------
Σ 0.3430 2.9645
Total SS = SSM + SSE
_{ } __{ } _
Σ(y_{i} - y)^{2} = Σ(y_{p} - y)^{2} + Σ(y_{i} - y_{p})^{2}
3.3075 = 2.9645 + 0.3430
s^{2} = SSE/(n - 2) = 0.3430/2 = 0.1715 ∴ s = 0.4141
t = β/(s/√SS_{xx}) = 0.77/(0.4141/√5.0000) = 4.158
t_{0.05/2} for 4 - 2 = 2df is 4.303 (2-tailed). Therefore,
do not reject H_{0}.
F = MSSM/MSSE
= (SSM/k)(SSE/(n - k - 1)
= SSM/(SSE/2)
= 2.9645/0.1715
= 17.286
F_{1,2} = 18.513 therefore do not reject H_{0}.
∴ t^{2} = F
Confidence Interval for α = 0.05:
β +/- t_{α/2}s_{β}
CI = 0.77 +/- 4.303*(0.4141/√5.0000)
CI = 0.77 +/- 0.797
= (-0.027,1.567)
SPSS: Analyze>Regression>Linear
Multiple Regression
-------------------
Often there is often more than one independent variable and
we would like to look at the relationship between each of the
independent variables (x_{1},…, x_{k}) and the dependent variable, y,
after taking into account the remaining independent variables.
y = α + β_{1}x_{1} + β_{2}x_{2} + e
Hypotheses:
H_{0}: β_{1} = β_{2} = . . . = β_{k} = 0
H_{1}: At Least One of the β_{j} ≠ 0
Test statistic: F test = MSSM/MSSE
SPSS: Analyze>Regression>Linear
Multiple Regression with Interactions
-------------------------------------
y = α + β_{1}x_{1} + β_{2}x_{2} + β_{3}x_{1}x_{2} + e
MR can also include interaction terms. Normally, the product
of the 2 terms is constructed in the data and treated as a
separate independent variable. Both categorical and continuous
data can be analyzed but in the case of the former the
appropriate coding of the 'dummy' variable must be used.