Wolfram Alpha:
Search by keyword:
Astronomy
Chemistry
Classical Physics
Climate Change
Cosmology
Finance and Accounting
Game Theory
General Relativity
Lagrangian and Hamiltonian Mechanics
Macroeconomics
Mathematics
Microeconomics
Particle Physics
Probability and Statistics
Programming and Computer Science
Quantum Computing
Quantum Field Theory
Quantum Mechanics
Semiconductor Reliability
Solid State Electronics
Special Relativity
Statistical Mechanics
String Theory
Superconductivity
Supersymmetry (SUSY) and Grand Unified Theory (GUT)
The Standard Model
Topology
Units, Constants and Useful Formulas
Sampling Distributions
----------------------
If we draw a sample size, n, from a given population and compute
a statistic (mean, standard deviation, proportion) for each sample.
The probability distribution of the statisitic is called a sampling
distibution.
Central Limit Theorem
---------------------
For large enough sample sizes^{*} the sample MEANS follow N(μ,σ/√n)
where n is the sample size. In other words, the mean of all
the sample means is equal to the population mean. This applies
regardless of the distribution of the parent population. As the
sample size is increased the standard deviation, skew and kurtosis
of the sampling distribution decreases.
The SD of the sampling distribution of the mean is called the
STANDARD ERROR OF THE MEAN.
SE = σ/√n = σ_{sample means}
We can generalize this to:
SE = (σ/√n)((N - n)/(N - 1))
Where N is the population size.
For infinite N this reduces to:
SE = σ/√n as before
For small N it reduces to:
SE = σ
* A sample size ≥ 30 is generally considered to be the minimum
for the CLT to apply.
Distribution of Means
---------------------
_
If x is the mean of a random sample of size n taken from
a normal population having mean μ and variance σ^{2}, then
we can state the following:
Large Sample Size (n > 30)
--------------------------
_
z = (x - μ)/σ/√n
Generally, the population standard deviation is not given.
Under these circumstances it is necessary to compute it from
the sample using:
_{ } _
s^{2} = Σ_{i}(x - x_{i})^{2}/(n - 1)
Why use n - 1? To answer that we need to look at biassed
versus unbiassed estimators.
Biassed versus Unbiassed Estimators
-----------------------------------
Consider a small distribution: 5 7 12
N = 3
n = 2
μ = 8
σ^{2} = Σ(x - μ)^{2}/N = 8.67
σ = 2.94
_ ^{ } _
s^{2} = Σ(x - x)^{2}/(n - 1) = Σ(x - x)^{2}/1
_
Sample x s^{2} s s^{2}/σ^{2}
------ --- ---- ---- ----
5 5 5.0 0.0 0.00 0.00
5 7 6.0 2.0 1.41 0.23
5 12 8.5 24.5 4.95 2.83
7 5 6.0 2.0 1.41 0.23
7 7 7.0 0.0 0.00 0.00
7 12 9.5 12.5 3.54 1.44
12 5 8.5 24.5 4.95 2.83
12 7 9.5 12.5 3.54 1.44
12 12 12.0 0.0 0.00 0.00
Average 8.0 8.67 2.20
Notes:
_
- x follows a t distribution (normal dostribution if the
sample size is large).
- s is a biassed estimator of σ
- (n - 1)s^{2}/σ^{2} follows the χ^{2} distribution.
- A denominator of n gives the biassed estimator for σ^{2}.
- A denominator n - 1 gives the unbiassed estimator for σ^{2}.
Small Sample Size (n ≤ 30)
--------------------------
If n ≤ 30 then we must use the t-statistic instead:
_{ } _
t_{v} = (x - μ)/s/√n
with v = n - 1 degrees of freedom.
The t-distribution has the following properties:
mean: μ = 0
variance: σ^{2} = v/(v - 2), where v is the degrees of freedom
and v > 2
The variance is always greater than 1, although it is close
to 1 when there are many degrees of freedom. With infinite
degrees of freedom, the t distribution is the same as the
standard normal distribution.
Distribution of Proportions
---------------------------
If p is the proportion (probability) of successes in the
population, then:
σ_{p} = √(p(1 - p)/n)
The sampling distribution of p is a discrete rather than a
continuous distribution. It is approximately normally
distributed if n is fairly large and p is not close to 0 or 1.
A general rule of thumb is that the approximation is good
when:
np and n(1 - p) are both ≥ 10
_
z = (p - μ)/σ/√n
As before, generally, p and σ are not given for the original
population. Under these circumstances it is necessary to
compute them from the sample. Thus,
p = p_{sample}
Distribution of Variances
-------------------------
If s^{2} is the variance of a random sample of size n taken from
a normal poulation having the variance σ^{2}, then
χ^{2} = (n - 1)s^{2}/σ^{2}
with v = n - 1 degrees of freedom.
The χ^{2} distribution has the following properties:
mean: μ = v
variance: σ^{2} = 2v
Example:
An optical firm purchases glass to be ground into lenses and
past experience has shown that the variance of the refractive
index = 1.26 x 10^{-4}. A shipment is received and a sample of
20 pieces is pulled. The measured variance of the sample is
2.10 x 10^{-4}. Should the sample be rejected?
H_{0}: σ^{2} = s^{2}
H_{1}: σ^{2} ≠ s^{2}
χ^{2} = (20 - 1)2.10 x 10^{-4}/1.26 x 10^{-4} = 31.66
From tables, χ^{2}_{0.05} for v = n - 1 = 19 is equal to 30.144.
Since 31.66 > 30.144 the result is significant at the 0.05
level and there is sufficient reason to reject H_{0}.
F Distribution
--------------
If s_{1}^{2} and s_{2}^{2} are the variances of independent random samples
of size n_{1} and n_{2} taken from two normal populations having the
same variance, then
F = s_{1}^{2}/s_{2}^{2}
with v_{1} = n_{1} - 1 and v_{2} = n_{2} - 1 degrees of freedom.
The F distribution has the following properties:
mean: μ = v_{2}/(v_{2} - 2) for v_{2} > 2.
variance: σ^{2} = [2v_{2}^{2}(v_{1} + v_{2} - 2)]/[v_{1}(v_{2} - 2)^{2}(v_{2} - 4)]
for v_{2} > 4.
Example:
Suppose you randomly select 7 marbles from company 1's
production line and 12 marbles from company 2's production
line and measure their diameters. Assume you are given:
s_{1} = 1.0 and s_{2} = 1.1
F = s_{1}^{2}/s_{2}^{2} = 1/1.21 = 0.83
H_{0}: σ_{1} = σ_{2}
H_{1}: σ_{1} ≠ σ_{2}
From tables, F_{0.05} for v_{1} = n_{1} - 1 = 6 and v_{2} = n_{2} - 1 = 11 is
equal to 3.0946. Since 0.83 < 3.0946 the result is not
significant at the 0.05 and there is insufficient reason to
reject H_{0}.