How long on average would it take (i) Bill, (ii) Ben to finish the job What is the standard deviation of the time it would take each of Bill and Ben to finish the job Interpret the results. Who would you choose to do the job.
The following dataset contains ages of 10 randomly selected students in a school. Complete (a) to (e) below.
9, 11, 12, 13, 13 , 15, 16, 21, 28, 42
(a) Compute the mean and standard deviation.
(b) Find the median, and compare the mean and median to determine the skewness of the dataset.
(c) Calculate the coefficient of variation.
(d) Locate the first quartile, third quartile
(e) Locate the 80th percentile.
(e) Find out the range and inter-quartile range for the dataset. Suppose the last observation (42) was incorrectly entered. If the correct age was 80, how would the correction affect the range and the inter-quartile range.
Answer:
- Average = sum of product of x and corresponding probability = X*P(X)
Average for Bill = average for Ben = 4.5. Both have same average time.
- Standard deviation = √ variance
Variance = ∑X2*P(X) – [average] 2
For Bill : variance = 21.3 -4.52 = 1.05
Standard deviation = √1.05 = 1.024
For Ben : variance = 22.9 -4.52 = 2.65
Standard deviation = √2.65 = 1.627.
The lower standard deviation indicates less spread of data and greater consistency of work. So Ben is preferred as he is more consistent.
- We would like a worker who is more consistent so that he works more uniformly. Also note that as the average is same for Bill and Ben, we can choose on the basis of standard deviation alone. We will like Bill as he displays lower standard deviation.
- Mean is the sum of all observations/ total no of observations . in this case it equals
9+11+12+13+13+15+16+21+28+42/10= 180/10 = 18
Standard deviation is square root of variance. The variance is found by √ ∑(X-mean)2/10 =
√ (914/10) = 9.56
- Median is the middle observation. We have 10 observations, so the middlemost is
( 11/2)th = 5.5 observation. The 5th one is 13 and 6th one is 15. So the 5.5th observation will be ( 13+15)/2 = 14
Median = 14
The mean is 18 so mean > median. This means the data is positively skewed.
- [FAQs]
Coefficient of varaition =standard deviation/ average = 9.56/18= 0.531
- 1stquartile is the 10/4 =2.5th
The 2nd observation is 11 and 3rd is 12 so 1st quartile = (11+12)/2 = 11.5
3rd quartile is the 10*(3/4) = 7.5th observation.
The 7nd observation is 16 and 8th is 21 so 3rd quartile = (16+21)/2 = 18.5
- 80thquantile is the 10*( .8) =8th = 21
- Range = maximum – minimum = 42-9= 32
Interquartile range = ( 3rd quanrtile – 1st quartile) = 18.5 -11.5 = 7
If the last observationis changed to 80 instead of 42 the new ramnge = 80-9 = 71
The inter quartile ramge remains unchanged as this isa positional measure. As 80 and 42 are both in 10th position the IQ range remains at 7.
The z distribution table gives us area to the left of the assigned value of z.
- P(0
We look at the two values of 0 and 2.21 in the table to get the shaded area.
- P (z < -3) = 0.0013
- P( z > -0.5) = 1- P( z < -.5) = 1 – 0.3085 = 0.6915
- P(Z <-2 or z >2) = P( z < -2 ) + P( z >2) –we add as these two are non overlapping /exclusive events
P ( z <-2 ) = 0.0228
P( z > 2 ) = 1 - P( z <2) = 1-.9972 = 0.0228
So P( z < -2 ) + P( z >2) =0.0228 +0.0288 = 0.0576
- P(z > 0.6 and < 2.5) = P( z <2.5) – P( z < 0.6) = 0.9938 – 0.7257 = 0.2681
- We need a value of z, call it z1 such that P( z < z1 ) = .2
We look at the area which has the entry of 0.2. The corresponding z value is -0.84, which shows area of 0.2005. so z1 = -0.84
- We need values of z, call them z1 and z2 such that P ( z1 < z< z2) = 0.95
Due to the symmetric nature of the z distribution P (z1 < z < 0) = .5*.95 = 0.475
Now we look at the values of areas to find 0.475. this corresponds to the z value that equals -1.96.
So ( -1.96 < z < 0) = .475
By symmetry we will have P( 0 < z < +1.96) = .475
This z1 = -1.96 and
z2 = +1.96
area between z1 and z2 equals 0.95.
We first need to set our hypothesis before we analyse the seriousness of both types of errors. As the rules want pollutants to be below a certain prescribed level, we need a 1 tail test. It will be right tailed test. Let pollutants levels be prescribed at level ‘p’ so that the null hypothesis will be
Ho: µ ≤ p
H1: µ > p
- A type I (Stat.berkley.edu, n.d.) error means that we DO NOT ACCEPT null hypothesis even though it is true. If we commit this error we will conclude that pollutants levels exceed p, and the shop license must be revoked. This drives the shop to close down, even though it was not exceeding the prescribed limits.
- A type II error (Ma.utexas.edu, n.d.)means that we ACCEPT null hypothesis even though it is not true. If we commit this error we will conclude that pollutants levels are within limits, and the shop will continue. In reality the shop is creating pollution but is allowed to continue, harming the environment.
- As we discussed above a Type I error will cause the shop to close down. This is more damaging to the shop owner than harming the environment with Type II error.
- For the environmentalists the harm done to the environment is of greater consequence. This makes them wary and careful about Type II error. A few shops shutting down is not their concern. For them a Type II error is more serious and concerns them more than Type I error being committed. They would like to reduce Type II error at the cost of Type I error.
We need to calculate the sample size as the width of a confidence interval depends on it. width of an interval for mean equals = 2*z*SE (or we can have t instead of z value) where SE is standard error and is given by σ/√n and n is sample size. A wider interval is less accurate, so that we must try and get a narrow interval for more precise interval estimates.
If we do not know σ, then we can’t get SE and interval estimates can’t be derived. The value of σ is ideally from the population itself. However in sampling such information is rare and we have to rely on the sample itself for getting the value of σ. Since sample data is used we have less degrees of freedom left with us. Since the value of σ will be based on the specific sample it is subject to variations based on the ample data chosen .
A common error is to ACCEPT the null hypothesis. This is an error because the correct statement is DO NOT REJECT the null hypothesis. The difference comes from recognizing that this is not English where not rejecting is equivalent to accepting.
In hypothesis test we use two hypothesis – null ( Ho) and the alternative (H1 or Ha) . We decide if the sample data supports the alternative one. This is because the test is designed to see if data supports H1 or not. The test is not meant to see if Ho is true. By assumption Ho is true till proven otherwise. If proven otherwise then we say that we accept H1 or that reject H1.
If we can’t prove otherwise then we say null is not rejected. In Statistics tests of hypothesis we are aiming to prove/disprove the alternative hypothesis. The aim is not to prove anything about the null hypothesis. The statement in the null hypothesis is true until proven otherwise. The hypothesis test is really about the veracity of alternative hypothesis. If something is true we don’t need to prove its true- which is why it does not make sense to ACCEPT Ho. We can reject the Ho (if data supports the alternative hypothesis statement) .or we DO NOT REJECT Ho ( if the alternative hypothesis statement is not supported by data) (Martz, 2013) (Stat.berkley.edu, n.d.)
Bibliography
[FAQs]
Anon., n.d. FAQ: What is coefficeint of varaition? [Online] Available at: HYPERLINK "https://stats.idre.ucla.edu/other/mult-pkg/faq/general/faq-what-is-the-coefficient-of-variation/" https://stats.idre.ucla.edu/other/mult-pkg/faq/general/faq-what-is-the-coefficient-of-variation/ [Accessed 1 June 2017].
Anon., n.d. Type 1 and 2 errors. [Online] Available at: HYPERLINK "https://www.stat.berkeley.edu/~hhuang/STAT141/Lecture-FDR.pdf" https://www.stat.berkeley.edu/~hhuang/STAT141/Lecture-FDR.pdf .
Ma.utexas.edu, n.d. Type I and II errors and significance levels. [Online] Available at: HYPERLINK "https://www.ma.utexas.edu/users/mks/statmistakes/errortypes.html" https://www.ma.utexas.edu/users/mks/statmistakes/errortypes.html [Accessed 9 June 2017].
Martz, E., 2013. Bewildering Things Statistician say. [Online] Available at: HYPERLINK "https://blog.minitab.com/blog/understanding-statistics/things-statisticians-say-failure-to-reject-the-null-hypothesis" https://blog.minitab.com/blog/understanding-statistics/things-statisticians-say-failure-to-reject-the-null-hypothesis [Accessed 8 June 2017].
Stat.berkley.edu, n.d. Type I and Type II errors. [Online] Available at: HYPERLINK "https://www.stat.berkeley.edu/~hhuang/STAT141/Lecture-FDR.pdf" https://www.stat.berkeley.edu/~hhuang/STAT141/Lecture-FDR.pdf [Accessed 8 June 2017].