Questions:
The data set above was also used to answer these additional research questions:
a.Was the average household size in 1960 equal to 3.67, as claimed by the website Statistica.com?
b.Was there a difference between average death rates in metropolitan areas with low and high nitrous oxide potential in 1960?
c.Was there a change in mean July (maximum) temperatures between the data collections in 1960 and 2000? Use an appropriate 95% confidence interval to answer this question.
Answers:
Introduction
The report aims at analyzing how mortality rate is related with components of air pollution, weather and socioeconomic variables. The mortality rate is predicted using six predictor variables. The predictor variables are average annual precipitation rate, Average maximum temperature n January, Average maximum temperature in July, Average size of the household, Percentage of White population in urbanized areas in 1960s and relative Sulphur di oxide pollution potential in 1960.
Methods
A random sample of 60 metropolitan areas in USA is collected. The sample contains information of relevant variables from the late 1950s to early 1960. In order to fit linear model for predicting morality rate as against each of the predictor variable six separate regression is run taking mortality as independent variable and other six as dependent variable. Each model is estimated separately.
Results
For the first model, morality is predicted taking average precipitation rate as dependent variable. The R square value is 26.0. This means average precipitation rate can explain 26 percent variation in mortality rate. The p value of the co efficient is 0.0000. Therefore, the variable precipitation is statistically significant. In the second model, mortality rate is predicted against maximum temperature in January. The R square value implies only 0.3% variation in mortality can be explained by concerned variable. The P value is 0.660 meaning the variable is not statistically significant. The third model is estimated using mortality rate and maximum temperature in July. In the model, the dependent variable explains 7.8% variation in mortality. The P value is 0.031. As the p value is less than 0.031, the variable is statistically significant. The forth predictor variable is average household size. In this model, household size explains 12.8% variation of the dependent variable as implied from the R square. The P value is 0.005. This shows statistical significance of average household size. The next model is estimated using White population as independent variable and mortality as dependent variable. In this model the independent variable explains 41.4% variation in the mortality rate as the estimate R square measure is as a percentage is 41.4%. The P value equals 0.0000. This shows statistical significance of the variable White population. The last predictor variable is the presence of sulphur di oxide. This estimated R square value is 18.1%, implying it explains 18.1% variation in mortality rate. The variable is statistically significant as shown from the P value. The P value is 0.001 that is less than 0.05. Therefore, the variable is statistically significant at 5% level of significance.
Conclusion
Among the six predictor variables all except maximum temperature in January turns out as statistically significant. The highest R square value is obtained for White population in the urbanized areas. Henceforth, the White population is a strong predictor of mortality rate. The weakest predictor is maximum temperature in the month of July.
Regression Analysis: Mort versus Precip
The regression equation is
Mort = 822 + 3.17 Precip
Predictor Coef SECoef T P
Constant 821.75 27.21 30.20 0.000
Precip 3.1743 0.7039 4.51 0.000
S = 53.9862 R-Sq = 26.0% R-Sq(adj) = 24.7%
Analysis of Variance
Source DF SS MS F P
Regression 1 59266 59266 20.33 0.000
Residual Error 58 169041 2915
Total 59 228308
Regression Analysis: Mort versus Jan
The regression equation is
Mort = 951 - 0.301 Jan
Predictor Coef SECoef T P
Constant 950.84 25.05 37.96 0.000
Jan -0.3011 0.6809 -0.44 0.660
S = 62.6348 R-Sq = 0.3% R-Sq(adj) = 0.0%
Analysis of Variance
Source DF SS MS F P
Regression 1 767 767 0.20 0.660
Residual Error 58 227541 3923
Total 59 228308
Regression Analysis: Mort versus July
The regression equation is
Mort = 669 + 3.63 July
Predictor Coef SECoef T P
Constant 669.3 123.0 5.44 0.000
July 3.634 1.646 2.21 0.031
S = 60.2590 R-Sq = 7.8% R-Sq(adj) = 6.2%
Analysis of Variance
Source DF SS MS F P
Regression 1 17701 17701 4.87 0.031
Residual Error 58 210607 3631
Total 59 228308
Regression Analysis: Mort versus Hhsize
The regression equation is
Mort = 404 + 164 Hhsize
Predictor Coef SECoef T P
Constant 404.1 184.2 2.19 0.032
Hhsize 164.34 56.40 2.91 0.005
S = 58.5984 R-Sq = 12.8% R-Sq(adj) = 11.3%
Analysis of Variance
Source DF SS MS F P
Regression 1 29149 29149 8.49 0.005
Residual Error 58 199159 3434
Total 59 228308
Regression Analysis: Mort versus White
The regression equation is
Mort = 1336 - 4.49 White
Predictor Coef SECoef T P
Constant 1336.01 62.06 21.53 0.000
White -4.4896 0.7007 -6.41 0.000
S = 48.0099 R-Sq = 41.4% R-Sq(adj) = 40.4%
Analysis of Variance
Source DF SS MS F P
Regression 1 94621 94621 41.05 0.000
Residual Error 58 133687 2305
Total 59 228308
Regression Analysis: Mort versus SO2
The regression equation is
Mort = 918 + 0.418 SO2
Predictor Coef SECoef T P
Constant 917.887 9.644 95.18 0.000
SO2 0.4179 0.1166 3.58 0.001
S = 56.7657 R-Sq = 18.1% R-Sq(adj) = 16.7%
Analysis of Variance
Source DF SS MS F P
Regression 1 41411 41411 12.85 0.001
Residual Error 58 186896 3222
Total 59 228308
Report 2
Introduction
The website Statistica.com claims that average household size in 1960 was equal to 3.67. The report evaluates this statement in light of specific statistical tests. However, except this there are two other important questions that are answered in this report. The questions are whether there is difference of average death rate between areas with high nitrous oxide potential and low nitrous oxide potential in 1960 and whether there was change in the average maximum temperature of July in the collected data between 1960 and 2000.
Methods
In order to answer the three questions the same sample as that used in report 1 is used. The first question corresponds to the test of mean for population of average household size. The population standard deviation is unknown. Therefore, one sample t test is used where the null hypothesis is average household size equals 3.67. The alternative hypothesis is average is different from 3.67. In order to test whether there is any significant difference in the average death rate between areas with high nitrous potential and that with low nitrous potential in 1960 two sample t test for equality of means are used. The null hypothesis here is there is no significant difference in the mean values of two groups and the alternative hypothesis is there is significant difference in average values. For the last question of finding significant difference in average of maximum temperature in July two samples t test is used. The null and alternative hypotheses are same as in the previous case.
Results
The one sample t test of average household size shows the t value as -23.00 and the corresponding probability value (P value) is 0.000. The null hypothesis that average household size in 1960 was 3.67 is rejected and the alternative hypothesis is accepted. The result of two sample t test for testing the difference the of average death rate of high and low nitrous oxide potential areas shows the estimated t value as -2.07 and the corresponding p value is 0.045. The p value is less than 0.05. Therefore, the null hypothesis is rejected and the alternative hypothesis is accepted. Another two sample t test is performed to analyze whether there is any significant difference in the mean values of maximum temperature in July between 1960 and 2000. The value of t statistics is 0.70 and the p value is 0.488. This implies acceptance of null hypothesis that there is no significant difference average value of maximum temperature in July between 1960 and 2000.
Conclusion
The analysis shows the average household size in 1960 is significantly different from zero as against the claim of Statistica.com. The average mortality rate is different between areas with high potential of nitrous oxide and that with low nitrous oxide potential. Finally, it can be concluded that the average temperature (maximum) in July is different between 1960 and 2000.
One-Sample T: Hhsize
Test of mu = 3.67 vs not = 3.67
Variable N Mean StDev SE Mean 95% CI T P
Hhsize60 3.2632 0.1353 0.0175 (3.2282, 3.2981) -23.30 0.000
Two-Sample T-Test and CI: Mort, High_NOx
Two-sample T for Mort
High_NOx N Mean StDev SE Mean
0 36 926.4 51.8 8.6
1 24 961.4 71.2 15
Difference = mu (0) - mu (1)
Estimate for difference: -35.0
95% CI for difference: (-69.2, -0.8)
T-Test of difference = 0 (vs not =): T-Value = -2.07 P-Value = 0.045 DF = 38
Two-Sample T-Test and CI: July_2000, July
Two-sample T for July_2000 vs July
July_2000 60 75.25 5.45 0.70
July 60 74.60 4.77 0.62
Difference = mu (July_2000) - mu (July)
Estimate for difference: 0.650
95% CI for difference: (-1.201, 2.501)
T-Test of difference = 0 (vs not =): T-Value = 0.70 P-Value = 0.488 DF = 115