BUS5SBF Statistics for Business: Analysing Household Data
Questions:
Analysing Household Data
Have you been part of a national census? Privacy issues aside, a census provides lots of data that can inform a government policies and actions - but to be useful, the data needs to be analysed and interpreted.
In this assignment, we will use statistical methods to analyse and interpret real world demographic data.
The goal of this assignment is to
- Test your understanding of statistical methods and approaches
- Improve your ability to use Excel for manipulation of data(see here for some guides on using Excel)
- Understand the real-world applicationsand implications of statistics
To complete this assignment you must
- Complete a set of statistical analysis tasks on a unique data set (both tasks and data set will be provided to you)
- Submit in a report in word detailing your response to each task (the final answer and reasoning / calculations that led to it)
- Submit an excel document that contains your data set and the calculations you used to complete the tasks
Tasks for Analysis of Data Set
Complete the following tasks based on the unique data set you generated.
Task 1
- Draw a random sample of two hundred (200) households as per the sample selection procedure. What sampling method have you used to select your sample data? In your opinion, is this the best method of sampling, why or why not?
- Compute the descriptive statistics and draw a Box-Whisker plot of Expenditures on the following variables (all series in one graph!);
(i) Alcohol
(ii) Meals
(iii) Fuel
(iv) Phone
- Also, use an appropriate measure of variation tocompare the variability in expenditures on these four variables. Explain, why is this an appropriate measure.
- Present a summary of your findings about the shape and spread of the distribution of these variables using information from the boxplots and the descriptive statistics.
Task 2
- Construct a frequency distribution of the expenditures on Utilities, using the following classification ;
- What is the percentage of households that spend on Utilities
- at the most $ 1200 per annum
- between $1200 and $2400 per annum, and
- more than $2400 per annum.
- Draw the histogram of the expenditures on Utilitiesby households in your sample. Do you think the utility expenditures are normally distributed? Provide the “statistical reason” for your answer?
Task 3
- What is the top 10% value and the bottom 10 % value of household’s annual after tax income (AtaxInc)? What does these two values imply?
- What does the mean (average) of variable OwnHouseimply?
- What is the probability that a randomly selected household will have a family size (FS= Adults + Children) equal to 5?
- Draw a scatter plot of natural log of total expenditures against natural log of after tax income, that is, ln(texp) against ln(ataxinc) and compute the coefficient of correlation. Express your finding about the relationship between the two variables.
Task 4
- Construct a contingency table between the gender and the level of education. Using information in this table, can we say that male and female heads of the households differ in their higher level of qualification?
- What is the probability that the head of household is a female and her higher level of education is Intermediate?
- What is the probability that the head of household is a male and has the Bachelor degree?
- What is the proportion of having the Secondary as the highest degree from among females?
- Do you think that the events "gender of household head is male" and "having the Master Degree" are independent?
Answers:
Task 1
- The given dataset contains information about 2000 households. A random sample of 200 households have been selected from the dataset using the technique of simple random sampling. Sampling has been done randomly as this is the most appropriate sampling technique that can be applied in this situation. There are certain advantages of this sampling technique for which this technique has been selected to be used for sampling. The advantages are given as follows:
- In this technique, the sample units are selected randomly.
- Each unit that is selected randomly for the purpose of the study has an equal probability of being selected in the sample.
- This type of sampling technique can be used with the availability of very little information about the population.
- This type of sampling technique is free from classification errors.
- The sample selected using this technique represents the whole population.
- The sample that has been selected is free from any kind of sampling biases.
- The analysis of data becomes much easier after sampling using this technique.
- The sampling technique can be used very easily as the technique is extremely simple.
- The errors in sampling becomes much more easy to assess by the use of this method.
- Descriptive statistics for the annual expenditure of alcohol, meal, fuel and phone along with the comparison of these measures using a box and whisker plot are given in the following tables 1, 2, 3 and 4 and also in figure 1 respectively.
Table 1: Descriptive summary statistics for annual family expenditure of Alcohol | |
Mean |
1242.895 |
Standard Error |
148.0106 |
Median |
782 |
Mode |
0 |
Standard Deviation |
2093.185 |
Sample Variance |
4381425 |
Kurtosis |
78.86298 |
Skewness |
7.336376 |
Range |
24680 |
Minimum |
0 |
Maximum |
24680 |
Sum |
248579 |
Count |
200 |
Table 2: Descriptive summary statistics for annual family expenditure of Meals | |
Mean |
1489.61 |
Standard Error |
111.569 |
Median |
1200 |
Mode |
1200 |
Standard Deviation |
1577.824 |
Sample Variance |
2489530 |
Kurtosis |
6.851153 |
Skewness |
2.28696 |
Range |
9600 |
Minimum |
0 |
Maximum |
9600 |
Sum |
297922 |
Count |
200 |
Table 3: Descriptive summary statistics for annual family expenditure of Fuel | |
Mean |
1557.85 |
Standard Error |
117.7495 |
Median |
1110 |
Mode |
0 |
Standard Deviation |
1665.23 |
Sample Variance |
2772991 |
Kurtosis |
10.89265 |
Skewness |
2.563535 |
Range |
12000 |
Minimum |
0 |
Maximum |
12000 |
Sum |
311570 |
Count |
200 |
Table 4: Descriptive summary statistics for annual family expenditure of Phone | |
Mean |
1460.89 |
Standard Error |
124.1273 |
Median |
1020 |
Mode |
1200 |
Standard Deviation |
1755.425 |
Sample Variance |
3081517 |
Kurtosis |
42.93626 |
Skewness |
5.391694 |
Range |
18000 |
Minimum |
0 |
Maximum |
18000 |
Sum |
292178 |
Count |
200 |
Figure 1: Box and whisker plot for annual family expenditure of Alcohol, meals, fuel and phone
Standard deviation is one of the best measures of variation. Standard deviation measures the deviation of the values of a variable from the mean of the variable. By comparing the standard deviations from tables 1, 2, 3 and 4, it can be seen that the highest standard deviation of the annual expenditure is on alcohol consumption. The standard deviation is found to be 2093.185. The standard deviation on the annual expenditure of meals is found to be 1577.824. The annual expenditure on fuels has a standard deviation of 1655.23 and the annual expenditure on phone is 1755.425. Thus, it can be said that expenditure on alcohol varies the most. Thus, the expenditure on alcohol by the families are not near the mean expenditure but are way from the mean expenditure. This can be less than or more than the mean value for the families. Among the expenses on these four necessities, meals show the least variability. Thus. The expenses are closer to the mean expense on meals.
From the box and whisker plot in figure 1, it can be seen that alcohol has the highest standard deviations. The least expense on alcohol by the families is zero. Thus, there are families who do not spend on alcohol. Again, there are values which are higher than the mean expense. Thus, there are families who spend quite high on alcohol. The expenses on meals are the least variable as this is a necessity in the life of people. Thus, it can be expected that higher income groups will spend more on meals and the lower income group will spend less but there is a minimum limit below which a person cannot spend on meals. Again, a person cannot have unlimited food. Thus, there is an upper bound too above which the expenses cannot rise. Thus, this explains the reason for minimum variability. The mean of all the expenses are higher than the median which is again higher than the modal expenditures. Thus, the expenses of most families are higher than the mean expenses.
Task 2
- The annual expenditure on utilities has been summarized using the following frequency table:
Table 5: Frequency Table for Annual expenditure on Utilities
Expenditure (in $) |
Frequency |
Cumulative % |
0-400 |
29 |
14.50% |
400-800 |
61 |
45.00% |
800-1200 |
57 |
73.50% |
1200-1600 |
30 |
88.50% |
1600-2000 |
11 |
94.00% |
2000-2400 |
6 |
97.00% |
2400-2800 |
2 |
98.00% |
2800-3200 |
2 |
99.00% |
More than 3200 |
2 |
100.00% |
- The following percentages have from the frequency table given in table 5:
- The percentage of households spending a maximum of $1200 per annum is 73.50%
- The percentage of households having expenditure on utilities between $1200 and $2400 per annum is (97 – 73.50) % = 23.50%.
- The percentage of households spending more than $2400 per annum is (100 – 97) % = 3%.
- The distribution of the expenses of household on utilities is shown with the help of a histogram in figure 2.
Figure 2: Annual expenditure on Utilities by the households
- From the histogram, it can be said very clearly that the expenses on utilities are not normally distributed. The expenses are skewed to the left. This indicates that most people have an annual expenditure on utilities which are less than the mean expenditure on utilities.
Task 3
Whether a household owns a house or rents a house, is indicated by the variable OwnHouse. The value 1 indicates that the household owns a house and 0 indicates that the household does not own a house. The mean has been found to be 0.665 which is a higher value than 0.5. Thus, it indicates that more than 50% of the households have their own house.
In the sample of 200 households, the number of households having a family size of 5 is 17. Thus, the probability that a randomly selected household will have a family size equal to 5 is given by (17/200) = 0.085.
Figure 3 shows the relation between annual after tax income and annual expenditure with the help of a scatterplot given in figure 3.
Figure 3: Scatterplot of Annual after tax income and total expenditure
Table 6: Correlation Table
|
ln(ATaxInc) |
ln(Texp) |
ln(ATaxInc) |
1 |
|
ln(Texp) |
0.69 |
1 |
From figure 3 it is very clear that the natural log of the annual income and total expenditure depends on each other. The increase or decrease in total income indicates the rise or fall in the total expenditure respectively. The correlation between the two variables is 0.69 which indicates that there is a moderate positive relationship between the variables.
Task 4
The highest level of education of males and females is shown with the help of a frequency table given in table 7.Table 7: Frequency table of Gender and Highest degree of Education
Count of GHH |
Column Labels |
|
|
Level of Education |
F |
M |
Grand Total |
B |
12 |
21 |
33 |
I |
26 |
19 |
45 |
M |
26 |
15 |
41 |
P |
23 |
16 |
39 |
S |
23 |
19 |
42 |
Grand Total |
110 |
90 |
200 |
Table 7 shows that the number of males undergoing higher level of education is 110 and the number of women undergoing higher level of education is 90. These two variables have very close values. Thus, it can be said that there is not much significant difference in the level of qualification in males and females.
- Table 8 shows the percentages of gender and Education level
Table 8: Percentages of Gender and Education level
Count of GHH |
Column Labels |
|
|
Row Labels |
F |
M |
Grand Total |
B |
6.00% |
10.50% |
16.50% |
I |
13.00% |
9.50% |
22.50% |
M |
13.00% |
7.50% |
20.50% |
P |
11.50% |
8.00% |
19.50% |
S |
11.50% |
9.50% |
21.00% |
Grand Total |
55.00% |
45.00% |
100.00% |
The probability that the head of the household is a female and her higher level of education is intermediate is (26/200) = 0.13.
- The probability that the head of the family is a male and has bachelor’s degree is (21/200) = 0.105.
- The proportion of having secondary as the highest degree from among females is ((23+23)/200) = 0.23.
- Two events are said to be independent if the product of the probabilities of the two events separately are equal to the probability of the occurrence of the two events together. Thus,
The probability of gender of a household head being male is (90/200) = 0.45.
The probability of having a master’s degree is (41/200) = 0.205.
The probability of a household head being male and having a master’s degree is (15/200) = 0.075.
Now, (0.45*0.205) = 0.092 which is not equal to 0.075.
Thus, the two events are dependent.
Buy BUS5SBF Statistics for Business: Analysing Household Data Answers Online
Talk to our expert to get the help with BUS5SBF Statistics for Business: Analysing Household Data Answers to complete your assessment on time and boost your grades now
The main aim/motive of the management assignment help services is to get connect with a greater number of students, and effectively help, and support them in getting completing their assignments the students also get find this a wonderful opportunity where they could effectively learn more about their topics, as the experts also have the best team members with them in which all the members effectively support each other to get complete their diploma assignments. They complete the assessments of the students in an appropriate manner and deliver them back to the students before the due date of the assignment so that the students could timely submit this, and can score higher marks. The experts of the assignment help services at urgenthomework.com are so much skilled, capable, talented, and experienced in their field of programming homework help writing assignments, so, for this, they can effectively write the best economics assignment help services.
Get Online Support for BUS5SBF Statistics for Business: Analysing Household Data Assignment Help Online
Resources
- 24 x 7 Availability.
- Trained and Certified Experts.
- Deadline Guaranteed.
- Plagiarism Free.
- Privacy Guaranteed.
- Free download.
- Online help for all project.
- Homework Help Services
Resources
- 24 x 7 Availability.
- Trained and Certified Experts.
- Deadline Guaranteed.
- Plagiarism Free.
- Privacy Guaranteed.
- Free download.
- Online help for all project.
- Homework Help Services