MATH 1324 Finite Mathematics: Predicting Body Fat Percentage
Predicting Body Fat Percentage
Overview
The Body(2).csv dataset is posted on the Assignment 3 website. It contains the percentage of body fat, age, weight, height, and ten body circumference measurements (e.g., abdomen) for a sample of 252 men and women. Body fat, as measured using the Brozek method, is estimated through a time consuming underwater weighing technique to measure a person's density. Investigators want to determine if a general, easy to determine, body circumference measurement could be used as a general indicator for body fat percentage. If so, the investigators want to establish a formula that can convert a body circumference measurement to a predicted body fat percentage and also understand how well this prediction will hold. (original source: JSE-DA). The variables are as follows:
Variables
- Case: Case Number
- BFP_Brozek: Percent body fat using Brozek's equation, 457/Density - 414.2
- BFP_Siri: Percent body fat using Siri's equation, 495/Density - 450
- Density: Density (gm/cm3)
- Age: Age (yrs)
- Weight: Weight (lbs)
- Height: Height (inches)
- Adiposity_index: Adiposity index = Weight/Height2(kg/m2)
- Fat_free: Fat Free Weight = (1 - fraction of body fat) * Weight, using Brozek's formula (lbs)
- Neck: Neck circumference (cm)
- Chest: Chest circumference (cm)
- Abdomen: Abdomen circumference (cm) "at the umbilicus and level with the iliac crest"
- Hip: Hip circumference (cm)
- Thigh: Thigh circumference (cm)
- Knee: Knee circumference (cm)
- Ankle: Ankle circumference (cm)
- Biceps: Extended biceps circumference (cm)
- Forearm: Forearm circumference (cm)
- Wrist: Wrist circumference (cm) "distal to the styloid processes"
- Sex: “1” for male and “0” for female
The Assignment
- Test whether the mean body fat percentage for males and females are the same. Write down your null and alternative hypothesis clearly, express your conclusion in words and provide your reason(s) for your final conclusion.
- Estimate the 99% confidence interval for the mean body fat percentage in the population. Is there any assumption for the body fat percentage distribution that we need to investigate? Explain your reason (s).
- Researchers believe that average body fat percentage is less than 12.5. Test this claim. Write down your null and alternative hypothesis clearly, express your conclusion in words and provide reason(s) for your final conclusion.
- Find the single best predictor of body fat percentage (Brozek method) using the body circumference data. Write a report that explains your method for identifying the single best predictor. Use the best predictor to determine a model that can convert a person’s body circumference measurement to an estimated body fat percentage. Ensure you test the model parameters and any assumptions. Critique the predictive ability of the model and draw an overall conclusion to help the investigators
Answer:
Overview
The assignment is an essay on the analysis and consequent findings of the “Body (2).csv”, dataset. The file constitutes of data on body measurements of 252 men and women. The key variable of interest is body fat, measured by means of the Brozek method. The calculation for the task was done using RStudio.
Task 1
To determine whether the mean body fat percentage for males is same as that of females or whether they are different. To test the conjecture that the body fat, as measured using Brozek method, is different for males than females or not, the contesting statistical hypotheses can be written as follows:
H0: The average body fat of men equals the average body fat of females (Null Hypothesis)
H1: The average body fat of men does not equal the average body fat of females (Alternative hypothesis)
Then taking the average of the variable “BFP_Brozek” as the variable of interest, the mean body fat for men and women are computed separately. Then applying t-test for independent samples as the body fat for men is not dependent on body fat of women, with alternative being selected as two-sided, the p-value was found to be 0.46. The p-value was thus found to be greater than the assumed level of significance, that is, 0.05. Thus the test failed to reject the null hypothesis at 5% level of significance. Thus it is inferred that there is no difference between the Brozek body fat for men and women.
Task 2
The 99% confidence interval estimate of the average body fat percentage is to be computed for the population. The variable of interest is thus “BFP_Brozek”. The primary assumption, in the computation of the confidence interval is that the variable of interest follows as normal probability distribution. To test the validity of the normality condition, the Shapiro Wilk test of normality was employed and the p-value was found to be greater than 0.05 and hence the test failed to reject the null hypothesis of the test, that is, the variable follows normal distribution. Hence the variable, body fat percentage, satisfies the condition of normality.
Subsequently, the 99% confidence interval was computed. The confidence interval foe body fat percentage computed using the form: (mean + error margin, mean – error margin) , where error margin is the product of standard error of body fat and the 99th quantile of the t-distribution with 251 degrees of freedom. The 99% confidence interval was hence found to be (17.79534, 20.08165).
Task 3
It is to be verified whether the average body fat percentage of the men and women taken together is lesser than 12.5 or not. The statistic of interest is then the average body fat percentage computed using Brozek’s equation which is represented by the variable “BFP_Brozek”. Then in order to test for validity of the conjecture that the body fat percentage, as measured using Brozek method, less than 12.5, the contesting statistical hypotheses can be written as follows:
H0: The average body fat percentage is equal to 12.5 (Null Hypothesis)
H1: The average body fat percentage is less than 12.5 (Alternative hypothesis)
Then applying one sample t-test for body fat percentage the p-value was found to be greater than the assumed level of significance 0.05. Thus the test failed to reject the null hypothesis at 5% level of significance. Thus it is inferred that the percentage body fat is not less than 12.5.
Task 4
The best prediction model to model the body fat percentage based on the body circumference parameters is required. A subset of the dataset was created, containing the response variable “BFP_Brozek” and body circumference parameters or tentative predictor variables, “Neck”, “Chest”, “Abdomen”, “Hip”, “Thigh”, “Knee”, “Ankle”, “Biceps”, “Forearm”, “Wrist”. Since, body fat was determined to be following normal distribution, a linear model was fit to the data. Not all the predictors were found to be significant. Hence by comparing AIC values the best model was determined. Thus using stepwise regression method, with both forward and backward selection, the final model with least AIC. The predictors that were finalised were Neck, Abdomen, Hip, Forearm, and Wrist or rather neck circumference and abdomen circumference, hip circumference and forearm circumference. The model fitted using these predictors was found to be significant at 0.05 level of significance since p-value was less than 0.05. None of the coefficients for the predictors was found to be insignificant. The model is specified as follows:
Body fat = 3.4768 – 0.55623 Neck circumference + 0.90155 circumference – 0.307 Hip circumference + 0.38761 Forearm circumference – 1.496 Wrist circumference
The adjusted R square for the model was found to be 0.7258 which means that the predictors in the model explain explains about 72% of the variation in the response variable which is body fat as computed using Brozek’s equation.