Enn543 Data Analysis And Optimisation Assessment Answers

A set of data were collected on multiple samples of red and white wine. These data include both objective measurements on chemical and physical properties of the wines, and subjective measurements of quality based on expert judgements. The data are included in the files:
winequality-red.csv
winequality-white.csv
Using these data

(a) Fit GLMs to the quality as a function of the other variables for both types of wine. Assume that quality follows a Poisson distribution.

(b) Compare which variables are significant in each case. What are there differences? What are the similarities?

Answer:

Given that,

Random variable X follows normal distribution with mean µ = 5 and standard deviation σ = 10.

Hence, Prob(X > 10) = 1- Prob(X<=10) = 1- Prob(Z<=(10-5)/10)) =1- Prob(Z<=0.5) = 1- 0.6915 = 0.3085. (probabilities under Z values are obtained from standard normal table)

Prob(−20 < X < 15) = Prob(X<=15) – Prob(X<=-20) = P(Z<=(15-5)/10) – P(Z<=(-20-5)/10) = P(Z<=(15-5)/10) – P(Z<=(-20-5)/10) = 0.8413 – 0.0062 = 0.8351. (Probability values are obtained from standard normal table)

Now, P(X > x) = 0.95 => P(X<=x) = 1-0.95 = 0.05 (As normal distribution is symmetric about the mean and the total probability is 1).

Now, from the standard normal table for Z = -1.65 the area under the normal curve is 0.05.

Hence, (x-5)/10 = -1.65 => x = -16.5 + 5 = -11.5

Hence, foe the value of x=-11.5, the area in the right tail of normal curve is 0.95.

Given that,

Random variable N follows Poisson distribution with mean µ=10000.

Now, through Gaussian approximation and the central limit theorem it can be shown that any sample mean distribution with specified mean and variance can be approximated to normal distribution with the same mean and variance as same as the mean of Poisson distribution.

In this case the approximation will be N ~ normal(10000, sqrt(10000)) = normal(10000, 100)

Hence, by the approximation the value of P(N > 10,200) = 1 – P(N<=10200)

= 1- P(Z<=(10200-10000)/100) = 1 – 0.9772 = 0.0228.

Now, in MATLAB putting Poisson distribution to calculate the CDF of P(N > 10,200) or 1 – P(N<=10200) gives the result 0.0227.

MATLAB code:

P = 1 - cdf('Poisson',10200,10000);

disp(P)

ans =

0.0227

Hence, error in approximation of Poisson to normal is |0.0228 – 0.0227| = 0.0001 which is very less.

Question 6:

The MLE (maximum likelihood estimate) of the above function is the value of that maximizes the function L() = f(x1,x2,x3..|). Here, f = probability density function.

So, L( = (x1/(x2/(x3/….and so on.

Now, taking log on the above equation

Now, the maximizing.

At, max(,

= 0

Hence, for the given probability density function gives the maximum likelihood estimate.

It is stated that the sample of data x= x1,x2,….xn follows a Poisson distribution with mean λ and that λ follows exponential distribution with parameter θ.

So, P(X) =

P() = θ e^(-

Hence, posterior probability = (Probability of likelihood)* (Prior probability)

Now, the above distribution is a Gamma distribution with parameters

β = θ + n, α = (Proved)

Question 8:

The variables of the yacht.dat file are the following in order.

X1 Residuary resistance per unit weight of displacement, adimensional

V2 Longitudinal position of the center of buoyancy, adimensional

V3 Prismatic coe?cient, adimensional

V4 Length-displacement ratio, adimensional

V5 Beam-draught ratio, adimensional

V6 Length-beam ratio, adimensional

V7 Froude number, adimensional

Now, using fitlm command in MATLAB the dependent variable X1 is fitted with respect to independent variables V2 to V7.

MATLAB command:

% the yacht.dat is loaded by selecting it from folder

lrm = fitlm(yacht,'X7~V1+V2+V3+V4+V5+V6');

disp(lrm)

Linear regression model:

X7 ~ 1 + V1 + V2 + V3 + V4 + V5 + V6

Estimated Coefficients:

Estimate SE tStat pValue

________ _______ ________ __________

(Intercept) 154.51 32.359 4.775 2.8055e-06

V1 0.018076 0.44595 0.040534 0.96769

V2 -301.54 52.185 -5.7783 1.8779e-08

V3 -9.8484 18.656 -0.52791 0.59795

V4 7.0168 7.2464 0.96832 0.33366

V5 7.6548 18.712 0.40908 0.68277

V6 73.168 5.1483 14.212 1.8803e-35

Number of observations: 309, Error degrees of freedom: 302

Root Mean Squared Error: 11.8

R-squared: 0.402, Adjusted R-Squared 0.39

F-statistic vs. constant model: 33.9, p-value = 3.44e-31

Hence, the linear regression model is,

X1 = 154.51 + 0.018V1 -301.54V2 -9.848V3 + 7.017V4 +7.655V5 + 73.168V6.

Now, this linear regression model can be used as a function of the independent variables and then for some values of the independent variables the estimate of X1 can be evaluated using the ‘feval’ function in MATLAB. Now, the exactness of the regression equation can be verified by dividing the total dataset in two namely, the training dataset (80% data) and the validation

dataset (20% data). MATLAB command fitlm will be evaluated in the training dataset and the regression equation obtained will be used to evaluate using feval function with the validation set.

Question 9:

In this question a generalized linear regression model is fitted for both red wine ‘quality’ variable and white wine ‘quality’ variable assuming Poisson distribution.

Model fitting for red wine and white wine model:

MATLAB code with output:

% manually load winequalityred.csv from folder

% winequalitywhite.csv and winequalityred.csv are manually loaded from folder

model = 'quality~fixedacidity +volatileacidity + citricacid + residualsugar + chlorides + freesulfurdioxide + totalsulfurdioxide + density + pH + sulphates + alcohol';

lrm1 = fitglm(winequalityred,model,'Distribution','poisson');

disp(lrm1)

lrm2 = fitglm(winequalitywhite,model,'Distribution','poisson');

disp(lrm2)

Output:

lm1 =

Generalized linear regression model:

quality ~ [Linear formula with 12 terms in 11 predictors]

Distribution = Poisson

Estimated Coefficients:

Estimate SE tStat pValue

___________ __________ ________ _________

(Intercept) 3.6538 13.67 0.26728 0.78925

fixedacidity 0.0036583 0.016633 0.21994 0.82592

volatileacidity -0.1977 0.08039 -2.4593 0.013921

citricacid -0.035923 0.096141 -0.37365 0.70866

residualsugar 0.0026177 0.009736 0.26887 0.78803

chlorides -0.33176 0.27688 -1.1982 0.23084

freesulfurdioxide 0.00082523 0.0014126 0.58418 0.5591

totalsulfurdioxide -0.00061063 0.00047979 -1.2727 0.20312

density -2.1729 13.953 -0.15573 0.87624

pH -0.074826 0.12406 -0.60317 0.5464

sulphates 0.15912 0.072618 2.1912 0.028434

alcohol 0.04815 0.016999 2.8325 0.0046188

1599 observations, 1587 error degrees of freedom

Dispersion: 1

Chi^2-statistic vs. constant model: 66.1, p-value = 6.81e-10

lm2 =

Generalized linear regression model:

quality ~ [Linear formula with 12 terms in 11 predictors]

Distribution = Poisson

Estimated Coefficients:

Estimate SE tStat pValue

___________ __________ ________ __________

(Intercept) 28.094 11.144 2.5211 0.011698

fixedacidity 0.012809 0.011881 1.0781 0.281

volatileacidity -0.33456 0.064234 -5.2085 1.9041e-07

citricacid 0.0025292 0.053278 0.047471 0.96214

residualsugar 0.014557 0.0043653 3.3347 0.00085393

chlorides -0.062667 0.31275 -0.20037 0.84119

freesulfurdioxide 0.00062244 0.00046312 1.344 0.17894

totalsulfurdioxide -3.6945e-05 0.00021042 -0.17558 0.86063

density -27.359 11.298 -2.4215 0.015457

pH 0.1235 0.059026 2.0922 0.036417

sulphates 0.10875 0.054501 1.9953 0.046011

alcohol 0.03036 0.014207 2.137 0.032594

4898 observations, 4886 error degrees of freedom

Dispersion: 1

Chi^2-statistic vs. constant model: 185, p-value = 1.04e-33

As the overall p value of the white wine model is less than considered significance level of 0.05, so the model is appropriate. Now, the independent variables which are significant are volatileacidity, residualsugar, density, pH and alcohol as the p values of these variables is less than the considered significance level of 0.05.

Similarly, the red wine model is a proper fit as overall p value is 6.81e-10 which is less than considered level of significance of 0.05.

In this model the independent variables which are significant are volatileacidity, sulphates and alcohol as the p values of those are less than 0.05.

So, in white wine model there are more significant independent predictor variables than in red wine model. The similarity of these two models are

a) both models are significant
b) volatileacidity and alcohol are significant independent variables in both.

Buy Enn543 Data Analysis And Optimisation Assessment Answers Online

Talk to our expert to get the help with Enn543 Data Analysis And Optimisation Assessment Answers to complete your assessment on time and boost your grades now

The main aim/motive of the management assignment help services is to get connect with a greater number of students, and effectively help, and support them in getting completing their assignments the students also get find this a wonderful opportunity where they could effectively learn more about their topics, as the experts also have the best team members with them in which all the members effectively support each other to get complete their diploma assignments. They complete the assessments of the students in an appropriate manner and deliver them back to the students before the due date of the assignment so that the students could timely submit this, and can score higher marks.Â The experts of the assignment help services at urgenthomework.com are so much skilled, capable, talented, and experienced in their field of programming homework help writing assignments, so, for this, they can effectively write the best economics assignment help services.

Get Online Support for Enn543 Data Analysis And Optimisation Assessment Answers Assignment Help Online

); }

Not the Exact Question you were looking for ? Post your question for assignment help and get instant help on your homework and assignment questions from our experts