Data Analysis Coursework Assignment
Data Analysis
Coursework – Assignment
Weighting of assessment: 100% total marks
Word Limits: 6000 Words
Aim(s)
The aim of this module is to help students acquire skills for job roles of Data Scientist, Data Modellers and Data Analyst and to enable them to understand and implement various statistical and computational techniques for analysing datasets using various industry standard software and programming languages.
Learning Outcomes
After completing this module the student should be able to:
- Critically analyse and evaluate various statistical and computational techniques for analysing datasets and determine the most appropriate technique for a business problem;
- Critically evaluate, develop and implement solutions for processing datasets and solving complex problems in various environments using relevant programming paradigms; Appraise and apply key steps and issues involved in data preparation, cleaning, exploring, creating, optimizing and evaluating models;
- Contrast and apply aspects of data science applications and their use.
Supermarket Sales Data Challenge
Overview
This dataset contains supermarket transactions over period of two years from 4 categories: Type 1 to Type 4. There are number of branches for this supermarket around two main provinces of the country.
As a data scientist, your task will be to clean, normalise and transform these data into R compatible formats and undertake an extensive data mining using Machine Learning. The main objective of this data challenge is to develop Machine Learning model to get various transaction patterns, sales forecasting using the following four (4) data sets. These data sets contain two years of transaction details. Report on any interesting patterns, buying patterns, market-basket analysis that you may reveal from the data analysis and possible visualisations. In your discussion, you will provide a critical synopsis of the challenges of data analysis, integration and visualisation you faced during this exercise. You will provide relevant assumptions you made with valid justifications during this exercise.
Datasets
Four (4) data sets have been provided for Item, Sales Promotion and Supermarkets.
Item.csv
This dataset contains information about items for sale, which contains the following fields.
- Code
- Description
- Type
- Brand
- Size
Sales.csv
Two years of sales transactions, which contains the following fields.
- Code
- Amount
- Units
- Time of transactions
- Province
- CustomerID
- Supermarket No
- Basket
- Day
- Voucher
Promotion.csv
This dataset contains various sales promotions, on various items in different supermarkets, which contains the following fields.
- Code
- Supermarket No
- Week
- Feature
- Display
- Province
Supermarkets.csv
This dataset contains supermarket store location details, which contains the following fields.
- Supermarket No
- Post-code
Please note, NO any other information provided, on the data definitions or meaning of the fields. You may have to explore and identify the meaning and relationships with other datasets.
Assignment tasks and marking criteria
Task |
Description |
Marks |
Data description |
Provide detailed description of each datasets, their properties and relationships |
5% |
Collecting data |
Read data from csv files to R environment for processing |
5% |
Data cleaning, Exploring and preparing the data |
Clean any outliers, exceptional values from the datasets Normalizations, Scaling Merge the datasets Create training and test datasets |
35% |
Apply Machine Learning and Model building |
Training a model on the data Apply different Machine Learning approaches and discuss |
20% |
Evaluating model performance |
Accuracy of the each different models |
10% |
Improving model performance |
Alternative ways of normalizations, model building, and their performances |
10% |
Comparative analysis |
Patterns identified and their visualizations Describe a detailed comparative analysis between the scaling, Machine Learning approaches – strengths, limitations, uniqueness Comparative analysis should be in relation to
|
10% |
Discussion |
Provide a brief discussion about the knowledge gained |
5% |
What to submit
Detailed report consisting of each of the above tasks, relevant R statements with relevant comments. Before showing any R statement, explain in detail. Visualization models where necessary for storytelling. Attach a CD, which contains ALL your workings, datasets, and merged datasets, if any
Referencing Requirements
All referencing should utilize the Harvard Style.
REPORT STRUCTURE
Paper Size | A4 |
Word Count | 6000 words |
Printing Margins | LHS; RHS: 1 Inch |
Binding Margin | ½ Inch |
Header and Footer | 1 Inch |
Printing | Single Sided |
Basic Font Size | 12 |
Font Style | Arial/Times New Roman |
Presentation | Bound Document |
Stats Sample Assignments
- SAS Homework Help
- Statistics For Life and Social Science
- Birth and Death Rates in Australia
- ECON 940 – Statistics for Decision Making
- Probability: Statistical Decision Making
- BB108 Business Statistics
- BUS708 Statistics and Data Analysis Statistical Modelling
- STA101:Statistics for Business
- STAT6000:Statistics for Public Health
- Data Analysis Coursework Assignment
Testimonials
Statistics was a very tough subject for me before. But after taking help and guidance from the expert tutors in urgenthomework.com I love Statistics like anything. The teaching technologies and methodologies used by these tutors were so interesting that I got highly interested in Statistics. Now I can solve any Statistics problem in no time. Urgenthomework.com simply rocks!!