Data Mining Homework COSC2110/COSC2111 Data Mining
RMIT University
School of Computer Science and Information Technology
COSC2110/COSC2111 Data Mining
Assignment 2
This assignment counts for 25% of the total marks in this course.
Submit through canvas
You can work on this assignment individually or in a group of 2. If you are working in a group please establish a group in Assignment 2 Group on Canvas
In this assignment you are asked to explore the use of neural networks for classification and numeric prediction. You are also asked to carry out a data mining investigation on a real-world data file. You are required to write a report on your findings. You will be assessed on methodology, analysis of results and conclusions.
PART 1: CLASSIFICATION WITH NEURAL NETWORKS 15 marks
This part involves the following file: heart-v1.arff in the directory:
/KDrive/SEH/SCSIT/Students/Courses/COSC2111/DataMining/data/arff/UCI/ For the neural network training runs build a table with the following headings:
Run |
Archi- |
Param |
Train |
Train |
Epochs |
Test |
Test |
No |
tecture- |
eters |
MSE |
Error |
MSE |
Error | |
1 |
23-10-5 |
lr=.2 |
0.5 |
30% |
500 |
0.6 |
40% |
1. Describe the data encoding that is required for this task. How many outputs andhow many inputs will there be?
2. Develop a script to generate the necessary training, validation and test files. Youmight want to normalize the numeric attributes with Weka beforehand. Include your data preparation script as an appendix (not part of the page count).
3. Determine the “analyze” strategy that you will use.
4. Using Javanns carry out 5 train and rest runs for a network with 10 hidden nodes. Comment on the variation in the training runs and the degree of overfitting.
5. Experiment with different numbers of hidden nodes. What seems to be the rightnumber of hidden nodes for this problem?
6. For 10 hidden nodes, explore different values of the learning rate. What do youconclude?
7. [Optional] Change the learning function to backprop-momentum. Explore different combinations of learning rate and momentum. What do you conclude?
8. Perform a run with 10 hidden nodes and no validation data. Stop training whenthe MSE is no longer changing. Get the classification error on the training and test data. Comment on the degree of overfitting.
9. Compare the classification accuracy of the neural classifiers with the classification accuracy of Weka J48 and MultilayerPerceptron.
Report Length Up to two pages.