Data Mining Homework solutions
Answer:
Part 1: Classification with neural networks
1) The data encoding required for this task will be binary encoding scheme. Because the gender attribute has two values male and female while there is lot of attributes which can classify these out puts but we will prefer to look at the attribute which classify our dataset in proper form. So, we will choose the attribute sex form the given dataset heart-v1.arff The value of the attribute sex is given as male or female. By using the binary encoding scheme we can convert it into 0 or 1. We can represent the male as 1 while female as 0. But we will use the filter nominal to binary in neural network function multi-perceptron which will classify input into classes. Here we have chosen the sex so it will classify our data into two classes between the male and female. The input data is 121 while output will be two classes.
2) the script of training and testing scripts of the neural network classifier is as follows:
public void WekaTrain(String filepath)
try{
FileReader trainreader = new FileReader(filepath);
Instances train = new Instances(trainreader);
train.setClassIndex(train.numAttributes() – 1);
MultilayerPerceptron mlp = new MultilayerPerceptron();
mlp.setLearningRate(0.1);
mlp.setMomentum(0.2);
mlp.setTrainingTime(2000);
mlp.setHiddenLayers(“3?);
mlp.buildClassifier(train);
}
catch(Exception ex){
ex.printStackTrace();
}
}
For training:
Evaluation eval = new Evaluation(train);
eval.evaluateModel(mlp, train);
System.out.println(eval.errorRate());
System.out.println(eval.toSummaryString());
For validation:
eval.crossValidateModel(mlp, train, kfolds, new Random(1));
For evaluation :
Instances datapredict = new Instances(
new BufferedReader(
new FileReader(<Predictdatapath>)));
datapredict.setClassIndex(datapredict.numAttributes() – 1);
Instances predicteddata = new Instances(datapredict);
for (int i = 0; i < datapredict.numInstances(); i++) {
double clsLabel = mlp.classifyInstance(datapredict.instance(i));
predicteddata.instance(i).setClassValue(clsLabel);
}
BufferedWriter writer = new BufferedWriter(
new FileWriter(<Output File Path>));
writer.write(predicteddata.toString());
writer.newLine();
writer.flush();
writer.close();
3) The determine strategy of analysing the dataset will be as follows. In the given dataset we will use the neural network model to analyse the data. We will upload the data in the weka interface and select the function. In this function we will get the multi perceptron as neural network classifier after doing this step we will start our execution. As we know that in the classification our dataset will classify in the classes. Here we are choosing on the basis of the sex it means that how many males and females are belonging or mentioning in the heart-v1 arff files. In the result we got that correctly classified instance is 87.37% and incorrectly classified instances is 12.62%. for analysing this dataset we have taken the 66 percent of data for training purpose while 34 percent has been taken for the testing purpose. This is our strategy for analysing the dataset. The result of the analysis is as follows:
4) Here are the observations for the network with 10 hidden nodes.in this we have observed that overfitting is 0.051 where epochs was 300. It is best due to the less over fitting.
Run No. |
Architecture |
Parameters |
Train MSE |
Train Error (%) |
Epochs |
Test MSE |
Test Error (%) |
Overfitting |
1 |
23-10-5 |
lr=0.3 |
0.030 |
4.08 |
500 |
0.075 |
13.55 |
0.056 |
2 |
23-10-5 |
lr=0.3 |
0.020 |
2.04 |
500 |
0.055 |
10.70 |
0.065 |
3 |
23-10-5 |
lr=0.3 |
0.004 |
3.08 |
400 |
0.062 |
11.93 |
0.071 |
4 |
23-10-5 |
lr=0.3 |
0.006 |
2.03 |
300 |
0.066 |
14.32 |
0.051 |
5 |
23-10-5 |
lr=0.3 |
0.003 |
1.05 |
500 |
0.072 |
12.87 |
0.081 |