Task 1: Data Processing Write a program that shows the following main menu: Welcome to Data Pre-Processing Program 1. Generate data file and name file 2. Normalized data file 3. Exit program When a user selects 1 and then the program takes a text file (rawdata.txt) as an input and produces two output files namely data.txt and name.txt. The format of the rawdata.txt is shown below: A1 A2 A3 A4 A5 Class n n c c c c 100 10 A31 10 A51 C1 50 15 A32 10 A51 C2 60 30 ? 10 A51 C2 200 8 A31 10 A51 C1 10 20 A32 10 A51 C1 40 5 A33 10 A51 C2 The first line of the rawdata.txt indicates the name of the columns (i.e. attributes) and the second line indicates the types of the attributes. Here n indicates the data type of the attribute is numerical, c indicates the data type of the attribute is categorical, and ? indicates the missing value. The next six rows indicate the records of the rawdata.txt file. For the above input file the content of the data.txt is shown below: 100 10 A31 C1 50 15 A32 C2 200 8 A31 C1 10 20 A32 C1 40 5 A33 C2 Note that, any record that has missing values will be removed. In this example, in the 3rd record the value of A3 is missing (?), therefore the record is removed and the remaining records are saved in the data.txt file. Moreover, if an attribute has only one domain value that attribute also need to be removed from the data.txt file. The columns for A4 and A5 are removed from the data.txt as both of them have a same domain value (i.e. for A4=10 and A5=A51) for all records. For the above input file the content of the name.txt is shown below: n,A1,10.0,200.0 n,A2,5.0,20.0 c,A3,3,A31, A32, A33 c,Class,2,C1,C2 The first value of the first line of the above name.txt file is n (i.e. numerical attribute ), the next value of the first line is the name of the attribute, the third and fourth values of the line are the minimum and the maximum value of the attribute. The format of the second line of the name.txt is same as the format of the first line as both of them are numerical attributes. However, for the third line starts with c (i.e. categorical attribute), the second value of the third line is name of the attribute, the third value of the third line is the number of domain values of the categorical attributes, and after that the domain values (i.e. the unique values of the categorical attribute) are entered with coma separated. The same format is followed for the last line (i.e. class attribute). If an attribute has only one domain value that attribute needs to be removed from the name.txt file. The attributes A4 and A5 are removed from the name.txt as both of them have a same domain value (i.e. for A4=10 and A5=A51) for all records. When a user selects 2 then the program takes input the data file (data.txt) and the name.txt file and then perform normalization operation on data.txt file and produces the normalizeddata.txt file. For the above data.txt input file the content of the normalizeddata.txt file is shown belowFor example, the maximum value of attribute A1 is 200 and the minimum value of A1 is 10. The first value of A1 is 100 and the normalized value would be fraction numerator 100 minus 10 over denominator 200 minus 10 end fraction= 90 over 190 equals 0.474 If the maximum value of a numerical attribute is less than or equal to 1 then do not perform normalization operation on that attribute. Moreover, if the maximum and minimum value of a numerical attribute is the same in the name.txt file then do not perform normalization operation on that attribute and remove that attribute from the normalizedata.txt file. Also, create a newname.txt file after removing the attribute that has the same domain value for all records. Note that, your program will be tested with different input file with the same format mentioned above but the number of columns (i.e. attributes) and records (i.e. rows) will be different than the sample input shown above. Marks distribution: Functionality: Generate data file and name file Functionality: Normalization Presentation (2 marks): Report with enough screen shots is submitted. The discussion on each screen shot is easy to read and understand. Task 2: Fill up the Circles Write a GUI program that draws two circles with radius 10 pixels, centered at random locations, with a line connecting the two circles. If the distance between two circles is more than or equal to 100 than the line color and the distance value displayed will be in red color as shown below. However, if the distance between two circles is less than 100 than the line color and the distance value displayed will be in green color as shown below. Marks distribution: FunctionalityPresentation: Report with enough screen shots is submitted. The discussion on each screen shot is easy to read and understand. Task 3: Generate ARFF File for Weka ) Write a Java Program that takes rawarffdata.txt as an input file and produce SampleData.ARFF as an output file. SampleData A1 A2 A3 A4 A5 Class n n c c c c 100 10 A31 10 A51 C1 50 15 A32 10 A51 C2 60 30 ? 10 A51 C2 200 8 A31 10 A51 C1 10 20 A32 10 A51 C1 40 5 A33 10 A51 C2 The first line of the rawarffdata.txt indicates the name of the dataset, the second line indicates the name of the columns (i.e. attributes) and the third line indicates the types of the attributes. Here n indicates the data type of the attribute is numerical, c indicates the data type of the attribute is categorical, and ? indicates the missing value. The next six rows indicate the records of the rawdata.txt file. The format of the output Sample.ARFF file is shown below: A3 {'A32', 'A33', 'A31', '?'}data 100,10,A31,10,A51,C1, 50,15,A32,10,A51,C2, 60,30,?,10,A51,C2, 200,8,A31,10,A51,C1, 10,20,A32,10,A51,C1, 40,5,A33,10,A51,C2, Note that, your program will be tested with different input file with the same format mentioned above but the number of columns (i.e. attributes) and records (i.e. rows) will be different than the sample input shown above. Marks distribution: Functionality (4 marks) Presentation (1 mark): Report with enough screen shots is submitted. The discussion on each screen shot is easy to read and understand. Requirements: For each Task, the following items are to be submitted in the Turnitin Report: The report (in .doc or .pdf format) should explain how to run your program and any settings needed to run your program. Enough number of screenshots should be shown in the document. In case if the assignment marker fails to run your program, because of any inconvenience, these screenshots would show how the program worked on your machine/environment. Source code and other files: All source codes, exe file and other relevant files must have to be zipped in a folder such that unzipping would keep the file/folder structure unaffected.
Talk to our expert to get the help with Ict313 Programming|Categorical Attribute Assessment Answers to complete your assessment on time and boost your grades now