CIS8008 Business Intelligence - Free Samples to Students
Conduct an exploratory data analysis (EDA) of diabetes.csv data set using RapidMiner summarise key findings of EDA in a table and discuss key findings of EDA in relation to the diabetes.csv data set.
Build a Decision Tree model for predicting diabetes using diabetes.csv data set and RapidMiner; provide Final Decision Tree model process, Decision Tree Diagram and Decision Tree Rules and explain final decision tree model process and discuss Final Decision Tree Diagram and Rules.
Using a Table discuss and compare the performance metrics of Final Decision Tree Model and Final Logistic Regression Model for predicting diabetes using the diabetes.csv data set and RapidMiner based on required model performance metrics (Accuracy, Miscalculation Rate, True Positive Rate, False Positive Rate, Area under Roc Chart (AUC), Precision, Recall, Lift.
Create a Tableau View of the impact of wildlife strikes with aircraft over time for a specific origin state. Provide a screen capture of and describe the Tableau view you have created and comment on different types of impact to aircraft from wildlife strikes over time and does this differ much for different origin states.
Create a Tableau View of flight phase by time of the day which shows when wildlife strikes with aircrafts occur. Provide a screen capture of and describe the Tableau view you have created and comment on which phase of a flight and time of the day wildlife strikes with aircraft are more likely to occur.
Create a Tableau GeoMap View of flights by origin states that displays number of wildlife strikes and total monetary cost for each origin state for different periods of time. Provide a screen capture of and describe Tableau view created and comment on this Tableau GeoMap View in relation to number of wildlife strikes and total cost by origin state over time. A number of origin states cannot be plotted on the geomap view as these are outside USA, comment on how you can deal with this issue.
Answer:
Introduction:
For this assignment, the selected Australian organization is Commonwealth Bank of Australia or CBA. It is one of the listed organizations in the ASX or Australian Stock Exchange that is eventually operating in the banking sector (Commbank.com.au. 2018). It is one of the most popular and significant multinational banks that has its businesses in Asia, New Zealand, Australia, United Kingdom and United States. This popular banking sector is responsible for providing various financial services. These services mainly include business banking, retail service banking, superannuation, investments, funds management, broking services and many more. Commonwealth Bank of Australia is registered as one of the “Big Four” banks in Australia and New Zealand. This bank started its journey in the year of 2011 by the Government of Australia and became completely privatised in the year of 1996. The headquarters of this organization is in Darling Harbour, Sydney, Australia. There are more than 1100 branches in the entire world and the net revenue in a year is about 26.005 billion Australian dollars. The net income of this organization is approximately 9.881 billion Australian dollars (Commbank.com.au. 2018). The total number of employees in the Commonwealth bank of Australia is 52000. The role of this bank in the central banking exponentially expanded after 1920. The most valuable factor that is being offered by this particular organization is the easiest loan providing services. Due to the excellent organizational strategies, The Commonwealth Bank has become extremely important as well as significant for the citizens of Australia and New Zealand. The major objective of this bank is to provide excellent services for their customers and retain the customer relationship management successfully (Commbank.com.au. 2018). The main reason for selecting this organization of CBA is that they are properly engaged with the information services by means of making subsequent utilization of various online platforms as well as various technological aspects.
This particular section of the report focuses on the significant analysis of Commonwealth Bank of Australia in respect to the security and privacy policy statements, which are available on their website (Tricker and Tricker 2015). The privacy governance and the data security of this organization are properly addressed with the nine core principles of Data Governance Australia Code of Practice. It is the principles based self regulatory regime, which sets the standards or benchmarks of any leading industry that are responsible for making the organization absolutely ethical and legal (Datagovernanceaus.com.au. 2018). The core nine principles for the data privacy and security of Commonwealth Bank of Australia are as follows:
- i) No harm Rule: According this principle, the reporting organization is needed to utilize the best endeavours to ensure the fact that collection, disclosure or use of any personal information of the person does not intend in causing harm or threat to the particular individual (Capaldi, Zu and Gupta 2013). Moreover, that organization working with integrity like data usage is not at all regarded as unethical. Any personal information must not be disclosed to the third party any type of data collection must not be exploited. For the organization of Commonwealth Bank of Australia, several important and significant steps are being undertaken to ensure that they are following the obligations of privacy or confidentiality. Hence, their organizational confidential information is secured and protected. Furthermore, various security measures are being adopted by them and access is controlled for maintaining the authenticity of the information (Kraakman and Hansmann 2017). This is securing their information and thus their data is secured.
- ii) Honesty and Transparency: The second principle is honesty and transparency. As per this principle, the organization is required to work with honesty and transparency when the data is being used, collected and disclosed (Davies 2016). The specific fact should be ensured by this organization is that any type of data collected by them is done according to this rule and is transparent enough to be overviewed by organizational employees and clients. The proper collection of information by the Commonwealth Bank of Australia is eventually done as per the updated practices. They are strictly adhered to the principle of honest and transparency and thus is termed as the most important and rules abiding organization (Cheffins 2013). The information related to their clients could be easily accessed for periodical unanticipated utilization and which is not disclosed in any privacy notice. Several security features are available for both the employees and customers that help in securing their data or information properly. The significant feature amongst these is providing a unique username and password for each of the customers. Moreover, if any of the organizational body or customer loses their username and password, there is an option to retrieve or recover that easily and promptly. This feature even help to mitigate the issues related to security of passwords (ArAs 2016). As per the Privacy Act 1988, Commonwealth Bank of Australia is bound to provide the best customer experience and make them satisfied.
iii) Fairness: The third principle of DGA codes of practice is fairness. The fairness to use collected as well as disclosed personal information is done by taking into consideration of various factors like proper community expectations about utilization of personal information and the chances of harm to the specific organization (Obradovich and Gill 2013). This even determines the perfect time for retrieving the personal data or information. This personal information is to be collected by the particular organization for any specific business purpose. This type of information cannot be retrieved if there is not proper purpose of business. The significant right to access this personal information by the organization of Commonwealth Bank of Australia is somewhat different from the privacy of law. This organization is often being unfair to their customers and they are utilizing the personal information of customers in other unfair means (Todorovic 2013). Hence, it is needed for Commonwealth Bank of Australia to be completely fair to their customers in maintaining the information.
- iv) Choice: The next principle of DGA codes of practice is choice. As per this rule, there should be a proper choice to use and collect any personal information. For the organization of Commonwealth Bank of Australia, the specific principle for personal information is not being properly followed by them and hence they should eventually re identify their choice of external information (Bottomley 2016). This would help in preserving the data properly and the individual’s consent would be required.
- v) Accuracy and Access: As per this principle, the accuracy of the data or information should be maintained properly and it should be ensured that the data that is being shared is not inaccurate. Perfect access to the confidential information is next important requirement in this principle (Beekes, Brown and Zhang 2015). The wrong access to the information should be checked and stopped and for this purpose, the specific reporting entity must take up few steps. When the organization of Commonwealth Bank of Australia is considered in this case, it is reported that they provide proper, accurate and genuine data to their customers. For their data accuracy, the organization is well accepted by every user worldwide. Furthermore, they have been useful in restricting their personal data perfectly and hence there is extremely less chance of data loss or data theft (com.au. 2018). They have maintained their choice to use and collect personal information, which are easily understood by everyone.
- vi) Safety, Security and De Identification: According to this principle, the organization of CBA must subsequently organize or design their security as per the standards of a recognized industry. This would stop them in data security breaching and all types of harms could be restricted. They should recruit a person, who would be responsible for checking all types of data security and this would then stop the chances of data breaching to a greater extent (Young and Thyil 2014). Moreover, the data set could be collected and stored properly. Apart from this, they must ensure that the procedure of de identification is remained robust with the help of regular testing and up gradation of tools and techniques.
vii) Accountability: One of the most significant principles of DGA codes of practice is accountability. This helps to maintain as well as create various personal information categories and why this information should not be disclosed to any third party. The proper method of collection, utilization and disclosure of personal information is checked here (Commbank.com.au. 2018). The transactions that are done within the reporting entity should be accurate and genuine. As the organization of Commonwealth Bank of Australia is a banking sector, this particular principle is the most important of all. The transactions of the organization should be proper and there should no chance of fraud cases. The customer information should be maintained with laws and a code of compliance must be ensured in the process.
viii) Stewardship: When the data privacy and security policy is being reviewed for Commonwealth Bank of Australia, it could be inferred that nothing is mentioned regarding stewardship principle. Hence, it is extremely important for them to incorporate this particular principle within their business (Datagovernanceaus.com.au. 2018). This would then require a specific officer, who would be responsible for creating rules and compliance with the code. Moreover, proper training is also required for the employees of Commonwealth Bank of Australia to handle the data practices perfectly and without any such complexities.
- ix) Enforcement: The final principle of DGA codes of practice is enforcement. It is completely related to the compliance and the specific code is enforced by the company periodically (ArAs 2016). In Commonwealth Bank of Australia, the codes should be properly enforced for making it a member of data governance.
The conduction of EDA or exploratory data analysis for diabetes data, with the help of rapid miner initially needs the proper understanding of data fitting procedure within the software and also the procedure of building the process. EDA started with data import within rapid miner and finally saving it in the local storage. As soon as the data is being stored, the data set is then dragged within the section of process layout and thus connecting it to output port. When the run button is being pressed, rapid miner would return the basic statistics and data set in a summarized form as provided below:
The analysis is eventually performed as the process of cleaning of provided set of raw data. In this particular case, the data set is of diabetes. IT comprises of one dependent variable and eight independent variables. Every variable is not vital for the prediction of dependent variables (Lin et al. 2014). As per this analysis, the set of 5 to 6 variable are enough for predicting the one dependent variable.
Three step analysis is being performed by the analyst for identification of the top 5 variables, which are enough for predicting diabetes of any person. After several research works, it is reported that blood pressure, BMI, age, amount of insulin in body and level of glucose in body are the major features for determining the fact that whether the patient can be attacked by diabetes disease or not. Each and every variable that is included within the data set are eventually selected by the analyst. Plotting of scatter diagrams is the first step. With the help of this step the analyst identified the variables. Following is graph of scatter plots where there is no effective light. After considering every scatter plot, it is assumed that the above mentioned factors are major features for predicting diabetes.
As the plotting scatter diagram was unsuccessful in providing clear and proper indication, the specific analyst has performed a correlation study. This study has indicated that every independent variable is positively associated with the resulting variable. Total 8 independent variables are present. However, as per the result, blood pressure and thickness of skin have negligible effect on the dependent variable (Wang et al. 2014). Thus, in this step, the two variables could be omitted from this list and rest 8 variables are utilized for further study.
This correlation analysis has given 6 independent variables for further analysis. For ensuring the effectiveness, he has performed a chi square test as the last step of EDA. Following is the figure for showing every variable weight by chi square statistic. The analysis confirms to exclude the two variables of skin thickness and blood pressure as well as also excludes the variable of pregnancy. Thus, 5 remaining variables could be helpful for predicting diabetes.
When the identification of major variables is being done, the next step is the proper understanding of the procedure of the variables in predicting diabetes chance. The analyst has eventually built a specific decision tree. It required some of the modifications of the existing data set and this is done with various operators like “Set role” and “Numerical to nominal”. After the successful application of Decision tree operator, the analyst can build decision tree.
The tree indicates the fact that level of glucose in body is the initial point for concluding that whether the person possesses diabetes or not (Ajala, English and Pinkney 2013). When the level of glucose is more than or equivalent to 166.5, the person is likely to have diabetes. However, if the score is below 166.5, rest of the variables are required to be analyzed. Moreover, if the score is below 154.5, the person does not have diabetes. Hence, decision tree is useful in this method.
The analyst has even performed logistic regression. He has used Weka extension within rapid miner for performing this analysis. There are few operators that are incorporated within the process as depicted below. The resulting figure would be demonstrating off ratios and coefficients. Odd ratios compare the outcome occurrence in presence of the particular exposures with the outcome occurrence in absence of that specific exposure (Lin et al. 2014). It depicts the fact that a value that is more than one indicates direct association and a value that is less than one indicates indirect association. Therefore, conclusion can be drawn from the outcome that only level of insulin is directly linked with the various chances of diabetes. The increment in the level of insulin should increment the chance of disease in any patient.
As of now, the two procedures that are mentioned above would be providing enough insight regarding diabetes prediction after considering the chosen five distinct independent variables. Hence, in this particular section, the respective analyst has eventually performed the comparison of both model performances. He has revised the logistic regression models and decision tree after incorporating some of the additional operators like Apply Model Operator and Performance (Binominal Classification) Operator and Cross Validation Operator with the data mining procedures (Ajala, English and Pinkney 2013). Various performance matrices are also considered like Area under Roc Chart (AUC), Accuracy, False Positive Rate, True Positive Rate, Recall, Lift, Miscalculation Rate, F Measure, Precision, and Sensitivity.
The figures given below are illustrating the Area under Roc Chart (AUC) and confusion matrices for both the models. Moreover, the table is depicting a proper comparison between the two models and concluding whose performance is better.
Performance Matrices |
Decision Tree |
Logistic Regression |
Remarks |
Accuracy |
72.39% +/- 4.53% (mikro: 72.40%) |
77.47% +/- 5.34% (mikro: 77.47%) |
Logistic Regression |
Classification error |
27.61% +/- 4.53% (mikro: 27.60%) |
22.53% +/- 5.34% (mikro: 22.53%) |
Decision tree |
Precision |
76.64% +/- 15.32% (mikro: 77.45%) (positive class: true) |
73.03% +/- 9.68% (mikro: 72.51%) (positive class: true) |
Decision Tree |
Recall |
29.37% +/- 13.52% (mikro: 29.48%) (positive class: true) |
57.02% +/- 12.46% (mikro: 57.09%) (positive class: true) |
Logistic Regression |
Lift |
219.45% +/- 43.06% (mikro: 221.95%) (positive class: true) |
209.32% +/- 27.86% (mikro: 207.80%) (positive class: true) |
Decision Tree |
F measure |
41.07% +/- 14.97% (mikro: 42.70%) (positive class: true) |
63.35% +/- 9.88% (mikro: 63.88%) (positive class: true) |
Logistic Regression |
False positive |
2.300 +/- 1.418 (mikro: 23.000) (positive class: true) |
5.800 +/- 2.713 (mikro: 58.000) (positive class: true) |
Logistic Regression |
True positive |
7.900 +/- 3.673 (mikro: 79.000) (positive class: true) |
15.300 +/- 3.437 (mikro: 153.000) (positive class: true) |
Logistic Regression |
Sensitivity |
29.37% +/- 13.52% (mikro: 29.48%) (positive class: true) |
57.02% +/- 12.46% (mikro: 57.09%) (positive class: true) |
Logistic Regression |
This table has provided the comparison analysis on the basis of selected performance matrices. As per this table, the logistic regression is providing better result than decision tree. The main reason is that in various cases such as true positive prediction, recall, sensitivity and false positive prediction, the percentage is much than decision tree.
The graph provided below is depicting the impact of various wildlife strikes with aircrafts for Delaware in given period of time. In most of the cases, there is no impact, which means the aircraft could run without any problem. However, this graph also shows that there had been few precautionary landing for these wildlife strikes. The analyst had noticed similar trends in various states.
The tableau view provided below demonstrates the phase of flight by time of day when wildlife strike with aircraft occurs. As per the tableau view, conclusion could be drawn that in the approach phase, these wildlife strikes are much more in day than the night time.
The tableau view provided below demonstrates the comparison of wildlife species for the frequency of aircraft strike and the various chances of damages occurred. As per this view, a medium sized bird is more dangerous. The estimated total expense for such damage is $69,54,217.
The tableau GeoMap view provided below demonstrates the flights shown by the origin states, displaying total wildlife strike and total monetary expenses for every state in various time periods. As per this, California has recorded the highest number of aircraft strikes in the frame of time. The graph has shown the states of United Nation.
The data is to be understood and visualized for the design of a dashboard. The variables required here are shown in the graph when it is being worked on dash board. It is seen that the dashboard is not considered properly since only the geographical view is selected. When AWS dashboard was designed, the respective analyst has selected every graph view and thus in taking every important aspect or feature. For example, when the third graph is considered, it compares the frequency of wildlife strike with damage chances. The utilization of hit map has properly shown all information with proper orientation.
References
Ajala, O., English, P. and Pinkney, J., 2013. Systematic review and meta-analysis of different dietary approaches to the management of type 2 diabetes–. The American journal of clinical nutrition, 97(3), pp.505-516.
ArAs, G., 2016. A handbook of corporate governance and social responsibility. CRC Press.
Beekes, W., Brown, P. and Zhang, Q., 2015. Corporate governance and the informativeness of disclosures in Australia: A re?examination. Accounting & Finance, 55(4), pp.931-963.
Bottomley, S., 2016. The constitutional corporation: Rethinking corporate governance. Routledge.
Capaldi, N., Zu, L. and Gupta, A.D. eds., 2013. Encyclopedia of corporate social responsibility (Vol. 21). New York: Springer.
Cheffins, B.R., 2013. The history of corporate governance (p. 47). Oxford: Oxford University Press.
Commbank.com.au. (2018). Learn about our Privacy Policy - CommBank. [online] Available at: https://www.commbank.com.au/security-privacy/general-security/privacy.html [Accessed 21 May 2018].
Datagovernanceaus.com.au. (2018). [online] Available at: https://datagovernanceaus.com.au/wp-content/uploads/2016/07/DGA_Code_of_Practice_2017_15.11.17.pdf [Accessed 21 May 2018].
Davies, A., 2016. Best practice in corporate governance: Building reputation and sustainable success. Routledge.
Kraakman, R. and Hansmann, H., 2017. The end of history for corporate law. In Corporate Governance (pp. 49-78). Gower.
Lin, T., Zhong, L., Guo, L., Fu, F. and Chen, G., 2014. Seeing diabetes: visual detection of glucose based on the intrinsic peroxidase-like activity of MoS 2 nanosheets. Nanoscale, 6(20), pp.11856-11862.
Obradovich, J. and Gill, A., 2013. The impact of corporate governance and financial leverage on the value of American firms.
Todorovic, I., 2013. Impact of corporate governance on performance of companies. Montenegrin Journal of Economics, 9(2), p.47.
Tricker, R.B. and Tricker, R.I., 2015. Corporate governance: Principles, policies, and practices. Oxford University Press, USA.
Wang, X., Bao, W., Liu, J., OuYang, Y.Y., Wang, D., Rong, S., Xiao, X., Shan, Z.L., Zhang, Y., Yao, P. and Liu, L.G., 2013. Inflammatory markers and risk of type 2 diabetes: a systematic review and meta-analysis. Diabetes care, 36(1), pp.166-175.
Young, S. and Thyil, V., 2014. Corporate social responsibility and corporate governance: Role of context in international settings. Journal of Business Ethics, 122(1), pp.1-24.
Buy CIS8008 Business Intelligence - Free Samples to Students Answers Online
Talk to our expert to get the help with CIS8008 Business Intelligence - Free Samples to Students Answers to complete your assessment on time and boost your grades now
The main aim/motive of the management assignment help services is to get connect with a greater number of students, and effectively help, and support them in getting completing their assignments the students also get find this a wonderful opportunity where they could effectively learn more about their topics, as the experts also have the best team members with them in which all the members effectively support each other to get complete their diploma assignments. They complete the assessments of the students in an appropriate manner and deliver them back to the students before the due date of the assignment so that the students could timely submit this, and can score higher marks. The experts of the assignment help services at urgenthomework.com are so much skilled, capable, talented, and experienced in their field of programming homework help writing assignments, so, for this, they can effectively write the best economics assignment help services.