BA03 Introduction to Big Data
WARNING - Clicking on the "SUBMIT ASSIGNMENT" button will submit the
Assignment. Be sure that you have reviewed your answers before clicking it. Attempt all the questions. All questions are compulsory. Each question carries 4 marks. There is No Negative Marking for wrong answer/s.
Please note: There are 25 questions out of which Q.No.21-25 are based on the Case Study.
Subject Code: BA03
Subject Name:
INTRODUCTION TO BIG
DATA
Component name: TERM
END
Question 1:- What are the four V’s of Big Data?
a) Volume
b) Velocity
c) Variety
d) All of the mentioned
Question 2:- Input to the _______ is the sorted output of the mappers a) Reducer
b) Mapper
c) Shuffle
d) All of the mentioned
Question 3:- According to analysts, for what can traditional IT systems provide a foundation when they’re integrated with big data technologies like Hadoop?
a) Data warehousing and business intelligence
b) Big data management and data mining
c) Management of Hadoop clusters
d) Collecting and storing unstructured data
Question 4:- __________ is a generalization of the facility provided by the MapReduce framework to collect data output by the Mapper or the Reducer.
a) Partitioner
b) OutputCollector
c) Reporter
d) All of the mentioned
Question 5:- ________ is the slave/worker node and holds the user data in the form of Data Blocks. a) DataNode
b) NameNode
c) Data block
d) Replication
Question 6:- What are the different features of Big Data Analytics?
/
Data Recovery
Scalability
Open-Source
All of the mentioned
Question 7:- ___________ is the world’s most complete, tested, and popular distribution of Apache Hadoop and related projects.
CDH
MDH
ADH
BDH
Question 8:- Cloudera ___________ includes CDH and an annual subscription license (per node) to Cloudera Manager and technical support.
Enterprise
Express
Standard
All of the mentioned
Question 9:- __________ is a online NoSQL developed by Cloudera.
Hcatalog
Hbase
Imphala Oozie
Question 10:- CDH process and control sensitive data and facilitate _____________ .
flexibilty scalabilty
multi-tenancy resuability
Question 11:- Which of the following is a distributed graph processing framework on top of Spark?
Spark Streaming
Mllib
GraphX
All of the mentioned
Question 12:- Based on which functional programming language construct for Spark optimizer?
Python
R
Java
Scala
Question 13:- You can delete a column family from a table using the method _________ of HBAseAdmin class.
delColumn() removeColumn() deleteColumn()
None of the mentioned
Question 14:- What is the default size of distributed cache?
8 GB
10 GB
16 GB
20 GB
Question 15:- Which of the following is a data processing engine for clustered computing?
Drill
Oozie
Spark
All of the mentioned
Question 16:- Hadoop is a framework that works with a variety of related tools. Common cohorts include ____________
MapReduce, Hive and Hbase
MapReduce, MySQL and Google Apps
MapReduce, Hummer and Iguana
MapReduce, Heron and Trumpet
Question 17:- In NameNode HA, when active node fails, which node takes the responsibility of active node?
Secondary NameNode
Backup node
Standby node
Checkpoint node
Question 18:- As companies move past the experimental phase with Hadoop, many cite the need for additional capabilities, including
Improved data storage and information retrieval
Improved extract, transform and load features for data integration
Improved data warehousing functionality
Improved security, workload management and SQL support
Question 19:- Which of the following is the reason for Spark being faster than MapReduce while execution time?
It supports different programming languages like Scala, Python, R, and Java.
RDDs
DAG execution engine and in-memory computation (RAM based)
All of the mentioned
Question 20:- The ________ class provides the getValue() method to read the values from its instance.
Get
Result
Put
Value
Case Study
Employees are a both a business’s greatest asset and its greatest expense. So hitting on the right formula for selecting them, and keeping them in place, is absolutely essential. One company offering unique solutions to help others tackle this challenge is Cornerstone. Cornerstone is a software tool which helps assess and understand employees and candidates by crunching half a billion data points on everything from gas prices, unemployment rates and social media use. Clients such as Xerox use it to predict, for example, how long an employee is likely to stay in his or her job, and remarkable insights gleaned include the fact that in some careers, such as call centre work, employees with criminal records perform better than those without. Its prowess has made Cornerstone into a huge success, with sales growing by 150% from 2012 to 2013 and the software being put to use by 20 of the Fortune 100 companies. The “data points” are measurements taken from employees working across 18 industries in 13 different countries, providing information on everything from how long they take to travel to work, to how often they speak to their managers. Data collection methods include the controversial “smart badges” that monitor employee movements and track which employees interact with each other. Cornerstone has certainly caused positive change in companies using it – Bank of America reportedly improved performance metrics by 23% and decreased stress levels (measured by analysing worker’s voices) by 19%, simply by allowing more staff to take their breaks together. And Xerox reduced call centre turnover by 20% by applying analytics to prospective candidates – finding among other things that creative people were more likely to remain with the company for the 6 months necessary to recoup the $6,000 cost of their training than inquisitive people. So far data gathering and analysis has focused mainly on customerfacing members of staff, who in larger organizations will tend to be those with less responsibility and decision-making power. Could even greater benefits be taken by applying the same principles to the movers and shakers in the boardroom, who hold the keys to widerreaching business change? Certainly some companies are starting to think that way. The director of research and strategy at one firm that uses the software – David Lathrop of
Steelcase – told the Financial Times this year that improving the performance of top executives has a
“disproportionate effect on the company”. Although he did not disclose precise details of methods or results, much research is being carried out in the name of finding exactly what it is that makes highfliers tick. This will inevitably find its way into analytical projects at big companies which spend millions hiring executives. Crunching employee data at this level plainly has the opportunity to bring huge benefits, but it could also prove disastrous if a company gets it wrong. Failing to take proper consideration of individuals’ rights to privacy in some jurisdictions (eg Europe) can lead to severe legal penalties. In my opinion, any company thinking about carrying out datagathering and analysis for these purposes needs to take great care. In workplaces where morale is low or relationships between workers and managers are not good, it could very easily be seen as a case of taking snooping too far. Interestingly, Cornerstone’s privacy policy makes it clear that information on applicants is provided to them by their clients, including names, work history and contact details. How many people know that simply by applying for a job with one of these clients, their personal data will be made available for analysis? It appears that Cornerstone absolves itself of responsibility here by declaring itself a “mere data processor” – putting the onus on the client businesses to gain permission to distribute their applicants’ and employees’ data. It is vitally important that staff are made aware of precisely what data is being gathered from them, and what it is being used for. Everyone (and certainly those running the operation) needs to be aware that the purpose is to increase overall company efficiency, rather than assess or monitor individual members of staff. With more than half of human resources departments reporting an increase in data analytics since 2010, according to a report by the Economist Intelligence Unit, it’s obvious that like it or not, it’s here to stay. Companies that use it well, with respect for their employees’ privacy and an understanding of the vital principle mentioned above, are likely to prosper.
Question 21:- You have data of a website contains information of logged in user ,one user may have multiple fields. But the number of fields per user may vary based on his actions.In that case which component of hadoop you will use to store the data?
- Pig
- MSSQL
- hbase
d) ORACLE 8I
Question 22:- Assume If data from external sources is getting populated in to hdfs in csv format on a daily basis, How would you handle it efficiently so that it can be processed by other applications and also reduce the data storage?
a) Hbase
b) Using ORC or Parquet format in hive, Deleting old hdfs data and Create business partdate as a partition in hive.
c) Pig
d) ORACLE 8I
Question 23:- Which tool is helpful to establised relationship and find employement life cycle in the above situation? a) NEO4J
b) Pig
c) ORACLE 8I
d) Hive
Question 24:- For the management and anlytics of the data in above situation, What is a block in HDFS and what is its default size in Hadoop 1 and Hadoop 2?
a) 32 MB and 64 MB
b) 16 MB and 32 MB
c) 128 MB and 256 MB
d) 64 MB and 128 MB
Question 25:- Which machine learning algorithm is most suitable for analytics and forecasting? a) Decision Tree
b) Regression
c) Classification
d) Random Forest
Resources
- 24 x 7 Availability.
- Trained and Certified Experts.
- Deadline Guaranteed.
- Plagiarism Free.
- Privacy Guaranteed.
- Free download.
- Online help for all project.
- Homework Help Services