Home > DSS Chapter 1

DSS Chapter 1


 

Chapter 6:

Big Data and Analytics 

 
 
 
Business Intelligence:  
A Managerial Perspective on Analytics (3
rd Edition)


Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 2  

Learning Objectives 

  • Learn what Big Data is and how it is changing the world of analytics
  • Understand the motivation for and business drivers of Big Data analytics
  • Become familiar with the wide range of enabling technologies for Big Data analytics
  • Learn about Hadoop, MapReduce, and NoSQL as they relate to Big Data analytics
  • Understand the role of and capabilities/skills for data scientist as a new analytics profession
 

(Continued��)


Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 3  

Learning Objectives 

  • Compare and contrast the complementary uses of data warehousing and Big Data
  • Become familiar with the vendors of Big Data tools and services
  • Understand the need for and appreciate the capabilities of stream analytics
  • Learn about the applications of stream analytics

Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 4  

Opening Vignette�� 

Big Data Meets Big Science at CERN 

  • Situation
  • Problem
  • Solution
  • Results
  • Answer & discuss the case questions.

Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 5  

Questions for the Opening Vignette 

  • What is CERN, and why is it important to the world of science?
  • How does the Large Hadron Collider work? What does it produce?
  • What is the essence of the data challenge at CERN? How significant is it?
  • What was the solution? How were the Big Data challenges addressed with this solution?
  • What were the results? Do you think the current solution is sufficient?

Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 6  

Big Data -  Definition and Concepts 

  • Big Data means different things to people with different backgrounds and interests
  • Traditionally, ��Big Data�� = massive volumes of data
    • E.g., volume of data at CERN, NASA, Google, ��
  • Where does the Big Data come from?
    • Everywhere! Web logs, RFID, GPS systems, sensor networks, social networks, Internet-based text documents, Internet search indexes, detail call records, astronomy, atmospheric science, biology, genomics, nuclear physics, biochemical experiments, medical records, scientific research, military surveillance, multimedia archives, ��

Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 7  

Technology Insights 6.1  
The Data Size Is Getting Big, Bigger, �� 

  • Hadron Collider - 1 PB/sec
  • Boeing jet - 20 TB/hr
  • Facebook - 500 TB/day
  • YouTube – 1 TB/4 min
  • The proposed Square Kilometer Array telescope (the world��s proposed biggest telescope) – 1 EB/day

Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 8  

Big Data -  Definition and Concepts 

  • Big Data is a misnomer!
  • Big Data is more than just ��big��
  • The Vs that define Big Data
    • Volume
    • Variety
    • Velocity
    • Veracity
    • Variability
    • Value
    • ��

Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 9  

Big Data -  Definition and Concepts 

  • Big Data is not new!
  • Traditionally, ��Big Data�� = massive volumes of data
    • Volume of data at CERN, NASA, Google, ��
  • Where does the Big Data come from?
    • Everywhere! Web logs, RFID, GPS systems, sensor networks, social networks, Internet-based text documents, Internet search indexes, detail call records, astronomy, atmospheric science, biology, genomics, nuclear physics, biochemical experiments, medical records, scientific research, military surveillance, multimedia archives, ��

Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 10  

A High-Level Conceptual Architecture for Big Data Solutions

(by AsterData / Teradata)


Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 11  

Application Case 6.1 

Big Data Analytics Helps Luxottica Improve its Marketing Effectiveness 

Questions for Discussion

  1. What does ��big data�� mean to Luxottica?
  2. What were their main challenges?
  3. What were the proposed solution and the obtained results?

Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 12  

Fundamentals of Big Data Analytics 

  • Big Data by itself, regardless of the size, type, or speed, is worthless
  • Big Data + ��big�� analytics = value
  • With the value proposition, Big Data also brought about big challenges
    • Effectively and efficiently capturing, storing, and analyzing Big Data
    • New breed of technologies needed (developed or purchased or hired or outsourced ��)

Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 13  

Big Data Considerations 

  • You can��t process the amount of data that you want to because of the limitations of your current platform.
  • You can��t include new/contemporary data sources (e.g., social media, RFID, Sensory, Web, GPS, textual data) because it does not comply with the data storage schema
  • You need to (or want to) integrate data as quickly as possible to be current on your analysis.
  • You want to work with a schema-on-demand data storage paradigm because of the variety of data types involved.
  • The data is arriving so fast at your organization��s doorstep that your traditional analytics platform cannot handle it.
  • ��

Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 14  

Critical Success Factors for  
Big Data Analytics 

  • A clear business need (alignment with the vision and the strategy)
  • Strong, committed sponsorship (executive champion)
  • Alignment between the business and IT strategy
  • A fact-based decision-making culture
  • A strong data infrastructure
  • The right analytics tools
  • Right people with right skills

Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 15  

Critical Success Factors for  
Big Data Analytics


Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 16  

Enablers of Big Data Analytics 

  • In-memory analytics
    • Storing and processing the complete data set in RAM
  • In-database analytics
    • Placing analytic procedures close to where data is stored
  • Grid computing & MPP
    • Use of many machines and processors in parallel (MPP - massively parallel processing)
  • Appliances
    • Combining hardware, software, and storage in a single unit for performance and scalability

Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 17  

Challenges of Big Data Analytics 

  • Data volume
    • The ability to capture, store, and process the huge volume of data in a timely manner
  • Data integration
    • The ability to combine data quickly and at reasonable cost
  • Processing capabilities
    • The ability to process the data quickly, as it is captured (i.e., stream analytics)
  • Data governance (�� security, privacy, access)
  • Skill availability (�� data scientist)
  • Solution cost (ROI)

Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 18  

Business Problems Addressed by  
Big Data Analytics 

  • Process efficiency and cost reduction
  • Brand management
  • Revenue maximization, cross-selling/up-selling
  • Enhanced customer experience
  • Churn identification, customer recruiting
  • Improved customer service
  • Identifying new products and market opportunities
  • Risk management
  • Regulatory compliance
  • Enhanced security capabilities
  • ��

Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 19  

Application Case 6.2 

Top 5 Investment Bank Achieves Single Source of the Truth 

Questions for Discussion

  1. How can Big Data benefit large-scale trading banks?
  2. How did MarkLogic��s infrastructure help ease the leveraging of Big Data?
  3. What were the challenges, the proposed solution, and the obtained results?

Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 20  

Application Case 6.2 

Moving from many old systems to a unified new system


Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 21  

Big Data Technologies 

  • MapReduce ��
  • Hadoop ��
  • Hive
  • Pig
  • Hbase
  • Flume
  • Oozie
  • Ambari
  • Avro
  • Mahout, Sqoop, Hcatalog, ��.

Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 22  

Big Data Technologies 
MapReduce 

  • MapReduce distributes the processing of very large multi-structured data files across a large cluster of ordinary machines/processors
  • Goal - achieving high performance with ��simple�� computers
  • Developed and popularized by Google
  • Good at processing and analyzing large volumes of multi-structured data in a timely manner
  • Example tasks: indexing the Web for search, graph analysis, text analysis, machine learning, ��

Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 23  

Big Data Technologies MapReduce  

How does

MapReduce

work?


Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 24  

Big Data Technologies 
Hadoop 

  • Hadoop is an open source framework for storing and analyzing massive amounts of distributed, unstructured data
  • Originally created by Doug Cutting at Yahoo!
  • Hadoop clusters run on inexpensive commodity hardware so projects can scale-out inexpensively
  • Hadoop is now part of Apache Software Foundation
  • Open source - hundreds of contributors continuously improve the core technology
  • MapReduce + Hadoop = Big Data core technology

Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 25  

Big Data Technologies 
Hadoop 

  • How Does Hadoop Work?
    • Access unstructured and semi-structured data (e.g., log files, social media feeds, other data sources)
    • Break the data up into ��parts,�� which are then loaded into a file system made up of multiple nodes running on commodity hardware using HDFS
    • Each ��part�� is replicated multiple times and loaded into the file system for replication and failsafe processing
    • A node acts as the Facilitator and another as Job Tracker
    • Jobs are distributed to the clients, and once completed, the results are collected and aggregated using MapReduce

Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 26  

Big Data Technologies 
Hadoop 

  • Hadoop Technical Components
    • Hadoop Distributed File System (HDFS)
    • Name Node (primary facilitator)
    • Secondary Node (backup to Name Node)
    • Job Tracker
    • Slave Nodes (the grunts of any Hadoop cluster)
    • Additionally, Hadoop ecosystem is made up of a number of complementary sub-projects: NoSQL (Cassandra, Hbase), DW (Hive), �� 
      • NoSQL = not only SQL

Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 27  

Big Data Technologies 
Hadoop - Demystifying Facts  

  • Hadoop consists of multiple products
  • Hadoop is open source but available from vendors too
  • Hadoop is an ecosystem, not a single product
  • HDFS is a file system, not a DBMS
  • Hive resembles SQL but is not standard SQL
  • Hadoop and MapReduce are related but not the same
  • MapReduce provides control for analytics, not analytics
  • Hadoop is about data diversity, not just data volume
  • Hadoop complements a DW; it��s rarely a replacement
  • Hadoop enables many types of analytics, not just Web analytics

Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 28  

Application Case 6.3 

eBay��s

Big Data

Solution 
 
 
 
 
 
 
 
 
 

Questions for Discussion

  1. Why did eBay need a Big Data solution?
  2. What were the challenges, the proposed solution, and the obtained results?
 

eBay��s Multi Data-Center Deployment


Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 29  

Data Scientist 

��The Sexiest Job of the 21st Century��

Thomas H. Davenport and D. J. Patil

Harvard Business Review, October 2012

  • Data Scientist = Big Data guru
    • One with skills to investigate Big Data
  • Very high salaries, very high expectations
  • Where do Data Scientists come from?
    • M.S./Ph.D. in MIS, CS, IE,�� and/or Analytics
    • There is not a specific degree program for DS!
    • PE, PML, �� DSP (Data Science Professional)

Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 30  

Skills That Define a Data Scientist


Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 31  

A Typical Job Post for Data Scientist


Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 32  

Application Case 6.4 

Big Data and Analytics in Politics 

Questions for Discussion

  1. What is the role of analytics and Big Data in modern-day politics?
  2. Do you think Big Data analytics could change the outcome of an election?
  3. What do you think are the challenges, the potential solution, and the probable results of the use of Big Data analytics in politics?

Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 33  

Application Case 6.4 
Big Data and Analytics in Politics


Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 34  

Big Data And Data Warehousing 

  • What is the impact of Big Data on DW?
    • Big Data and RDBMS do not go nicely together
    • Will Hadoop replace data warehousing/RDBMS?
  • Use Cases for Hadoop
    • Hadoop as the repository and refinery
    • Hadoop as the active archive
  • Use Cases for Data Warehousing
    • Data warehouse performance
    • Integrating data that provides business value
    • Interactive BI tools

Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 35  

Hadoop versus Data Warehouse 
When to Use Which Platform


Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 36  

Coexistence of Hadoop and DW 

  1. Use Hadoop for storing and archiving multi-structured data
  2. Use Hadoop for filtering, transforming, and/or consolidating multi-structured data
  3. Use Hadoop to analyze large volumes of multi-structured data and publish the analytical results
  4. Use a relational DBMS that provides MapReduce capabilities as an investigative computing platform
  5. Use a front-end query tool to access and analyze data

 


Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 37  

Coexistence of Hadoop and DW 

Source: Teradata


Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 38  

Big Data Vendors 

  • Big Data vendor landscape is developing very rapidly
  • A representative list would include
    • Claudera - claudera.com
    • MapR – mapr.com
    • Hortonworks - hortonworks.com
    • Also, IBM (Netezza, InfoSphere), Oracle (Exadata, Exalogic), Microsoft, Amazon, Google, ��
 

Software,

Hardware,

Service, ��


Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 39  

Top 10 Big Data Vendors  
with Primary Focus on Hadoop


Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 40  

Application Case 6.5 

Dublin City Council Is Leveraging Big Data to Reduce Traffic Congestion 

Questions for Discussion

  1. Is there a strong case to make for large cities to use Big Data Analytics and related information technologies? Identify and discuss examples of what can be done with analytics beyond what is portrayed in this application case.
  2. How can big data analytics help ease the traffic problem in large cities?
  3. What were the challenges Dublin City was facing; what were the proposed solution, initial results, and future plans?

 


Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 41  

Technology Insights 6.4  
How to Succeed with Big Data 

  1. Simplify
  2. Coexist
  3. Visualize
  4. Empower
  5. Integrate
  6. Govern
  7. Evangelize

Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 42  

Application Case 6.6 

Creditreform Boosts Credit Rating Quality with Big Data Visual Analytics 

Questions for Discussion

  1. How did Creditreform boost credit rating quality with Big Data and visual analytics?
  2. What were the challenges, proposed solution, and initial results?

Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 43  

Big Data And Stream Analytics 

  • Data-in-motion analytics and real-time data analytics
  • One of the Vs in Big Data = Velocity
  • Analytic process of extracting actionable information from continuously flowing/streaming data
  • Why Stream Analytics?
    • It may not be feasible to store the data, or may lose its value
  • Stream Analytics Versus Perpetual Analytics
  • Critical Event Processing?
 

 


Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 44  

Stream Analytics 
A Use Case in Energy Industry


Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 45  

Stream Analytics Applications 

  • e-Commerce
  • Telecommunication
  • Law Enforcement and Cyber Security
  • Power Industry
  • Financial Services
  • Health Services
  • Government
 

 


Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 46  

Application Case 6.7 

Turning Machine-Generated Streaming Data into Valuable Business Insights 

Questions for Discussion

  1. Why is stream analytics becoming more popular?
  2. How did the telecommunication company in this case use stream analytics for better business outcomes? What additional benefits can you foresee?
  3. What were the challenges, proposed solution, and initial results?

Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 47  

End of the Chapter  
 
 
 

  • Questions, comments

Copyright © 2014 Pearson Education, Inc.  

Slide 6 - 48  

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher. Printed in the United States of America. 
 


Search more related documents:DSS Chapter 1
Download Document:DSS Chapter 1

Set Home | Add to Favorites

All Rights Reserved Powered by Free Document Search and Download

Copyright © 2011
This site does not host pdf,doc,ppt,xls,rtf,txt files all document are the property of their respective owners. complaint#nuokui.com
TOP