Big Data For DummiesBiq dataDUMMIESFORA Wiley brandBiq dataFORDUMMIESA Wiley brandby Judith Hurwitz, Alan Nugent, Dr. Fern Halper,and marcia KaufmanDUMMESFORa Wiley brandBig Data For dummiesPublished byJohn wiley sons, Inc111 River StreetHoboken. NJ07030-5774www.wiley.comCopyright@ 2013 by John Wiley Sons, Inc, Hoboken, New JerseyPublished simultaneously in CanadaNo part of this publication may be reproduced, stored in a retrieval system or transmitted in any form orby any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permit-ted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior writtenpermission of the Publisher, or authorization through payment of the appropriate per-copy fee to theCopyright Clearance Center, 222 Rosewood Drive, Danvers, MA01923, (978)750-8400, fax(978)646-8600Requests to the Publisher for permission should be addressed to the Permissions Department, John wiley&sonsInc.,11lRiverStreethoBoken,Nj07030,(201)748-6011,fax(201)748-6008,oronlineat Wiley, the Wiley logo, For Dummies, the Dummies Man logo, A Reference for the rest of UsThe Dummies Way, Dummies Daily, The Fun and Easy Way, Dummies. com, Making Everything Easier, andrelated trade dress are trademarks or registered trademarks of john Wiley sons, Inc and or its affiliates in the United States and other countries and may not be used without written permission. all othertrademarks are the property of their respective owners. John wiley sons, Inc is not associated with anyproduct or vendor mentioned in this bookLIMIT OF LIABILITY/DISCLAIMER OF WARRANTY: THE PUBLISHER AND THE AUTHOR MAKE NOREPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE ACCURACY OR COMPLETENESS OFTHE CONTENTS OF THIS WORK AND SPECIFICALLY DISCLAIM ALL WARRANTIES. INCLUDING WITH-OUT LIMITATION WARRANTIES OF FITNESS FOR A PARTICULAR PURPOSE. NO WARRANTY MAY BECREATED OR EXTENDED BY SALES OR PROMOTIONAL MATERIALS. THE ADVICE AND STRATEGIESCONTAINED HEREIN MAY NOT BE SUITABLE FOR EVERY SITUATION. THIS WORK IS SOLD WITH THEUNDERSTANDING THAT THE PUBLISHER IS NOT ENGAGED IN RENDERING LEGAL ACCOUNTING OROTHER PROFESSIONAL SERVICES. IF PROFESSIONAL ASSISTANCE IS REQUIRED, THE SERVICES OFA COMPETENT PROFESSIONAL PERSON SHOULD BE SOUGHT. NEITHER THE PUBLISHER NOR THEAUTHOR SHALL BE LIABLE FOR DAMAGES ARISING HEREFROM, THE FACT THAT AN ORGANIZATIONOR WEBSITE IS REFERRED TO IN THIS WORK AS A CITATION AND/OR A POTENTIAL SOURCE OF FURTHER INFORMATION DOES NOT MEAN THAT THE AUTHOR OR THE PUBLISHER ENDORSES THE INFOR-MATION THE ORGANIZATION OR WEBSITE MAY PROVIDE OR RECOMMENDATIONS IT MAY MAKEFURTHER. READERS SHOULD BE AWARE THAT INTERNET WEBSITES LISTED IN THIS WORK MAY HAVECHANGED OR DISAPPEARED BETWEEN WHEN THIS WORK WAS WRITTEN AND WHEN IT IS READFor general information on our other products and services, please contact our Customer CareDepartment within the U.s. at 877-762-2974, outside the U.s. at 317-572-3993, or fax publishes in a variety of print and electronic formats and by print-on-demand. Some materialincluded with standard print versions of this book may not be included in e-books or in print-on-demandIf this book refers to media such as a Cd or dvd that is not included in the version you purchased, youmaydownloadthismaterialathttp://booksupport.wiley.comFormoreinformationaboutWileyproductsvisitwww.wiley.comLibrary of Congress Control Number: 2013933950ISBN:978-1-11850422-2(pbk);ISBN978-1-118644l7-l(ebk);ISBN978-1-118643969ebk);ISBN978-1-118-64010(ebkManufactured in the united states of america10987654321About the AuthorsJudith S. hurwitz is President and ceo of hurwitz Associates a researchand consulting firm focused on emerging technology, including cloud comput-ing, big data, analytics, software development, service management, and secu-rity and governance. She is a technology strategist, thought leader, and author.a pioneer in anticipating technology innovation and adoption she has servedas a trusted advisor to many industry leaders over the years. Judith has helpedthese companies make the transition to a new business model focused on thepusiness value of emerging platforms. She was the founder of Hurwitz GroupShe has worked in various corporations, including Apollo Computer and JohnHancock. She has written extensively about all aspects of distributed softwareIn 2011 she authored Smart or Lucky? How Technology leaders Turn Chance intoSuccess (ossey Bass, 2011). Judith is a co-author on five retail For dummiestitles including Hybrid Cloud For Dummies ohn Wiley& Sons, InC, 2012), CloudComputing For Dummies John Wiley sons, Inc, 2010), Service ManagementFor dummies, and service oriented architecture for dummies. 2nd edition(both John Wiley sons, Inc., 2009). She is also a co-author on many custompublished For Dummies titles including Platform as a Service For Dummies,CloudBees Special Edition John Wiley Sons, Inc, 2012), Cloud For dummiesIBM Midsize Company Limited Edition John Wiley& Sons, Inc, 2011), PrivateCloud For dummies, IBM Limited Edition(2011), and information on Demand ForDummies, IBM Limited Edition(2008)(both John Wiley sons, Inc)Judith holds bs and ms degrees from Boston University, serves on severaladvisory boards of emerging companies, and was named a distinguishedalumnus of Boston University's College of Arts Sciences in 2005. She serveson boston University's alumni Council. She is also a recipient of the 2005Massachusetts Technology leadership Council awardAlan F. Nugent is a Principal Consultant with Hurwitz& Associates. Al isan experienced technology leader and industry veteran of more than threedecades. Most recently, he was the Chief Executive and Chief TechnologyOfficer at Mzinga, Inc, a leader in the development and delivery of cloud-basedsolutions for big data, real-time analytics, social intelligence, and communitymanagement Prior to mzinga, he was executive vice president and chiefTechnology Officer at CA, Inc. where he was responsible for setting the strategictechnology direction for the company He joined ca as senior vice presidentand general manager of CA's Enterprise Systems Management esm businessunit and managed the product portfolio for infrastructure and data managementPrior to joining CA in April of 2005, Al was senior vice president and cto ofNovell, where he was the innovator behind the company's moves into opensource and identity-driven solutions. As consulting Cto for BellSouth he ledthe corporate initiative to consolidate and transform all of BellSouth's disparatecustomer and operational data into a single data instanceAl is the independent member of the board of Directors of AdaptiveComputing in Provo, UT, chairman of the advisory board of space curve ilSeattle, WA, and a member of the advisory board of n-of-one in Waltham, MAHe is a frequent writer on business and technology topics and has shared histhoughts and expertise at many industry events throughout the yearsHe is an instrument rated private pilot and has played professional poker forthe past three decades In his sparse spare time he enjoys rebuilding olderAmerican muscle cars and motorcycles, collecting antiquarian books, epicurean cooking, and has passion for cellaring American and Italian winesFern Halper, PhD, is a Fellow with Hurwitz& Associates and Director ofTDWI Research for Advanced Analytics. She has more than 20 years ofexperience in data analysis, business analysis, and strategy development.Fern has published numerous articles on data analysis and advanced analytics. She has done extensive research, writing, and speaking on the topicof predictive analytics and text analytics. Fern publishes a regular technoogy blog. She has held key positions at at&t Bell Laboratories and lucentTechnologies, where she was responsible for developing innovative dataanalysis systems as well as developing strategy and product-line plans forInternet businesses Fern has taught courses in information technology atseveral universities. She received her ba from Colgate University and herPhd from Texas A&M University.Fern is a co-author on four retail For dummies titles including hybrid cloudFor Dummies ohn Wiley sons, Inc, 2012), Cloud Computing For DummiesJohn Wiley sons, InC, 2010), Service Oriented Architecture For Dummies2nd Edition, and Service Management For Dummies(both John Wiley sonsInc., 2009). She is also a co-author on many custom published For Dummiestitles including Cloud For Dummies, IBM Midsize Company Limited EditionJohn Wiley sons, Inc 2011), Platform as a Service For Dummies, Cloud BeesSpecial edition John Wiley& sons, Inc, 2012), and Information on demandFor Dummies, IBM Limited Edition John Wiley Sons, Inc, 2008)Marcia A Kaufman is a founding partner and coo of hurwitz Associates, aresearch and consulting firm focused on emerging technology, including cloudcomputing, big data, analytics, software development, service management, andsecurity and governance. She has written extensively on the business valueof virtualization and cloud computing, with an emphasis on evolving cloudinfrastructure and business models, data-encryption and end-point securityand online transaction processing in cloud environments marcia has morethan 20 years of experience in business strategy, industry research, distributedsoftware, software quality, information management and analytics marcia haworked within the financial services, manufacturing, and services industriesDuring her tenure at Data Resources, Inc. DrD, she developed sophisticatedindustry models and forecasts. She holds an AB from Connecticut College inmathematics and economics and an mba from boston universityMarcia is a co-author on five retail For dummies titles including Hybrid CloudFor Dummies (ohn Wiley sons, Inc, 2012), Cloud Computing For dummiesJohn Wiley Sons, Inc, 2010), Service Oriented Architecture For Dummies,2nd Edition, and Service Management For Dummies(both John Wiley sonsInc 2009). She is also a co-author on many custom published For Dummiestitles including Platform as a Service For Dummies, CloudBees Special editionJohn Wiley Sons, Inc, 2012), Cloud For Dummies, IBM Midsize CompanyLimited Edition John Wiley sons, Inc, 2011), Private Cloud For dummies,IBM Limited Edition (2011), and information on Demand For Dummies(2008)(both John Wiley Sons, Inc.DedicationJudith dedicates this book to her husband. warren, her children Sara andDavid, and her mother, Elaine. She also dedicates this book in memory of herfather. DavidAlan dedicates this book to his wife Jane for all her love and support; histhree children Chris, Jeff, and Greg; and the memory of his parents whostarted him on this journeyFern dedicates this book to her husband, Clay, daughters, Katie and lindsay,and her sister adrienneMarcia dedicates this book to her husband, matthew her children Sara andEmily, and her parents, Gloria and Larry