Apache Spark Graph Processing PDF

u903906596 38 0 PDF 2019-09-13 18:09:12

This book is intended to present the GraphX library for Apache Spark and to teach the fundamental techniques and recipes to process graph data at scale. It is intended to be a self-study step-by-step guide for anyone new to Spark with an interest in or need for large-scale graph processing.Apache Spark Graph ProcessingCopyright C 2015 Packt PublishingAll rights reserved. No part of this book may be reproduced, stored in a retrievalsystem, or transmitted in any form or by any means, without the prior writtenpermission of the publisher, except in the case of brief quotations embedded incritical articles or reviewsEvery effort has been made in the preparation of this book to ensure the accuracyof the information presented. However, the information contained in this book issold without warranty cither express or implied. Neither the author, nor PacktPublishing and its dealers and distributors will be held liable for any damagescaused or alleged to be caused directly or indirectly by this bookPackt Publishing has endeavored to provide trademark information about all of thecompanies and products mentioned in this book by the appropriate use of capitalsHowever, Packt Publishing cannot guarantee the accuracy of this informationFirst published: September 2015Production reference: 1040915Published by Packt Publishing LtdLivery place35 Livery streetBirmingham b3 2PB, UKISBN978-1-78439-180-5www.packtpub.comwww.allitebooks.comCreditsAuthorProject CoordinatorRindra ramamonjisonNikhil nairReeviewerProofreaderThomas W. DinsmoreSafis editingRyan MccuneFrancoise ProvencherIndexerTejal SoniCommissioning EditorAmit ghodkeProduction CoordinatorAparna BhagatAcquisition EditorLarissa pintoCover workContent Development editDharmesh parmarTechnical editorPrajakta MhatreCopy EdiYesha Ganganiwww.allitebooks.comwww.allitebookscomForewordApache Spark is one of the most compelling technologies in the big data space andfor good reason. It allows data scientists and data engineers alike to work in theirlanguage of choice (ava, Scala, Python, SQL, and r as of this writing)to make senseof their data. As ReynoldXin noted, Apache Spark is the Swiss Army Knife of bigdata analytics tools. It allows you to use one tool to do many things from real-timestreaming to advanced analytics. And in no small part, the versatility and power ofGraphX has helped Spark propel forwardApache Spark graph Processing follows Rindra's journey into solving complex analyticsproblems. As a Phd graduate in electrical engineering from the University ofBritish Columbia, he focused on applying learning and optimization algorithms toachieve energy-efficient wireless networks. as he dove further into these problemshe realized the ease of which he could solve graph-processing problems by usingApache Spark GraphX. With a tutorial style and hands-on projects with interestingdatasets, this book is a reflection of his path from getting started with Apache SparkGraphX to iterative graph parallel processing to learning graph structuresprocessing, and a testament to the author's enthusiasm for the Spark communi hThis book is a great jump-start into GraphX, a practical guide for large-scale gr(and the community as a whole)Denny leeTechnology Evangelist, DatabricksAdvisor, Wearhackswww.allitebooks.comabout the authorRindra ramamonjison is a fourth year Phd student of electrical engineering atthe University of British Columbia, Vancouver. He received his masters degree fromTokyo Institute of Technology. He has played various roles in many engineeringcompanies, within telecom and finance industries. Hlis primary research interests aremachine learning, optimization, graph processing, and statistical signal processingRindra is also the co-organizer of the Vancouver Spark meetupwww.allitebooks.comAbout the reviewerThomas W. Dinsmore is a consultant and author with more than 30 years ofservice to enterprises around the world. He is an expert in business analytics, andhas working experience with the leading analytic tools, languages, and databasesIn his practice, Thomas helps organizations streamline analytics for improvedperformance and time to valuePreviously, Thomas served with The Boston Consulting Group, IBM,Price Waterhouse Coopers and SAS, as well as several startupsThomas coauthored Modern Analytics Methodologies and Advanced AnalyticsMethodologies, published in 2014 by FT Press. He is currently under contract topublish a book on disruptive technologies in business analytics, scheduled forpublication in Q2 2016I would like to thank the entire editorial and production team atPackt Publishing, who work tirelessly to bring quality books tothe publicwww.allitebooks.comWww. Packtpub. comSupport files, eBooks, discount offers, and moreForsupportfilesanddownloadsrelatedtoyourbookpleasevisitwww.packtpub.coMDid you know that Packt offers e Book versions of every book published, with PDFandepuBfilesavailableYoucanupgradetotheebookversionatwww.packtpub.coMand as a print book customer, you are entitled to a discount on the eBook copy. Get intouch with us at service@packtpub com for more detailsAtwww.packtpub.comyoucanalsoreadacollectionoffreetechnicalarticlessignup for a range of free newsletters and receive exclusive discounts and offers on Packtbooks and ebooksPACKTLIBhttps://www2.packtpub.com/books/subscription/packtlibDo you need instant solutions to your it questions? PacktLib is Packt's online digitalbook library. Here, you can search, access, and read Packt's entire library of booksWhy subscribe?Fully searchable across every book published by PacktCopy and paste, print, and bookmark contentOn demand and accessible via a web browserFree access for Packt account holdersIfyouhaveanaccountwithPacktatwww.packtpub.comyoucanusethistoaccessPacktLib today and view 9 entirely free books Simply use your login credentials forimmediate accesswww.allitebooks.comTable of contentsPrefaceChapter 1: Getting Started with Spark and GraphXDownloading and installing Spark 1.4.1Experimenting with the Spark shellGetting started with GraphXBuilding a tiny social networkLoading the dataThe property graph113556679Transforming RDds to VertexRDD and EdgeRDDIntroducing graph operationsBuilding and submitting a standalone application10Writing and configuring a Spark program10Building the program with the Scala Build ToolDeploying and running with spark-submit15SummaryChapter 2: Building and Exploring GraphsNetwork datasetsThe communication network18Flavor networks18Social ego networks19Graph builders19The graph factory method19edgeListFile20romEdges20fromEdge Tuples21Building graphs21Building directed graphs21Building a bipartite graph22Building a weighted social ego network26www.allitebooks.com

用户评论
请输入评论内容
评分:
暂无评论