Apache Hadoop YARN 完整版(英文)
Yarn是Hadoop集群的资源管理系统。Hadoop2.0对MapReduce框架做了彻底的设计重构,我们称Hadoop2.0中的MapReduce为MRv2或者YarnThe Addison-Wesley Data and Analytics SeriesLivelessonsoHadoopSUALDATST RIGHT6已STORYTELLINGFundamentalsEveryoneYARND3Doug EadlineMapReduce and Batchvideo山ARED. LANDERMICHAEL MAN口口 CHEHE1vtN日 D vAvILAHALRITCHIE B:K G÷ Addison- WesleyVisit informit. com/awdataseries for a complete list of available publicationshe Addison-Wesley Data and Analytics Series provides readers with practicalknowledge for solving problems and answering questions with data. Titles in this seriesprimarily focus on three areas:1. Infrastructure: how to store, move, and manage data2. Algorithms: how to mine intelligence or make predictions based on data3. Visualizations: how to represent data and insights in a meaningful and compelling wayThe series aims to tie all three of these areas together to help the reader build end-to-endsystems for fighting spam; making recommendations; building personalizationdetecting trends, patterns, or problems; and gaining insight from the data exhaust ofsystems and user interactionsMake sure to connect with usformit. com/socialconnectntormit.com+addisOn-wesleySafarithe trusted technology learning sourceALWAYS LEARNINGPEARSONApache HadoopYARNMoving beyond mapreduce andBatch Processing withApache HadoopArun C MurthyVinod Kumar vavilapalliDoug eadlineJoseph niemiecJeff markhamw Addison-WesleyUpper saddle river,NJ· Boston· Indianapolis· San franciscoNew york· Toronto· Montreal· London· Munich· Paris· MadridCapetown· Sydney· Tokyo· Singapore· Mexico CityMany of the designations used by manufacturers and sellers to distinguish their products areclaimed as trademarks. Where those designations appear in this book, and the publisher wasaware of a trademark claim, the designations have been printed with initial capital letters or in allcapitalsThe authors and publisher have taken care in the preparation of this book, but make no expressedor implied warranty of any kind and assume no responsibility for errors or omissions. No liability isassumed for incidental or consequential damages in connection with or arising out of the use ofthe information or programs contained hereinFor information about buying this title in bulk quantities, or for special sales opportunities(whichmay include electronic versions; custom cover designs; and content particular to your businesstraining goals, marketing focus, or branding interests), please contact our corporate sales depart-ment at corpsales@ pearsoned com or(800)382-3419For government sales inquiries, please contact governmentsales@ pearsoned comForquestionsaboutsalesoutsidetheUnitedStatespleasecontactinternational@pearsoned.comVisit us on the Web: informit. com/awLibrary of Congress Cataloging-in-Publication DataMurthy, Arun CApache Hadoop YARN: moving beyond MapReduce and batch processing with Apache Hadoop 2Arun C Murthy, vinod Kumar Vavilapalli, Doug Eadline, Joseph Niemiec, Jeff Markhampages cmIncludes indexISBN 978-0-321-93450-5(pbk: alk. paper)1. Apache Hadoop. 2. Electronic data processing--Distributed processing. I. TitleOA76.9.D5M972014004.36-dc232014003391Copyright 2014 Hortonworks IncApache, Apache Hadoop Hadoop and the hadoop elephant logo are trademarks of The apacheSoftware Foundation. Used with permission. No endorsement by The Apache Software Foundationis implied by the use of these marksHortonworks is a trademark of Hortonworks, Inc, registered in the U.S. and other countriesAll rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction,storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical,photocopying, recording, or likewise. To obtain permission to use material from this work, pleasesubmit a written request to Pearson Education, Inc, Permissions Department, One Lake Street,Upper Saddle River, New Jersey 07458, or you may fax your request to(201)236-3290lSBN-13:9780321-93450-5sBN-10:0-32193450-4Text printed in the United States on recycled paper at RR Donnelley in Crawfordsville, IndianaFirst printing, March 2014ContentsForeword by raymie Stata xiiiForeword by Paul dixPrefaceVIIAcknowledgment:XXIAbout the authors xxv1 Apache Hadoop YARN:A Brief History and Rational1Introduction 1Apache Hado2Phase o The era of ad hoc clusters 3Phase 1: Hadoop on Demand 3HDES in the hod world 5Features and Advantages of HODShortcomings of Hadoop on Demand 7Phase 2: d:f the Shared Compute Clusters 9Evolution of shared clusters 9Issues with Shared MapReduce Clusters 15Phase 3: Emergence of YARN 18Conclusion 202 Apache Hadoop YARN Install Quick Start 21Getting Started 22Steps to Configure a Single-Node yarn Cluste22Step 1: Download Apache Hadoop 22Step 2: Set JAVA_ HOME 23Step 3: Create Users and Groups 2:Step 4: Make Data and Log direct23Step 5: Configure core-site xml 24Step 6: Configure hdfs-site xml 24Step 7: Configure mapred-site xml 25Step 8: Configure yarn-site xml 25Step 9: Modify Java Heap size26Step 10: Format HDFS 26Step 11: Start the HDFS S27ContentsStep 12: Start YARN Services 28Step 13: Verify the Running services Using theWeb Interface 28Run Sample mapReduce Examples 30Wrap-up 313 Apache Hadoop YARN Core Concepts 33Beyond MapReduce 33The MapReduce Paradigm 35Apache Hadoop Map Reduce 35The Need for Non - MapReduce Workloads 37ddressing Scalability 37Improved Utilization 38User Agility 38Apache Hadoop YARN 38YARN Components 39ResourceManager 39Application Master 40Resource model 41ResourceRequests and containers 41Container specification 42Wrap-up 424 Functional Overview of YARN Components 43Architecture Overview 43ResourceManager 45YARN Scheduling Components 46FIFO Scheduler 46Capacity Scheduler 47Fair Schedule47Containers 49NodeManager 49ApplicationMaster 50YARN Resource model 50Client Resource Request 51ApplicationMaster Container Allocation 51ication master-ContainelManager Communication 52ContentsManaging Application dependencies53Localresources definitions 54LocalResource Timestamps 55LocalResource Types 55Localresource visibilities 56Lifetime of localresources 57Wrap-u575 Installing Apache Hadoop YARN 59The basics 59System Preparation 60Step 1: Install EPEL and pdsh 60Step 2: Generate and Distribute ssh Keys 61Script-based Installation of Hadoop 2 62JDK Optic62Step 1: Download and Extract the Scripts 63Step 2: Set the Script Variables 63Step 3: Provide Node Names 64Step 4: Run the script 64Step 5: Verify the Installatio65Script-based Uninstall 68Configuration File Processing 68Configuration File Settings 68core-site xm 68hdfs-site xml 69mapred-site. xml 69yarn-site. xml 70Start-up ScriptsInstalling Hadoop with Apache Ambari 71Performing an Ambari-basedHadoop Installation 72Step 1: Check Requirements 73Step 2: Install the Ambari Server 73Step 3: Install and Start Ambari Agents 73Step 4: Start the Ambari server 74Step 5: Install an HDP2 X Cluster 7584Contents6 Apache Hadoop YARN Administration 85Script-based Configuration 85Monitoring Cluster Health: Nagios 90Monitoring Basic Hadoop Services 92nitoring the JVN95Real-time Monitoring: Ganglia 97Administration with ambari 99JVM Analysis 103Basic yarn administration 106Yarn Administrative Tools 106Adding and Decommissioning YARN Nodes 107Capacity Scheduler Configuration 108YARN WebProxy 108Using the JobHistory Server 108Refreshing User-to-Groups Mappings 108Refreshing Superuser Proxy GroupsMappings 109Refreshing ACLs for Administration ofResourceManager109Reloading the Service-level AuthorizationPolicy File 109Managing YaRN Jobs 109Setting Container Memory 110Setting Container Cores 110Setting Map Reduce Properties 110User Log Management 111Wrap-u1147 Apache Hadoop YARN Architecture Guide 115ResourceManager 117Overview of the ResourceManageComponents 118Client Interaction with thResourceManager118Application Interaction with theResourceManager 120ContentsInteraction of Nodes with theResourceManager 121Core ResourceManager Components 122Security-related components in theResourceManager 124NodeManager 127Overview of the Node Manager Components 128NodeManager Components 129Node Manager Security Components 136Important NodeManager Functions37Application Master 138verview 138Liveliness 139Resource Requirements 140140Scheduling Protocol and Locality 142Launching Containers 145npleted Containers 146ApplicationMaster Failures and Recovery 146Coordination and Output Commit 146nformation for clients 147Security 147Cleanup on ApplicationMaster Exit 147YARN Containers 148Container environment 148Communication with the Application Master 149Summary for Application-writers 150Wrap-up 1518 Capacity Scheduler in YARN 153Introduction to the Capacity Scheduler 153Elasticity with Multitenancy 154154Resource Awareness 154Granular Scheduling54Locality155Scheduling policies 155Capacity Scheduler Configuration 155
暂无评论