Apache Hadoop YARN 完整版(英文)
Yarn是Hadoop集群的资源管理系统。Hadoop2.0对MapReduce框架做了彻底的设计重构,我们称Hadoop2.0中的MapReduce为MRv2或者Yarn To obtain permission to use material from this work, pleasesubmit a written request to Pearson Education, Inc, Permissions Department, One Lake Street,Upper Saddle River, New Jersey 07458, or you may fax your request to(201)236-3290lSBN-13:9780321-93450-5sBN-10:0-32193450-4Text printed in the United States on recycled paper at RR Donnelley in Crawfordsville, IndianaFirst printing, March 2014ContentsForeword by raymie Stata xiiiForeword by Paul dixPrefaceVIIAcknowledgment:XXIAbout the authors xxv1 Apache Hadoop YARN:A Brief History and Rational1Introduction 1Apache Hado2Phase o The era of ad hoc clusters 3Phase 1: Hadoop on Demand 3HDES in the hod world 5Features and Advantages of HODShortcomings of Hadoop on Demand 7Phase 2: d:f the Shared Compute Clusters 9Evolution of shared clusters 9Issues with Shared MapReduce Clusters 15Phase 3: Emergence of YARN 18Conclusion 202 Apache Hadoop YARN Install Quick Start 21Getting Started 22Steps to Configure a Single-Node yarn Cluste22Step 1: Download Apache Hadoop 22Step 2: Set JAVA_ HOME 23Step 3: Create Users and Groups 2:Step 4: Make Data and Log direct23Step 5: Configure core-site xml 24Step 6: Configure hdfs-site xml 24Step 7: Configure mapred-site xml 25Step 8: Configure yarn-site xml 25Step 9: Modify Java Heap size26Step 10: Format HDFS 26Step 11: Start the HDFS S27ContentsStep 12: Start YARN Services 28Step 13: Verify the Running services Using theWeb Interface 28Run Sample mapReduce Examples 30Wrap-up 313 Apache Hadoop YARN Core Concepts 33Beyond MapReduce 33The MapReduce Paradigm 35Apache Hadoop Map Reduce 35The Need for Non - MapReduce Workloads 37ddressing Scalability 37Improved Utilization 38User Agility 38Apache Hadoop YARN 38YARN Components 39ResourceManager 39Application Master 40Resource model 41ResourceRequests and containers 41Container specification 42Wrap-up 424 Functional Overview of YARN Components 43Architecture Overview 43ResourceManager 45YARN Scheduling Components 46FIFO Scheduler 46Capacity Scheduler 47Fair Schedule47Containers 49NodeManager 49ApplicationMaster 50YARN Resource model 50Client Resource Request 51ApplicationMaster Container Allocation 51ication master-ContainelManager Communication 52ContentsManaging Application dependencies53Localresources definitions 54LocalResource Timestamps 55LocalResource Types 55Localresource visibilities 56Lifetime of localresources 57Wrap-u575 Installing Apache Hadoop YARN 59The basics 59System Preparation 60Step 1: Install EPEL and pdsh 60Step 2: Generate and Distribute ssh Keys 61Script-based Installation of Hadoop 2 62JDK Optic62Step 1: Download and Extract the Scripts 63Step 2: Set the Script Variables 63Step 3: Provide Node Names 64Step 4: Run the script 64Step 5: Verify the Installatio65Script-based Uninstall 68Configuration File Processing 68Configuration File Settings 68core-site xm 68hdfs-site xml 69mapred-site. xml 69yarn-site. xml 70Start-up ScriptsInstalling Hadoop with Apache Ambari 71Performing an Ambari-basedHadoop Installation 72Step 1: Check Requirements 73Step 2: Install the Ambari Server 73Step 3: Install and Start Ambari Agents 73Step 4: Start the Ambari server 74Step 5: Install an HDP2 X Cluster 7584Contents6 Apache Hadoop YARN Administration 85Script-based Configuration 85Monitoring Cluster Health: Nagios 90Monitoring Basic Hadoop Services 92nitoring the JVN95Real-time Monitoring: Ganglia 97Administration with ambari 99JVM Analysis 103Basic yarn administration 106Yarn Administrative Tools 106Adding and Decommissioning YARN Nodes 107Capacity Scheduler Configuration 108YARN WebProxy 108Using the JobHistory Server 108Refreshing User-to-Groups Mappings 108Refreshing Superuser Proxy GroupsMappings 109Refreshing ACLs for Administration ofResourceManager109Reloading the Service-level AuthorizationPolicy File 109Managing YaRN Jobs 109Setting Container Memory 110Setting Container Cores 110Setting Map Reduce Properties 110User Log Management 111Wrap-u1147 Apache Hadoop YARN Architecture Guide 115ResourceManager 117Overview of the ResourceManageComponents 118Client Interaction with thResourceManager118Application Interaction with theResourceManager 120ContentsInteraction of Nodes with theResourceManager 121Core ResourceManager Components 122Security-related components in theResourceManager 124NodeManager 127Overview of the Node Manager Components 128NodeManager Components 129Node Manager Security Components 136Important NodeManager Functions37Application Master 138verview 138Liveliness 139Resource Requirements 140140Scheduling Protocol and Locality 142Launching Containers 145npleted Containers 146ApplicationMaster Failures and Recovery 146Coordination and Output Commit 146nformation for clients 147Security 147Cleanup on ApplicationMaster Exit 147YARN Containers 148Container environment 148Communication with the Application Master 149Summary for Application-writers 150Wrap-up 1518 Capacity Scheduler in YARN 153Introduction to the Capacity Scheduler 153Elasticity with Multitenancy 154154Resource Awareness 154Granular Scheduling54Locality155Scheduling policies 155Capacity Scheduler Configuration 155