This guide is ideal if you're a professional, manager, or student who wants practical knowledge of analyzing data, without having to get a PhD in statistics. It's also good for people who have a PhD in statistics, but may not know how to write programs that apply statistical methods to real data. DiHands-On Programming with rGarrett grolemundBeijing· Cambridge.mham·Kh· Sebastopol. TokyoOREILLYHands-On Programming with Ry Garrett GrolemundCopyright o 2014 Garrett Grolemund. All rights reservedPrinted in the United States of americaPublished by o reilly media, InC., 1005 Gravenstein Highway North, Sebastopol, CA 95472OReilly books may be purchased for educational, business, or sales promotional use Online editions arealsoavailableformosttitles(http://my.safaribooksonline.com).fOrmoreinformation,contactourcorporateinstitutionalsalesdepartment800-998-9938orcorporate@oreilly.comEditors: Julie Steele and Courtney nashIndexer: Judith McConvilleProduction editor: matthew hackerCover Designer: Randy ComerCopyeditor: Eliahu SussmanInterior Designer: David FutatoProofreader: Amanda Kerseystrator: Rebecca demarestJuly 2014First editionRevision History for the First Edition:2014-07-08 First releaseSeehttp://oreilly.com/catalog/errata.csp?isbn=9781449359010forreleasedetailsNutshell Handbook, the Nutshell Handbook logo, and the O Reilly logo are registered trademarks ofO ReillyMedia, Inc. Hands-On Programming with R, the picture of an orange-winged Amazon parrot, and relatedtrade dress are trademarks of o reilly media, IncMany of the designations used by manufacturers and sellers to distinguish their products are claimed astrademarks. Where those designations appear in this book, and O Reilly Media, Inc was aware ofa trademarkclaim, the designations have been printed in caps or initial capsWhile every precaution has been taken in the preparation of this book, the publisher and authors assumeno responsibility for errors or omissions, or for damages resulting from the use of the information containedhereinISBN:978-1-449-35901-0Table of contentsForewordPrefabPart I. Project 1 Weighted diceThe very basicsThe R User InterfaceObjectsFunctions12Sample with ReplacementWriting your own Functions16The Function Constructor17Arguments18Scripts20Summar222. Packages and Help Pages....Packa23install packages24librar24Getting Help with Help Pages29Parts of a help pageGetting more helpSummary33330334Project 1p-upPart I. Project 2: Playing cards3. R Objects37Atomic vectors38Doubles39Integers40Characters41logicaLs42Complex and raw42AttributesamesDi46arrays46Cl47Dates and Times48Factors49Ccoercion51Lists53Data e55oading Data57Saving Data61Summary64. R Notation,65Selecting values6Positive Integers66Negative Integers68Blank spaces69Logical valuesames70Deal a card70Shuffle the deck71Dollar Signs and Double BracketsSummary765. Modifying ValueS........·。·。·77Changing Values in PlaceLogical Subsetting80Logical testsBoolean operators85Table of contentsMissing informationna.rmis na90Summary916. Environments93Environments93Working with Environments5The Active Environment7Scoping rules98assignmentEvaluation99Closures107SummaryProject 2 Wrap-up112Part l. Project 3: Slot Machine7. Programs115118Sequential steps118Parallel cases119if Statements120else statements123Lookup tableCode comments136Summary1378.S3139The s3 System139Attributes140Generic Functions145Methods146Method dispatch148Classes151S3 and Debugging152S4 and r5152Summary152oop155Expected values155Table of contentexpand gri157for Loops163while loops168repeat loops169Summary16910. Speed171Vectorized code171How to Write vectorized code173How to Write Fast for Loops in R178Vectorized Code in Practice179Loops Versus Vectorized Code183Summar183Project 3 W184A. Installing R and rStudio..........,187ages.191C. Updating R and Its Packages95D. Loading and Saving Data in R..,..............197E. Debugging R Code.…211Index,,,221ⅵi| Table of contentsForewordLearning to program is important if you're serious about understanding data. There'sno argument that data science must be performed on a computer, but you have a choicebetween learning a graphical user interface (GUn)or a programming language. BothGarrett and i strongly believe that programming is a vital skill for everyone who worksintensely with data. While convenient, a gui is ultimately limiting because it hampersthree properties essential for good data analysisReproducibilityThe ability to re-create a past analysis, which is crucial for good scienceAutomationThe ability to rapidly re-create an analysis when data changes(as it always doesCommunicationCode is just text, so it is easy to communicate. When learning, this makes it easy toget help-whether it's with email, Google, Stack Overflow, or elsewhereDont be afraid of programming! Anyone can learn to program with the right motivation,and this book is organized to keep you motivated. This is not a reference book;instead, it's structured around three hands-on challenges. Mastering these challengeswill lead you through the basics of R programming and even into some intermediatetopics, such as vectorized code, scoping, and S3 methods. Real challenges are a greatway to learn, because you're not memorizing functions void of context; instead, you relearning functions as you need them to solve a real problem. You'l learn by doing, notreadings you learn to program, you are going to get frustrated. You are learning a new language, and it will take time to become fluent. But frustration is not just natural,itactually a positive sign that you should watch for. frustration is your brains way of beinglazy it's trying to get you to quit and go do something easy or fun. If you want to getphysically fitter, you need to push your body even though it complains. If you want toget better at programming, you'll need to push your brain. Recognize when you getfrustrated and see it as a good thing: you're now stretching yourself. Push yourself alittle further every day, and you'll soon be a confident programmerHands-On Programming with r is friendly, conversational, and active It's the next-bestthing to learning r programming from me or Garrett in person. I hope you enjoy readingit as much as i haveHadley wickhamChief Scientist, STudioP.S. Garrett is too modest to mention it, but his lubridate package makes working withdates or times in R much less painful. Check it out!Foreword