Python Data Science Handbook
This is a book about doing data science with Python, which immediately begs the question: what is data science? It’s a surprisingly hard definition to nail down, espe‐ cially given how ubiquitous the term has become. Vocal critics have variously dis‐ missed the term as a superfluous label (after allPython Data Science handbookEssential Tools for Working with dataJake vander plasBeijing. Boston. Farnham. Sebastopol. Tokyo OREILLYPython Data Science handbookby jake VanderPlasCopyright C 2017 Jake Vander Plas. All rights reservedPrinted in the united states of americaPublished by o reilly Media, Inc, 1005 Gravenstein Highway North, Sebastopol, CA95472O Reilly books may be purchased for educational, business, or sales promotional use. Online editions arealsoavailableformosttitles(http://oreilly.com/safari).Formoreinformationcontactourcorporate/institutionalsalesdepartment800-998-9938orcorporate@oreilly.comEditor: dawn schanafeltIndexer: Word Co Indexing Services, Inc.Production editor: Kristen brownInterior Designer: David FutatoCopyeditor: Jasmine KwitynCover Designer: Karen MontgomeryProofreader: Rachel MonaghanIllustrator rebecca demarestDecember 2016: First editionRevision History for the First Edition2016-11-17: First ReleaseSeehttp://oreilly.com/catalog/errata.csp?isbn=9781491912058forreleasedetailsThe o reilly logo is a registered trademark of o reilly media, Inc. Python data Science handbook, thecover image, and related trade dress are trademarks of O Reilly Media, IncWhile the publisher and the author have used good faith efforts to ensure that the information andinstructions contained in this work are accurate the publisher and the author disclaim all responsibilitfor errors or omissions, including without limitation responsibility for damages resulting from the use ofor reliance on this work. Use of the information and instructions contained in this work is at your ownrisk. If any code samples or other technology this work contains or describes is subject to open sourcelicenses or the intellectual property rights of others, it is your responsibility to ensure that your usethereof complies with such licenses and/or rights978-1-491-91205-8Table of contentsPrefaceXI1. IPython: Beyond Normal PythonShell or notebookLaunching the IPython ShellLaunching the Jupyter NotebookHelp and Documentation in IPythonAccessing Documentation withAccessing Source Code withExploring modules with Tab CompletionKeyboard Shortcuts in the ipython ShellNavigation ShortcutsText Entry ShortcutsCommand History ShortcutsMiscellaneous shortcuts10IPython Magic Commands10Pasting Code Blocks: %paste and %pasteRunning external Code: %run12Timing Code Execution: %timeit12Help on Magic Functions: ? %magic, and %lsmagicInput and Output HistoryI Pythons In and Out objects13Underscore Shortcuts and Previous Outputs15Suppressing Output15Related Magic Commands16IPython and Shell Commands16Quick Introduction to the Shell16Shell Commands in ipython18Passing values to and from the shell18Shell-Related Magic Commands19Errors and Debugging20Controlling Exceptions: %xmodeDebugging: When Reading Tracebacks Is Not Enough22Profiling and Timing Code25Timing Code Snippets: %timeit and %time25Profiling Full Scripts: %prun27Line-by-Line Profiling with %lprun28Profiling Memory Use: %memit and %mprun29More Ipython resources30Web resources30Books312. Introduction to NumPy.,,33Understanding Data Types in Python34A Python Integer Is More Than Just an Integer35A Python List Is More Than Just a list37Fixed-Type arrays in python38Creating arrays from Python Lists39Creating Arrays from Scratch39NumPy Standard Data Types41The Basics of Num Py arrays42NumPy Array Attributes42Array Indexing: Accessing Single Elements43Array Slicing: Accessing SubarraysReshaping of arraysArray Concatenation and Splitting48Computation on NumPy Arrays: Universal Functions50The Slowness of Loops50Introducing UFuncs51Exploring NumPy's UFuncs52Advanced func features56Ufuncs: Learning Moreggregations: Min, Max, and Everything in Between58Summing the values in an Array59Minimum and maximum59Example: What Is the Average Height of US Presidents?61Computation on Arrays: Broadcastingg Broadcasting63Rules of BroadcastingBroadcasting in Practice68iv Table of ContentsComparisons, Masks, and Boolean Logic70Example: Counting Rainy Day70Comparison operators as ufuncs71Working with Boolean arraysBoolean arrays as masksancy indexing78Exploring Fancy IndexingCombined Indexing80Example: Selecting Random Points81Modifying values with Fancy Indexirg82Example: Binning Data83Sorting arrays85Fast Sorting in NumPy: npsort and np.argsort86Partial Sorts: Partitioning88Example: k-Nearest Neighbors88Structured Data: NumPy's Structured Arrays92Creating Structured Arr94More Advanced Compound types95Record arrays: Structured Arrays with a twist96On to pandas63. Data Manipulation with PandaSInstalling and Using Pandas97Introducing Pandas Objects8The Pandas Series Object9The Pandas Data Frame Object102The Pandas Index object105Data Indexing and Selection107Data Selection in Series107Data selection in data frame110Operating on Data in Pandas115Ufuncs: Index preservation115UFuncs: Index alignment116Ufuncs: Operations Between Data Frame and Series118Handling missing DataTrade-Offs in Missing Data Conventions120Missing data in pandas120Operating on null values124Hierarchical Indexing128a Multiply Indexed Series128Methods of multiIndex creation131Indexing and Slicing a multiIndex134Table of contenRearranging multi-Indices137Data Aggregations on Multi-Indices140Combining Datasets: Concat and append141Recall: Concatenation of Num Py arrays142Simple Concatenation with pdconcat142Combining Datasets: Merge and join146Relational algebra146Categories of Joins147Specification of the merge Ke149Specifying Set Arithmetic for Joins152Overlapping Column Names: The suffixes Keyword153Example: US States Data154gregation and grouping158Planets data159Simple aggregation in Pandas159Group By: Split, Apply, Combine161Pivot tables170Motivating Pivot Table170Pivot Tables by hand171Pivot Table syntax171Example: Birthrate DataVectorized String Operations178Introducing Pandas String Operations178Tables of Pandas String Methods180Example: Recipe Database184Working with Time Series188Dates and Times in Python188Pandas Time Series: Indexing by Time192Pandas Time series data structures192Frequencies and offsets195Resampling, Shifting, and Windowing196Where to Learn more202Example: Visualizing Seattle Bicycle Counts202High-Performance Pandas: eval) and query208Motivating query and eval(): Compound Expressions209pandas eval for Efficient Operations210Data Frame. eval() for Column-Wise Operations211Data Frame. query( Method213Performance: When to Use These functions214Further resources215Table of contents4. Visualization with Matplotlib.217General Matplotlib Tips218Importing matplotlib218Setting styles218show( or No show(? How to Display Your Plots218Saving figures to file221Two Interfaces for the Price of One222Simple line plots224Adjusting the Plot: Line Colors and Styles226Adjusting the Plot: Axes Limits228Labeling Plots230Simple scatter Plots233Scatter Plots with plt plot233Scatter Plots with plt scatter235olot Versus scatter: A Note on Efficiency237Visualizing errors237Basic errorbars238Continuous errors239Density and contour plots241Visualizing a Three-Dimensional Function241Histograms, Binnings, and Density245Two-Dimensional Histograms and Binnings247Customizing Plot Legends249Choosing elements for the legend251Legend for Size of Points252Multiple legends254Customizing Colorbars255Customizing Colorbars256Example: Handwritten Digits261Multiple Subplots262oIt axes: Subplots by hand263plt subplot: Simple Grids of Subplots264plt. subplots: The Whole Grid in One Go265plt. GridSpec: More Complicated Arrangements266Text and annotation268Example: Effect of Holidays on US Births269Transforms and text position270Arrows and annotation272Customizing Ticks275Major and Minor Ticks276Hiding Ticks or labels277Reducing or Increasing the Number of Ticks278Table of( ontents|ⅶiFancy Tick Formats279Summary of Formatters and Locators281Customizing Matplotlib: Configurations and Stylesheets282Plot Customization by hand282Changing the defaults: rcParams284Stylesheet285Three-Dimensional Plotting in Matplotlib290Three-Dimensional points and lines291Three-Dimensional Contour plots292Wireframes and Surface Plots293Surface Triangulations295Geographic Data with Basemap298Map Projections300Drawing a Map Background304Plotting data on maps307Example: California Cities308Example: Surface Temperature Data309Visualization with seaborn311Seaborn versus matplotlib312Exploring Seaborn Plots313Example: Exploring Marathon Finishing Times322Further resources329Matplotlib resources329Other Python Graphics Libraries3305. Machine Learning......,,,.331What Is Machine Learning?332Categories of Machine Learning332Qualitative Examples of Machine Learning Applications333Summary342ntroducing Scikit-Learn343Data Representation in Scikit-Learn343Scikit-Learns Estimator api346Application: Exploring Handwritten Digits354Summary359Hyperparameters and Model validation359Thinking About Model Validation359Selecting the Best Model363Learning curves370Validation in practice: grid Search373Summary375Feature Engineering375I Table of Contents
暂无评论