High Performance Python.pdfHigh Performance PythonMicha gorelick and lan ozsvaldBeijing· Cambridge.mham·Kh· Sebastopol. TokyoOREILLYHigh Performance Pythonby Micha gorelick and lan OzsvaldCopyright@ 2014 Micha Gorelick and Ian Ozsvald. All rights reservedPrinted in the United States of americaPublished by o reilly media, InC., 1005 Gravenstein Highway North, Sebastopol, CA 95472OReilly books may be purchased for educational, business, or sales promotional use. Online editions arealsoavailableformosttitles(http://safaribooksonline.com/).Formoreinformationcontactourcorporateinstitutionalsalesdepartment800-998-9938orcorporate@oreilly.comEditors: Meghan Blanchette and Rachel Roumeliotis Indexer: Wendy CatalanoProduction editor: matthew hackerCover Designer: Karen MontgomeryCopyeditor: Rachel HeadInterior Designer: David FutatoProofreader: Rachel Monaghanstrator: Rebecca demarestSeptember 2014: First EditionRevision History for the First Edition:2014-08-21: First releaseSeehttp://oreilly.com/catalog/errata.csp?isbn=9781449361594forreleasedetailsNutshell Handbook, the Nutshell Handbook logo, and the O Reilly logo are registered trademarks ofO ReillyMedia, Inc. High Performance Python, the image of a fer-de-lance, and related trade dress are trademarksof O Reilly Media, IncMany of the designations used by manufacturers and sellers to distinguish their products are claimed astrademarks. Where those designations appear in this book, and O Reilly Media, Inc was aware ofa trademarkclaim, the designations have been printed in caps or initial capsWhile every precaution has been taken in the preparation of this book, the publisher and authors assumeno responsibility forerrors or omissions, or for damages resulting from the use of the information containedhereinISBN:978-1-449-361594Table of contentsPreface1. Understanding Performant Python.The Fundamental Computer SystemComputing unitsMemory Unitsx11257Communications layersPutting the Fundamental Elements TogetherIdealized Computing Versus the Python Virtual Machine10So why use Python?132. Profiling to Find BottlenecksProfiling Efficiently18Introducing the Julia Set19Calculating the Full Julia Set23Simple approaches to Timing-print and a Decorator26Simple Timing Using the Unix time Command29USing the pRofile Moovule31Using runsnakerun to Visualize cProfile output36USing line_profiler for Line-by-Line Measurements37Using memory_profiler to Diagnose Memory Usage42Inspecting Objects on the Heap with heapy48USing dowser for Live Graphing of Instantiated Variables50Using the dis Module to Examine CPython Bytecode52Different Approaches, Different Complexity54Unit Testing During Optimization to Maintain Correctness56No-op @profile Decorator5Strategies to Profile Your Code Successfully5790p-up3. Lists and Tuples............A More EfficientLists Versus tuplesLists as dynamic arrays67Tuples as static arraysp-Up4. Dictionaries and setsHow Do Dictionaries and sets Work?Inserting and RetrievingDeletion80Resizing81Hash Functions and Entropy81Dictionaries and Namespacesp-up885. Iterators and generatorsIterators for infinite series92Lazy generator Evaluation94Wrap-UPpP986. Matrix and Vector Computation99Introduction to the problem100Arent Python Lists Good Enough105Problems with Allocating Too Much106Memory Fragmentation109Understanding perrfMaking Decisions with perf's Output113Enter numpy114Applying numpy to the Diffusion ProblemMemory Allocations and In-Place Operations120Selective Optimizations: Finding what Needs to be Fixed124numexpr: Making In-Place Operations Faster and Easier127A Cautionary Tale: Verify Optimizations(scipy)129Wrap-Up7.〔 compiling to〔135What Sort of Speed Gains Are Possible?136JIT VersuS AOT Compilers138Why Does Type Information Help the Code Run Faster?138Using a C Compiler139Reviewing the Julia Set Example140Cython140Table of contentsCompiling a pure-Python version using cython141Cython annotations to analyze a block of code143Adding Some Type Annotations145Shed skin150Building an Extension Module151The Cost of the Memory Copies153Cython and numpy154Parallelizing the Solution with OpenMP on One Machine155Numba157Pythran159Garbage Collection Differences161Running pypy and installing modules162When to Use Each Technology163Other Upcoming Projects165A Note on graphics Processing Units(GPUs165A Wish for a Future Compiler project166Foreign Function Interfaces166ctypes167ffi170fryCPython module175Wrap-Up1798.〔 concurrency..181Introduction to Asynchronous Programming182Serial crawler185gevent187tornado192asyncH196Database Example198Wrap-Up2019. The multiprocessing module203An Overview of the Multiprocessing Module206Estimating Pi USing the Monte Carlo M208Estimating Pi Using Processes and Threads209Using Python Objects210Random Numbers in Parallel Systems217Using numpy218Finding Prime Numbers221Queues of Work227Verifying Primes Using Interprocess Communication232Table of contentSerial solution236Naive pool solutionA Less naive pool solution238USing Manager. Value as a Flag239Using Redis as a Flag241Using raw value as a Flag243Using mmap as a flas244USing mmap as a Flag Redux245Sharing numpy Data with multiprocessing248Synchronizing file and Variable access254File locking254Locking a value258Wrap-Up26110. Clusters and Job Queues.263Benefits of Clustering264Drawbacks of Clustering265$462 Million Wall Street Loss Through Poor Cluster Upgrade Strategy266Skype s 24-Hour Global Outage267Common Cluster Designs268How to Start a Clustered Solution268Ways to Avoid Pain When Using Clusters269Three Clustering Solutions270Using the Parallel Python Module for Simple local clusters271Using IPython Parallel to Support Research272NSQ for Robust Production Clustering277Queues277Pub/sub278Distributed Prime calculation280Other Clustering Tools to Look at284Wrap-Up28411. Using Less ram287Objects for Primitives Are Expensive288The Array Module Stores Many Primitive Objects Cheaply289Understanding the ram used in a collection292Bytes versus Unicode294Efficiently Storing Lots of Text in RAM295Trying These Approaches on 8 Million Tokens296Tips for Using Less ram304Probabilistic Data structures305Very approximate Counting with a 1-byte Morris Counter306K-Minimum values308vi Table of ContentsBloom filters312LogLog Counter317Real-World example32112. Lessons from the field......4.4.4...4... 325Adaptive Labs Social Media AnalytiCs(SOMA)325Python at Adaptive lab326SOMAS Design326Our Development Methodology327Maintaining SoMA327Advice for Fellow Engineers328MakingDeepLearningFlywithRadimrehurek.com328The Sweet Spot328Lessons in Optimizing330Wrap-UP332Large-scaleProductionizedMachineLearningatLyst.com333Pythons place at lyst333Cluster Design333Code evolution in a Fast-Moving Start-Up333Building the recommendation Engine334Reporting and Monitoring334Some advice335Large-Scale Social Media Analysis at Smesh335Pythons role at Smesh335The platform336High Performance Real-Time String Matching336Reporting, Monitoring, Debugging, and Deployment338PyPy for Successful Web and Data Processing Systems339Prerequisites339The database340The Web Application340OCR and Translation341Task Distribution and workers341Conclusion341TaskqUEuesatLanyrd.com342Pythons role at lanyrd342Making the Task Queue Performant343Reporting, Monitoring, Debugging, and Deployment343Advice to a Fellow Developer343Index345Table of Contents