Mastering Data Mining with Python(PACKT2016)

linzhicheng 37 0 PDF 2019-09-14 08:09:00

Mastering Data MiData mining is an integral part of the data science pipeline. It is the foundation of any successful data-driven strategy – without it, you’ll never be able to uncover truly transformative insights. Since data is vital to just about every modern organization, it is worth taking the Mastering Data Mining with python Find patternshidden in your dataCopyright@ 2016 Packt PublishingAll rights reserved. No part of this book may be reproduced, stored in a retrievalsystem, or transmitted in any form or by any means without the prior writtenpermission of the publisher, except in the case of brief quotations embedded incritical articles or reviewsEvery effort has been made in the preparation of this book to ensure the accuracyof the information presented. However the information contained in this book issold without warranty either express or implied. Neither the author(s nor packtPublishing and its dealers and distributors will be held liable for any damagescaused or alleged to be caused directly or indirectly by this bookPackt Publishing has endeavored to provide trademark information about all of thecompanies and products mentioned in this book by the appropriate use of capitalsHowever, Packt Publishing cannot guarantee the accuracy of this informationFirst published August 2016Production reference: 1240816Published by packt Publishing ltdLivery place35 Livery streetBirmingham B3 2PB, UKISBN978-1-78588-995-0www.packtpub.comCreditsAuthorProject CoordinatorMegan SquireShweta h birwatkarReviewersProofreaderSanjeev JaiswalSafis EditingRon Mitsugo ZacharskiCommissioning EditorPratik shirodkarVeena PagareGraphicsAcquisition EditorKirk d' penhaLester friasProduction coordinatorContent Development EditorShantanu N. AgadeMamata WalkarCover WorkTechnical editorShantanu N. agadeNaveenkumar jainCopy EditorsSafis Editingha sAbout the authorMegan squire is a professor of computing sciences at Elon universityHer primary research interest is in collecting, cleaning, and analyzing dataabout how free and open source software is made. She is one of the leadersof the FLOSSmole. org, FLOSSdata. org, and FLOSSpapers org projectsAbout the reviewersSanjeev Jaiswal is a computer graduate with 7 years of industrial experienceHis works involves Perl, Python, and GNU/Linux. He is currently working onprojects involving penetration testing, source code review, and security designand implementationsHe is very much interested in web and cloud security. He is also learning NodeJsand cloud securitySanjeev loves teaching engineering students and It professionals. He has beenteachingforthelast8yearsinhisfreetimeIlefoundedAlienCoders(http://wwwaliencoders. org), based on the learning through sharing principle for computerscience students and It professionals in 2010, which became a huge hit in Indiaamong engineering studentsYoucanfollowhimonFacebookathttp://www.facebook.com/aliencoderonTwitterataaliencoders,andongithubathttps://github.com/jassicsSanjeev wrote Instant Page Speed Optimization and co-authored Learning django WebDevelopment for Packt Publishing. He has reviewed more than 5 books for Packt andlooks forward to more such opportunitiesRon Mitsugo Zacharski is a computational linguist working in the areas oftraction anding(zacharski. org). He has a bFAmusic from the University of wisconsin at milwaukee and a phd in computerscience from the university of Minnesota, and he completed a post doctoratein linguistics at the University of edinburgh he authored the free online bookA Programmer's Guide to Data mining: The Ancient art of the Numerati (wwwguidetodatamining. com) and co-edited The grammar-Pragmatics Interface: Essaysin Honor of Jeanette K. Gundel, published by John Benjamins. For the majority ofhis academic life, he has focused on multilingual natural language processing,particularly with lesser-studied languages. Dr. Zacharski is a Zen monk in theSoto school lineage of Soyu matsuoka. He lives in New mexicoWww.Packtpub.comeBooks, discount offers and moreDid you know that Packt offers e Book versions of every book published, with PDFandepuBfilesavailableYoucanupgradetotheebookversionatwww.packtpubcom and as a print book customer, you are entitled to a discount on the e Book copyGet in touch with us at customercareapacktpub. com for more detailsAtwww.packtPub.comyoucanalsoreadacollectionoffreetechnicalarticlessignup for a range of free newsletters and receive exclusive discounts and offers on Packtbooks and ebook] PACKTLIB°https://www2.packtpub.ccm/books/subscription/packtlibDo you need instant solutions to your it questions? PacktLib is Packt's online digitalbook library. Here, you can search, access, and read Packt's entire library of booksWhy subscribe?Fully searchable across every book published by PacktCopy and paste print, and bookmark contentOn demand and accessible via a web browserTable of contentsPrefaceChapter 1: Expanding Your Data Mining ToolboxWhat is data mining?How do we do data mining?The Fayyad et al. KDD processThe Han et al. KDD processThe CRisP-DM processThe Six Steps process124556789Which data mining methodology is the best?What are the techniques used in data miningWhat techniques are we going to use in this book?How do we set up our data mining work environment?SummaryChapter 2: Association Rule Mining19What are frequent itemsets?20The diapers and beer urban legend20Frequent itemset mining basics21Towards association rules23Support23Confidence24Association rules24An example with data25Added value- fixing a flaw in the plan27Methods for finding frequent itemsets28a project-discovering association rules in software project tags30Summary46Table of contentsChapter 3: Entity MatchingWhat is entity matching48Merging dataMerging datasets vertically51Merging datasets horizontally53echniques for matching54Attribute-based similarity matching54Be careful of pairwise comparisons54Leverage rare values55Methods for matching attributes55Range-based or distance from targetString edit distance55lamming distanceLevenshtein distanceSoundex57averaging disjoint sets58Context-based similarity matching58Machine learning-based entity matching59Evaluation of entity matching techniques60Efficiency-how long does it take to do the matching?Effectiveness- how accurate are the matches that we generate?6Usefulness-how practical is the matching procedure to use?63Entity matching project64Difficulties with matching software projects65Two examplesMatching on project names67Matching on people namesMatching on URLS67Matching on topics and description keywords68The dataset69The codeThe results75How many entity matches did we find?76How good are the pairs we found?Summar80Chapter 4: Network Analysis81What is a network?82Measuring a network85Degree of a network85Diameter of a network86Walks, paths, and trails in a network88Components of a network88Table of ContentsCentrality of a network89Closeness centralityDegree centrality90Betweenness centrality91Other measures of centrality92Representing graph data93Adjacency matrix93Edge lists and adjacency lists95Differences between graph data structures95Importing data into a graph structure96Adjacency list format97Edge list format97GEXF and GraphMLGDFPython pickle100JSONJson node and link series100JsoN trees101Pajek format102A real project103Exploring the data104Generating the network files111Understanding our data as a network112Generating simple network metrics113Playing with the parameters of a network116Analyzing subgraphs118Analyzing cliques and centrality in the subgraphs121Looking for change over time124Summary134Chapter 5: Sentiment Analysis in Text135What is sentiment analysis?136The basics of sentiment analysis137The structure of an opinion137Document-level and sentence-level analysis139Important features of opinions140Sentiment analysis algorithms141General-purpose data collections142Hu and liu's sentiment analysis lexicon142Sentiwordnet143Vader sentiment143Sentiment mining application144Motivating the project145Data preparation145

用户评论
请输入评论内容
评分:
Generic placeholder image 卡了网匿名网友 2019-09-14 08:09:00

感谢分享,包括了文本挖掘的内容。