豆瓣链接 https://book.douban.com/subject/10758624/完整带目录。Today's Web-enabled deluge of electronic data calls for automated methods of data analysis. Machine learning provides these, developing methods that can automatically detect patterns in data and then use the uncovered patterns to predict futuMachine LearningA Probabilistic PerspectiveKevin P. MurphThe mit PressCambridge, MassachusettsLondon, Englandc 2012 Massachusetts institute of technololI rights reserved. No part of this book may be reproduced in any form by any electronic or mechanicalmeans (including photocopying, recording, or information storage and retrieval)without permission inwriting trom the publisherFor information about special quantity discounts, please email special_sales@mitpress. mit. eduThis book was set in the heX programming language by the author. Printed and bound in the UnitedStates of americaLibrary of Congress Cataloging-in-Publication InformationMurphy, Kevin PMachine learning: a probabilistic perspective/Kevin P. Murphyp. cm. -(Adaptive computation and machine learning series)ncludes bibliographical references and indexISBN 978-0-262-01802-9 (hardcover: alk. paper1. Machine learning. 2. Probabilities. I. TitleQ325.5.M872012006.31-dc232012004558This book is dedicated to alessandro, Michael and stefanoand to the memory of gerard Joseph MurphyContentsPreface1 IntroductionMachine learning: what and why1 Types of machine learning 21.2 Supervised learning 31.2.1Classification1.2.2Re81.3 Unsupervised learning1.3. 1 Discovering clusters1.3.2ng latent facto1.3.3 Discovering graph structure 131.3.4 Matrix completion1.4 Some basic concepts in machine learning161.4.1Parametric vs non-parametric models 161.4.2 A simple non-parametric classifier: K-nearest neighbors 161.4.3The curse of dimensionality81.4.4 Parametric models for classification and regression 191.4.5Linear regression191.4.6 Logistic regression 211.4.7 Overfitting 221.4.8Model selection 221.4.9No free lunch theorem 242 Probability272. 1 Introduction2.2 A brief review of probability theory 282,2. 1 Discrete random variables 282.2.2 Fundamental rules 282.2.3 Bayes rule 292. 2. 4 Independence and conditional independence 302.2.5 Continuous random variables 32CONTENTS2.2.6 Quantiles 332.2.7 Mean and variance 332.3 Some common discrete distributions 342.3. 1 The binomial and Bernoulli distributions 342.3.2 The multinomial and multinoulli distributions 352.3.3 The poisson distribution 372.3.4 The empirical distribution 372.4 Some common continuous distributions 382.4.1Gaussian (normal) distribution 382.4.2Date pdf 392.4.3 The Laplace distribution 412.4.4 The gamma distribution 412. 4.5 The beta distribution 422.4.6 Pareto distribution 432.5 Joint probability distributions 442.5. 1 Covariance and correlation 442.5.2 The multivariate Gaussian 462.5.3 Multivariate Student t distribution 462.5.4 Dirichlet distribution 472.6 Transformations of random variables 492.6.1Linear transformations 492.6.2 General transformations 502.6.3 Central limit theorem 512.7 Monte Carlo approximation 522.7.1 Example: change of variables, the Mc way 532.7.2 Example: estimating t by monte Carlo integration2.7.3 Accuracy of Monte Carlo approximation 54Information theory 562.8.1Entropy562.8.2 KL divergence 572.8.3 Mutual information593 Generative models for discrete data653.1 Introduction 653.2 Bayesian concept learning 653.2. 1 Likelihood 673.2.2 Prior 673.2.3 Posterior 683.2. 4 Posterior predictive distribution 713.2.5 A more complex prior 723.3 The beta-binomial model 723. 3. 1 Likelihood 733.3.2Prior 7433. 3 Posterior 753.3.4 Posterior predictive distribution 77CONTENTS3.4 The Dirichlet-multinomial model 783.4.1Likelihood3.4.2 Prior 793.4.3 Posterior 793.4.4Posterior predictive 813.5 Naive Bayes classifiers 823.5.1Model fitting833.5.2 Using the model for prediction 853.5.3 The log-sum-exp trick 863.5.4 Feature selection using mutual information 863.5.5 Classifying documents using bag of words 874 Gaussian models4.1 Introduction 9741.1Notation974.1.2Basics 974. 1.3 mlE for an mvn 9941.4Maximum entropy derivation of the gaussian4.2 Gaussian discriminant analysis4.2.1 Quadratic disalysis(QDA) 1024.2.2 Linear discriminant analysis lda)1034.2.3Two-class LDa 1044.2.4 MLE for discriminant analysis 1064.2.5 Strategies for preventing overfitting 1064.2.6 Regularized LDA* 1074.2.7 Diagonal LDA 1084.2.8 Nearest shrunken centroids classifier 1094.3 Inference in jointly Gaussian distributions 1104.31Statement of the result4.3.2 Examples 1114.3.3 Information form 1154.3.4 Proof of the result 1164.4 Linear Gaussian systems 1194.4.1Statement of the resultl194.4.2 Examples 1204.4.3 Proof of the result 1244.5 Digression: The Wishart distribution 124.51Inverse Wishart distribution 1264.5.2 Visualizing the Wishart distribution* 1274.6 Inferring the parameters of an MVN 1274.6. 1 Posterior distribution of u 1284.6.2 Posterior distribution of 21284.6.3 Posterior distribution of u and 2* 1324.6.4 Sensor fusion with unknown precisions* 138