Introduction to Machine Learning Second edition Ethemalpaydin Themitpress Cambridge,Massachusetts London,england o2010MassachusettsInstituteofTechnology Allrightsreserved.Nopartofthisbookmaybereproducedinanyformbyany electronicormechanicalmeans(includingphotocopyingrecording,orinforma tionstorageandretrieval)withoutpermissioninwritingfromthepublisher Forinformationaboutspecialquantitydiscounts,pleaseemail special_sales@mitpress.mit.edu Typesetin10/13LucidabrightbytheauthorusingETEX28 Printedandboundintheunitedstatesofamerica LibraryofCongressCataloging-in-PublicationInformation Alpaydin,ethe IntroductiontomachinelearningEthemalpaydin.-2nded cm Includesbibliographicalreferencesandindex isbn978-0-262-01243-0(hardcover:alk 1.Machinelearning.I.Title Q325.5.A462010 006.31-dc22 2009013169 CIP 10987654321 Briefcontents 1Introduction1 2SupervisedLearning21 3BayesianDecisionTheory47 4Parametricmethods61 5Multivariatemethods87 6Dimensionalityreduction109 7Clustering 143 8NonparametricMethods163 9Decisiontrees185 10Lineardiscrimination209 11Multilayerperceptrons233 12Localmodels279 13Kernelmachines309 14BayesianEstimation341 15HiddenMarkovModels363 16GraphicalModels387 17Combiningmultiplelearners419 18ReinforcementLearning 447 19DesignandAnalysisofmachineLearningExperiments475 aProbability517 Contents seriesforewordxvii Figures XIX Tables Preface Acknowledgments NotesfortheSecondedition Notations 1Introduction 1.1WhatIsMachinelearning 1.2ExamplesofMachineLearningApplications4 1.2.1LearningAssociations4 1.2.2Classification5 1.2.3Regression9 1.2.4UnsupervisedLearning11 1.2.5Reinforcementlearning 13 1.3Notes14 1.4Relevantresources16 1.5Exercises18 1.6References19 2SupervisedLearning21 2.1LearningaclassfromExamples21 Contents 2.2Vapnik-Chervonenkis(VC)Dimension27 2.3Probablyapproximatelycorrect(Pac)learning29 2.4Noise30 2.5LearningMultipleclasses32 2.6Regression34 2.7Modelselectionandgeneralization37 2.8DimensionsofaSupervisedmachinelearningalgorithm41 2.9Notes42 2.10Exercises43 2.11References44 3BayesianDecisionTheory47 3.1Introduction47 3.2Classification49 3.3Lossesandrisks5 3.4DiscriminantFunctions53 3.5UtilityTheory54 3.6Associationrules55 3.7Notes58 3.8Exercises58 3.9References59 4Parametricmethods61 4.1Introduction61 4.2MaximumLikelihoodestimation62 4.2.1Bernoullidensit 63 4.2.2Multinomialdensity64 4.2.3Gaussian(Normal)Density64 4.3EvaluatinganEstimator:Biasandvariance65 4.4ThebayesEstimator66 4.5ParametricClassification69 4.6Regression73 4.7Tuningmodelcomplexity:Bias/VarianceDilemma76 4.8Modelselectionprocedures80 4.9Notes84 4.10Exercises84 4.11References85 5Multivariatemethods87 5.1Multivariatedata87 Contents 5.2Parameterestimation88 5.3EstimationofMissingvalues89 5.4Multivariatenormaldistribution90 5.5Multivariateclassific n 94 5.6TuningComplexity99 5.7DiscreteFeatures102 5.8Multivariateregression103 tes105 5.10Exercises106 5.11References107 6Dimensionalityreduction 109 6.1Introduction109 6.2Subsetselection110 6.3Principalcomponentsanalysis113 6.4Factoranalysis120 6.5Multidimensionalscaling125 6.6LineardiscriminantAnalysis128 6.7Isomap133 6.8Locallylinearembedding 135 6.9Notes138 6.10Exercises139 6.11References140 7Clustering 143 7.1Introduction143 7.2Mixturedensities144 7.3k-Meansclustering 145 7.4Expectation-MaximizationAlgorithm149 7.5Mixturesoflatentvariablemodels154 7.6SupervisedlearningafterClustering155 7Hierarchicalclustering157 7.8Choosingthenumberofclusters158 7.9Notes160 7.10Exercises160 7.11Refer 161 8NonparametricMethods163 8.1Introduction163 8.2Nonparametricdensityestimation165 X Contents 8.2.1Histo Estimator165 8.2.2Kernelestimator167 8.2.3k-NearestNeighborEstimator168 8.3Generalizationtomultivariatedata170 8.4NonparametricClassificatin l71 8.5CondensedNearestNeighbor172 8.6NonparametricRegression:SmoothingModels174 8.6.1RunningMeanSmoother175 8.6.2KernelSmoother176 8.6.3RunningLineSmoother177 8.7HowtoChoosetheSmoothingParameter178 8.8Notes180 8.9Exercises181 8.10References182 9Decisiontrees185 9.1Introduction185 9.2Univariatetrees187 9.2.1ClassificationTrees188 9.2.2RegressionTrees192 9.3Pruning194 9.4Ruleextractionfromtrees197 9.5LearningRulesfromData198 9.6MultivariateTrees202 9.7Notes204 9.8Exercises207 9.9References207 10LinearDiscrimination 209 10.1Introduction209 10.2Generalizingthelinearmodel211 10.3Geometryofthelineardiscriminant212 10.3.1TwoClasses212 10.3.2Multipleclasses214 10.4PairwiseSeparation216 10.5ParametricDiscriminationrevisited217 10.6GradientDescent218 10.7LogisticDiscrimination220 10.7.1TwoClasses220