Outlier Analysis

shihuishui 49 0 PDF 2019-09-14 21:09:23

This book provides comprehensive coverage of the field of outlier analysis from a computer science point of view. It integrates methods from data mining, machine learning, and statistics within the computational framework and therefore appeals to multiple communities. The chapters of this book can bCharu C. AggarwalOutlier analysis②Sppringerhttp:/lavaxho.me/blogs/chrisredfieldCharu C. aggarwalIBMTJ. Watson research centerYorktown HeightsNew yorkUSAISBN978-1-4614-6395-5ISBN978-1-4614-6396-2( eBook)DOI10.1007/978-1-4614-6396-2Springer New York Heidelberg Dordrecht LondonLibrary of congress Control Number: 2012956186o Springer Science+Business Media New York 2013This work is subject to copyright All rights are reserved by the Publisher, whether the whole or part ofthe material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,broadcasting, reproduction on microfilms or in any other physical way, and transmission or informationstorage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodologynow known or hereafter developed. Exempted from this legal reservation are brief excerpts in connectionwith reviews or scholarly analysis or material supplied specifically for the purpose of being entered andexecuted on a computer system, for exclusive use by the purchaser of the work. Duplication of thispublication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisherslocation, in its current version, and permission for use must always be obtained from Springer. Permissions foruse may be obtained through rightslink at the Copyright Clearance Center. Violations are liable to prosecutionunder the respective Copyright LawThe use of general descriptive names, registered names, trademarks, service marks, etc in this publicationdoes not imply, even in the absence of a specific statement, that such names are exempt from the relevantprotective laws and regulations and therefore free for general useWhile the advice and information in this book are believed to be true and accurate at the date of publication,neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors oromissions that may be made. The publisher makes no warranty, express or implied, with respect to thematerial contained hereinPrinted on acid-free papepringerispartofSpringerScience+businessMedia(www.springer.com)To my wife, Lata andmy daughter, sayaniContentsaceAcknowledgmentsAn Introduction to Outlier Analysis123IntroductionThe Data Model is Everything6The Basic Outlier Models103. 1 Extreme Value Analysis103.2 Probabilistic and Statistical Models3.3 Linear models3.4 Proximity-based Models43.5 Information Theoretic Models3.6 High-Dimensional Outlier Detection4.Meta-Algorithms for Outlier analysis194.1 Sequential Ensembles204.2 Independent Ensembles21The Basic Data Types for Analysis221 Categorical, Text and Mixed Attributes235.2 When the Data Values have Dependencies2367Supervised outlier Detection28Outlier Evaluation TechniquesConclusions and summary359Bibliographic Survey3510. Exercises38Probabilistic and statistical models for Outlier DetectionIntroduction412Statistical Methods for Extreme Value Analysis432.1 Probabilistic Tail Inequalities432.2 Statistical Tail Confidence Tests503. Extreme Value Analysis in Multivariate Data543.1 Depth-based Methods553.2 Deviation-based methods3.3 Angle-based Outlier Detection573.4 Distance distribution-based methods604. Probabilistic Mixture Modeling for Outlier Analysis62Limitations of Probabilistic ModelingiOUTLⅠ ER ANALYSIS678inclusions andammarQBibliographic survey70Exercises72Linear models for Outlier DetectionIntroduction75Linear regression models1 Modeling with Dependent Variables80Regression Modeling for Mean Square Projection Error 84Principal Component Analysis83.1 Normalization issues3.2 Applications to Noise Correction3.3 How Many Eigenvectors?4Limitations of Regression Analysis94Conclusions and summary956. Bibliographic Survey957Exercises974Proximity-based Outlier Detection101123Introduction101Clusters and Outliers: The Complementary relationship103Distance-based Outlier Analysis1083.1 Cell-based Methe1093.2 Index-based methods1123.3 Reverse Nearest Neighbor Approach1153.4 Intensional Knowledge of distance-based Outliers1163.5 Discussion of distance-based methods117Density-based Outliers4.1 LOF: Local Outlier Factor1194.2 LOCI: Local Correlation Integral1204.3 Histogram-based Techniques1234.4 Kernel Density Estimation124Limitations of Proximity-based Detection1256.Conclusions and Summary1267.Bibliographic survey126Exercises132High-Dimensional Outlier Detection: The Subspace Method135Introduction135Projected Outliers with grids1402. 1 Defining abnormal Lower Dimensional Projections1402 Evolutionary algorithms for Outlier Detection14Distance-based Subspace Outlier Detection1443.1 Subspace Outlier Degree1453.2 Finding Distance-based Outlying Subspaces1464. Combining Outliers from Multiple Subspaces1474.1 Random Subspace Sampling1474.2 Selecting High Contrast Subspaces149Contents4.3 Local Selection of Subspace Projections150Generalized Subspaces15356789Discussion of Subspace analysis159Conclusions and Summary162Bibliographic survey163Exercises1666upervised Outlier Detection169Introduction169The Fully Supervised Scenario: Rare Class Detection1732.1 Cost Sensitive Learning1742.2 Adaptive Re-sampling1802.3 Boosting methods182The Semi-Supervised scenario: Positive and Unlabeled Data 1843.1 Difficult Cases and One-Class Learning185The Semi-Supervised Scenario: Novel Class detection186One Class Novelty Detection1874.2 Combining novel Class detection with Rare Class detection1894.3 Online Novelty Detection189Human Supervision1905.1 Active Learning1915.2 Outlier by Example1936.Conclusions and Summary1947. Bibliographic Survey194Exercises1977Outlier Detection in Categorical, Text and Mixed Attribute Data199Introduction1991234Extending Probabilistic Models to Categorical Data202.1 Modeling Mixed Data203Extending Linear Models to Categorical and Mixed Data204Extending Proximity Models to Categorical Data2054.1 Aggregate Statistical Similarit2064.2 Contextual similarit2074.3 Issues with Mixed data4.4 Density-based Methods2104.5 Clustering Methods210Outlier Detection in Binary and Transaction Data2105.1 Subspace methods211ovelties in Temporal transactions2126Outlier Detection in Text Data2136.1 Latent Semantic Indexing6.2 First Story Detection2147.Conclusions and summary220Bibliographic survey2209.Exercises223OUTLIER ANALYSISTime Series and Multidimensional Streaming Outlier Detection225Introduction2252. Prediction-based Outlier Detection of Streaming Time Series 2292.1 Autoregressive Models2302.2 Multiple Time Series Regression Models2322.3 Supervised Outlier Detection in Time Series237Time-Series of Unusual Shapes2393.1 Transformation to Other Representations2413.2 Distance-based method2433.3 Single Series versus Multiple series2453.4 Finding Unusual Shapes from Multivariate Series2463.5 Supervised Methods for Finding Unusual Time-SeriesShapes248Outlier Detection in Multidimensional Data Streams2494.1 Individual Data Points as Outliers2504.3 Rare and Novel Class Detection in Multidimensional s4.2 Aggregate Change Points as OutliersData Streams257567Conclusions and Summary260Bibliographic survey260Exercises264Outlier Detection in Discrete Sequences267Introduction267Position outliers2702. 1 Rule-based models2732.2 Markovian models2742.3 Efficiency Issues: Probabilistic Suffix Trees277Combination outliers2803.1 A Primitive Model for Combination Outlier Detection 2833.2 Distance-based models2863.3 Frequency-based models2903. 4 Hidden markov models292Complex sequences and scenarios3044.1 Multivariate Sequences3044.2 Set-based Sequences3054.3 Online Applications: Early Anomaly Detection306Supervised outliers in sequences3066.Conclusions and Summary307.Bibliographic Survey309Exercises311patial Outlier Detection313Introduction313Neighborhood-based Algorithms3182. 1 Multidimensional methods3192.2 Graph-based Methods3202.3 Handling Multiple Behavioral Attributes3213.Autoregressive Models3214Visualization with variogram clouds323

用户评论
请输入评论内容
评分:
暂无评论