Data Mining.Multimedia,Soft Computing and BioinformaticsThis page intentionally left blankData MiningMultimedia, Soft Computing,and bioinformaticsSUSHMITA MITRAAssociate ProfessorMachine Intelligence UnitIndian statistical InstituteKolkata. indiaTINKU ACHARYASenior executive vice PresidentChief science officerAvisere I1Tucson. arizonaAdjunct professorDepartment of Electrical EngineeringArizona state UniversitTempe, arizona(WILEYNTERSCIENCEa JoHn WileY sons inc publicationCopyright o 2003 by John Wiley Sons, Inc. All rights reservedPublished by John wiley Sons, Inc, Hoboken, New JerseyPublished simultaneously in CanadaNo part of this publication may be reproduced, stored in a retrieval system or transmitted in any form orby any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except aspermitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the priorwritten permission of the Publisher, or authorization through payment of the appropriate per-copy fee tothe Copyright Clearance Center, Inc, 222 Rosewood Drive, Danvers, MA 01923, (978)750-8400,fax(978)750-4744,oronthewebatwww.copyright.comRequeststothePublisherforpermissionshouldbe addressed to the Permissions Department, John Wiley Sons, Inc., 111 River Street, Hoboken, NJ07030,(201)748-6011,fax(201)7486008,e-mail: permed@ wiley.coLimit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts inpreparing this book, they make no representation or warranties with respect to the accuracy orcompleteness of the contents of this book and specifically disclaim any implied warranties ofmerchantability or fitness for a particular purpose. No warranty may be created or extended by salesrepresentatives or written sales materials. The advice and strategies contained herein may not besuitable for your situation. You should consult with a professional where appropriate. Neither thepublisher nor author shall be liable for any loss of profit or any other commercial damages, includingbut not limited to special, incidental, consequential, or other damagesFor general information on our other products and services please contact our Customer CareDepartment within the U.S. at 877-762-2974, outside the U.S. at 317-572-3993 or fax 317-572-4002Wiley also publishes its books in a variety of electronic formats. Some content that appears in print,however, may not be available in electronic format.Library of congress Cataloging-in-Publication Data:Mitra, SushmitaData mining: multimedia, soft computing, and bioinformaticsSushmita Mitra and Tinku AcharyaIncludes bibliographical references and indexISBN0-471-460540( cloth)Data miningharya, Tinku. II.TitleQA769.D343M820030063-dc212003011394Printed in the united states of america0987654321To Ma, who made me what amand to Somosmita, who let me feel like a supermomsushmitaTo Lisa, Arita, and araniTinku acharyaThis page intentionally left blankContentsPreface1 Introduction to Data Mining1.1 Introduction1.2 Knowledge Discovery and Data Mining51.3 Data Compression101.4 Information retrieval1.5 Text mining1.6 Web Mining151.7 Image Mining161.8 Classification18lustering191.10 Rule Mining1.11 String Matching1.12 Bioinformatics1.13 Data Warehousing1.14 Applications and Challenges251.15 Conclusions and discussionReferences30CONTENTS2 Soft Computing352.1 Introduction2.2 What is Soft Computing?2.2.1 Relevance372.2.2 Fuzzy sets2.2.3 Neural networks442.2.4 Neuro-fuzzy computing2.2.5 Genetic algorithms2.6 Rough sets02.27Waⅴ lets612.3 Role of Fuzzy Sets in Data Mining622.3.1 Clustering632.3.2 Granular computing632.3.3 Association rules642.3.4 Functional dependencies2.3.5 Data summarization652.3.6 Image mining2.4 Role of Neural Networks in Data Mining672.4.1 Rule extraction672.4.2 Rule evaluation672.4.3 Clustering and self-organization692.4.4 Regression2.4.5 Information retrieval2.5 Role of Genetic Algorithms in Data Mining7025.1R712.5.2 Association rules712.6 Role of Rough Sets in Data mining2.7 Role of Wavelets in Data Mining2.8 Role of Hybridizations in Data Mining742.9 Conclusions and Discussion77References783 Multimedia Data Compression3.1 Introduction3.2 Information Theory Concepts913.2.1 Discrete memoryless model and entropy913.2.2 Noiseless Source Coding Theorem923.3 Classification of Compression algorithms94CONTENTS3.4 A Data Compression Model3.5 Measures of Compression Performance3.5.1 Compression ratio and bits per sample3.5.2 Quality metric973.5.3 Coding complexity3.6 Source Coding Algorithms993.6.1 Run-length coding3.6.2 Huffman coding1003.7 Principal Component Analysis for Data Compression3.8 Principles of Still Image Compression1053.8.1 Predictive coding1053.8.2 Transform coding1073.8.3 Wavelet coding103.9 Image Compression Standard: JPEG1123.10 The JPEG Lossless Coding Algorithm1133. 11 Baseline JPEG Compression1163.11.1 Color space conversion1163.11.2 Source image data arrangement1183.11.3 The baseline compression algorithm1193.11.4 Decompression process in baseline JPEG1263. 11.5 JPEG2000: Next generation still picture codingstandar1293. 12 Text Compression1313.12.1 The LZ77 algorithm1323.12.2 The Lz78 algorithm1333. 12.3 The Lzw algorithm1363. 12.4 Other applications of Lempel-Ziv coding1393.13 Conclusions and discussion140References1404 String Matching4.1 Introduction1434. 1. 1 Some definitions and preliminaries1444.1.2 String matching problem1464. 1. 3 Brute force string matching1484.2 Linear-Order String Matching algorithms4.2.1 String matching with finite automata1504.2.2 Knuth-Morris-Pratt algorithm1524.2.3 Boyer-Moore algorithm158