The Art of Computer Programming, Volumes 3TEX is a trademark of the American Mathematical SocietyMETAFONT is a trademark of Addison-WesleyLibrary of Congress Cataloging-in-Publication DataKnuth, Donald Ervin, 1938-The art of computer programming Donald Ervin Knuth2nd edxiv, 780 P. 24 cm.Includes bibliographical references and indexContents: v. 1. Fundamental algorithms. --v. 2. Seminumericalalgorithms.--V. 3. Sorting and searchingISBN0-201-03809-9(IsBN0-201-03822-6(v.2)ISBN0-201-89685-0(,31. Electronic digital computers--Programming. I. TitleQA76.6.K641997001.624273-1830CIPInternetpagehttp://www-cs-faculty.stanfordedu/knuth/taocp.htmlcontainscurrent information about this book and related booksCopyright C 1998 by Addison Wesley LongmanAll rights reserved. No part of this publication may be reproduced, stored in a retrievalsystem, or transmitted, in any form, or by any means, electronic, mechanical, photo-copying, recording, or otherwise, without the prior consent of the publisher. Printedin the United States of America. Published simultaneously in CanadaISBN0-201-89685-0Text printed on acid-free paper123456789MA0201009998First printing, March 1998PREFACECookery is becoma noble sciencecooks are gentlemenTT∪sLⅣvUS, Ab Urbe Condita xx×1×vi(Robert Burton, Anatomy of Melancholy 1.2.2.2)THiS BOOK forms a natural sequel to the material on information structures inChapter 2 of Volume l, because it adds the concept of linearly ordered data tothe other basic structural ideasThe title "Sorting and Searching "may sound as if this book is only for thosesystems programmers who are concerned with the preparation of general-purposesorting routines or applications to information retrievaL. But in fact the area ofsorting and searching provides an ideal framework for discussing a wide varietyof important general issueHow are good algorithms discovered?How can given algorithms and programs be improved?How can the efficiency of algorithms be analyzed mathematically?ow can a person choose rationally between different algorithms for thesame task?In what senses can algorithms be proved"best possible?How does the theory of computing interact with practical considerationsHow can external memories like tapes, drums, or disks be used efficientlywith large databasesIndeed, i believe that virtually every important aspect of programming arisessomewhere in the context of sorting or searchingThis volume comprises Chapters 5 and 6 of the complete series. Chapter 5is concerned with sorting into order; this is a large subject that has been dividedchiefly into two parts, internal sorting and external sorting. There also aresupplementary sections, which develop auxiliary theories about permutations(Section 5. 1)and about optimum techniques for sorting( Section 5. 3). Chapter 6deals with the problem of searching for specified items in tables or files; this issubdivided into methods that search sequentially, or by comparison of keys, orby digital propcrtics, or by hashing, and then the more difficult problem ofsecondary key retrieval is considered. There is a surprising amount of interplayV1 PREFACEbetween both chapters, with strong analogies tying the topics together. Twoimportant varieties of information structures are also discussed in addition tothose considered in Chapter 2, namely priority queues(Section 5.2.3 )and linearlists represented as balanced trees(Section 6.2.3Like volumes 1 and 2. this book includes a lot of material that does notappear in other publications. Many people have kindly written to me abouttheir ideas, or spoken to me about them, and i hope that i have not distortedthe matcrial too badly when i have presented it in my owL wordsI have not had time to search the patent literature systematically; indeedi decry the current tendency to seek patents on algorithms(see Section 5.4.5If somebody sends me a copy of a relevant patent not presently cited in thisbook, i will dutifully refer to it in future editions. However, I want to encouragepeople to continue the centuries-old mathematical tradition of putting newlydiscovered algorithms into the public domain. There are better ways to earn aliving than to prevent other people from making use of one's contributions tocomputer scienceBefore I retired from teaching, I used this book as a text for a student'ssecond course in data structures, at the junior-torgraduate level, omitting mostof the mathcmatical material. I also used the mathematical portions of this bookas the basis for graduate-level courses in thc analysis of algorithmS, emphasizingespecially Sections 5.1, 5.2.2, 6.3, and 6.4. A graduate-level course on concretecomputational complexity could also be based on Sections 5.3, and 5.4.4, togetherwith Sections 4.3.3, 4.6.3, and 4.6.4 of volume 2For the most part this book is self-contained, except for occasional discussions relating to the MIX computer explained in volume l. Appendix B contains asummary of the mathematical notations used, some of which are a little differentfrom those found in traditional mathematics booksPreface to the second editionThis new edition matches the third editions of volumes 1 and 2 in which I havebeen able to celebrate the completion of leX and metafont by applying thosesystems to the publications they were designed forThe conversion to electronic format has given me the opportunity to goover every word of the text and every punctuation mark. I've tried to retainthe youthful exuberance of my original sentences while perhaps adding somemore mature judgment. Dozens of new exercises have been added; dozens ofold exercises have been given new and improved answers. Changes appeareverywhere but most significantly in Scctions 5.1.4(about permutations andtableaux 53(about optimum sorting), 5.4.9(about disk sorting), 6.2.2(aboutentropy), 6.4(about universal hashing), and 6.5(about multidimensional treesand tries)PREFACEThe Art of computer Programming is, however, still a work in progressResearch on sorting and searching continues to grow at a phenomenal rateTherefore some parts of this book are headed by an iunder construction?" iconto apologize for the fact that the material is not up-to-date. For example, if Iwere teaching an undergraduate class on data structures today, I would sureldiscuss randomized structures such as treaps at some length; but at present, iam only ablc to cite the principal papers oIl the subject, and to announce plansfor a future Section 6.2.5 (see page 478 ). My files are bursting with importantmaterial that i plan to include in the final, glorious, third edition of volume 3,perhaps 17 years from Ilow. But I must finish Volumes 4 and 5 first, and i donot want to delay their publication any more than absolutely necessaryI am enormously grateful to the many hundreds of people who have helpedme to gather and refine this material during the past 35 years. Most of thehard work of preparing the new edition was accomplished by Phyllis winkler(who put the text of the first edition into TEX form), by Silvio Levy(whoedited it extensively and helped to prepare several dozen illustrations ), and byJeffrey Oldham(who converted more than 250 of the original illustrations toMETAPOST format). The production staff at Addison-Wesley has also beenextremely helpful, as usualI have corrected every error that alert readers detected in the first edition-as well as some mistakes that, alas, nobody noticed-and i have tried to avoidintroducing new errors in the new material. However, I suppose some defects stillremain, and i want to fix them as soon as possible. Therefore i will cheerfullypay $2.56 to the first finder of cach technical, typographical, or historical crrorThe webpage cited on page iv contains a current listing of all corrections thathave been reported to meStanford, CaliforniaD.E. KFebruary 1998There are certain common Privileges of a writerthe Benefit whereof, i hope, there will be no Reason to doubtParticularly, that where am not understood, it shall be concludedthat something very useful and profound is couch underneathJONATHAN SWIFT, Tale of a Tub, Preface(1704)NOTES ON THE EXERCISESTHE EXERCISES in this set of books have been designed for self-study as well asclassroom study. It is difficult, if not impossible, for anyone to learn a subjectpurely by reading about it, without applying the information to specific problemsand thereby being encouraged to think about what has been read. Furthermorewe all learn best the things that we have discovered for ourselves. Therefore theexercises form a major part of this work; a dcfinitc attempt has bccn made tokeep them as informative as possible and to select problems that are enjoyableas well as instructiveIn many books, easy exercises are found mixed randomly among extremelydifficult ones This is sometimes unfortunate because readers like to know inadvance how long a problem ought to take--otherwise they may just skip overall the problems. A classic example of such a situation is the book dynamicProgramming by Richard Bellman: this is an important, pioneering work inwhich a group of problems is collected together at the end of some chaptersunder the heading "Exercises and Research problems, with extremely trivialquestions appearing in the midst of deep, unsolved problems. It is rumored thatsomeone once asked Dr. Bellman how to tell the exercises apart from the researchproblems, and he replied "If you can solve it, it is an exercise, otherwise it's aresearch problemGood arguments can be made for including both research problems andvery easy exercises in a book of this kind thcrcfore, to savc the reader fromthe possible dilemma of determining which are which, rating mumbers have beenprovided to indicate the level of difficulty. These numbers have the followinggcneral significanccRating Inteerpretation00 An extremely easy exercise that can be answered immediately if thematerial of the text has been understood: such an exercise can almostalways be worked“ in your head.”10 A simple problem that makes you think over the material just read, butis by no means difficult. You should be able to do this in onc minute atmost; pencil and paper may be useful in obtaining the solution20 An average problem that tests basic understanding of the text material, but you may need about fifteen or twenty minutes to answer itcompletelyNOTFS ON THE EXERCISESg0 A problem of moderate difficulty and or complexity, this one mayinvolve more than two hours'work to solve satisfactorily, or even morcif the tv is on40 Quite a difficult or lengthy problem that would be suitable for a termproject in classroom situations. A student should be able to solve theproblem in a reasonable amount of time, but the solution is not trivial50 A research problem that has not yet been solved satisfactorily, as faras the author knew at the time of writing, although many people havetried. If you have found an answer to such a problem, you ought towrite it up for publication; furthermore, the author of this book wouldappreciate hearing about the solution as soon as possible(provided thatit is correct)By interpolation in this "logarithmic"scale, the significance of other ratingnumbers becomes clear. For example, a rating of 17 would indicate an exercisethat is a bit simpler than average. Problems with a rating of 50 that aresubsequently solved by some reader may appear with a 45 rating in later editionsof the book, and in the crrat a posted on the Internet(see page iv)The remainder of the rating number divided by 5 indicates the amount ofdetailed work required. Thus, an exercise rated 24 may take longer to solve thanan exercise that is rated 25, but the latter will require more creativityThe author has tried earnestly to assign accurate rating numbers, but it isdifficult for the person who makes up a problem to know just how formidable itwill be for someone else to find a solution; and everyone has more aptitude forcertain types of problems than for others. It is hoped that the rating numbcrsrepresent a good guess at the level of difficulty, but they should be taken asgeneral guidelines, not as a bsolute indicatorsThis book has becn written for readers with varying degrees of mathematicaltraining and sophistication; as a result, some of the exercises are intended only forthe use of more mathematically inclined readers. The rating is preceded by an Mif the exercise involves mathematical concepts or motivation to a greater extentthan necessary for someone who is primarily interested only in programmingthe algorithms themselves. An exercise is marked with the letters "H" if itssolution necessarily involves a knowledge of calculus or other higher mathematicsnot developed in this book. An"HM"designation does not neccssarily implydifficultySome exercises are preceded by an arrowhead, "ri this designates problems that are especially instructive and especially recommended. Of course, noreader student is expected to work all of the exercises, so those that seem to bethe most valuable have been singled out.(This is not meant to detract from theother exercises! Each reader should at least make an attempt to solve all of theproblems whose rating is 10 or less; and the arrows may help to indicate whichof the problems with a higher rating should be given priorityNOTES ON THE EXERCISESsolve the problem by yourself, or unless you absolutely do not have time to workthis particular problem. After getting your own solution or giving the problem adecent try, you may find the answer instructive and helpful. The solution givenwill often be quite short, and it will sketch the details under the assumptionthat you have earnestly tried to solve it by your own means first. Sometimes thesolution gives less information than was asked; often it gives more. It is quitepossible that you may have a better answer than the one published here, or youmay have found an error in the published solution; in such a case, the authorwill be pleased to know the details. Later editions of this book will give theimproved solutions together with the solver's name where appropriateWhen working an exercise you may generally use the answers to previousexercises, unless specifically forbidden from doing So. The rating numbers havebeen assigned with this in mind; thus it is possible for exercise m+ 1 to have alower rating than exercise n, even though it includes the result of exercise n asa special caseSummary of codes:o Immediate10 Simple (one minute20 Medium(quarter hourRecommended30 Moderately hardM Mathematically oriented40 Term projectHM Requiring"higher math50 Research problemEXERCISES1.[00」 What does the rating“Mg0mean?2.10 Of what value can the exercises in a textbook be to the reader3. HM45 Prove that when n is an integer, n2. the equation a"t yasno solution in positive integers , y,2Two hours daily exercise.. will be enoughto keep a hack fit for his work-M. H. MAHON, The Handy Horse Book (1865)CONTENTSChapter 5--Sorting5.1. Combinatorial Properties of Permutations5.1.1. Inversions*5. 1.2. Permutations of a multiset*5.13.Runs35*5. 1. 4. Tableaux and involutions45.2. Internal sorting5.2.1. Sorting by Insertion5.2.2. Sorting by Exchanging1055. 2.3. Sorting by Selection1385.2.4. Sorting by Merging1585.2,5. Sorting by Distribution1685. 3. Optimum Sorting1805.3. 1. Minimum- Comparison Sorting180*5.3.2. Minimum-Comparison Merging197*5. 3.3. Minimum-Comparison Selection207for Sorting21954.Eⅹ ternal sorting5.4.1. Multiway Merging and Replacement Selectin25#.4,2. The Polyphase Merge267w5.4.3. The Cascade Merge288*5. 4.4. Reading Tape Backwards2995. 4.5. The Oscillating Sort311*5.4.6. Practical Considerations for Tape merging3174.7. External Radix Sorting5. 4. 8. Two-Tape Sorting3485. 4.9. Disks and drums3565.5. Summary, History, and Bibliography380Chapter 6--Searching3926. 1. Sequential Searching3966. 2. Searching by Comparison of Keys4096.2.1. Searching an Ordered Table6. 2.2. Binary Tree Searching4266.2.3. Balanced Trees4586.2.4. Multiway Tree481X11