Preface xi

Part I Asynchronous circuit design – A tutorial

Author: Jens Sparsø

1

IntroducTIon 3

1.1 Why consider asynchronous circuits? 3

1.2 Aims and background 4

1.3 Clocking versus handshaking 5

1.4 Outline of Part I 8

2

Fundamentals 9

2.1 Handshake protocols 9

2.1.1 Bundled-data protocols 9

2.1.2 The 4-phase dual-rail protocol 11

2.1.3 The 2-phase dual-rail protocol 13

2.1.4 Other protocols 13

2.2 The Muller C-element and the indicaTIon principle 14

2.3 The Muller pipeline 16

2.4 Circuit implementaTIon styles 17

2.4.1 4-phase bundled-data 18

2.4.2 2-phase bundled data (Micropipelines) 19

2.4.3 4-phase dual-rail 20

2.5 Theory 23

2.5.1 The basics of speed-independence 23

2.5.2 ClassificaTIon of asynchronous circuits 25

2.5.3 Isochronic forks 26

2.5.4 Relation to circuits 26

2.6 Test 27

2.7 Summary 28

3

Static data-flow structures 29

3.1 Introduction 29

3.2 Pipelines and rings 30

v

vi PRINCIPLES OF ASYNCHRONOUS CIRCUIT DESIGN

3.3 Building blocks 31

3.4 A simple example 33

3.5 Simple applications of rings 35

3.5.1 Sequential circuits 35

3.5.2 Iterative computations 35

3.6 FOR, IF, and WHILE constructs 36

3.7 A more complex example: GCD 38

3.8 Pointers to additional examples 39

3.8.1 A low-power filter bank 39

3.8.2 An asynchronous microprocessor 39

3.8.3 A fine-grain pipelined vector multiplier 40

3.9 Summary 40

4

Performance 41

4.1 Introduction 41

4.2 A qualitative view of performance 42

4.2.1 Example 1: A FIFO used as a shift register 42

4.2.2 Example 2: A shift register with parallel load 44

4.3 Quantifying performance 47

4.3.1 Latency, throughput and wavelength 47

4.3.2 Cycle time of a ring 49

4.3.3 Example 3: Performance of a 3-stage ring 51

4.3.4 Final remarks 52

4.4 Dependency graph analysis 52

4.4.1 Example 4: Dependency graph for a pipeline 52

4.4.2 Example 5: Dependency graph for a 3-stage ring 54

4.5 Summary 56

5

Handshake circuit implementations 57

5.1 The latch 57

5.2 Fork, join, and merge 58

5.3 Function blocks – The basics 60

5.3.1 Introduction 60

5.3.2 Transparency to handshaking 61

5.3.3 Review of ripple-carry addition 64

5.4 Bundled-data function blocks 65

5.4.1 Using matched delays 65

5.4.2 Delay selection 66

5.5 Dual-rail function blocks 67

5.5.1 Delay insensitive minterm synthesis (DIMS) 67

5.5.2 Null Convention Logic 69

5.5.3 Transistor-level CMOS implementations 70

5.5.4 Martin’s adder 71

5.6 Hybrid function blocks 73

5.7 MUX and DEMUX 75

5.8 Mutual exclusion, arbitration and metastability 77

5.8.1 Mutual exclusion 77

5.8.2 Arbitration 79

5.8.3 Probability of metastability 79

Contents vii

5.9 Summary 80

6

Speed-independent control circuits 81

6.1 Introduction 81

6.1.1 Asynchronous sequential circuits 81

6.1.2 Hazards 82

6.1.3 Delay models 83

6.1.4 Fundamental mode and input-output mode 83

6.1.5 Synthesis of fundamental mode circuits 84

6.2 Signal transition graphs 86

6.2.1 Petri nets and STGs 86

6.2.2 Some frequently used STG fragments 88

6.3 The basic synthesis procedure 91

6.3.1 Example 1: a C-element 92

6.3.2 Example 2: a circuit with choice 92

6.3.3 Example 2: Hazards in the simple gate implementation 94

6.4 Implementations using state-holding gates 96

6.4.1 Introduction 96

6.4.2 Excitation regions and quiescent regions 97

6.4.3 Example 2: Using state-holding elements 98

6.4.4 The monotonic cover constraint 98

6.4.5 Circuit topologies using state-holding elements 99

6.5 Initialization 101

6.6 Summary of the synthesis process 101

6.7 Petrify: A tool for synthesizing SI circuits from STGs 102

6.8 Design examples using Petrify 104

6.8.1 Example 2 revisited 104

6.8.2 Control circuit for a 4-phase bundled-data latch 106

6.8.3 Control circuit for a 4-phase bundled-data MUX 109

6.9 Summary 113

7

Advanced 4-phase bundled-data

protocols and circuits

115

7.1 Channels and protocols 115

7.1.1 Channel types 115

7.1.2 Data-validity schemes 116

7.1.3 Discussion 116

7.2 Static type checking 118

7.3 More advanced latch control circuits 119

7.4 Summary 121

8

High-level languages and tools 123

8.1 Introduction 123

8.2 Concurrency and message passing in CSP 124

8.3 Tangram: program examples 126

8.3.1 A 2-place shift register 126

8.3.2 A 2-place (ripple) FIFO 126

viii PRINCIPLES OF ASYNCHRONOUS CIRCUIT DESIGN

8.3.3 GCD using while and if statements 127

8.3.4 GCD using guarded commands 128

8.4 Tangram: syntax-directed compilation 128

8.4.1 The 2-place shift register 129

8.4.2 The 2-place FIFO 130

8.4.3 GCD using guarded repetition 131

8.5 Martin’s translation process 133

8.6 Using VHDL for asynchronous design 134

8.6.1 Introduction 134

8.6.2 VHDL versus CSP-type languages 135

8.6.3 Channel communication and design flow 136

8.6.4 The abstract channel package 138

8.6.5 The real channel package 142

8.6.6 Partitioning into control and data 144

8.7 Summary 146

Appendix: The VHDL channel packages 148

A.1 The abstract channel package 148

A.2 The real channel package 150

Part II Balsa - An Asynchronous Hardware Synthesis System

Author: Doug Edwards, Andrew Bardsley

9

An introduction to Balsa 155

9.1 Overview 155

9.2 Basic concepts 156

9.3 Tool set and design flow 159

9.4 Getting started 159

9.4.1 A single-place buffer 161

9.4.2 Two-place buffers 163

9.4.3 Parallel composition and module reuse 164

9.4.4 Placing multiple structures 165

9.5 Ancillary Balsa tools 166

9.5.1 Makefile generation 166

9.5.2 Estimating area cost 167

9.5.3 Viewing the handshake circuit graph 168

9.5.4 Simulation 168

10

The Balsa language 173

10.1 Data types 173

10.2 Data typing issues 176

10.3 Control flow and commands 178

10.4 Binary/unary operators 181

10.5 Program structure 181

10.6 Example circuits 183

10.7 Selecting channels 190

Contents ix

11

Building library components 193

11.1 Parameterised descriptions 193

11.1.1 A variable width buffer definition 193

11.1.2 Pipelines of variable width and depth 194

11.2 Recursive definitions 195

11.2.1 An n-way multiplexer 195

11.2.2 A population counter 197

11.2.3 A Balsa shifter 200

11.2.4 An arbiter tree 202

12

A simple DMA controller 205

12.1 Global registers 205

12.2 Channel registers 206

12.3 DMA controller structure 207

12.4 The Balsa description 211

12.4.1 Arbiter tree 211

12.4.2 Transfer engine 212

12.4.3 Control unit 213

Part III Large-Scale Asynchronous Designs

13

Descale 221

Joep Kessels & Ad Peeters, Torsten Kramer and Volker Timm

13.1 Introduction 222

13.2 VLSI programming of asynchronous circuits 223

13.2.1 The Tangram toolset 223

13.2.2 Handshake technology 225

13.2.3 GCD algorithm 226

13.3 Opportunities for asynchronous circuits 231

13.4 Contactless smartcards 232

13.5 The digital circuit 235

13.5.1 The 80C51 microcontroller 236

13.5.2 The prefetch unit 239

13.5.3 The DES coprocessor 241

13.6 Results 243

13.7 Test 245

13.8 The power supply unit 246

13.9 Conclusions 247

14

An Asynchronous Viterbi Decoder 249

Linda E. M. Brackenbury

14.1 Introduction 249

14.2 The Viterbi decoder 250

14.2.1 Convolution encoding 250

14.2.2 Decoder principle 251

14.3 System parameters 253

14.4 System overview 254

x PRINCIPLES OF ASYNCHRONOUS CIRCUIT DESIGN

14.5 The Path Metric Unit (PMU) 256

14.5.1 Node pair design in the PMU 256

14.5.2 Branch metrics 259

14.5.3 Slot timing 261

14.5.4 Global winner identification 262

14.6 The History Unit (HU) 264

14.6.1 Principle of operation 264

14.6.2 History Unit backtrace 264

14.6.3 History Unit implementation 267

14.7 Results and design evaluation 269

14.8 Conclusions 271

14.8.1 Acknowledgement 272

14.8.2 Further reading 272

15

Processors 273

Jim D. Garside

15.1 An introduction to the Amulet processors 274

15.1.1 Amulet1 (1994) 274

15.1.2 Amulet2e (1996) 275

15.1.3 Amulet3i (2000) 275

15.2 Some other asynchronous microprocessors 276

15.3 Processors as design examples 278

15.4 Processor implementation techniques 279

15.4.1 Pipelining processors 279

15.4.2 Asynchronous pipeline architectures 281

15.4.3 Determinism and non-determinism 282

15.4.4 Dependencies 288

15.4.5 Exceptions 297

15.5 Memory – a case study 302

15.5.1 Sequential accesses 302

15.5.2 The Amulet3i RAM 303

15.5.3 Cache 307

15.6 Larger asynchronous systems 310

15.6.1 System-on-Chip (DRACO) 310

15.6.2 Interconnection 310

15.6.3 Balsa and the DMA controller 312

15.6.4 Calibrated time delays 313

15.6.5 Production test 314

15.7 Summary 315

Epilogue 317

References 319

Index 333