ARM GCC Inline Assembler Cookbook.pdf
介绍ARM编程中嵌入汇编的使用%o/result refers to output operand, the c variable y, and%o/value] refers to the input operand the c variable xSymbolic operand names use a separate name space. That means, that there is no relation to anyother symbal table. To put it simple: You can choose a name without ta king care whether the samename already exists in your C code. However, unique symbols must be used w ithin each asmstatementIf you a lready loo ked to some working inline assembler statements written by other authors, youmay have noticed a significant difference. In fact the gcc compiler supports symbolic names sinceversion 3.1. For earlier releases the rotating bit example must be written asasm("mov %0, %1, ror #1":=r"(result):r"(value)Operands are referenced by a percent sign followed by a single digit, where %0 refers to the first%o1 to the second operand and so forth. This format is still supported by the latest GCC releasesbut quite error-prone and difficult to mainta in. Ima gine that you have written a large number ofassembler instructions, where operands have to be renumbered manua lly after inserting a newoutput operandIf all this stuff still looks a little odd, don 't worry. Beside the mysterious clobber list, you have thestrong fee ling that something else is missing right? Indeed we didn't talk about the constraintstrings in the operand lists. I'd like to ask for your patience. There's something more important tohighlight in the next chapterC code optimizationThere are two possible reasons why you want to use assembly language. First is, that C is limitedwhen we are getting closer to the hardware. E. g. there's no C statement for directly modifying theprocessor status register. The second reason is to create highly optimized code. No doubt, the GNUC code optimizer does a good job, but the results are far away from handcrafted assembler codeThe subject of the chapter is often overlooked When adding assembly language code by usinginline assembler statements, this code is also processed by the c compiler's code optimizer. Let'sexamine the part of a compiler listing w hich may have been generated from our rotating bitsexample0309DE5 Idr r3, sp, #o x, XE330A0EI mov r3, r3, ror #l g tmp, x0430815strr3,[sp,#4]@tmp,yThe compiler selected register r3 for bit rotation. It could have selected any other re gister or tworegisters, one for each C varia ble. It may not explicitly load the value or store the result. Here isanother listing generated by a diffe rent compiler version with different compile optionsE420A0E1 mov r2. r4, ror #1The compiler selected a unique register for each operand, using the value already cached in r4 andpassing the result to the follow ing code in r2. Did you get the picture?Often it becomes worse. The compiler may even decide not to include your assembler code at allThese decisions are part of the compiler's optimization strategy and depend on the context in whichyour assembler instructions are used. For example if you never use any of the output operands inthe remaining part of the C program, the optimizer w ill most likely remove your inline assemblerstatement. The No P example we presented initia lly may be such a candidate as well, because tothe compiler this is useless overhead slowing down program execution.The solution is to add the volatile attribute to the asm statement to instruct the compiler to excludeyour assembler code from code optimization Re member that you have been warned to use theinitial example Here is the revised version:/x NOP example, revised *asm volatile(" mov rO, ro")But there is more trouble w aiting for us. A sophisticated optimizer will re-arrange the code. thefollowing C snippet had been left over after several last minute changes:1f(1The optimizer will re cognize that the two incre ments do not have any impact on the conditionalstatement. Furthermore it knows, that incrementing a value by 2 will cost one ARM instruction onlyThus, it w ill re-arrange the code tof(j--1)i+-2and save one ARM instruction. As a result: There is no guarantee that the compiled code will retainthe sequence of statements given in the source codeThis may have a great impact on your code, as we will demonstrate now. The follow ing code intendsto multiply c with b, of which one or both may be modified by an interrupt routine. Disa blinginterrupts before a ccessing the varia bles and re -ena ble them afterw ards looks like a good ideaasm volatile("mrs r12, cpsr\n\torr r12, r12, FOxCo\n\tpsr C, r12\nt.C*=b;/米 This mav fail.asm volatile mrs r12, cpsr\rbic r12, r12, #OxCO\nmsr cpsr c, r12:::r12",cc")Unfortunately the optimizer may decide to do the multiplication first and then execute both inlineassembler instructions or vice versa. This w ill ma ke our assembly code uselessWe can solve this with the help of the clobber list, which will be explained now. The clobber list fromthe example aboveinforms the compiler that the assembly code modifies register r12 and updates the condition codeflags. Btw. using a hard coded register w ill typically prevent best optimization results In generayou should pass a variable and let the compiler choose the adequate re gister. Beside registernames and cc for the condition register, memory is a valid keyw ord too. It tells the compiler that theassembler instruction may change me mory lo cations. This forces the compiler to store all cachedvalues before and reload them after executing the assembler instructions. And it must retain thesequence because the contents of all varia bles is unpredicta ble after executing an asm statementwith a memory clobber." orr r12, r12, #OxCo\nt+tasm volatile("mrs r12, cpsr\n\rcpsrc, r12\n\t:::r12cc,memory")b;/* This is safe.水asm volatile( mrs r12, cpsr\nbic r12, r12, #OxCO\nmsr cpsr C,C, memorvInvalidating all cached values may be suboptimal Altern a tively you can add a dummy operand tocreate an artificial dependencyasm volatile("mrs r12, cpsr\n\torr r12, r12, #OxCO\n\tmsr cpsr c, r12\n\tC *=b:/* This is safe. */asm volatile( mrs r12, cpsr\nbic r12, r12, #OxCO\n" msr cpsr C,r12"::"X"(c):"r12","cc");This code pretends to modify variable b in the first asm statement and to use the contents variablec in the second. This w ill preserve the sequence of our three state ments w ithout inva lidating othercached variablesIt is essential to understand how the optimizer affe cts inline assembler statements. If some thingremains nebulous, better re-read this part before moving on the the next topicInput and output operandsWe learned that each input and output operand is described by a symbolic name enclosed insquare bracket, followed by a constraint string w hich in turn is followed by a c expression inparentheses.What are these constraints and why do we need them? You probably know that every assemblyinstruction a ccepts specific operand types only. For example, the branch instruction expects a targetaddress to jump at. However, not every memory address is valid because the final opcode a cceptsa 24-bit offset only. In contrary the branch and exchange instruction expects a register that containsa 32-bit target address. In both cases the operand passed from C to the inline assembler may bethe same C function pointer. Thus, when passing constants, pointers or varia bles to in line assemblystatements, the inline assembler must know, how they should be re presented in the assemblycodeFor ARM processors, GCC 4 provides the follow ing constraintsConstraint Usage in ARM stateUsage in Thumb stateFloating point registers fO.. fNot availa bleNot availableRegisters r8.r15Immediate floating point constant Not availableSame a G, but negatedNot availa bleImmediate value in data processing Constant in the range0.255instructionse.g. SWI operande.g. ORR RO, RO, #operandIndexing constants. 4095Constant in the range -255..-1e.g. LDR R1, [PC #operand]e.g. SUB RO, RO, #operandKSame as i, but invertedSame asi, but shiftedSame as I, but negatedConstant in the range..7e.g. SUB RO Rl, #operandSame as rRegisters ro.r7e.g. PUSH operandConstant in the range of 0.32 or a Constant that is a multiple of 4 in the range ofpower of 20,1020e.g. MOV R2, Rl ROR #operande.g. ADD RO, SP, #operandAny valid memory addressNot availableConstant in the range ofo.31e.g. LSL RO, R1, #operandNot availableConstant that is a multiple of 4 in the range of-508,508e.g. ADD SP, #operandGeneral register ro. r15Not availablee.g. SUB operandi, operand2,operandsw Vector floating point registers sO. s31 Not availableAny operandConstraint characters may be prepended by a single constraint modifier Constra ints w ithout amodifier specify read-only operands. Modifie rs areModifier SpecifiesWrite-only operand, usually used for all output operandsRead-write operand, must be listed as an output operanda register that should be used for output onlyOutput operands must be write-only and the c expression result must be an lvalue, w hich meansthat the operands must be valid on the left side of assignments. The c compiler is able to checkInput operands are, you guessed it read-only Note, that the c compiler will not be able to checkwhether the operands are of reasonable type for the kind of operation used in the assemblerinstructions. Most problems will be detected during the late assembly stage, which is well know n forits weird error messages. Even if it claims to have found an internal compiler problem that should beimme diately reported to the authors, you better check your inline assembler code first.A strict rule is: Never ever write to an input operand. But what if you need the same operand forinput and output? the constraint modifier t does the trick as shown in the next exampleasm(" mov %[value], %Lvalue], ror #1: Lvalue] +r"(y))This is similar to our rotating bits example presented above. It rotates the contents of the variablevalue to the right by one bit. In opposite to the pre vious example, the result is not stored in anothervariable. Instead the original contents of input varia ble will be modifiedThe modifier may not be supported by earlier releases of the co mpiler Luckily they offer anothersolution, w hich still works with the latest compiler version, For input o perators it is possible to use asingle digit in the constraint string. Using digit n tells the compiler to use the same register as forthe n-th operand, starting with zero. Here is an example:asm("mov %0, %0, ror #1": r"(value):"0"(value))Constraint 0"tells the compiler, to use the same input register that is used for the first outputoperand.Note however, that this doesnt auto matically imply the reverse case. the co mpiler may choose thesame registers for input and output, even if not to ld to do so. You may remember the first assemblylisting of the rotating bits example w ith two variables where the compiler used the same register r3for both variables, the asm state mentasm("mov %[result], [value], ror #1": [result] =r"(y): [valuer"(x))generated this code:00309DE51drr3,[sp,#O]@ x. xE330AOEI mov r3, r3, ror #1 a tmp, x04308DE5 str r3, [sp, #4 a tmp,yThis is not a problem in most cases, but may be fatal if the output operator is modified by theassembler code before the input operator is used. In situations where your code depends ondifferent registers used for input and output operands, you must add the constraint modifier toyour output operand. the follow ing code demonstrates this problemasm volatile( ldr %0, %11Ant"str‰,[%,#4]""ntr(&table),r"(wdvmemor vA value is read from a table and then another value is written to another location in this table if thecompiler would have chosen the same register for input and output, then the output value wouldhave been destroyed on the first assembler instruction. Fortunately, the& modifier instructs thecompiler not to select any register for the output value, which is used for any of the input operandsMore recipesInline assembler as preprocessor macroIn order to reuse your assembler language parts, it is useful to define them as ma cros and putthem into include files Using such include files may produce compiler warnings, if they are used inmodules, which are compiled in strict ANSI mode. to avoid that, you can write asm instead ofasm and__volatile instead of volatile. These are equivalent a liases. Here is a macro w hich wconvert a long value from little endian to big endian or vice versa#define BYTESWAP(val)\asmvolatile (eorr3,%1,‰1,ror#16n\t"r3,r3,#0x00FF0000n\t"%0,%1,ror#8n\tcor0,%0,r3,1sr#8″(val)\"0"(va1)C stub functionsMacro definitions will include the same assembler code whenever they are referenced. This may notbe acceptable for larger routines. In this case you may define a c stub function Here is the byteswap procedure again, this time implemented as a c functionunsigned long ByteSwap(unsigned long val)asm volatile3,%1,%1,ror#16nbicr3,r3,#0x00FF0000\ttmov %0, %1, ror #8\n\teor%0,%0,r3,1sr#8r(vareturn valReplacing symbolic names of c variablesBy default GCC uses the same symbolic names of functions or variables in C and assembler codeYou can specify a different name for the assembler code by using a special form of the asmstatementunsigned long value asm("clock")=3686400This statement instructs the co mpiler to use the symbolic name clock rather than value. This ma kessense only for global varia bles. Lo cal varia bles (aka auto varia bles) do not have symbolic names inassembler codeReplacing symbolic names of C functionsIn order to change the name of a function, you need a prototype declaration because the compilerill not accept the asm keyword in the function definition:extern long Calc(void) aSm ("CALCULATE)Calling the function Calco will create assembler instructions to call the function CALCULATE.Forcing usage of specific registersA local varia ble may be held in a register. You can instruct the inline assembler to use a specificregister for itvoid Count (void)iregister unsigned char counter asm(r3")some codeasm volatile(" eor r,, r3, r3":=1"(counter))more codeThe assembler instruction eor r3, r3, r3 will clear the varia ble counter. be warned that thissample is bad in most situations because it interferes with the compiler's optimizer. Furthermore,GCC will not completely reserve the specified register. If the optimizer recognizes that the variablewill not be referenced any longer, the register may be re-used. But the compiler is not able to checkwhether this register usage conflicts with any predefined register. If you reserve too many registersin this way, the compiler may even run out of registers during code generation.Using registers temporarilyIf you are using registers, which had not been passed as operands you need to inform the co mpilerabout this. The follow ing code will adjust a value to a multiple of four. It uses r3 as a scratchregister and lets the compiler know about this by specifying r3 in the clobber list. Furthermore theCPU status flags are modified by the ands instruction and thus cc had been added to the clobbersasm volatile(and s r3. %1, #3eor0,%0,r"adn%0,#4"0"(1en)Again, hard coding register usage is always bad coding style. Better implement a C stub functionand use a local varia ble for temporary valuesUsing constantsYou can use the mov instruction to load an immediate constant value into a register. Basically, thisis limited to values ranging from 0 to 255asm( mov r0,%[flag」"::[flag」"I"(0x80))But also larger values can be used when rotating the given range by an even number of bits. Inother words, any result ofwith n is in the mentioned range ofo to 255 and x is an even number in the range of o to 24.Because of rotation, x may be set to 26, 28 or 30, in w hich case bits 37 to 32 are folded to bits 5 too resp. Last not least, the binary complement of these values may be given when using mvninstead of movSometimes you need to jump to a fixed memory address, which may be defined by a preprocessormacro. You can use the fo llowing assembly codeldMPADDRbx r3This will work with any legal address value. If the constant fits(for example 0x20000000), then thesmart assembler will convert this tomovr3,#0x200000If it doesn't fit(for example 0x00F000F0), then the assembler will load the value from the literalpool.Idr r3.. L.L1:. word OxOOF000FOWith in line assembly it works in the same way. But instead of using ldr you ca n simply provide aconstant as a register valueasm volatile( bx %0::r(MPADDR))Depending on the actual value of the constant, either mo ldr or any of its variants is used. IfJMPADDR is defined as OXFFFFFF00, then the resulting code will be similar tomyn r3. tOxFFbx r3The real world is more complicated. It may happen, that we need to load a specific re gister w ith aconstant. Let's assume that we want to call a subroutine but we want to return to anotheraddress than the one that follows our branch. this is can be use ful w hen embedded firmwarereturns from ma in. In this case we need to load the link register. Here is the assembly code:Idr Ir,=JMPADDRIdr r3, mainAny idea how to implement this in inline assembly Here is a solutionasm volatile(bx %0\ntr"(main),"I"(MPADDR))But there is still a problem. We use mov here and this will work as long as the value of JMP ADDRfits. The resulting code will be the same than what we get in pure assembly code. If it doesn't fitthen we need ldr instead But unfortunately there is no way to expressIdr Ir,=JMPADDRin inline assembly. Instead, we must writeasm volatile(mov Ir, %1\n\tr"(main),r(MPADDR)Compared to the pure assembly code we end up w ith an additional statement, using an additionalregister2,.L2movbxRegister UsageIt is always a good idea to a nalyze the assembly listing output of the c compiler and study thegenerated code. The follow ing table of the compiler's typical register usage will be prob ably helpfuto understand the codeRegister Alt Name Usagea 1First function argumentInteger function resultScratch registerr1a2Second function argumentScratch register2a 3Third function argument
暂无评论