NAME

arm-asm - ARM assembler for ISOS


SYNOPSIS

arm-asm [options] <sourcefile>


ARGUMENTS

Options may be one or more of the following (note that not all builds of the assembler necessarily support all the object formats listed here):

-acorn
Produce Acorn Object Module (AOF)

-armv4
Allow ARM architecture version 4 instructions.

-armv5
Allow ARM architecture version 5 instructions. Note that this includes everything in version 4 so there is no need to specify -armv4 and -armv5 in the same command.

-aout
Produce a.out format Object Module

-binary
Produce a pure binary output file.

-bigendian
Big-endian Object Module data

-D<symbol>
Define <symbol> (for ifdef, #ifdef, etc.)

-elf
Produce Executable and Linking Format (ELF) Object Module

-hardfloat
Do not set the EF_SOFT_FLOAT flag in the object header. This is only meaningful if the ELF output format is selected.

-zero-reloc-placeholders
Put zeros into place holders for relocated values. In particular, it affects branch instructions to externally defined symbols. The option is not normally necessary unless required by third-party tools.

-I<directory>
Define <directory> (for include or #include)

-intel
Produce Intel Object Module (Hex)

-l <file>
Output listing file

-littleendian
Little-endian Object Module data (the default)

-motorola
Produce Motorola Object Module (S-Record)

-nocpp
Bypass cpp processing

-o <file>
Object module file

-quiet
Do not produce verbose output

-s <file>
Symbol table file

-stabs
Insert stabs debug symbols into output . This is needed for source-level debugging of hand-written assembler.

-strict-using
Make it an error to apply a using (see below) directive to a register to which one already applies. The default behaviour it to silently drop the previous using.

-U<symbol>
Undefine <symbol> (for ifdef, #ifdef, etc.)

-unix
Produce Unix Object Module (a.out). Deprecated synonym for -aout

-verbose
Produce verbose output

-xref
Produce a cross reference listing


DESCRIPTION

arm-asm is the arm assembler.

Preprocessing

Unless the -nocpp option has been used, the assembler will use cpp to preprocess the source. It predefines the two preprocessor macros ASM_MAJOR_VER and ASM_MINOR_VER when invoking cpp, allowing testing of the assembler version. HOWEVER, this is only true of versions 8.02 and later. In previous versions these macros will not be defined.

Input Lines

The assembler takes a set of input lines, each of which have the format: [label] [opcode] [operand(s)] [comment]

Comments are introduced by a semicolon (;), and cause subsequent characters until end of line to be ignored. Apart from the label field (which must start in the first column of the line) input is free format, and white space is ignored. Blank lines are also ignored.

A special case is the 'typedef' directive. In order to maximize compatibility with C header files, the 'typedef' directive is allowed to occur in the label field of a line. It is therefore not possible to use the string ``typedef'' as a label.

As of arm-asm version 6.76, C++ style end-of-line comments (//) will also work.

Numbers

Decimal
A string of decimal digits, e.g. 123456

Hexadecimal
& (or 0x, in UNIX mode) followed by string of hexadecimal digits e.g. &ff00, 0xff00

Octal
@ (or 0, in UNIX mode) followed by string of octal digits e.g. @377, 0377

Binary
% followed by string of binary digits e.g. %110101

Characters
Up to 4 characters contained within single (') or double (``) quotes.

Quote characters
These can be escaped by doubling them. e.g. 'a', ``8'', '****', ``It's'', '\'\''

Tags

Tags are up to 64 characters in length, the first of which is alphabetic (a-z), a dot (.) or an underline (_); subsequent characters can be alphabetic, numeric (0-9), dot or underline.

e.g. a, FRED, _main, R12, .1.2.3

For compatibility with other assemblers, tags may be enclosed in tag quotes, vertical bar (|), and may then contain any characters except space and the tag quote character itself.

e.g. |x$stackoverflow|, |:fred:|

Literals

The ARM has restrictions on the size of constants which can be included in an instruction. In order to circumvent this problem, it is possible to use the ldr instruction in conjuction with a literal value. The syntax of a literal is an equals sign (=) followed by an expression denoting the value of the literal. An anonymous 32 bit storage location is allocated to contain the value, and the address of the location is incorporated into the instruction. A new location is created for each new literal, which is why it is not meaningful to use literals with str. The storage allocated to hold the literal values can be placed within the code using the ltorg directive. If no ltorg is present, literals are placed at the end of the program.

 e.g. ldr r1, =&ffff0000
 ldr r2, =symbol

Current Location

At any time during the assembly of a program, the special symbol star (*) holds the value of the current location. This is defined as the value of the assembler program counter before the current instruction or directive being handled. When assembling for the UNIX object module type, the special symbol dot (.) also holds the value of the current location.

Monadic Operators

 +              Plus
 -              Minus
 ~ or \\        Not

Dyadic Operators

 +              Plus
 -              Minus
 *              Multiply
 /              Divide
 &              And
 |              Or
 ^              Exclusive Or
 <<     Shift Left
 >>     Shift Right

Priority of Operators

The default priority of operators, ranked highest to lowest is: Monadic +, -, ~ and \\ *, / +, - <<, >> &, |, ^

Expressions can always be enclosed within parentheses to override these priorities.

General Directives

title <title>

Set the output listing page title.

 [label] <align
 [label] align <boundary>
 [label] align <offset>, <boundary>

Align the assembler program counter. With no parameters, the program counter is aligned to a word (4 byte) boundary. If <boundary> is given, then it specifies the boundary for alignment. If <offset> is given, then it specifies an offset from that boundary. If a label is given, the value assigned is the program counter after the alignment.

 e.g. align
 align 32
 align 2, 4

include <file>

Includes a source file. <file> is a filename of up to 64 characters within single or double quotes. As with other strings, quotes can be escaped by doubling. Includes can be nested arbitrarily, but there is a current implementation limit of 52 files in total.

 e.g. include "registers"

The cpp directive #include may also be used to include a source file.

using <expression>, <register>

Defines the 'using' register. This tells the assembler that, at run time, the register <register> will contain the value of the expression <expression>. This allows the assembler to choose the correct base register when working out how to address a piece of memory. Usually, the expression is the name of a dummy section (see dorg).

 e.g. using 0, r0
 using structure, r4

drop <register> [, <register>]*

Drop one or more 'using' register(s). Instructs the assembler to forget its knowledge of the contents of one or more registers.

 e.g. drop r0
 drop r4, r5, r6

entry

Defines the program entry point. If the chosen object module format supports it, set the entry point of the program to the current location.

[label] ltorg

Place literal storage. If there are any stacked literal values, allocate storage for them and place them at the current location. If a label is given, the value assigned is the address of the start of the literal area.

[label] org <expression>

Sets the absolute program origin. Start assembling absolute code starting at the origin given by the expression <expression>. If a label is given, the value assigned is the new absolute origin address.

 e.g. org 0
 org *+100

[label] rorg <expression>

Sets the relocatable program origin. Start assembling relocatable code starting at the origin given by the expression <expression>. If a label is given, the value assigned is the new relocatable origin address.

 e.g. rorg 0
 rorg *+100

[label] dorg <expression>

Sets the dummy program origin. Start assembling dummy code starting at the origin given by the expression <expression>. Each time a dorg directive is encountered, a new dummy data type is created and subsequent symbols defined relative to the dummy origin are given this data type. Combined with ds, dsb and using directives, this allows the user to generate arbitrary structure definitions and have the assembler work out the base register and offset values automatically. No code is ever generated within a dummy section. If a label is given, the value assigned is the new dummy origin address.

 e.g. dorg 0
 dorg &03200000

label equ <expression>

Defines the assembly time equate. Assigns the value of the expression <expression> to the symbol in the label field. The expression can be any of:

 (a) absolute value
 (b) relocatable value
 (c) dummy value
 (d) register value
 (e) register mask

Both the data type and the value of the expression are assigned to the label symbol.

 e.g. zero equ 0
 fred equ *
 base equ r4
 mask equ \{r0-r4\

label rn <expression>

Assigns register number. Takes the value of the absolute expression <expression> and assign it to the symbol in the label field, converting it to be a register number. This directive is included for compatibility with other assemblers, and is superseded by the equ directive (see above).

 e.g. base rn 4

[label] dc <expression> [, <expression>]*

[label] dch <expression> [, <expression>]*

[label] dcb <expression> [, <expression>]*

Defined constant(s). Allocate one or more units of storage (32 bits for dc, 16 bits for dch, 8 bits for dcb), initialising them to the <expression> values given. With dcb, strings of characters can be declared by enclosing them within single or double quotes. As with other strings, quotes can be escaped by doubling them. If a label is given, the value assigned is the address of the start of the constant area. The dch directive is only available in versions 7 and later.

 e.g. dc 0
 dc fred, bert
 dcb 'a'
 dcb "This is a string", 13, 10

[label] ds <expression> [, <expression> ]

[label] dsh <expression> [,<expression> ]

[label] dsb <expression> [, <expression> ]

Defines storage. Allocate one or more units of storage (32 bits for ds, 16 bits for dsh, 8 bits for dsb), initialising them to zero. The <expression> indicates how many units to allocate. N.B. ds allocates words, dsb allocates bytes. If a label is given, the value assigned is the address of the start of the storage area.

 e.g. ds 4
 ds 0
 dsb 10
 dsb &1000

In versions 8.09 and later an optional second argument can be used to specify a non-zero fill value, e.g. ds 100, 0xdeadbeef will declare an array of 100 32-bit values initialised to 0xdeadbeef. The width of the fill value matches the width of the declaration unit (8, 16 or 32 bits).


I<import> <symbol> [, <symbol>]*

Import one or more external symbols. Tell the assembler to assume that the symbols given are defined outside the current module. Only some of the object module formats support imported symbols.

 e.g. import |x$stackoverflow|
 import _printf, _exit

export <symbol> [, <symbol>]*

Export one or more internal symbols. Tell the assembler to make certain symbols, defined within the current module, available to other modules. Only some of the object module formats support exported symbols.

 e.g. export zero
 export _main, _exit

[label] end

Stop reading from the current file. Tell the assembler to stop reading from the current file and return to the one which included it. In the case of the main file, this means stop the assembly. UNIX Specific Directives

 [label] .align
 [label] .align <boundary>
 [label] .align <offset>, <boundary>

Align the assembler program counter. See 'align' for details.

.ASCII, .BYTE, .SHORT, .WORD

 [label] .ascii <expression> [, <expression>]*
 [label] .byte  <expression> [, <expression>]*
 [label] .short <expression> [, <expression>]*
 [label] .word  <expression> [, <expression>]*

Define constant(s) of various sizes. See dc and dcb for details.

 [label] .bss
 [label] .data
 [label] .text

Start assembling in the named segment.

.global <symbol> [, <symbol>]*

Declare one or more global symbols. See 'export' for details.

label .req <register>

Assign a name to a register. See 'equ' for details.

[label] .space <expression>

Allocate space for given number of bytes. See dsb for details.

Conditional Assembly

The following section describes the conditional assembly facilities provided by the assembler. It is also possible to use the cpp conditional directives #ifdef, #else, #endif, etc.

 ifdef <symbol>
 ifndef <symbol>

Assemble only if symbol is (or is not) defined. If the definition state of the symbol <symbol> matches the criterion of the directive, continue assembling input lines. If not, then skip input lines until the matching else or endif directive is reached. Symbols can be defined by the -D option from the command line or by using define; symbols can be undefined by using undefine.

 e.g. ifdef arm3
 ifndef memc1a
 ifeq <v1>, <v2>
 ifne <v1>, <v2>
 ifgr <v1>, <v2>
 ifls <v1>, <v2>
 ifge <v1>, <v2>
 ifle <v1>, <v2>

Assemble only if comparison succeeds. If the comparison of the two values <v1> and <v2> matches the criterion of the directive, continue assembling input lines. If not, then skip input lines until the matching else or endif directive is reached. These directives can be used to compare values of any data type, providing the types match. If the values are strings (enclosed within single or double quotes) then a lexicographic comparison is done. This is particularly useful within macros (see below).

 e.g. ifeq bytesperword, 4
 ifgr $nestlevel, 20
 ifne "$dummy", "yes"

else

Start 'else' part of if... directive.

endif

End of if... directive.

define <symbol> [, <symbol>]*

Set one or more symbols to be defined. Tell the assembler to make certain symbols defined' so that they can be tested by ifdef and ifndef.

 e.g. define version3
 define byte, word

undefine <symbol> [, <symbol>]*

Set one or more symbols to be undefined. Tell the assembler to make certain symbols undefined' so that they can be tested by ifdef and ifndef.

 e.g. undefine old
 undefine himem, eprom

Anonymous Labels

When assembling a program it is sometimes necessary and often desirable not to have to use fixed names for labels. There are 10 anonymous labels (called $0 to $9) which, each time they are declared, are guaranteed to unique. This allows the programmer to avoid inventing a multiplicity of label names within normal code. To declare an anonymous label, use one of the names in the label field of an instruction or directive; in order to refer to the anonymous label, use the name in the operand field.

For example: $0 add r0, r1 subs r2, #1 bne $0

is a simple loop which uses an anonymous label to mark its beginning. There are, obviously, ambiguities with anonymous labels, particularly in the area of forward references. For example:

 $0 add r0, r1
 subs r2, #1
 bne $0
 $0 add r3, r4
...
is potentially ambiguous, since it is not clear which declaration of $0 to use. The solution to this
problem is to use qualifiers when referring to an anonymous label: $0b means a backward reference to $0,
$0f means a forward reference to $0. An unqualified reference (i.e. $0) is always taken to be a backward
reference. For example:
 $0 adds r0, r1
 bvs $1f
 subs r2, #1
 bne $0b
 $1 add r3, r4

Macros

The assembler includes a powerful macro processor. It is only possible here to give brief details about the syntax of the macro language and the macro directives.

Macro Substitution

Within a macro special symbols, introduced by a dollar ($), are local to the macro and are assigned values whenever the macro is expanded. In order to obtain the value of a macro symbol, normally it is just necessary to give its name. For example: dc $xxx, $yyy

which would perform textual substitution of the symbols $xxx and $yyy in the expanded line. In this case, the syntax is unambiguous, since the characters following the macro symbols could not themselves be part of the symbol. In cases where there is ambiguity, it is possible to enclose the macro symbol within parentheses. For example: dc $(xxx)1, $(xxx)2

Normally, the macro symbol expands to exactly the number of characters of its textual value. When expanding a line, the assembler attempts to maintain the relative positioning of items on the line for readability. If it is essential that a macro symbol expands always into into an exact number of characters, it is possible to specify this; longer values are truncated, shorter values are padded to the right with spaces. For example: dcb 7, ``$(xxx,7)''

which always expands into 8 bytes, with the macro substitution being forced to exactly 7 bytes, no matter what its original length. If it is necessary to obtain a dollar character within a macro expansion, then (as with string quotes) the dollar must be doubled. For example: dcb ``To you, my friend, the cost is $$20''

The only other type of special macro symbol is the anonymous label ($0 to $9) which are described below in more detail. All anonymous labels used within a macro expansion are guaranteed to be unique to that expansion.

Macro Definition

Macros have the following structure: macro [label] macroname [parameters] ... ... mend

in other words, the macro directive is used to start a macro definition, and mend is used to end it. The macro definition line itself contains the name of the macro and the names and types of any parameters which can be passed to it.

There are two types of macro parameters: positional and keyed. Positional parameters (including the label parameter) can be specified by position only. For example: $label munge $r1, $r2

has three positional parameters: an (optional) label field called $label, and two other (non optional) fields called $r1 and $r2. When the macro is called, both positional parameters must be given, and in the correct order. If no label field is given, the $label parameter will be passed as the null string. Keyed parameters are more general: not only do they allow parameters to be specified in any order, but defaults can be specified when the parameters are declared, thus allowing parameters to be omitted when the macro is called. For example: $label grunge $from=r0, $to=r1

declares two keyed parameters: $from and $to, which by default are assigned the values r0 and r1 respectively. When the macro is called, the parameters (unless omitted altogether) must be specified by keyword. For example: fred grunge from=r5, to=r6

Note that, when a macro is called, positional values and keyed values only ever match their own type of parameter. Thus: fred grunge r5, r6

is illegal.

Macro Calling

When calling a macro, parameters are separated by commas, and leading and trailing spaces are removed. This gives the correct result in most cases, but sometimes it is necessary to pass the comma character itself, or guarantee leading or trailing spaces. In order to do this, it is necessary to enclose the parameter within quotes.

A macro parameter enclosed within single or double quotes are passed as a single entity, with no interpretation being done on the data within the quotes. As always, quotes can be escaped by being doubled. When the parameter is passed, the enclosing quotes are passed as well. For example, with the macro declaration: msg $string

the macro call: msg ``This is a message''

would cause the symbol $string to have the value ``This is a message''. This is fine if the parameter is to be passed to a directive such as dcb, but if it is necessary to pass a quoted string without the enclosing quotes, then backquote (`) must be used instead. Given the macro definition: macro do $op, $parameters $op $parameters mend

followed by the macro call: do add, `r1, r2, r3`

would result in the expanded output line: add r1, r2, r3

Note that, in this case, the second parameter to the macro is passed as a single entity, but by the time the expanded value is used, the quotes have been stripped.

Other Macro Directives

There are two other directives associated with macros: mexit and mnote. mexit causes a macro expansion to be terminated, and is usually used with one of the conditional assembly directives (see above). mnote enables a warning message to be generated by a macro expansion if, say, inconsistent parameters have been given. For example: macro $label compare $r1, $r2 ifeq ``$r1'', ``$r2'' mnote ``Compare $r1 with itself? Come off it!'' mexit endif $label cmp $r1, $r2 mend

is a rather contrived example, but it shows the possibilities.

Common Macro Uses

Macros can be used to provide assembly-time checking, e.g.:

   ; ############################################################################
   ; Macro used to validate equality of two values.  This is mainly
   ; used to validate field ordering in symbolic data structures
   ; prior to using optimised "ldm/stm" type accesses.
   ; ############################################################################
           macro
   $label  checkeq $x, $y
   $label  ifne    $x, $y
           mnote   "Check failed: ($x) != ($y)"
           endif
           mend

can be used as checkeq foo.field2 - foo.field1, 4

They are also often used to declare data structures. For example, with the macro

                                macro
        $name                   struct_unicast
        $(name)
        $(name).next            ds      1       ; Next on ring
        $(name).prev            ds      1       ; Previous on ring
        $(name).branches        ds      1       ; Linked list of multicast branches
        ;;; Don't care about the rest of the structure here
                                mend

the line ucast struct_unicast

will declare 3 words of storage, with the label 'ucast' pointing to the start, and the labels 'ucast.next', 'ucast.prev' and 'ucast.branches' can be used to refer to the 3 member words. This allows use of a syntax similar to C structs.

Use of C-style typedefs for data structure macros

 As of version 8 of the assembler, an alternative C-like syntax is available for 
defining data structures. This allows code such as
    typedef struct s_unicast_struct
    {
            struct s_unicast_struct *next;
            struct s_unicast_struct *prev;
            struct s_branch         *branches;
    } unicast_struct;
    followed by constructs such as
    using unicast_struct, r0
    ldr r1, unicast_struct.branches
 The main use of this is where data structures are to be shared between C and 
assembler, since it is possible to produce headers that can be included in both
sets of source code, ensuring that the definitions are consistent. Where this is 
not needed, it is better to use the native assembler style (see limitations below).
 The C-style system can cope with both structs and unions, as well as the basic types,
and it is possible to nest one typedef'd struct inside another.

Limitations with C-style typedefs

Naming:
To avoid excessive namespace clashes, all such definitions must start with 'typedef'.

Validation:
The assembler does not attempt to perform full validation: header files should only define structures in this way if they are also going to be run through a C compiler to provide proper checking.

Packing:
The stuctures defined are always packed. Since the C compiler may or may not generate packed structures, the only safe approach is to lay out the structure so all members are naturally aligned.


 Note that if a listing file is generated, it will show explicitly the layout that
has been created for the structure. For example, a file that includes the header shown 
above will have the following in the listing, which makes clear what has been 
generated.
                        1 ;;; This is a fake include file made up to contain the definition for unicast_struct
 00000000"              2                         dorg 0
 00000000"              3 unicast_struct
 00000000"  00000004    4 unicast_struct.next   ds 1
 00000004"  00000004    5 unicast_struct.prev   ds 1
 00000008"  00000004    6 unicast_struct.branches   ds 1
=0000000C               7 unicast_struct_size            equ     *-unicast_struct; Size of structure