
       +---------------------------------------------------------------+
       |        Using of Assembler with GNU Pascal Compiler on         |
       |                      80x86 CPUs family.                       |
       |                    v1.09999 [June 2 1997]                     |
       |                                                               |
       |       Based on doc written by Brennan "Bas" Underwood.        |
       |  Rewritten and updates for GPC by Dario "-PrEdAtOrZ-" Anzani. |
       |    Translated in English by Johnny "BitMaster" Piacentini.    |
       +---------------------------------------------------------------+

##############################################################################
##############################################################################

 >>> Introduction <<<

   Why should one use GNU Pascal & Assembler together? The answer is rather
simple: this is the EASIEST method to write time-critical routines in your
pascal programs. Now, someone could object that GPC optimizations are already
incredibly powerful and that there isn't any point to use asm.
Well, that's true, GPC _IS_ a great compiler, but there are some things
(interrupt handling, hardware programming, time-critical loops etc...) which
are almost impossible to code with a compiler in an efficient way.
Assembler functions let your code run _FAST_ and make it possible for you
to use all those nasty low-level tricks that real programmers love. ;)
This doc assumes that you already know Intel syntax for assembler code, the
one used in MASM, TASM, NASM etc...
GNU Pascal uses the AT&T syntax, which will be explained through examples in
Intel syntax: if you don't know it, stop reading this and get a book or
tutorial about it first.

Throughout the whole text, code pieces will be organized like this:

<Code_in_Intel_Syntax> <===> <Code_in_AT&T_Syntax>

##############################################################################

 >>> AT&T Syntax <<<
   
 - 1. The basics -

   AT&T syntax is not that different form the Intel one, but due to some
little things it may appear a little bit confusing.

The first difference you'll notice in AT&T syntax is that the source
operand and the destination operand are swapped:

INTEL syntax is : <Opcode> <Dest>, <Source>
AT&T syntax is:   <Opcode> <Source>, <Dest>

In second place, hex constants always follow the UNIX style:

0x12345678   YES!
  12345678h  NO!
 $12345678   NO!

Register names are prefixed by the "%" symbol. Constant and immediate values
are prefixed by the "$" symbol. The following example loads 0x1234 into EAX:

Mov EAX, 1234h                  <===>   MovL $0x1234, %EAX

If you noticed, the "Mov" instruction for the AT&T style had a leading "L":
in this syntax every instruction may have a SIZE DESCRIPTOR, a letter
indictaing to the compiler the size of the operands. This is very similar to
the PTR operator found in MASM and TASM.

"B" indicates 8-bits operands
"W" indicates 16bits operands
"L" indicates 32bits operands
"Q" indicates 64bits  operands

Here are some examples:

Mov   AL, BL                    <===>   MovB   %BL, %AL
Mov   AX, BX                    <===>   MovW   %BX, %AX
Mov   EAX, DWORD PTR [edi]      <===>   MovL   (%EDI), %EAX
FILd  QWORD PTR [edi]           <===>   FILdQ  (%EDI)

Infact, in many cases the specifier can be omitted. In most opcodes
the assembler should be able to figure out what operand size it has to use,
but remember: a dumb assembler could guess wrong on this aspect, thus
producing incorrect code. Try to be as clear as possible and NEVER be
afraid to use size descriptors.

##############################################################################

 - 2. Protected mode -

   If you use GPC with inline assembler, you MUST remember that GNU Pascal
(on 80x86 machines) is a PROTECTED MODE programming environment.
This implies some important consequences:

a) Forget real mode memory addressing! Now every register can be used as
   a base, index or displacement execpt for ESP which can only be used as
   a base. It's important to point out that this is _NOT TRUE_ if you
   use 16 bit instructions. For example, the instruction

   Mov   EAX, DWord PTR DS:[ECX]  <===>   Movl   %DS:(%ECX), %EAX

   is absolutely correct, but this one

   Mov   EAX, DWord PTR DS:[CX]   <===>   MovL   %DS:(%CX), %EAX

   IS NOT!

   Why? Simple... 8D In real mode, memory addressing is only available
   through BX, DI, SI, BP: you _CANNOT_ use any other reg!

   _Any reference to a 16 bit register, that is AX, BX, CX, DX, SI, DI, BP
   and SP, will cause the instruction to be assembled as 16 bit_

   Also, it is important that you keep in mind that CS,DS,ES,SS,FS and GS
   don't hold SEGMENTS any more: once you enter protected mode, you must
   play your game with SELECTORS!
   SELECTORS are data structures that identify not only a memory area, but
   also its attributes(read only, read/write, executable), its PRIVILEGE
   LEVEL and its LIMIT, among other things. You should get a good PMODE
   reference if you paln to dance with them alot.

b) The common interrupt services may be accessed directly, only if the
   operating system(or the Dos Extender) provides 32 bit functions.
   If you _MUST_ get to those old 16 bit functions, you should use DPMI
   function 31h. Get a ref about DPMI for more infoz...

c) In version 2 of DJGPP the first megabyte of phisycal memory is not
   freely accessible. You should use a special selector, __go32_info_block+26,
   defined by the extender, but keep in mind that this memory has to be
   preserved, so be attentive!
   Also, remember to save the old selector before loading the new one, and
   restore it after you have finished:

   Push  ES                     <===>   PushW  %ES
   [...]                        <===>   [...]
   Mov   [__go32_info_block+26], ES
                                <===>   MovQ  (__go32_info_block)+26, %ES
   [...]                        <===>   [...]
   Pop   ES                     <===>   PopW %ES

##############################################################################

 - 3. Memory addressing -

Intel: SELECTOR : [BASE + INDEX*SCALE + DISPLACEMENT]

AT&T:  SELECTOR : DISPLACEMENT (BASE, INDEX, SCALE)

   If you are able to decypher the scheme above, you are perfectly able to
master memory addressing. If you couldn't grasp the meaning, go on reading
this short explanation:

-The SELECTOR, for both formats, must be in the first position. The SELECTOR
 is expressed by means of a segment register(CS,DS,ES,SS,GS or FS) and
 CANNOT be an immediate value. If the selector used by the instruction is
 the default one(ES for StoS, SS for stack opcodes, DS/ES for MovS and DS
 for all the other one), you must omit both the register AND the colon.

-The BASE value is, for example, the address of the first element of an
 array. In the Intel format it is placed as the first element into the
 square braces. Can be any one of the general purpose registers.

-The INDEX value represents, to continue the previous example, the index
 of an array. BASE + INDEX gives the address in memory of the INDEXth
 element of the array starting at BASE. INDEX should also be any one of
 the general purpose registers and can be modified by a scaling factor...

-The SCALE is an immediate value, choosen from 2, 4 or 8. When computing the
 address, INDEX is multiplied by SCALE, so that if our beloved array has
 elements made up of 2, 4 or 8 bytes, we can address them directly.

-The DISPLACEMENT, at last, is a costant immediate value to be added to the
 final address. Usually, the most common addressing schemes involve
 BASE & DISPLACEMENT or BASE & INDEX: it's very unlikely to find all of them
 in one opcode.

Mov EAX, DWord PTR FS:[EBX]     <===>   MovL %FS:(%EBX),    %EAX
Mov EDI, DWord PTR [ECX+EBX*2]  <===>   MovL (%ECX,%EBX,2), %EDI
Mov EAX, DWord PTR [12345678h]  <===>   MovL 0x12345678,    %EAX
Mov EBX, DWord PTR GS:[EDX*2]   <===>   MovL %GS:(,%EDX,2), %EBX
Mov EAX, DWord PTR [ECX+12345h] <===>   MovL 0x12345(%ECX), %EAX
Mov EDI, DWord PTR ES:[12345h]  <===>   MovL %ES:0x12345,   %EDI
Mov BX,  Word PTR [_MyVar]      <===>   MovW (_MyVar),      %BX

Now for some highlights:

Have a look at the third example: did you notice that in the AT&T syntax
there is no dollar sign("$") preceding the value 0x12435678? That's perfectly
right... if you look at the Intel syntax, you'll notice that it is a memory
reference, not an immediate value:

_In AT&T syntax, __EVERY__ immediate value WITHOUT a dollar sign preceding it
 is considered a memory reference_

Now, what about the forth example? Is that comma in the right place?
The answer is YES, and it is very important to understand why. In this opcode
the BASE is null. If we omitted the comma, the assembler would have guessed
wrong and would have produced this:

Mov EBX, DWord PTR GS:[EDX]   <===>   MovL %GS:(%EDX,2), %EBX

The "2", mistaken for a scale, would have been discarded and EDX would have
been the only register to be considered... a BAD error indeed!
__ALWAYS__ put 3 commas between the braces, whatever you want to do!

##############################################################################

 - 4. Various instructions examples -

  Here are some examples of common instructions translated from Intel's
to AT&T's style. I want to hear the mumbling of your brain trying to
understand them all: it's very important that you get acquainted with them
to avoid MANY hours of boring debugging!!!

Intel style:                             AT&T style:

Call NEAR DWord PTR [EDX]................Call *(%EDX)
Push 10h.................................PushL $0x10
Mov EDX, 3C8h............................MovL $0x3C8, %EDX
Add ESP, 4...............................AddL $4, %ESP
CWD......................................CLTD
CWDE.....................................CWTL
Sub CX, SI...............................SubW %SI, %CX
Mov EAX, DWord PTR ES:[ESI]..............MovL %ES:(%ESI), %EAX
LEA ESI, DWord PTR [EAX+EDX*4-33]........LEAL -33(%EAX,%EDX,4), %ESI
Out DX, AX...............................OutW %AX,%DX
MovZX EAX, CL............................MovZBL %CL, %EAX
Out DX, AL...............................OutB %AL,%DX
FLd TByte PTR DS:[ESI]...................FLdT %DS:(%ESI)

Got them? It's not that hard after all... 8D

##############################################################################
##############################################################################

 >>> Assembly with GPC <<<


 - 1. External Assembler -

To use some bits of assembly code in your prog, one of the most straight
forward ways is to create an ASCII file containing your procs.
This file should be in the GNU Assembler format, but if you use another
assembler, just remember to set the correct type of output object.
GNU Assembler(GAS, for short) eats ".s" files. They are built like this:

***************************************

.data                   
  __myvar: .word 0

.globl __myvar

.text
.globl __MyProc         

  __MyProc:
    ...
    ...
    ...
    Ret

***************************************

To call __MyProc from your pascal program just include this piece of code:

        Procedure MyProc; AsmName '_MyProc';

Parameters to external procedures will be passed on the stack, so given a
declaration like this:

        Procedure MyProc(MyVar: Integer); AsmName '_MyProc';

MyVar will be addressable this way:

        movl 4(%esp), %eax

It's _VERY IMPORTANT_ that you save and restore the registers that your
procedure is going to trash. GCC, and thus GPC, is not able to tell what
registers hold what after the execution of your procedure, so it assumes
that you are going to save them by yourself. _DON'T FORGET THIS FACTOR_!!!

If you are going to work under DOS or Linux, get RHIDE at:

        http://www.tu-chemnitz.de/~rho/rhide.html

it's an execellent freeware IDE with plenty of features which are very
useful when dealing with GNU compilers: it supports GNU Pascal, GNU C and
GNU Fortran, provides automatic compiling, linking and execution, features
an integrated debugger, syntax highlighting, multi-language support, etc.
Have a look at it, it's great!!!

Also, if you plan to use only external objects, get NASM: it is an assembler
with the most advanced capabilities ever seen! It uses an Intel-like syntax
with some modifications to make it clearer and simpler, it supports very
complex macros, MMX instructions, every processor up to the PPro with any
undocumented opcode, multiple object formats... it's so far superior from the
others that there's no point waiting for it! Ah, it's freeware... 

Get the latest version at:

http://www.dcs.warwick.ac.uk/~jules/nasm1.html

##############################################################################

 - 2. Extended Inline Assembler -

       The methods which we quothed till here are, as for me, a very good way
to start programming in Asm with GPC. Then, as I have already said, NASM in
Intel syntax is an help for those who cannot, or don't want, learn the At&t
syntax. But the best way is surely the Extended Inline Assembler; in fact,
this last can cooperate with the optimizations of GPC. What does this mean?
Well, simple: you can communicate to the compiler the registers or the
variables you modify without take care to save them on the stack, you can
address the input directly into the registers which you want to work with and
... much other. Look at this:

Procedure MyStupidDelay(time : integer);
Begin
  Asm("0:   pushl $0xfffff
       1:   decl (%%esp)
            jnz 1b
            addl $0x4, %%esp
            decl %%eax
            jnz 0b"
     :                                      (* Output line *)
     : "a" (time)                           (* Input line *)
     : "eax");                              (* Clobbered regs *)
end;

We go on step by step. The Input line tells to the compiler to make us find the
variable 'time' in the EAX register; "a" means EAX, "b" EBX, "c" ECX, "d" EDX,
"S" ESI e "D" EDI. Therefore, if we had written:

     : "c" (time)   

we would have worked with the variable 'time' placed in ECX instead of EAX.
The next line declares that our assembler procedure clobbers EAX, and it make
the compiler to know that in EAX there is not the value of 'time' anymore;
omitting this line it could use EAX to indicate again 'time' with the mistakes
which come from. Is possible to declare more than one register at time,
obviously:

     : "a" (time), "b" (time), "c" (time)
     : "eax", "ebx", "ecx");

make EAX, EBX and ECX to be loaded all with the 'time' value;
the second line tells to the compiler that EAX, EBX and ECX were changed by
our procedure. You __MUST__ remember that the loading of the variables can
occur ONLY in the 32 bits form of regs; rightly, it's impossible to use the
8 bits registers(AH, AL, BH, BL, ecc.) or 16 bits registers(AX, BX, CX, DX,
ecc.). Nevertheless, these can be declared as clobbered; therefore, the
declaration:

     : "a" (time), "b" (time), "c" (time)
     : "ax", "bx", "cx");

is accepted from compiler without problems. If 'time' would have been a string
or a pointer to a memory area, we had had to write down:

     : "a" (@time)

in order to have EAX, loaded with the POINTER to our data. (Thankz Peter!!!!)
Obviously, a procedure may also have output data, which are managed in a
similar way:

  Asm("0:   pushl $0xfffff
       1:   decl (%%esp)
            jnz 1b
            addl $0x4, %%esp
            decl %%eax
            jnz 0b"
     : "=a" (time)                          (* Output line *)
     : "a" (time)                           (* Input line *)
     : "eax");                              (* Clobbered regs *)

As you can easily realize, the line:

     : "=a" (time)          

make the EAX register value move to the var 'time' to the end of our assembler
session. The naming of the registers with "a","b","c","d","D","S" works in the
same way of the input.
The example can be considered interesting because it used LOCAL labels, which
are declared with an hexadecimal number from 0 to f and are used by adding "b",
for a back jump, or "f" for a forward jump. Why do we use local labels? Well,
that is the only way to use the labels together with the inlining of the
procedures.... __TRY__ and guess __WHY__...

There is the problem there are two "%" instead of one; this happens because
the first symbol of percentage is eliminated during the preprocessor phase,
the phase in which the GPC gathers information about your procedure. This
phase is performed ONLY when you use EXTENDED INLINE ASSEMBLER. 

  Asm("pushl %eax
       xorl %eax, %eax
       mov %eax, 0xa0000
       popl %eax");

do not impose to the compiler any preprocessor phase, therefore it allows to
use just one "%".
       But the potentialities of the Extended ASM are not finished. There is
the possibility to make the registers dynamically allocated to the compiler:

Procedure MyStupidDelay(time : integer);
Begin
  Asm("0:   pushl $0xfffff
       1:   decl (%%esp)
            jnz 1b
            addl $0x4, %%esp
            decl %0
            jnz 0b"
     :                                      (* Output line *)
     : "r" (time)                           (* Input line *)
     : "0");                                (* Clobbered regs *)
end;

       in this way, using the "r" letter, we say to the compiler to choose a
whatever register and to replace any "%0" with it; besides, with the
inscription "0" in the (* Clobbered regs *) line we inform it that the same
register chosen by it has been clobbered. The "r" letter make the Gnu Pascal
to choose among the four general purpose register "EAX", "EBX", "ECX", "EDX",
while the "q" letter would widen the choice also to other registers ("ESI" and
"EDI"); the "g" letter, finally, considers possible also the use of memory
locations.
The procedure:

Procedure MyStupidDelay(time : integer);
Begin
  Asm("decl %0"
     : "=g" (time)                          (* Output line *)
     : "g" (time)                           (* Input line *)
     :);                                    (* Clobbered regs *)
end;

it would produce the next code line:

...
decl $time
...

while using "r" or "q" the decrease would happen on a register, and not on a
memory area. Obviously , it's allowed make use of more than one register
dynamically allocated:

Procedure MyStupidDelay(time : integer);
Var Stupid, Vars : integer; 
Begin
  Asm("decl %0
       decl %1
       decl %2"
     : "=r" (time), "=r" (stupid), "=r" (vars)    (* Output line *)
     : "0" (time), "1" (stupid), "2" (vars)       (* Input line *)
     :);                                          (* Clobbered regs *)
end;

the register allocated with "=r" (time) will be indicated in the code session
as "%0", that allocated with "=r" (stupid) as "%1", and finally that allocated
with "=r" (vars) as "%2". Later, in the input line we will not allocate again
the same registers with "r" because they are already allocated; we will use,
instead, the same numbers which we used with in the code session; therefore
"0", "1" and "2".

##############################################################################
                                    * * *
##############################################################################


   Note for GNU C users.

      The "__volatile__" or "volatile" directives don't work with last official
release of the GNU Pascal (the 2.00). But after all Peter Gerwinski, one of
programmers of Gnu Pascal Team, make me secure that he will implement soon a
local compiler switch to solve this and other problems....



                           -*\  T h e   E n d  /*-

##############################################################################
                                    * * *
##############################################################################


>>> Last note: <<<

If you have a problem, try to post your question to comp.os.msdos.djgpp or
subscribe to the Gnu Pascal mailing list. To subscribe, write to
<gpc-request@hut.fi>; the list itself is <gpc@hut.fi>.

Also, you can mail me at:

predatorzeta@geocities.com

                                    * * *

                                  Have phun!!

                                                         ///#PrEdAtOrZ1997
