80x86 Assembler, Part 2

Atrevida Game Programming Tutorial #13
Copyright 1997, Kevin Matz, All Rights Reserved.

Prerequisites:

  • Chapter 12: Introduction to 80x86 Assembler

       
In this chapter, we'll learn all about using variables. We'll discover the different addressing modes, and we'll find out how to use the stack.

Declaring variables

If we wish to use variables in an assembly-language program, we need to set up a data segment. We saw in the previous chapter how the starting point of the code segment was declared: we used a "CODESEG" line. Inside the code segment, we put our assembly-language instructions. We do the same thing for a data segment: we use a "DATASEG" line, and in the data segment area, we list the variables that we wish to use. The DATASEG portion of an assembler program should normally precede the CODESEG.

The most commonly used sizes of variables are bytes (8 bits) and words (16 bits). To indicate that a variable is to occupy a byte of memory, we use the "DB" (define byte) directive. To indicate a word-sized variable, we use the "DW" (define word) directive. When we declare a variable, we have the option of initializing it to a value of our choice. Or, we can choose not to initialize a variable, in which case the initial value is undefined.

Here's how we use the DB and DW directives. Under the DATASEG header, we normally set up three columns. In the first column, we indicate the names of the variables we want to create. Variable names may contain upper- and lower-case letters, underscores, and numbers (but note that the first character of a variable name cannot be a number). In the second column, we indicate a directive such as DB or DW. If we want to give a variable an initial value, we put that value in the third column. The value should match the size of the variable, so values such as 0, 25, -100, 0F3h, and 'A' would all be acceptable for a "DB" variable, and 0, 25, -100, 0F3h, 'A', 0ABCDh, 20000, and -30000 would all be acceptable for a "DW" variable. (Byte-sized values are "expanded" to word size.) If we don't care about the initial value of the variable, we can put a single question mark in the third column instead. Here are some examples:

    DATASEG

Counter                           DB   0
NumberOfStudents                  DW   30
MyGrade                           DB   'A'
VideoMemorySegment                DW   0A000h
VideoMemoryOffset                 DW   ?
StatusCode                        DB   ?

In the first example (after the DATASEG line), one byte is allocated for a variable called Counter. Its initial value is zero. In the second line, a variable called NumberOfStudents is allocated a word (two bytes), and is initialized to 30 dec. MyGrade, a byte, is assigned the initial value of 'A', whose ASCII value is 65 dec. VideoMemorySegment, a word, is assigned 0A000 hex. VideoMemoryOffset is allocated a word of space, but it is not assigned any specific value. The assembler will give it an unpredictable initial value. The same thing will happen with variable StatusCode, except that this variable will occupy a byte of memory in the data segment.

Variable names should be reasonably descriptive, of course, and they should be less than 32 characters in length. Your assembler probably supports longer variable names, but if you want to interface your assembler code with C or C++, keep them shorter than 32 characters.

If you want to define a list of values all at once, you can put them all in the third column, separated by commas:

ListOfSmallNumbers                DB   10, 20, 30, 40, 50
ListOfBigNumbers                  DW   1000, 2000, 3000, 4000, 5000

Why would anyone want to do this? Well, in the above examples, we've managed to set up two arrays and pre-initialize them! We'll see how to access the elements later. (Think of pointer arithmetic in C. We'll be using the same technique in assembler.)

Okay, what if we needed a bigger list of numbers? Well, let me be honest now: variable names are not mandatory in the first column. This is perfectly legal:

BiggerListOfSmallNumbers          DB   10, 20, 30, 40, 50, 60, 70, 80
                                  DB   90, 100, 110, 120, 130, 140, 150
                                  DB   160, 170, 180, 190, 200

These numbers will all be stored in order, in consecutive bytes (so they can still be accessed with pointer arithmetic). In fact, all variables are stored in the data segment in the order of their declaration. Each variable is located in memory immediately after the variable before it; there are no breaks or gaps. (Example: if you declared three byte-sized variables, say A, B, and C, and gave them initial values A1 hex, A2 hex and A3 hex, respectively, you could look at the executable program with a hex editor and find the bytes 0A1h, 0A2h, and 0A3h all together, one after the other.)

So, perhaps we want to declare a string. Because a string is simply an array of characters, we could do this:

Greeting                          DB   'H', 'e', 'l', 'l', 'o'

But that's tedious. With the DB directive, you are permitted to put one or more strings in the third column:

Greeting                          DB   "Hello"
ProgramAuthor                     DB   "John W. Doe III"
ProgramVersion                    DB   "Version ", "1.0"

Most assemblers let you use either the single and double quotes for both single characters and strings. Of course, it doesn't hurt to be consistent with C's style. Here's how you use embedded quote marks, using the ever-so-helpful guide for deciding when to use the word "its" or "it's":

"If it's ""its"", it's ""its"".  If it's ""it is"", it's ""it's""."
'If it''s "its", it''s "its".  If it''s "it is", it''s "it's".'

Both of these would produce the string:

It it's "its", it's "its".  If it's "it is", it's "it's".

Long messages can be split up onto multiple lines, like this:

ExpiredCopyrightMessage           DB   "This program is Copyright 1497, "
                                  DB   "John W. Doe III, All Rights "
                                  DB   "Reserved."

A null-terminated string, used by C, is a string of characters in which the last character is an ASCII 0. We can declare a null-terminated string by adding an ASCII 0 character to the end of the string, like this:

Message                           DB   "I'm a null-terminated string.", 0

While we're on the topic of strings, let's consider a handy DOS interrupt service we saw in Chapter 5: INT 21h, Service 9h, the "$-Terminated String Print" service. It will print a string to the screen until it reaches a "$" character. Let's also consider strings that might not fit on one line. If you have an ASCII chart handy, you'll notice that ASCII 13 dec is the carriage return character; this moves the cursor to the first column of the screen. And ASCII 10 dec is the line-feed character, which moves the cursor down one line, scrolling the display if necessary. (Just a warning: these characters are interpreted this way by the DOS interrupt services, but not by the BIOS services.) So if we want to create a single string to hold a message that will occupy two lines on the screen, such as the following pathetic joke...

Did you hear about the thieves who broke into the music store and
got away with the lute?

...we can use these "special" characters in the declaration:

BadJoke                           DB   "Did you hear about the thieves "
                                  DB   "who broke into the music store "
                                  DB   "and", 13, 10, "got away with "
                                  DB   "the lute?$"

Then we could use this string with INT 21h, Service 9h.

DB and DW are the most commonly used directives, but there are others. DD (define doubleword) lets you declare 32-bit (four byte) values. DP (define pointer) and DF (define far pointer), according to the TASM manual, are equivalent. They reserve six bytes and let you declare pointers (but don't ask me how; there are no examples, and I've never actually seen these used). DQ (define quadword) reserves eight bytes, and DT (define ten bytes) reserves ten bytes.

Finally, let's consider arrays again. If we need to declare a big array, perhaps thousands of elements in length, using lines upon lines of declarations becomes unreasonable. There is an operator called DUP that can be used in the third column. Some examples are best:

ArrayOfBytes                      DB   20 DUP (0)

In the brackets, you specify the value to be repeated (in this case, 0). Before the DUP directive, you specify how many times you want that value to be repeated (in this case, 20). So the above example declares twenty bytes, all initialized to zeroes.

To set up an array of 50 words, where each word is initialized to 01234h:

ArrayOfWords                      DW   50 DUP (01234h)

You can use the question mark if you don't care what values are used. To set up an array of 2000 bytes of undetermined values (perhaps for use as a buffer):

Buffer                            DB   2000 DUP (?)

If you want to repeat a certain pattern, you can put the pattern in the brackets. The following example will repeat the pattern 1, 2, 3 three times, reserving nine bytes in total:

AnotherArray                      DB   3 DUP (1, 2, 3)

You can duplicate strings too. You can even nest DUP operators.

Bizarre                           DB   2 DUP (2 DUP (1, 2), 3)

This produces the pattern: 1, 2, 1, 2, 3, 1, 2, 1, 2, 3. Is there really a use for this kind of thing?

Using the small memory model, you are permitted a maximum of 64K of data in the data segment. (A segment is, after all, 64K long.) Most assembler programs are so small and simple that this isn't a problem.

Using variables

Now that we can declare all the variables we want, let's find out how to use them. Before we can use any variables, we must set the DS register to point to the data segment we have created. This is how we do it:

    MOV AX, @data                      ; AX = segment address of DATASEG
    MOV DS, AX                         ; DS = AX

@data is a special variable created by the assembler. It is initialized with the segment part of the segment:offset address that points to the start of the data segment. Actually, the data segment (and other segments) can only start on paragraph boundaries (multiples of 16 bytes -- see Chapter 4), so the offset part of that address would always be zero.

So the data segment's segment address is copied to AX, and then AX is copied to register DS. Why don't we just copy the address directly to the DS register? Well, the MOV instruction has some quirks: you can't copy a variable or a literal value directly into a segment register. (The 8088's segment registers are CS, DS, ES and SS.) You can't even directly copy one segment register into another segment register. You can only copy one of the general-purpose registers into a segment register. So, to move some value into a general-purpose register, always remember to copy it to one of the 16-bit general purpose registers, and then copy that general-purpose register to the segment register.

Anywhere after these DS-setting instructions, we can use our variables declared in the DATASEG section.

If you're using a non-TASM assembler, or you're using TASM but not TASM's Ideal mode (comment out the "IDEAL" line at the top of your program), then you can do this:

    MOV AX, MyVariable

This would to put the value of MyVariable, perhaps something like 5000 dec, into AX.

In TASM's Ideal mode, which we'll be using, the above line will actually generate an error message upon assembly. Ideal mode forces you to use a more consistent syntax. To retrieve the value of a variable in Ideal mode, we need to put square brackets around the variable name, like this:

    MOV AX, [MyVariable]

The value of MyVariable would be copied into the AX register. If MyVariable was a (word-sized) variable with the value 5000 dec, then after this instruction executed, AX would equal 5000 dec.

I frequently make the mistake of forgetting the square brackets.

There is another restriction on the use of the MOV instruction. You can't copy the contents of one variable directly to another. This is illegal:

    MOV [MyVariable1], [MyVariable2]

You have to use a register as an intermediate:

    MOV AX, [MyVariable2]
    MOV [MyVariable1], AX

Sometimes it is necessary to determine the address of a variable. In C you can use the "&" operator, or FP_SEG and FP_OFF; in assembler, we can use operators called SEG and OFFSET.

To get the segment part of the address of a variable, you can use the SEG operator. (If a variable is in the data segment, its segment should be equal to the @data variable, which is what DS should normally be equal to). Here's how you would load the segment address of variable MyVariable into ES, the extra segment register:

MOV AX, SEG MyVariable
MOV ES, AX

There is also an OFFSET operator, which finds the corresponding offset part of the address of a variable:

MOV DI, OFFSET MyVariable

Here's a short program that uses variables. It does two things: it displays a single character on the screen (nothing new, but now that character is specified in a variable), and then it writes out a string to the screen. (I've used DOS interrupt services here: INT 21h, Services 02h and 09h. Check your interrupt listings to see how they work. The reason I've used INT 21h, Service 02h to write a single character instead of the BIOS service (INT 10h, Service 0Ah) I used in previous tutorials is this: the BIOS function writes the character at the current position, but it does not advance the cursor. This is okay for one character, but any later output would start at the cursor, and the first character would be overwritten.)

------- TEST2.ASM begins -------

%TITLE "Assembler Test Program #2 -- writes a string to the screen"

    IDEAL
  
    MODEL small
    STACK 256

    DATASEG

MyCharacter                       DB   '*'
BadJoke                           DB   "Did you hear about the thieves "
                                  DB   "who broke into the music store "
                                  DB   "and", 13, 10, "got away with "
                                  DB   "the lute?$"


    CODESEG

Start:
    ; Let DS point to the start of the DATASEG data segment:
    MOV AX, @data
    MOV DS, AX

    ; Output a character using INT 21h, Service 02h (DOS' "Character
    ; Output on Screen" function):
    MOV AH, 02h
    MOV DL, [MyCharacter]
    INT 21h

    ; Set up for INT 21h, Service 09h ("$-Terminated String Print"):
    MOV DX, OFFSET BadJoke             ; Let DX = offset address of BadJoke
    
    ; MOV AX, SEG BadJoke              ; Let DS = segment address of BadJoke
    ; MOV DS, AX                       ; (These instructions are not necess-
                                       ; ary, because the segment address
                                       ; of BadJoke is equal to DS.)

    ; Call INT 21h, Service 09h to write the string to the screen:
    MOV AH, 09h
    INT 21h

    ; Terminate the program:
    MOV AX, 04C00h
    INT 21h
END Start

------- TEST2.ASM ends --------

Data addressing modes

Data addressing modes are different "methods" of referring to data. You can refer to a register, or you can explicitly provide an arbitrary value, or you can provide certain types of expressions that evaluate to a particular address in memory. The addressing modes are called register, immediate, direct, indirect, base + displacement, and base + index + displacement. (There are alternate names, which I'll describe below.) Here's an overview:

1. Register

If the source operand of an instruction is simply a register, then register addressing is being used.

    MOV AX, BX

The assembler generates certain machine language codes that tell the processor to read the value from the specified register (in this case, BX).

2. Immediate

If the source operand of an instruction is a literal (an arbitrary constant), then immediate addressing is in use:

    MOV AX, 3

In this case, the assembler stores the 3 directly in the code segment, as an operand to the MOV instruction's opcode.

3. Direct

Direct addressing involves the use of a variable. The DS segment is assumed:

    MOV AX, [SomeVariable]

However, if you need to refer to a variable defined in another segment (we'll see later that it is possible to define variables in places like the code segment), you can use a segment override by explicitly stating the segment like this:

    MOV AX, [CS:CodeSegmentVariable]

Variable names are really just labels, and labels are just aliases for offset addresses within a segment. When you use direct addressing, you're really just specifying a segment address (which may be a default) and an offset address, so that the processor can construct a physical address and retrieve the data residing at that physical address.

4. Indirect

If the offset address of a variable (or really, any location in memory) is in either the BX, SI or DI registers, you can place square brackets around the register name like this...

    MOV AX, [BX]

...to retrieve the value of the byte or word residing at that address. The DS segment is assumed, but you can override it as usual:

    MOV AX, [CS:BX]

Here's an example of indirect addressing:

    MOV [MyByte], 123                  ; Let variable MyByte equal 123 dec

    MOV BX, OFFSET MyByte              ; Let BX (actually, DS:BX) point
                                       ;  to MyByte
    MOV AL, [BX]                       ; Get the value residing at DS:BX
                                       ;  (DS is the default)

AL would then contain 123 dec. (Of course, "MOV AL, [MyByte]" would be simpler.)

Let's now consider this example:

    INC [ES:DI]

This would increment the value at the memory location pointed to by ES:DI. But just what is that value? Do we want the processor to increment the byte at ES:DI, or do we want it to increment the word at ES:DI? The assembler would give an error if it encountered the above example, because it wouldn't know which of the two meanings, byte or word size, we really intended. This wasn't a problem when we used "MOV AL, [BX]", because the destination, AL, was byte-sized. The assembler knew that a byte-sized value was to be copied. And with "MOV AX, [BX]", a word would be copied, because the destination is word-sized. But when there is an ambiguity, we can one of two operators, WORD and BYTE, like this:

    INC [WORD ES:DI]                   ; Increment the word at ES:DI
    INC [BYTE ES:DI]                   ; Increment the byte at ES:DI

Here's another ambiguous situtation:

    MOV [DS:SI], 5

Do we want to move the 5 into the byte at DS:SI, or do we want to move the 5 into the word at that location? (Yes, there is a difference. If we move the 5 into a word, the first byte will contain 5, and the byte after it will contain 0 -- recall the dreaded little-endian storage.) Here's how we could specify either choice:

    MOV [WORD DS:SI], 5                ; Copy 5 dec into the word at DS:SI
    MOV [BYTE DS:SI], 5                ; Copy 5 dec into the byte at DS:SI

(Notice also that the explicit usage of "DS:" is not really necessary, because it is the default.)

You'll probably find that it's easy to forget the WORD or BYTE operators. Fortunately the assembler points out any omissions.

Sometimes the indirect addressing mode is called register-indirect addressing.

5. Base + Displacement

The Base + Displacement (or just Base) addressing mode is very similar to the Indirect addressing mode. You are only permitted to use the BX and BP registers, however. You can then add a displacement -- an arbitrary constant -- to the physical address that is generated. If you want to retrieve the byte or word at that is located at, say, 5 bytes after the address specified by DS:BX, then you would use a displacement of 5. You can also add negative displacements.

    MOV AH, [DS:BX + 5]                ; A physical address is calculated
                                       ;  from DS:BX, then 5 bytes are
                                       ;  added to it.  The byte at that
                                       ;  location is copied to AH.

An important note: in this addressing mode, the default segment for BP is SS. This SS:BP address is often used for accessing data on the stack (but don't worry about it now).

    MOV [WORD BP + 2], 0CAFEh          ; Copies CAFE hex to the address
                                       ;  SS:BP, minus 2 bytes.

DS is still the default segment when BX is used. For both BX and BP, you can specify override segments, as usual.

Displacements can be used to address array elements. A displacement of 0 would refer to the first element of a byte-sized array, a displacement of 1 would refer to the second element, etc.

Displacements can also be specified in hex (eg. 0C3h), binary (eg. 1010b), and so on.

6. Base + Index + Displacement

Base + Index + Displacement addressing is just like Base + Displacement addressing, except that you can add in one more displacement value. That displacement value can be specified in either the SI or DI register.

    MOV AX, [BX + SI + 3]              ; A physical address is calculated
                                       ;  from DS:BX, then the contents
                                       ;  of SI are added to that physical
                                       ;  address, and then 3 is added.
                                       ;  The word at this address is
                                       ;  copied to AX.

Again, the default segment for BP is SS, but you can override it:

    SUB [WORD ES:BP + DI - 10h], 5     ; Subtract 5 from the word located
                                       ;  at ES:BP + DI - 10h.

If you don't need to add the constant displacement value, you can either use a displacement of 0, or simply leave out that displacement entirely:

    INC [BYTE CS:BX + DI]

You can even leave out the BX or BP:

    MOV [BYTE SI + 3], 'A'             ; Copy an 'A' to DS:SI + 3
                                       ;  (DS is still the default
                                       ;  segment for both SI and DI)
    INC [WORD ES:DI]                   ; Increment the word at ES:DI

Using just the index part (just the SI or DI, plus an optional displacement) is sometimes called the Indexed addressing mode.

The Base + Displacement and the Base + Index + Displacement addressing modes are useful for working with arrays. The base can point to the first element of the array, and the displacement or index or both can point to positions within that array. (The Base + Index + Displacement mode is useful for arrays where you have "records" composed of more than one byte or word. The base part points to the beginning of the array, the index part can refer to a specific element or record in the array, and the arbitrary constant displacement can refer to a field (offset) in that record.)

Addressing modes can be tricky to deal with at first. We'll get more practice with these addressing modes as we come across their usage in later example programs.

Using the stack

You might already know how a stack data structure works.

A stack works just like a stack of papers. Let's assume I've got a pile of top-secret documents on my desk. I'm going to place more documents onto this stack. First, I'll put the FBI's file on Elvis Presley's whereabouts on the top. Then, on top of that document, I'll add to the stack the evidence of Bill Clinton's drug habit. Finally, I'll put on top the proof that the moon landings were faked.

Later, a government agent clandestinely breaks in and takes the documents off the stack, one at a time. He or she takes the moon landings documents first, and then the Clinton documents next, and the Elvis file last. The documents come off the stack in the reverse order in which they were added to the stack. A stack is therefore called a Last-In, First-Out (LIFO) structure -- the last item to be placed on a stack becomes the first item to be removed.

Stacks on the 8088 can handle 16-bit (word-sized) data. These stacks work the same way as the stack of papers described above: if I were to put the words 1000 hex, 2000 hex, and 3000 hex on the stack, in that order, I would be able to remove them in this order: 3000 hex, 2000 hex, 1000 hex.

To add a word to the stack, we use the PUSH instruction. To remove the most-recently added word from the stack, we use an instruction called POP.

PUSH requires one operand -- a value to push onto the stack. It can be a literal (an arbitrary constant) or a word-sized register. Alternately, you can use one of the other addressing modes to refer to a word.

    PUSH 01234h                        ; Push 1234 hex onto the stack
    PUSH AX                            ; Push the contents of AX onto the
                                       ;  stack
    PUSH DS                            ; Push the contents of DS onto the
                                       ;  stack
    PUSH [WORD ES:DI]                  ; Find the word-sized value at
                                       ;  address ES:DI, and push it onto
                                       ;  the stack

If you supply a literal that could be interpreted as a byte-sized value, such as 50, -100, 0ABh, or 'A', that value is "converted" to word size.

    PUSH 123                           ; 123 dec is pushed onto the stack
                                       ;  as a word-sized value
    PUSH 'A'                           ; 0041 dec is pushed onto the stack

However, a value that is explicitly a byte-sized value will not work:

    PUSH [BYTE DS:SI]                  ; Illegal -- byte sized operand

To pop the latest value from the stack, you use POP, supplying an operand that specifies where you want the value to go:

    POP AX                             ; Pop a value off the stack, and
                                       ;  place it in AX
    POP [MyWord]                       ; Let variable [MyWord] equal the
                                       ;  value popped off the stack
    POP [WORD ES:DI]                   ; Let the word at ES:DI equal the
                                       ;  value popped off the stack

You can't pop a value into a byte-sized destination:

    POP DL                             ; A word won't fit here!
    POP [MyByte]                       ; If variable MyByte is byte-
                                       ;  sized, this won't work
    POP [BYTE DS:BP]                   ; A word won't fit here!

And it doesn't make sense to do this:

    POP 50                             ; What?

Or this:

    POP                                ; Where does the value go?

Or this:

    POP AX, BX                         ; One at a time, please!

What good is a stack? Well, the stack is most commonly used for saving temporary backup copies of variables. Let's say we have some important values in the AX, CX, and DX registers. But we need to call some interrupt service, and that interrupt service requires particular input values in those registers! And after the interrupt call, we want those important values back in AX, CX and DX.

Using the stack offers a simple solution. PUSH the values of the registers onto the stack, modify the registers as necessary and do whatever else is needed, and then POP the saved values back into the original registers. For example:

    ; Save affected registers:
    PUSH AX
    PUSH CX
    PUSH DX

    ; Use INT 21h, Service 2Dh to set the time to 3:30 PM:
    MOV AH, 02Dh                       ; INT 21h, Service 2Dh is "Set Time"
    MOV CH, 15                         ; Let hours = 15 (3:00 PM)
    MOV CL, 30                         ; Let minutes = 30
    MOV DH, 0                          ; Let seconds = 0
    MOV DL, 0                          ; Let 100th's of seconds = 0
    INT 21h                            ; Change the time

    ; Restore affected registers:
    POP DX
    POP CX
    POP AX

Notice the order of the registers. The registers are popped off the stack in the reverse of the order used to push them on the stack. DX is the last value to be put on the stack, so it is the last one retrieved, CX is the second-last value to be pushed on the stack, so it is the second one retrieved, and AX was pushed on the stack first, so it is the last one retrieved.

I suppose you could swap the values of two registers like this:

    ; Swap AX and BX:
    PUSH AX
    PUSH BX
    POP AX
    POP BX

(If you don't see how this works, draw a diagram of a stack and try it out.)

Just a sidenote: the above method works fine for swapping two word-sized registers or memory locations, but you can use another instruction, XCHG, which is slightly faster. XCHG can work with byte-sized data as well:

    XCHG AX, BX                        ; Swap AX and BX
    XCHG CL, DH                        ; Swap CL and DH
    XCHG [BYTE DS:SI], BH              ; Swap BH and the byte at DS:SI
    XCHG AX, [MyWord]                  ; Swap AX and variable MyWord

But, like MOV, XCHG can't handle two memory references as operands. At least one of the operands must be a register.

Back to stacks: stacks can also be used to provide space for temporary variables. Let's save this for later. Stacks are also used when subroutines (or functions in C or functions and procedures in Pascal) are called. Again, let's worry about this only when we need to.

It's important not to push so many elements onto the stack that you occupy all of the space allocated for the stack segment. Recall that at the top of the programs we have seen, there is a "STACK 256" line; this means the stack can store 256 bytes, which means a maximum of 128 words can be on the stack at once. Feel free to increase this number. For most of the small assembler programs we'll be writing, 256 bytes is more than enough.

Filling up the stack segment and exceeding the boundary is called a stack overflow. If you cause a stack overflow, you'll overwrite adjoining parts of memory, and this often eventually crashes the computer (or at least causes odd results, if data is overwritten).

It's equally important not to pop off more elements than you have pushed on. If you were to do so, you would be guilty of creating a stack underflow. Make sure your PUSH instructions are matched with corresponding POP instructions!

How 8088 stacks work

If we were to draw a diagram of a stack that had the elements 1234 hex, 5678 hex, and 9ABC hex pushed onto it (in that order), we might draw something like this:

         Top  +-----------+
              |   9ABCh   |
              +-----------+
              |   5678h   |
              +-----------+
              |   1234h   |
      Bottom  +-----------+

We would expect that, in memory, a stack would start at a low memory address, and as more elements were added, the top of the stack would grow to higher memory addresses.

But... We're using an Intel processor, so something sensible like that is obviously out of the question.

With the 8088, the stack grows downward in memory. The "top" of the stack (that is, the location where the next element to be pushed on the stack would be stored) is located nearer to the bottom of memory. The "bottom" of the stack (that is, where the very first element pushed on the stack would reside) is nearer to the top. Thank you Intel.

SS is the stack segment. SS:0000, the bottom of the stack segment, would be the "top" of the stack if the stack was totally full. SS:SP (SP is the stack pointer) points to the current "top" of the stack (it points to the element that was most recently added; it does not point to the position where the next element will be added). When your program begins, SS and SP are automatically set to point to SS:xxxx, where the xxxx is the hexdecimal equivalent of the number of bytes reserved with the "STACK" line at the top of your program. (If you used "STACK 256", then the xxxx would be 0100 hex, like this: SS:0100.) This address is at the "bottom" of the stack, and because the stack is empty, this address would initially point to the "top" of the stack as well.

Here's a diagram of a stack. This stack can hold six words, so it must have been created with a "STACK 12" line. I'll draw the diagram in a sideways fashion:

          One word     One byte
         |<------->|    |<-->|
                                                                   SS:SP
      SS:0000                                                    (Initial)
         |                                                           |
        \|/                                                         \|/
         *----+----*----+----*----+----*----+----*----+----*----+----*...
         |    |    |    |    |    |    |    |    |    |    |    |    |
         *----+----*----+----*----+----*----+----*----+----*----+----*...
Offsets:  0000 0001 0002 0003 0004 0005 0006 0007 0008 0009 000A 000B 000C
        /|\                                                         /|\
         |          -------------------------------->                |
         |                Increasing addresses                       |
         |                                                           |
         |          <--------------------------------             "Bottom"
         |           Stack grows downward (this way)               (and
         |                                                        initial
    "Potential" top; if the stack become                            top)
    completely filled, this would be the top.

So, in this case, with the 12-byte stack, SP is 000C hex. Also note that the contents of all the memory locations making up the stack are unknown. They could be all zeroes, but they're probably not.

Now, when a word is pushed onto the stack, what happens? Well, first, SP is decremented by two. Then, the value in question is copied to the word at SS:SP.

Here's the diagram after pushing 1234 hex onto our stack:

          One word     One byte
         |<------->|    |<-->|
                                                        Current    (Old
      SS:0000                                            SS:SP     SS:SP)
         |                                                 |         .
        \|/                                               \|/        .
         *----+----*----+----*----+----*----+----*----+----*----+----*...
         |    |    |    |    |    |    |    |    |    |    | 34 | 12 |
         *----+----*----+----*----+----*----+----*----+----*----+----*...
Offsets:  0000 0001 0002 0003 0004 0005 0006 0007 0008 0009 000A 000B 000C
                                                                    /|\
                    -------------------------------->                |
                          Increasing addresses                       |
                                                                     |
                    <--------------------------------             "Bottom"
                     Stack grows downward (this way)

And yes, little-endian storage is still in effect.

Every time a value is added to the stack, the same two steps occur: two is subtracted from SP, and the value is copied to the new SS:SP address.

If we were to push 5678 hex and 9ABC hex onto our stack, we'd get:

          One word     One byte
         |<------->|    |<-->|
                                    Current
      SS:0000                        SS:SP
         |                             |
        \|/                           \|/
         *----+----*----+----*----+----*----+----*----+----*----+----*...
         |    |    |    |    |    |    | BC | 9A | 78 | 56 | 34 | 12 |
         *----+----*----+----*----+----*----+----*----+----*----+----*...
Offsets:  0000 0001 0002 0003 0004 0005 0006 0007 0008 0009 000A 000B 000C
                                                                    /|\
                    -------------------------------->                |
                          Increasing addresses                       |
                                                                     |
                    <--------------------------------             "Bottom"
                     Stack grows downward (this way)

To pop a value off, the word-sized value at the current SS:SP address is copied to whatever location is specified by the POP instruction's operand. Then SS:SP is incremented by two.

If we were to pop off the latest value from our stack, it would look like this:

          One word     One byte
         |<------->|    |<-->|
                                              Current
      SS:0000                                  SS:SP
         |                                       |
        \|/                                     \|/
         *----+----*----+----*----+----*----+----*----+----*----+----*...
         |    |    |    |    |    |    | BC | 9A | 78 | 56 | 34 | 12 |
         *----+----*----+----*----+----*----+----*----+----*----+----*...
Offsets:  0000 0001 0002 0003 0004 0005 0006 0007 0008 0009 000A 000B 000C
                                                                    /|\
                    -------------------------------->                |
                          Increasing addresses                       |
                                                                     |
                    <--------------------------------             "Bottom"
                     Stack grows downward (this way)

The value that was at the top of the stack, 9ABC hex, would have been copied to whatever destination was specified by the POP instruction. Notice that the old value, the 9ABC hex (in the stack), is not erased or overwritten. Eventually it will be, the next time something gets pushed onto the stack.

So, because SS:SP points to the element that is on the top of the stack, if you wanted to take a peek at the value that was at the top of the stack, and you didn't want to pop it off, you could just do this (to copy the value to AX):

    ; Peek at value on the top of the stack
    MOV BX, SP
    MOV AX, [SS:BX]

Sadly, using SS:SP is not permitted (check the addressing modes above), but SS:BX is. Other combinations work too. We can make a copy of SP and use that copy, along with an segment override, to address SS:SP.

And -- why not -- we could add a displacement as well. That way we can look at deeper stack elements (assuming there were more elements on the stack):

    MOV BX, SP
    MOV AX, [SS:BX + 2]                ; Peek at the next value on the
                                       ;  stack

(No, I can't think of a good reason for doing this.)

A quick stack example program

Here's a short program that briefly demonstrates saving and restoring important values using the stack:

------- TEST3.ASM begins -------

%TITLE "Assembler Test Program #3 -- using the stack"
; This program doesn't do anything noticeable (eg. no screen output).

    IDEAL

    MODEL small
    STACK 256

    DATASEG

TestWord                          DW   05678h

    CODESEG

Start:
    ; Let DS point to the DATASEG data segment:
    MOV AX, @data
    MOV DS, AX

    ; Create some "important data":
    MOV AX, 01234h
    MOV DX, 0ABCDh

    ; Save our important data on the stack:
    PUSH AX
    PUSH DX
    PUSH [TestWord]

    ; Trash the important data:
    MOV AX, 09876h
    XOR DX, 0F0F0h
    SUB [TestWord], 300

    ; Get our important data back from the stack:
    POP [TestWord]
    POP DX
    POP AX

    ; Terminate the program:
    MOV AX, 04C00h
    INT 21h
END Start

------- TEST3.ASM ends -------

And no, this program doesn't do anything to indicate whether it worked correctly or not. If you have a debugger such as Turbo Debugger, it would be a good exercise to step through the program and watch the stack as pushing and popping occurs. If you don't have a debugger, you can trace through the steps on paper.

Summary

In this much too lengthy chapter, we've learned all sorts of semi-interesting information about declaring and using variables. We've learned about the different addressing modes, and we've discovered how the stack works. The new instructions we saw were PUSH and POP (and XCHG).

In the next chapter, we'll find out how to make comparisons, how to make decisions ("if...then", in high-level languages), and how to construct loops.

  

Copyright 1997, Kevin Matz, All Rights Reserved. Last revision date: Tue, Jul. 22, 1997.

[Go back]

A project