Multiplication with the Z80

Multiplication

The Z80 CPU does not have a multiply command, so we must write a function to do the task. Here is a method that works for all number-base systems:

Multiply the number in the top row by the last digit of the number in the bottom row to generate a partial result.
Multiply the number in the top row by the number base. (same as adding a 0 digit to the rightside of the number)
Divide the number in the bottom row by the number base, and ignore the remainder. (same as removing the rightmost digit from the number)
Repeat steps 1,2 and 3 until all digits of the number in the bottom row have been used.
Finally add all partial results to generate the final answer

An example of multiplying 32x48:

Stage 1: Multiply number on the top using the last digit of the number in the bottom row:

    32
x    8
------
   256

Giving 256 as the first partial result.

Stage 2 and 3: Multiply the number in the top row by the number base (base 10), and divide the number in the bottom row by the number base:

   320  <- (32*10)
x    4  <- (48/10)
------

Repeat stage 1:

   320
x    4 
------
  1280

Giving 1280 as the second partial result.

We find now that dividing 4 by 10, results in 0, and we have therefore used all digits in the number of the bottom row.

Stage 5: Add all partial results

   256
+ 1280
------
  1536

giving the final result as 1536.

Multiplying two numbers in binary uses exactly the same method. But the following observations apply:

In a binary number a digit can only be 1 or 0. We know that multiplying a number by 0 gives 0, and multiplying a number by 1 gives itself (e.g. 3*1 = 3), therefore if the last digit is 1 the partial result will be the number in the top-row, otherwise it will be 0.
Multiplying a number in binary by the number-base (2) can be done with a "shift-left" operation or adding a number to itself (3+3=6).
Dividing a number in binary by the number-base (2), can be done with a "shift-right".
A digit in binary is a "bit".

These observations lead us to the following algorithm:

;; multiply DE and BC
;; DE is equivalent to the number in the top row in our algorithm
;; and BC is equivalent to the number in the bottom row in our algorithm

    ld a,16     ; this is the number of bits of the number to process
    ld hl,0     ; HL is updated with the partial result, and at the end it will hold 
                ; the final result.
.mul_loop
    srl b
    rr c        ;; divide BC by 2 and shifting the state of bit 0 into the carry
                ;; if carry = 0, then state of bit 0 was 0, (the rightmost digit was 0) 
                ;; if carry = 1, then state of bit 1 was 1. (the rightmost digit was 1)
                ;; if rightmost digit was 0, then the result would be 0, and we do the add.
                ;; if rightmost digit was 1, then the result is DE and we do the add.
    jr nc,no_add	

    ;; will get to here if carry = 1        
    add hl,de   

.no_add
    ;; at this point BC has already been divided by 2

    ex de,hl    ;; swap DE and HL
    add hl,hl   ;; multiply DE by 2
    ex de,hl    ;; swap DE and HL

    ;; at this point DE has been multiplied by 2
    
    dec a
    jr nz,mul_loop  ;; process more bits

We can see from this that multiplication is a long process and uses many CPU cycles. If the the two numbers are not known and are not fixed, this is the only method to multiply a number.

But there are ways we can improve this if the number to multiply by is known:

If we know the maximum and minimum ranges of the numbers we can process less bits to give an accurate answer. e.g. if all numbers were known to be 8 bits, then we would only need to repeat the loop 8 times.
If we know in advance the number we are multiplying against, we can perform the multiplication much faster

As we know, multiplying by 2, is the same as "shifting" a number to the left or adding itself to itself.

So this leads us to the following optimisations:

Multiplying number in HL by 2:

add hl,hl

Multiplying number in HL by 4:

add hl,hl
add hl,hl

Multiplying number in HL by 16:

add hl,hl    ;x2
add hl,hl    ;x4
add hl,hl    ;x8
add hl,hl    ;x16

Therefore, by repeatidly adding a number to itself we are doubling it's value each time, and we can multiply by any "power of 2" e.g. 2,4,8,16,32,64,128,256....

If we use the partial result at one or more of the stages, we can multiply by other values:

Multiplying number in HL by 6:

add hl,hl   ;x2
ld c,l
ld b,h      ; BC = partial result of number in HL multiplied by 2
add hl,hl   ;x4
add hl,bc   ;x6

Multiplying number in HL by 3:

ld c,l
ld b,h      ; BC = partial result of number in HL
add hl,hl   ;x2
add hl,bc   ;x3

However, there comes a point when multiplying using this method can be too slow. An alternative, is to store a table of premultiplied values and based on one number pick the result from the table.

The first stage is to generate the table, and this can be done once at initialisation time.

ld ix,table
ld hl,0     ; 
ld de,320   ; this is the number to multiply by
ld b,256    ; number of entries in the table. The range of input numbers is assumed to be in the
            ; range 0 to 255.
.tb1
ld (ix+0),l ; store value in table
ld (ix+1),h
inc ix
inc ix
add hl,de
djnz tb1

;; at this point we will have a table of 16-bit numbers corresponding to the multiples of 320
;; e.g. 0,320,640,....

Now to do the multiplication we "look-up" the result from the table, using one of the numbers as the index into the table.

;; a is number which will be multiplied by 320

ld l,a      ; use a as the index into the table
ld h,0
add hl,hl   ; generate offset into table to get result   
ld de,table
add hl,de   ; add base of table
            ; hl = memory address of result in table
ld a,(hl)
inc hl
ld h,(hl)   ; hl = result from table

There are other ways to optimise the code so that multiplication can be faster, but this document has shown how multiplication can be done on a Z80 processor. The task of optimisation is left for the programmer.