The Z80 CPU does not have a multiply command, so we must write a function to do the task. Here is a method that works for all number-base systems:
An example of multiplying 32x48:
Stage 1: Multiply number on the top using the last digit of the number in the bottom row:
32 x 8 ------ 256
Giving 256 as the first partial result.
Stage 2 and 3: Multiply the number in the top row by the number base (base 10), and divide the number in the bottom row by the number base:
320 <- (32*10) x 4 <- (48/10) ------
Repeat stage 1:
320 x 4 ------ 1280
Giving 1280 as the second partial result.
We find now that dividing 4 by 10, results in 0, and we have therefore used all digits in the number of the bottom row.
Stage 5: Add all partial results
256 + 1280 ------ 1536giving the final result as 1536.
Multiplying two numbers in binary uses exactly the same method. But the following observations apply:
;; multiply DE and BC ;; DE is equivalent to the number in the top row in our algorithm ;; and BC is equivalent to the number in the bottom row in our algorithm ld a,16 ; this is the number of bits of the number to process ld hl,0 ; HL is updated with the partial result, and at the end it will hold ; the final result. .mul_loop srl b rr c ;; divide BC by 2 and shifting the state of bit 0 into the carry ;; if carry = 0, then state of bit 0 was 0, (the rightmost digit was 0) ;; if carry = 1, then state of bit 1 was 1. (the rightmost digit was 1) ;; if rightmost digit was 0, then the result would be 0, and we do the add. ;; if rightmost digit was 1, then the result is DE and we do the add. jr nc,no_add ;; will get to here if carry = 1 add hl,de .no_add ;; at this point BC has already been divided by 2 ex de,hl ;; swap DE and HL add hl,hl ;; multiply DE by 2 ex de,hl ;; swap DE and HL ;; at this point DE has been multiplied by 2 dec a jr nz,mul_loop ;; process more bits
We can see from this that multiplication is a long process and uses many CPU cycles. If the the two numbers are not known and are not fixed, this is the only method to multiply a number.
But there are ways we can improve this if the number to multiply by is known:
As we know, multiplying by 2, is the same as "shifting" a number to the left or adding itself to itself.
So this leads us to the following optimisations:
Multiplying number in HL by 2:
add hl,hl
Multiplying number in HL by 4:
add hl,hl add hl,hl
Multiplying number in HL by 16:
add hl,hl ;x2 add hl,hl ;x4 add hl,hl ;x8 add hl,hl ;x16
Therefore, by repeatidly adding a number to itself we are doubling it's value each time, and we can multiply by any "power of 2" e.g. 2,4,8,16,32,64,128,256....
If we use the partial result at one or more of the stages, we can multiply by other values:
Multiplying number in HL by 6:
add hl,hl ;x2 ld c,l ld b,h ; BC = partial result of number in HL multiplied by 2 add hl,hl ;x4 add hl,bc ;x6
Multiplying number in HL by 3:
ld c,l ld b,h ; BC = partial result of number in HL add hl,hl ;x2 add hl,bc ;x3
However, there comes a point when multiplying using this method can be too slow. An alternative, is to store a table of premultiplied values and based on one number pick the result from the table.
The first stage is to generate the table, and this can be done once at initialisation time.
ld ix,table ld hl,0 ; ld de,320 ; this is the number to multiply by ld b,256 ; number of entries in the table. The range of input numbers is assumed to be in the ; range 0 to 255. .tb1 ld (ix+0),l ; store value in table ld (ix+1),h inc ix inc ix add hl,de djnz tb1 ;; at this point we will have a table of 16-bit numbers corresponding to the multiples of 320 ;; e.g. 0,320,640,....
Now to do the multiplication we "look-up" the result from the table, using one of the numbers as the index into the table.
;; a is number which will be multiplied by 320 ld l,a ; use a as the index into the table ld h,0 add hl,hl ; generate offset into table to get result ld de,table add hl,de ; add base of table ; hl = memory address of result in table ld a,(hl) inc hl ld h,(hl) ; hl = result from table
There are other ways to optimise the code so that multiplication can be faster, but this document has shown how multiplication can be done on a Z80 processor. The task of optimisation is left for the programmer.