Fixed point numbers and fixed point arithmetic
Introduction
A fixed point number is a value with a integer and fractional part.
Internally, these numbers are stored using a binary representation, where the integer
and fractional part are described by a fixed range of binary digits.
The binary representation:
integer part | binary point | fractional part |
... | 25 | 24 | 23 | 22 | 21 | 20 | . | 2-1 | 2-2 | 2-3 | 2-4 | 2-5 | ... |
... | 32 | 16 | 8 | 4 | 2 | 1 | . | 1/2 | 1/4 | 1/8 | 1/16 | 1/32 | ... |
The number of binary digits assigned to the integer part may be different to the number of digits
assigned to the fractional part.
There are many different fixed point representations used, so to identify the representation
this general notation is used:
<number of bits assigned to integer part> : <number of bits assigned to fractional part>
For example, a fixed point number using a "12:4" notation has 12 bits used by the integer part
and 4 bits used by the fractional part.
This may be represented in a 16-bit value as follows:
Bit Index | Position of binary point | Bit index |
15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | | 3 | 2 | 1 | 0 |
Integer part | | Fractional part |
- The position of the binary point is implied by the notation, and no bits are used to represent it.
- This notation describes a positive number range.
- The minimum value that can be represented is 1 (base 2), which is 1/16 (base 10).
- The maximum value that can be represented is 1111111111111111 (base 2), which is 4096.9375 (base 10).
Positive and negative values can also be represented as fixed point numbers. Where positive and negative numbers are represented by the same notation, one bit is used to hold the sign of the number.
An example representation for a positive & negative fixed point number:
Bit Index | Position of binary point | Bit index |
15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | | 3 | 2 | 1 | 0 |
Sign | Integer part | | Fractional part |
- In this example:
- When the sign bit is "1" the number is negative.
- When the sign bit is "0" the number is positive.
- The minimum value that can be represented is 1111111111111111 (base 2) -2047.9375,
- The maximum value that can be represented is 01111111111111111 (base 2) +2047.9375.
- The above representation is the same as 2's complement numbers. Therefore existing CPU opcodes can
be used to test for signedness.
Advantages:
- Arithmetic and logical operations may be performed on fixed point numbers using integer
arithmetic. On a Amstrad CPC, this means that fixed point numbers are much faster than floating point numbers.
- A fixed point number representation can use less memory to store values. (For example it is possible to have a
8-bit fixed point number). On a Amstrad CPC, floating point numbers use 5-bytes per value.
Disadvantages:
- It is easy for a arithmetic operation to produce a "overflow" or "underflow".
- A fixed point number has a limited integer range. It is not possible to represent very large and verey small
numbers with the same representation.
- A fixed point number has limited accuracy. You must choose the representation which will best suit your needs.
- overflow
- A "overflow" will occur when the result of a arithmetic operation is too large to fit
into the fixed representation of a fixed point number
- underflow
- A "underflow" will occur when the result of a arithmetic operation is too small to fit
into the fixed representation of a fixed point number