Fixed point numbers and fixed point arithmetic

Introduction

A fixed point number is a value with a integer and fractional part.

Internally, these numbers are stored using a binary representation, where the integer and fractional part are described by a fixed range of binary digits.

The binary representation:

integer part binary point fractional part

... 2⁵ 2⁴ 2³ 2² 2¹ 2⁰ . 2^-1 2^-2 2^-3 2^-4 2^-5 ...

... 32 16 8 4 2 1 . ¹/₂ ¹/₄ ¹/₈ ¹/₁₆ ¹/₃₂ ...

integer part	binary point	fractional part
...	2⁵	2⁴	2³	2²	2¹	2⁰	.	2^-1	2^-2	2^-3	2^-4	2^-5	...
...	32	16	8	4	2	1	.	¹/₂	¹/₄	¹/₈	¹/₁₆	¹/₃₂	...

The number of binary digits assigned to the integer part may be different to the number of digits assigned to the fractional part.

There are many different fixed point representations used, so to identify the representation this general notation is used:

<number of bits assigned to integer part> : <number of bits assigned to fractional part>

For example, a fixed point number using a "12:4" notation has 12 bits used by the integer part and 4 bits used by the fractional part.

This may be represented in a 16-bit value as follows:

Bit Index Position
of binary point Bit index

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Integer part Fractional part

Bit Index	Position of binary point	Bit index
15	14	13	12	11	10	9	8	7	6	5	4		3	2	1	0
Integer part		Fractional part

The position of the binary point is implied by the notation, and no bits are used to represent it.
This notation describes a positive number range.
- The minimum value that can be represented is 1 (base 2), which is ¹/₁₆ (base 10).
- The maximum value that can be represented is 1111111111111111 (base 2), which is 4096.9375 (base 10).

Positive and negative values can also be represented as fixed point numbers. Where positive and negative numbers are represented by the same notation, one bit is used to hold the sign of the number.

An example representation for a positive & negative fixed point number:

Bit Index Position
of binary point Bit index

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Sign Integer part Fractional part

Bit Index	Position of binary point	Bit index
15	14	13	12	11	10	9	8	7	6	5	4		3	2	1	0
Sign	Integer part		Fractional part

In this example:
- When the sign bit is "1" the number is negative.
- When the sign bit is "0" the number is positive.
The minimum value that can be represented is 1111111111111111 (base 2) -2047.9375,
The maximum value that can be represented is 01111111111111111 (base 2) +2047.9375.
The above representation is the same as 2's complement numbers. Therefore existing CPU opcodes can be used to test for signedness.

Advantages:

Arithmetic and logical operations may be performed on fixed point numbers using integer arithmetic. On a Amstrad CPC, this means that fixed point numbers are much faster than floating point numbers.
A fixed point number representation can use less memory to store values. (For example it is possible to have a 8-bit fixed point number). On a Amstrad CPC, floating point numbers use 5-bytes per value.

Disadvantages:

It is easy for a arithmetic operation to produce a "overflow" or "underflow".
A fixed point number has a limited integer range. It is not possible to represent very large and verey small numbers with the same representation.
A fixed point number has limited accuracy. You must choose the representation which will best suit your needs.

overflow: A "overflow" will occur when the result of a arithmetic operation is too large to fit into the fixed representation of a fixed point number
underflow: A "underflow" will occur when the result of a arithmetic operation is too small to fit into the fixed representation of a fixed point number