Number Representation
In computers, numbers are represented with a finite amount of memory. The smallest unit of memory is a bit which can take two values 0 or 1. Due to the use of bits for memory, it is easier to represent numbers using the binary system in computers.
Most computers use the binary system for representing numbers. Contrary to this, most handheld calculators use the decimal or base-10 system to represent numbers.
Computers represent integers and real numbers in different ways. In this article, we will discuss how real numbers are represented in computers.
There are two ways in which real numbers are represented. One is fixed point representation and the other is floating point representation.
Fixed Point Representation
In fixed point representation, the number of decimal digits is fixed. This is achieved by allocating fixed memory for the decimal part. For example, if we allocate 3 digits for decimal then the numbers we can represent are:
Fixed point representation minimizes the absolute error of representation. It can't represent a large range of numbers. It is used in places where numbers always have fixed numbers of decimals like representing money. Generally, fixed point numbers are not available at the hardware level. But there are plenty of libraries that provide fixed-point number support using software implementation like decimal package for Python.
Importance of significant digits
One of the major flaws of fixed point representation is that it doesn't take into account the importance of significant digits. For example, the decimal part in 200.001 has very small importance. If we omit 0.001 from 200.001, there won't be a huge error in the representation of the number. But considering the number 0.0001, there is a huge importance for the decimal digits and a small change can cause large errors in representation.
Floating Point Representation
In floating point representation, numbers are represented in scientific notation with a fixed number of significant digits and an exponent. It properly accounts for the importance of significant digits and can represent very large or very small numbers with small relative errors. A number represented in floating point notation is called a float. A real number is represented in floating point notation as:
where, is the fractional part also called mantissa. is called radix, it is for binary system and for decimal system. is the exponent. Both mantissa and exponent can take positive as well as negative values. Some numbers in floating point representation are:
The floating point representation can only represent a discrete subset of real numbers. The numbers will be dense in the region -1 to 1 and sparse at the extreme ends. For example, if we take mantissa to be 1 digit long and also exponent to be 1 digit long then the numbers that can be represented are:
we can see that it is impossible to represent as it will cause overflow. Numbers like can't be represented either due to the 1-digit mantissa.
Similarly, the absolute error for representation is variable while the relative error is always less than 100%(not considering overflow). For example, when we round off as , the absolute error is and the relative error is about 11%. But when we round off as the absolute error is only while the relative error is about 24%.
Characteristics of Floating Point Representation
1. Non Unique
In FP representation, a single number may be represented in many different ways. For example, the number can be represented as or or . All of these are valid representations. To avoid confusion, floating point numbers are represented in normalized form like . One way of normalization is to represent the mantissa in the form , where is non-zero, and then adjust the exponent for accurate representation.
2. Asymmetric Memory Usage
In FP representation, half of memory is used to represent numbers from to and the other half of memory is used to represent the other part.
To understand this, let's assume that the numbers are always normalized. In normalized form, +ve exponents represent numbers with a magnitude bigger than . Similarly, -ve exponents represent numbers with a magnitude smaller than . We also know that the number of -ve exponents is half the total number of exponents. This directly implies that the count of numbers between -1 and 1 will be half the count of the numbers in full range.
3. Minimizes Relative Error
Floating point representation minimizes the relative error. This is because of fixed-sized mantissa and the use of exponent. The exponent helps maintain the magnitude of the number, while the fixed-sized mantissa truncates the number of significant digits.
Floating Point Arithmetics
Floating point numbers need to be stored in normalized exponent form. After any arithmetic operation, the result should also be in normalized form. We take two numbers and for demonstrating the algorithms.
Addition Algorithm
Our goal is to find .
- Find exponent:
- Shift: towards right by digits and towards right by digits.
- Add Mantissa: Set .
- Normalize: if , then right-shift by 1 digit and increase by 1.
Given and find , provided number of digits in mantissa=3 and number of digits in exponent=1.
Solution
Given: and
Find exponent:
Shift:
Add Mantissa:
Normalize: Since, , and .
Therefore,
Subtraction Algorithm
Subtraction is similar to addition but the normalization will be different. Instead of the magnitude of mantissa getting bigger than 1, it can be smaller than 0.1 and thus normalization will be the opposite.
Our goal is to find .
- Find exponent:
- Shift: towards right by digits and towards right by digits.
- Subtract mantissa: Set .
- Normalize: if , then left-shift by 1 digit and decrease by 1.
Given and find , provided number of digits in mantissa=3 and number of digits in exponent=1.
Solution
Given: and
Find exponent:
Shift:
Subtract mantissa:
Normalize: Since, , and .
Therefore,
Multiplication Algorithm
Our goal is to find
- Find exponent:
- Multiply mantissa:
- Normalize: if , then left-shift by 1 and decrease by 1
Given and find , provided number of digits in mantissa=3 and number of digits in exponent=1.
Solution
Given: and
Find exponent:
Multiply mantissa:
Normalize: Since, , and .
Therefore,
Division Algorithm
Our goal is to find
- Find exponent:
- Divide mantissa:
- Normalize: if , then right-shift by 1 and increase by 1
Given and find , provided number of digits in mantissa=3 and number of digits in exponent=1.
Solution
Given: and
Find exponent:
Multiply mantissa:
Normalize: Since, , and .
Therefore,
Fixed Point vs Floating Point
The following table summarizes the differences between fixed-point and floating-point representations:
Fixed Point | Floating Point |
---|---|
1. Fixed number of digits is used to represent the decimal part of a real number. | 1. Fixed number of significant digits plus an exponent is used to represent a real number. |
2. It minimizes the absolute error of representation. | 2. It minimizes the relative error of representation. |
3. The absolute error of representation is constant while the relative error of representation is variable and can exceed 100% | 3. Both absolute and relative errors of representations are variable but relative error is always less than 100%. |
4. It can't represent very large or very small numbers. | 4. It can represent both very large and very small numbers. |
5. Used for financial and accounting purposes. | 5. Used for scientific calculations. |
6. Generally not available in hardware implementation. | 6. Generally available in hardware implementation with standards such as IEEE 754. |
7. Example: 1.002, 7.888, 0.001 etc | 7. Example: , etc. |