Floating Point
Register Sizes#
The following image shows registers for fixed point, floating point, and double precission. This follows the IEEE 754 FP specification.
Conversion#
The following shows an example of how to convert between binary and float
In the first example, there are four fractional bits, which gives you a value of 0.1875. In the second example, six fractional bits are used, which gives a value of 0.203125. As you can see, adding only a few fractional bits gives a much higher accuracy.
An example of binary to decimal/float conversion is shown below:
| Binary | Decimal | Conversion |
|---|---|---|
11.0 | 3.0 | |
10.0 | 2.0 | |
1.0 | 1.0 | |
0.1 | 0.5 | |
0.11 | 0.75 |
Adding Floating Point Instructions to Pipeline#
Floating point operations require different execution units from integer operations. This results in onger latency and requires more cycles.
For current CPUs:
| Intel | AMD | |
|---|---|---|
add | 3 | 3 |
multiply | 5 | 3 |
divide | 16 | 13 |
Latency and Initiation Interval#
Latency Definition
How long an operation takes (number of cycles between producing and consuming
instructions)
The following example has a latency of 3:
Initiation Interval Definition
How long to wait before sending some type of instruction
An initiation interval of 1 means you can send instructions every clock cycle
For the following examples, assume:
| Cycle Latency | Initiation Interval | |
|---|---|---|
add | 3 | 1 |
multiply | 6 | 1 |
divide | 24 | 25 |
What the pipeline now looks like:
Example 1#
How many clock cycles does it take to run this code?
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | |
|---|---|---|---|---|---|---|---|---|---|---|---|
L.D | IF | ID | EX | MEM | WB | ||||||
ADD.D | IF | ID | ID | A1 | A2 | A3 | A4 | MEM | WB | ||
S.D | IF | IF | ID | EX | EX | EX | EX | MEM | WB |
Some things to note:
- At cycle four for
ADD.D, there is an extraID. This is because the add cannot happen until the load reaches and completes theMEMstage - At cycle nine,
S.Dcannot enter theMEMstage becauseADD.Dis using it, therefore it is stuck inEXfor an extra cycle
Example 2#
How many clock cycles does it take to run this code? Assume that all operations are independent
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | |
|---|---|---|---|---|---|---|---|---|---|---|---|
MUL.D | IF | ID | M1 | M2 | M3 | M4 | M5 | M6 | M7 | MEM | WB |
ADD.D | IF | ID | A1 | A2 | A3 | A4 | MEM | WB | |||
L.D | IF | ID | EX | MEM | WB | ||||||
S.D | IF | ID | EX | MEM | WB |
Problems with this pipeline#
- Instructions could write registers in an order that is different from program order (this is shown in the above example)
- Structural Hazards
- Instructions may want to use register file at the same time
- If you have many divides
You need to check for hazards. Like before, hazard checking is done in the ID
stage, but now need to check for some new ones. You'll know what the instruction
is, and how many clock cycles it is going to take. Then, you can figure out how
many cycles you need to hold the insturction to determine if the order is correct.
During the decode stage, you check for:
- structural hazards
- data hazards
A few more terms:
Forwarding/Bypassing Definition
Taking a value from a pipeline stage and sending it to another stage