Compiler Optimizations
Compiler Optimizations#
- Assume the following latencies (# of cycles between instructions)
- FP ALU Op to Store double (2)
- Load double to FP ALU Op (1)
- Load double to Store double (0)
- FP ALU Op to another FP ALU Op (3)
- ALU to ALU (1)
Example 1#
Loop Unrolling#
This optimization is called loop unrolling: generate code with multiple copies of the loop
Code Scheduling#
This is called code scheduling: Reorder instructions to eliminate stalls. A limitation is the number of registers
Example 2: Software Pipelining#
Running instructions for different iterations of the original loop in one iteration of the optimized loop
Load - Add - Store

Each time you run the loop, you run a SAL. In the very beginning, you need to do the first Load and add. This technique is called software pipelining.