Page 109 - Computer_Science_F5
P. 109

Computer Science  Table 2.3: Pipeline without ILP  5 MEM WB  7  Cycles  9  10  11  12  13  14


            Instruction
                                              6
                                                        8
                                    4
                         1
                             2
                                3
            result += a + b
                                    MEM WB
                                EX
                             ID
                         IF
            result += a + b
                                ID
                                    EX
                             IF
          FOR ONLINE READING ONLY
            result += a + b
            result += a + b     IF  ID   EX   MEM WB
                                         ID
                                    IF
                                              EX
                                                   MEM WB
            result += a + b              IF   ID   EX   MEM
            result += a + b                   IF   ID   EX   MEM WB
            result += a + b                        IF   ID   EX    MEM WB
            result += a + b                             IF   ID    EX   MEM WB
            result += a + b                                  IF    ID   EX   MEM WB
            result += a + b                                        IF   ID   EX   MEM WB
           Loop unrolling can duplicate the loop body, allowing the processor to perform multiple
           additions concurrently. In our previous example, we unrolled the loop by a factor of 2,
           creating two independent additions. This means the unrolled loop results in 5 blocks,
           each with two additions, as shown in Table 2.4.
           Table 2.4: Loop unrolling, a factor of 2
            result += a + b;                        Block 1
            result += a + b;
            result += a + b;
            result += a + b;                        Block 2
            result += a + b;                        Block 3
            result += a + b;
            result += a + b;                        Block 4
            result += a + b;
            result += a + b;
            result += a + b;                        Block 5

           Modern processors can fetch, decode, and  pipeline  in each clock cycle as seen
           execute 2 of the additions concurrently  in Table 2.5. The total execution time
           on different execution units, assuming no   with loop unrolling becomes 9 cycles
           data dependencies exist between them.     (assuming no data dependencies and no
           This overlapping of instructions can
           potentially reduce the total execution time   hazards). This means, that ILP, through
           compared to the pipeline without ILP:     the compilation technique of loop
                                                     unrolling improves our simple program
           Since we unrolled by a factor of 2,       by a factor of: 14/9 = 1.5.
           two instructions will be issued to the

                                                 100
                                                                for Advanced Secondary Schools



     Computer Science Form 5.indd   100                                                     23/07/2024   12:33
   104   105   106   107   108   109   110   111   112   113   114