Skip to content

11. Pipelining

A Lesson on Laundry

Luis, Artur, Stephen, and Ray need to do laundry, washer takes 60 minutes, dryer takes 60 minutes, and folding takes 60 minutes. They can only do laundry on Saturday from 9am to 6pm.

Let's do it sequentially.

alt text

Oh no, we can only do 1/3 of a load an hour! We could buy more machines! But, thats expensive!

What if we pipelined the laundry:

alt text

Now we can do 2/3 of a load an hour with only one set of machines!

Upgrading the Multi-Cycle CPU

We can use this concept to a multi-cycle CPU!

Pipelining doesn't help with latency of a single instruction, but it does help with the throughput of the entire workload.

Different takes operating at different resources can execute simultaneously.

More stages, more potential speedup, but too many stages is not good.

With this we can get the CPI down closer to 1.

The average CPI is generally, $CPI=\frac{n+3}{n}$ and if we go to infinity the CPI would be one. We have to wait a couple of cycles to get everything up and running with full overlapping.

Issues

Structual Hazards

Attempting to use the same resource two different ways simultaneously. E.g.: You get home soaking wet and need to dry your clothes while someone is using the dryer.

In a CPU with a single memory we can't fetch a new instruction while reading a word from memory.

alt text

Structural hazards arise from lack of resources. We could eliminate the hazard by adding more resources. Or we can stall until the resource is available.

Data Hazards

Attempting to use an item before it is ready. E.g.: Only one sock of a pair is found during folding and it's in the dryer! Folding has to wait!

In a CPU, an instruction depends on result of prior instruction still in the pipeline.

Solution 1: wait.

alt text

Solution 2: modify the register a little bit.

alt text

Or the best solution 3: forward (passing the result from a later stage to an earlier one) the ALU output.

alt text

Control Hazards

Attempting to make a decision before condition is evaluated. E.g.: If the dirty clothes are not clean after washing! Then I must wash them again.

In CPU, branches.

alt text

Solution 1: Wait.

alt text

Solution 2: Predict what that the branch is never taken.

alt text

Its a gamble, but if we are right, there is no stall.

And if we are wrong, just fix it since the instructions haven't changed anything.

alt text

If we are correct at predicting the value 10% of the time, then take the complement and we are correct 90% of the time. The worst case is when we are correct 50% of the time.

Law of Diminishing Returns

Let's say we have this program and it takes $102010$ns to run.

alt text

What if we improve the multiplication to include what we need, it combines some registers, make it do the 3 step simultaneously, and ALU only needs certain things. So now, each multiplicatino takes $160$ns. Meaning our speedup is 6 times better.

alt text

However, our program now takes $22010$ns, so our program speed up is:

alt text

What happens as we decrease the proportion of execution time?

alt text

So, the proportion of time spent multiplying was not enough to have gains.

Optimization is a balancing act. As you solve bottlenecks, new ones appear. Improve things to a point, then there are diminishing returns.

If we look at pipelining, we see the same thing.

alt text

This is a regular multi-cycle CPU's CPI.

alt text

It's CPU time is $2660$ns.

Let's use pipelining with literally incorrect branching.

alt text

Now the CPU time is $871$ns which is a speed up of $3.05$.