9. Multicycle Design

The single-cycle CPU is rather inefficient if we think about it. We have to wait for memory to fully complete before moving on.

Why not make different instructions take different amounts of time (and by taht, different number of clock cycles).

alt text

Remember the five phases of execute? Well, all instructions have IF, ID, EX, but only some writer to memory/registers.

alt text

For now, assume lw is 5 cycles.

For a multi-cycle CPU, we need to calculate the clock to accommodate a single phase. We can now chop instructions and make the clock faster, which means less time wasted by faster instructions (reduces latency).

alt text

However, in a multi-cycle design, the slowest stage limits the rate, so balanced stages are desired. For example split the memory operation into multiple clock cycles.

Each phase of execution has its own functional unit. Between phases, we need registers to hold onto the data for the next phase.

alt text

To measure this, we use CPI (cycles per instructions) and IPC (instructions per cycle). CPI measures the average number of cycles it takes to complete one instruction, while IPC is just the reciprocal of CPI. The CPI for a single cycle is 1.

Every program is different, and every program has a different instruction mix. An example is below.

alt text

As you can see, it is just the weighted average.

If we have n instructions, $\text{Total Time} = n \cdot \text{CPI} \cdot t \text{seconds}$.

Let's take 500 mega instructions. For a single cycle CPU, with cycle time of 5ns, the total time would be 2.5 seconds. For a multicycle CPU, with CPU of 3.95, cycle of 1ns, the total time would be 1.975 seconds.

The CPU

Here's the CPU.

alt text

And all of the signals with it.

alt text

How it works. The fetch instruction reads instruction from memory and increments PC. The decode makes the controller do its thing and make the ALU add PC to a potential branch offset (just in case).

R-Type

For an R-type, the execute instruction for add s0, s1, s2 for example, makes the ALU add registers A and B. Then write those results back into the register file.

I-Type

The execute instruction for lw s0, 4(s1) for example, makes the ALU adds registers A and Imm to calculate the effective address. The ALU result is used as the address and memory is read. Then the data read from memory is written into the register file.

The execute instruction for beq s0, s1, label makes the ALU substract registers A and B if the result is zero, then branch.

J-Type

The execute instruction for j target basically, just does the jump.

Wait, how are control signals generated on each cycle? Well, single cycles signals don't change during each instruction (combinational circuit). Multi-cycle signals change during each instruction and different signals for each clock cycle means sequential circuit (needs to remember what it did before). To describe this behaviour we need state machines.