Introduction
Computer programs are ultimately executed as sequences of low-level machine instructions inside the CPU. Each instruction passes through multiple internal execution stages such as instruction fetch, decoding, arithmetic processing, memory access, and result write-back. This complete execution flow is known as the Instruction Cycle.
Modern processors improve execution throughput using pipelining, where multiple instructions are processed simultaneously across different execution stages.
Understanding the instruction cycle is fundamental for learning how processors interact internally with registers, ALUs, memory systems, control units, and system buses. This knowledge forms the foundation for understanding processor performance, pipelining, interrupts, and embedded firmware execution behavior.
What is an Instruction?
An instruction is a binary command that dictates a specific hardware operation for the execution unit to perform.
Examples include:
- •Arithmetic addition
- •Data movement
- •Logical operations
- •Jump/Branch operations
- •Memory read/write transactions
All high-level C programs are compiled down to this continuous stream of machine instructions.
Basic Instruction Execution Flow
A CPU executes a program as a continuous stream of machine instructions processed in a defined sequence. This flow generally moves through 5 distinct hardware stages:
These stages are hardwired directly into the processor datapath circuitry.
Fetch Stage
During the Fetch stage, the control unit retrieves the next machine instruction from memory.
The Program Counter (PC) dictates the memory address of the next instruction in the stream.
Program Counter → Memory Address → Instruction Fetch → Instruction Register
After fetching, the PC automatically increments to point to the next sequential instruction.
Decode Stage
The Control Unit translates the decoded opcode into internal control signals that coordinate register access, ALU behavior, memory operations, and data movement across the processor datapath.
It identifies:
- •Operation type
- •Source operands
- •Destination registers
- •Hardware units needed
Instruction Register → Control Unit → Operation Decode
This stage prepares the hardware circuitry for the upcoming execution phase.
Execute Stage
During execution, the Arithmetic Logic Unit (ALU) performs arithmetic, logical, comparison, or bitwise operations on operand data.
Operations include:
- •Mathematical calculations
- •Register manipulation
- •Data comparisons
- •Bit shifting
- •Logical boolean operations
Operands → ALU → Operation Result
This is the primary computational stage inside the execution engine.
Memory Access Stage
Instructions that operate on external data require memory transactions through the system bus hierarchy. These operations may involve RAM access, cache interaction, or peripheral communication.
Examples include:
- •Loading data from SRAM
- •Storing values into hardware peripheral registers
Instruction → Memory Read / Write → Data Returned
Memory access latency is generally the slowest step in the execution cycle and significantly affects performance.
Write Back Stage
The final computed result is written back into the target destination. This could be a core register, a memory address, or a memory-mapped peripheral.
Operation Result → Register / Memory Update
This commits the state change and completes the execution of the machine instruction.
Complete Instruction Pipeline
Modern CPU architectures overlap multiple instructions inside the hardware pipeline. Pipelining improves instruction throughput by allowing different execution stages to operate concurrently on separate instructions during each clock cycle.
| Instruction Stream | Cycle 1 | Cycle 2 | Cycle 3 | Cycle 4 | Cycle 5 |
|---|---|---|---|---|---|
| Instruction 1 | Fetch | Decode | Execute | Memory | Write Back |
| Instruction 2 | - | Fetch | Decode | Execute | Memory |
| Instruction 3 | - | - | Fetch | Decode | Execute |
Notice that starting at Cycle 5, the pipeline is fully saturated. This overlapping mechanism drastically accelerates hardware performance without needing higher clock frequencies.
Branch Instructions
Branch instructions modify the standard sequential control flow of a program.
Examples include:
- •Function calls
- •Loops
- •If-else conditions
- •Jump routines
(Flush Pipeline)
(Continue Pipeline)
Branches can disrupt pipeline flow and reduce overall efficiency because pre-fetched instructions may need to be flushed if a branch is taken.
Processor Flags
During the execution phase, the ALU updates internal condition flags based on computation results.
These status flags help the control unit make branching decisions during conditional operations.
| Condition Flag | Trigger Condition |
|---|---|
| Zero Flag (Z) | Result equals exactly zero |
| Carry Flag (C) | Unsigned addition carry or subtraction borrow occurred |
| Overflow Flag (V) | Signed overflow occurred across boundary limit |
| Negative Flag (N) | Result is a negative number |
Instruction Timing
Execution timing depends heavily on instruction complexity and memory latency.
Simple logical instructions may execute in a single clock cycle, while memory transactions or complex arithmetic take multiple cycles.
Clock Cycles → Instruction Progress → Execution Complete
This timing directly impacts overall execution throughput and real-time responsiveness.
Example Execution Sequence
Example: sum = a + b
The processor datapath steps through the following sequence:
- Fetch the ADD machine instruction from memory
- Decode the addition opcode and generate internal control signals
- Load operands 'a' and 'b' from the register file
- Execute the addition inside the ALU
- Write back the 'sum' result into the target destination register
💡 Important Points
- →Programs execute as ordered streams of machine instructions
- →Instruction execution is divided into multiple hardware-controlled stages
- →The Control Unit orchestrates instruction sequencing and datapath control
- →The ALU performs arithmetic, logical, and comparison operations
- →Memory access latency significantly affects execution performance
- →Pipelining improves throughput by overlapping instruction stages
- →Branch instructions may disrupt pipeline flow and reduce efficiency
🚀 Embedded Systems Perspective
Instruction execution behavior directly affects interrupt latency, real-time determinism, execution timing, and firmware responsiveness in embedded systems. Pipeline stalls, branch behavior, cache latency, and memory access timing all influence real-time performance characteristics.
Understanding low-level execution flow helps embedded developers optimize firmware, analyze timing bottlenecks, improve interrupt handling, and design predictable real-time systems for microcontrollers, DSPs, and high-performance embedded processors.
🔹 Next Chapter
In the next chapter, we will explore memory hierarchy, organization, and layout.
- •CPU Registers
- •Cache Memory
- •RAM vs ROM
- •EEPROM Memory
- •Memory Layout
- •Embedded Memory Architecture