Chapter 3: Instruction Cycle and Processor Execution

Introduction

Computer programs are ultimately executed as sequences of low-level machine instructions inside the CPU. Each instruction passes through multiple internal execution stages such as instruction fetch, decoding, arithmetic processing, memory access, and result write-back. This complete execution flow is known as the Instruction Cycle.

Modern processors improve execution throughput using pipelining, where multiple instructions are processed simultaneously across different execution stages.

Understanding the instruction cycle is fundamental for learning how processors interact internally with registers, ALUs, memory systems, control units, and system buses. This knowledge forms the foundation for understanding processor performance, pipelining, interrupts, and embedded firmware execution behavior.

CPU Internal Architecture Flow

Control Unit (CU)

↓

Registers

↔

ALU

↔

Memory Interface

↓

SYSTEM BUS

What is an Instruction?

An instruction is a binary command that dictates a specific hardware operation for the execution unit to perform.

Examples include:

•Arithmetic addition
•Data movement
•Logical operations
•Jump/Branch operations
•Memory read/write transactions

All high-level C programs are compiled down to this continuous stream of machine instructions.

Basic Instruction Execution Flow

A CPU executes a program as a continuous stream of machine instructions processed in a defined sequence. This flow generally moves through 5 distinct hardware stages:

5-Stage Execution Flow

Fetch

→

Decode

→

Execute

→

Memory Access

→

Write Back

These stages are hardwired directly into the processor datapath circuitry.

Fetch Stage

During the Fetch stage, the control unit retrieves the next machine instruction from memory.

The Program Counter (PC) dictates the memory address of the next instruction in the stream.

                            Program Counter → Memory Address → Instruction Fetch → Instruction Register
                        

After fetching, the PC automatically increments to point to the next sequential instruction.

Decode Stage

The Control Unit translates the decoded opcode into internal control signals that coordinate register access, ALU behavior, memory operations, and data movement across the processor datapath.

It identifies:

•Operation type
•Source operands
•Destination registers
•Hardware units needed

Instruction Register → Control Unit → Operation Decode

This stage prepares the hardware circuitry for the upcoming execution phase.

Execute Stage

During execution, the Arithmetic Logic Unit (ALU) performs arithmetic, logical, comparison, or bitwise operations on operand data.

Operations include:

•Mathematical calculations
•Register manipulation
•Data comparisons
•Bit shifting
•Logical boolean operations

Operands → ALU → Operation Result

This is the primary computational stage inside the execution engine.

Memory Access Stage

Instructions that operate on external data require memory transactions through the system bus hierarchy. These operations may involve RAM access, cache interaction, or peripheral communication.

Examples include:

•Loading data from SRAM
•Storing values into hardware peripheral registers

Instruction → Memory Read / Write → Data Returned

Memory access latency is generally the slowest step in the execution cycle and significantly affects performance.

Write Back Stage

The final computed result is written back into the target destination. This could be a core register, a memory address, or a memory-mapped peripheral.

Operation Result → Register / Memory Update

This commits the state change and completes the execution of the machine instruction.

Complete Instruction Pipeline

Modern CPU architectures overlap multiple instructions inside the hardware pipeline. Pipelining improves instruction throughput by allowing different execution stages to operate concurrently on separate instructions during each clock cycle.

Instruction Pipeline Overlap

Instruction Stream	Cycle 1	Cycle 2	Cycle 3	Cycle 4	Cycle 5
Instruction 1	Fetch	Decode	Execute	Memory	Write Back
Instruction 2	-	Fetch	Decode	Execute	Memory
Instruction 3	-	-	Fetch	Decode	Execute

Notice that starting at Cycle 5, the pipeline is fully saturated. This overlapping mechanism drastically accelerates hardware performance without needing higher clock frequencies.

Branch Instructions

Branch instructions modify the standard sequential control flow of a program.

Examples include:

•Function calls
•Loops
•If-else conditions
•Jump routines

Branch Control Flow

Condition Check

↙ TRUE

Jump to Target Address
(Flush Pipeline)

FALSE ↘

Next Sequential Instruction
(Continue Pipeline)

Branches can disrupt pipeline flow and reduce overall efficiency because pre-fetched instructions may need to be flushed if a branch is taken.

Processor Flags

During the execution phase, the ALU updates internal condition flags based on computation results.

These status flags help the control unit make branching decisions during conditional operations.

Condition Flag	Trigger Condition
Zero Flag (Z)	Result equals exactly zero
Carry Flag (C)	Unsigned addition carry or subtraction borrow occurred
Overflow Flag (V)	Signed overflow occurred across boundary limit
Negative Flag (N)	Result is a negative number

Instruction Timing

Execution timing depends heavily on instruction complexity and memory latency.

Simple logical instructions may execute in a single clock cycle, while memory transactions or complex arithmetic take multiple cycles.

Clock Cycles → Instruction Progress → Execution Complete

This timing directly impacts overall execution throughput and real-time responsiveness.

Example Execution Sequence

Example: sum = a + b

The processor datapath steps through the following sequence:

Fetch the ADD machine instruction from memory
Decode the addition opcode and generate internal control signals
Load operands 'a' and 'b' from the register file
Execute the addition inside the ALU
Write back the 'sum' result into the target destination register

Datapath Flow

Source

RAM

→

Load

Regs

→

Exec

ALU

→

Result

Regs

→

Store

RAM

💡 Important Points

→Programs execute as ordered streams of machine instructions
→Instruction execution is divided into multiple hardware-controlled stages
→The Control Unit orchestrates instruction sequencing and datapath control
→The ALU performs arithmetic, logical, and comparison operations
→Memory access latency significantly affects execution performance
→Pipelining improves throughput by overlapping instruction stages
→Branch instructions may disrupt pipeline flow and reduce efficiency

🚀 Embedded Systems Perspective

Instruction execution behavior directly affects interrupt latency, real-time determinism, execution timing, and firmware responsiveness in embedded systems. Pipeline stalls, branch behavior, cache latency, and memory access timing all influence real-time performance characteristics.

Understanding low-level execution flow helps embedded developers optimize firmware, analyze timing bottlenecks, improve interrupt handling, and design predictable real-time systems for microcontrollers, DSPs, and high-performance embedded processors.

🔹 Next Chapter

In the next chapter, we will explore memory hierarchy, organization, and layout.

•CPU Registers
•Cache Memory
•RAM vs ROM
•EEPROM Memory
•Memory Layout
•Embedded Memory Architecture

← Chapter 2: CPU Organization Chapter 4: Memory Hierarchy & Organization →