inline: The Reality
Eliminating the CALL opcode.
Merging logic directly into the instruction stream.
1. Generated Assembly
In C++, when a function is inlined, the compiler removes the function call instruction and directly
inserts the function’s body at the location where it is used. This changes the structure of the
generated assembly code and eliminates several instructions normally required for a function
call.
A normal function call typically involves:
A normal function call typically involves:
- Saving the return address
- Passing parameters
- Jumping to the function
- Returning control back to the caller
2. Call Elimination
When a function is inlined, these instructions are eliminated. Instead of performing a call, the
compiler places the equivalent instructions of the function body directly into the caller's
instruction stream.
This reduces branching and can allow the compiler to perform further optimizations, such as constant propagation, register reuse, and instruction reordering.
This reduces branching and can allow the compiler to perform further optimizations, such as constant propagation, register reuse, and instruction reordering.
3. Optimization Opportunities
Inlining often enables additional compiler optimizations because the compiler can see the entire
context of the code at the call site. This allows:
As a result, inlining can sometimes produce faster machine code than a traditional function call.
- Better register allocation
- Constant folding when parameters are known at compile time
- Removal of redundant calculations
- Improved instruction scheduling
As a result, inlining can sometimes produce faster machine code than a traditional function call.
4. Trade-offs
Although inlining can improve performance, it also increases the amount of generated machine code.
If a function is inlined at many locations, the binary may grow significantly.
Larger binaries can negatively impact instruction cache performance, which may offset the gains from eliminating function calls. For this reason, modern compilers use sophisticated heuristics to determine when inlining is beneficial.
Larger binaries can negatively impact instruction cache performance, which may offset the gains from eliminating function calls. For this reason, modern compilers use sophisticated heuristics to determine when inlining is beneficial.
5. Practical Insight
In performance-critical domains such as embedded systems, game
engines, and high-performance libraries, developers often rely on
compiler inlining together with optimization flags to ensure that small functions are expanded where
beneficial while avoiding unnecessary code growth.
Next
→ Macros That Break Systems