Read it here: blog.codingconfessions.com/p/what-makes...
Read it here: blog.codingconfessions.com/p/what-makes...
The article has all the details correctly explained.
The article has all the details correctly explained.
- loop unrolling
- function inlining
- SIMD vectorization
- branch elimination
- cache optimization
If you are someone who thinks about performance, I think you will find it worth your time.
- loop unrolling
- function inlining
- SIMD vectorization
- branch elimination
- cache optimization
If you are someone who thinks about performance, I think you will find it worth your time.
- number of instructions executed (dynamic instruction count)
- instructions executed per cycle (IPC)
- CPU clock cycle
- number of instructions executed (dynamic instruction count)
- instructions executed per cycle (IPC)
- CPU clock cycle
The law defines CPU performance in terms of three factors:
Performance = 1 / (instruction count x instructions per cycle x cycle time)
The law defines CPU performance in terms of three factors:
Performance = 1 / (instruction count x instructions per cycle x cycle time)
Does something like this exist? Maybe there is: the Iron Law of performance.
Does something like this exist? Maybe there is: the Iron Law of performance.
blog.codingconfessions.com/p/python-per...
blog.codingconfessions.com/p/python-per...
blog.codingconfessions.com/p/hardware-a...
blog.codingconfessions.com/p/hardware-a...
In the article you will find a couple of example on how to do that
In the article you will find a couple of example on how to do that
If the prediction is wrong, then the speculative work goes wasted and the CPU has to start from scratch to fetch the right set of instructions and execute them.
If the prediction is wrong, then the speculative work goes wasted and the CPU has to start from scratch to fetch the right set of instructions and execute them.
To avoid this wastage of time and resource the CPU employs branch predictors to predict the branch direction and speculatively execute those instructions.
To avoid this wastage of time and resource the CPU employs branch predictors to predict the branch direction and speculatively execute those instructions.
All real-world code is non-linear in nature, it consists of branches, such as if/else, switch case, function calls, loops, etc. The execution flow depends on the branch result and that is not known until the processor executes it.
All real-world code is non-linear in nature, it consists of branches, such as if/else, switch case, function calls, loops, etc. The execution flow depends on the branch result and that is not known until the processor executes it.
In the article I explain how caches work and how can you write code to take advantage of caches to avoid the penalty of slow main memory access.
In the article I explain how caches work and how can you write code to take advantage of caches to avoid the penalty of slow main memory access.
In the article I explain how ILP works with code examples to demonstrate how you can take advantage of it.
In the article I explain how ILP works with code examples to demonstrate how you can take advantage of it.