Not so much as harder to fill the pipelines, as harder to fill the pipelines when a jump/branch is encountered. The CPU takes a guess at the most likely result from Jump instruction (jumps = choice/function call/change of thread) and continues loading the pipeline with these instructions. However, if the choice goes the other way then the pipeline has to be cleared by executing the operations and discarding the results.
I think the P4 have a pipeline more than 12 instructions deep. Athlons have a pipeline less than 8 instructions deep. Hence the penalty of an incorrect guess is only 2/3 of the penalty of a P4.
Hyperthreading basically eliminates this problem by clearing the pipeline much faster and loading in the right instructions much faster.
Doesn’t seem that jumps happen alot, but how many proggies do you have running on your PC? Now how many drivers? Now how many background processes running? What about other proggies running behind Svchost.exe? (10/svchost present in task manager).
That’s alot of jumps.
Even something as simple as playin an MP3 has about 5 threads alone.
- Winamp Main Proggie
- File IO input
- Buffer thread
- Sound IO output.