The move from a non-OS (MS-DOS…was pretty much a binary loader) to a multi-process, preemptive multitasking OS is only part of the picture. DOS didn’t have threads, so there you go (actually you could with one of the various DOS-extenders)
The growing gap between CPU <-> FSB/Northbridge & Southbridge components is a biggy. One of the big early reasons for graphics cards to start adding computation functionality was to deal with pushing every increase amounts of data over a slow communication channel…the fact that you could do ever increasingly cool stuff was a bonus.
The bottom line is that modernish CPUs stall all the time: cache-miss? branch misprediction? data dependency? And these are just examples of short pauses.
Bottom line, given three machines all equal, except some hypothetical intel CPU:
A) 1 core @ 8.8Gz without HyperThreading
B) 4 cores @ 2.2Gz without HyperThreading
C) 4 cores @ 2.0Gz with HyperThreading
No question: I’d take C before B and B before A.
Threads, pico-threads/fibers/green-threads & interrupt handlers are all different, but fulfill a similar need…I didn’t intend to imply otherwise.