Author: Matt Taylor
Date: 21:53:04 12/13/02
Go up one level in this thread
On December 13, 2002 at 23:03:22, Robert Hyatt wrote: >On December 13, 2002 at 21:36:57, Matt Taylor wrote: > >>On December 13, 2002 at 11:33:17, Robert Hyatt wrote: >> >><snip> >>>Regardless of hand-waving nay-sayers. It is a logical development of removing >>>one more >>>time-critical piece of code from the Operating System into the microprocessor, >>>namely the >>>task scheduler. >> >>I wouldn't say that. I opened task manager for a second on my NT box and >>discovered that I have 324 threads running. > >You don't have 324 threads running. You have 324 threads in the system, >with probably 320 of them blocked long-term waiting on I/O thru a socket or >whatever. It is the other _four_ that are interesting... The threads that >are really computing. Not the threads that are doing I/O or just sleeping >waiting on some event to happen. > > >> For HT to replace the NT task >>scheduler, it would need the capability to handle at least the 324 concurrent >>processes I have running now (technically it would need to handle *many* more). >>I quote the numbers for NT because Unix variants are usually not as prolific >>with scheduling units, usually because Unix threads/processes aren't very >>lightweight... >> >> > > >See above, that is simply not true. First, unix supports both types of threads, >lightweight and heavyweight. But none of that matters. The only interesting >threads from the CPU's perspective are the threads that are not blocked waiting >on I/O and other stuff. Just the threads that are ready to compute... > >And hyper-threading handles those perfectly... Yes, some variants handle light-weight threads. The traditional unit of scheduling in Unix has always been the process. That has affected the architecture somewhat, just as NT's lightweight threads means developers can throw a thread at a given problem. Perhaps that's why the subsystems of NT inexcusably create over 200 threads at boot-time. Anyway, that's all I meant -- NT has a very large scheduling queue. >>HT will not scale to large numbers of tasks. The IA-32 register set has 8 32-bit >>general registers, 8 80-bit FPU registers, 2 16-bit FPU control registers, and 8 >>128-bit SSE registers. This means each logical CPU requires 244 bytes of >>application register file alone. For simplicitly, I did not include the 3 groups >>of system registers, the MTRRs, or the MSRs. There are additional caches which >>would not allow HT to scale unless they were duplicated. Intel is not about to >>put MBs of fast-access register file on an IA-32 processor. It would make your >>128-cpu HT Pentium 5 cost more than a cluster of Itaniums with negligible >>performance gain over a dual- or quad-Xeon system. > >Want to bet? 10 years ago they could not put cache on-chip due to space >limitations. That is now gone. With up to 2mb of L2 cache on chip today, >they really can do whatever they want in the future. And there are not >really duplicate sets of registers. Just duplicate rename tables which are >much smaller since they only store small pointers to real registers, rather >than 32 bit values. If HT does not have 2 physical sets of registers, what do the remap tables point to? Intel docs actually state that the logical CPU has a duplicated set of registers. Also, it is important to differentiate between registers and L2 cache which can take from 6-300 clocks to access. Even the 8 KB L1 cache that the P4 has takes 2 cycles to access. If it were possible to build a chip with such vast amounts of fast cache memory, there would be no such thing as an on-chip cache heirarchy; there would be an L1 cache on-chip and -maybe- an L2 cache on the motherboard. The L1 caches on Athlon and P4 are a good example of how size limits speed. The L1 cache on Athlon requires 3-cycles to access. On P4 it takes 2-cycles. Why? P4 has 8 KB of L1 data. Athlon has 64 KB of L1 data. The other argument worth making is that HT will hit diminishing returns very quickly. It -may- not even be worth going to quad-HT. The main reason why HT gets any performance gains is because two threads don't fully utilize the CPU's execution capacity. It is convenient when a cache miss occurs because one thread can utilize the full capacity of the processor, but across -most- applications that is rare. Additionally, the processor has the ability to speculatively fetch data and code. Cache misses are rare. One of my machines has a BIOS option to disable the on-chip caches. When I disable it, my 1.2 GHz Thunderbird runs extremely slow. Every memory access effectively becomes a cache miss. If you have a machine with this option, you can try it and see. If cache misses happened often enough to make a viable impact on HT, you wouldn't see a big difference. >>HT is merely a way to make the existing hardware more efficient. If it were >>anything more, it would add -additional- hardware registers so the OS could >>control the scheduling algorithm and specify the location of the ready queue. It >>would also add instructions that would allow the processor to switch tasks. > >The processor _already_ is doing this. But for processes that are ready to >run rather than for processes that are long-term-blocked for I/O, etc. Yes, but the scheduler's job is to pick who runs, when they run, and how long they run. HT only affects the first by allowing the scheduler to pick two tasks to run instead of just one. HT isn't replacing the scheduler; it only complicates it. FYI, HyperThreading looks like a regular CPU to the operating system. There may be some means of communicating that it's an HT CPU, but Intel made HT backward-compliant. -Matt
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.