Computer Chess Club Archives

Search

Terms

Messages

Subject: Re: final note I presume

Author: Matt Taylor

Date: 21:53:04 12/13/02

On December 13, 2002 at 23:03:22, Robert Hyatt wrote:

>On December 13, 2002 at 21:36:57, Matt Taylor wrote:
>
>>On December 13, 2002 at 11:33:17, Robert Hyatt wrote:
>>
>><snip>
>>>Regardless of hand-waving nay-sayers.  It is a logical development of removing
>>>one more
>>>time-critical piece of code from the Operating System into the microprocessor,
>>>namely the
>>>task scheduler.
>>
>>I wouldn't say that. I opened task manager for a second on my NT box and
>>discovered that I have 324 threads running.
>
>You don't have 324 threads running.  You have 324 threads in the system,
>with probably 320 of them blocked long-term waiting on I/O thru a socket or
>whatever.  It is the other _four_ that are interesting...  The threads that
>are really computing.  Not the threads that are doing I/O or just sleeping
>waiting on some event to happen.
>
>
>> For HT to replace the NT task
>>scheduler, it would need the capability to handle at least the 324 concurrent
>>processes I have running now (technically it would need to handle *many* more).
>>I quote the numbers for NT because Unix variants are usually not as prolific
>>with scheduling units, usually because Unix threads/processes aren't very
>>lightweight...
>>
>>
>
>
>See above, that is simply not true.  First, unix supports both types of threads,
>lightweight and heavyweight.  But none of that matters.  The only interesting
>threads from the CPU's perspective are the threads that are not blocked waiting
>on I/O and other stuff.  Just the threads that are ready to compute...
>
>And hyper-threading handles those perfectly...

Yes, some variants handle light-weight threads. The traditional unit of
scheduling in Unix has always been the process. That has affected the
architecture somewhat, just as NT's lightweight threads means developers can
throw a thread at a given problem. Perhaps that's why the subsystems of NT
inexcusably create over 200 threads at boot-time. Anyway, that's all I meant --
NT has a very large scheduling queue.

>>HT will not scale to large numbers of tasks. The IA-32 register set has 8 32-bit
>>general registers, 8 80-bit FPU registers, 2 16-bit FPU control registers, and 8
>>128-bit SSE registers. This means each logical CPU requires 244 bytes of
>>application register file alone. For simplicitly, I did not include the 3 groups
>>of system registers, the MTRRs, or the MSRs. There are additional caches which
>>would not allow HT to scale unless they were duplicated. Intel is not about to
>>put MBs of fast-access register file on an IA-32 processor. It would make your
>>128-cpu HT Pentium 5 cost more than a cluster of Itaniums with negligible
>>performance gain over a dual- or quad-Xeon system.
>
>Want to bet?  10 years ago they could not put cache on-chip due to space
>limitations.  That is now gone.  With up to 2mb of L2 cache on chip today,
>they really can do whatever they want in the future.  And there are not
>really duplicate sets of registers.  Just duplicate rename tables which are
>much smaller since they only store small pointers to real registers, rather
>than 32 bit values.

If HT does not have 2 physical sets of registers, what do the remap tables point
to? Intel docs actually state that the logical CPU has a duplicated set of
registers.

Also, it is important to differentiate between registers and L2 cache which can
take from 6-300 clocks to access. Even the 8 KB L1 cache that the P4 has takes 2
cycles to access. If it were possible to build a chip with such vast amounts of
fast cache memory, there would be no such thing as an on-chip cache heirarchy;
there would be an L1 cache on-chip and -maybe- an L2 cache on the motherboard.

The L1 caches on Athlon and P4 are a good example of how size limits speed. The
L1 cache on Athlon requires 3-cycles to access. On P4 it takes 2-cycles. Why? P4
has 8 KB of L1 data. Athlon has 64 KB of L1 data.

The other argument worth making is that HT will hit diminishing returns very
quickly. It -may- not even be worth going to quad-HT. The main reason why HT
gets any performance gains is because two threads don't fully utilize the CPU's
execution capacity. It is convenient when a cache miss occurs because one thread
can utilize the full capacity of the processor, but across -most- applications
that is rare. Additionally, the processor has the ability to speculatively fetch
data and code. Cache misses are rare.

One of my machines has a BIOS option to disable the on-chip caches. When I
disable it, my 1.2 GHz Thunderbird runs extremely slow. Every memory access
effectively becomes a cache miss. If you have a machine with this option, you
can try it and see. If cache misses happened often enough to make a viable
impact on HT, you wouldn't see a big difference.

>>HT is merely a way to make the existing hardware more efficient. If it were
>>anything more, it would add -additional- hardware registers so the OS could
>>control the scheduling algorithm and specify the location of the ready queue. It
>>would also add instructions that would allow the processor to switch tasks.
>
>The processor _already_ is doing this.  But for processes that are ready to
>run rather than for processes that are long-term-blocked for I/O, etc.

Yes, but the scheduler's job is to pick who runs, when they run, and how long
they run. HT only affects the first by allowing the scheduler to pick two tasks
to run instead of just one. HT isn't replacing the scheduler; it only
complicates it.

FYI, HyperThreading looks like a regular CPU to the operating system. There may
be some means of communicating that it's an HT CPU, but Intel made HT
backward-compliant.

-Matt

Re: final note I presume Robert Hyatt 11:10:14 12/14/02
- Re: final note I presume Matt Taylor 02:53:37 12/15/02
  - Re: final note I presume Robert Hyatt 19:01:10 12/15/02
    - Re: final note I presume Matt Taylor 14:47:49 12/16/02
      - Re: final note I presume Robert Hyatt 20:22:32 12/16/02
        
        Re: final note I presume Matt Taylor 02:30:32 12/17/02
        
        Re: final note I presume Robert Hyatt 07:24:55 12/17/02

This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.