Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: SURPRISING RESULTS P4 Xeon dual 2.8Ghz

Author: Robert Hyatt

Date: 09:26:24 12/19/02

Go up one level in this thread


On December 18, 2002 at 13:59:00, Matt Taylor wrote:

>On December 17, 2002 at 22:43:42, Vincent Diepeveen wrote:
>
>>On December 17, 2002 at 12:50:48, Matt Taylor wrote:
>>
>>>On December 17, 2002 at 12:08:41, Vincent Diepeveen wrote:
>>>
>>>>On December 17, 2002 at 11:50:20, Matt Taylor wrote:
>>>>
>>>>>On December 17, 2002 at 11:25:10, Vincent Diepeveen wrote:
>>>>>
>>>>>>On December 17, 2002 at 10:58:51, Bob Durrett wrote:
>>>>>>
>>>>>>
>>>>>>Indeed you are correctly seeing that DIEP, which runs well on
>>>>>>cc-NUMA machines as well, is a very good program from intels
>>>>>>perspective, because even a 'second' processor on each physical
>>>>>>processor which runs slower will still give it a speedboost,
>>>>>>where others simply slow down a lot when you do such toying.
>>>>>>
>>>>>>So where many programs which will be way slower when running at
>>>>>>4 processes/threads at a 2 processor Xeon, the software is the
>>>>>>weak chain.
>>>>>>
>>>>>>In case of DIEP the bottleneck is the hardware clearly. Even
>>>>>>something working great on cc-NUMA doesn't profit too much from
>>>>>>the SMT/HT junk from intel.
>>>>>
>>>>>Clearly? It seems to me that memory is your bottleneck, and logical CPUs
>>>>>obviously don't help you there.
>>>>
>>>>for the SMT/HT the memory isn't my bottleneck at all. the fact that
>>>>it's not 2 real processors but something that has to wait for the
>>>>other each time is the problem.
>>>
>>>Actually it doesn't work like that. The CPU has an existing bandwidth of 3
>>>micro-ops/cycle. It is rare that x86 code utilizes this full bandwidth. Anyway,
>>>HT allows the CPU to run 2 threads literally at the same time. Literally.
>>>
>>>Thread 1 schedules 2 micro-ops but can't fit in the third due to the fact that
>>>its result is dependant on other things currently being computed. Thread 2 says,
>>>"Yay, it's my birthday!" and schedules its next micro-op. When the three
>>>micro-ops retire, the same thing happens again. If thread 1 enters a wait state
>>>(hlt, pause, memory wait, etc.), it's not scheduling any micro-ops. Thread 2 now
>>>has 3 micro-ops/cycle to utilize. Without HT, thread 2 executes a total of zero
>>>micro-ops. With HT, thread 2 executes a total of more than zero micro-ops.
>>>
>>>Now, I am no parallel researcher, but even my parallel code doesn't suffer
>>>overheads so large that it can't gain from HT.
>>
>>bob's optimistic claim about crafty (let's not go into
>>details too much here it's clear what my opinion is and what his is)
>>is that he gets 1.0 + 0.7 speedup for each cpu.
>>
>>So he assumes that each cpu loses 30%. We didn't talk about
>>the loss in nodes a second yet then... ...there is different
>>claims about that, depending upon tester and hardware (speedup
>>idem. there is also a 2.8 speedup of crafty measured at 4 processors
>>over 30 testsets; and far worse than 1.5 at a dual too
>>when the asymmetric code was turned off).
>>
>>So when we talk about a single program, in this case crafty, we already
>>see many different speedups. But if we use the most optimistic value
>>for crafty's speedup then SMT will already not help it speedup.
>>
>>Get the problem?
>
>No, I don't. The 30% loss is probably due to the fact that you can't keep all
>threads busy all the time. That isn't overhead. If you have a raw gain of 20% in
>HT and figures like Dr. Hyatt's, your HT-enabled program isn't going to go 10%
>slower.


The 30% loss is from "search overhead plus everything else".  IE a two-cpu
search will search a larger tree than a normal serial alpha/beta search on
one processor.  This larger tree means unnecessary work is being done.  The
1.7 number is a number that is machine-specific.  In my case, my quad xeon 700
was used to produce that.  Each processor loses 30% of its performance (not
counting the first) due to search overhead, processors sitting idle when they
should be searching, spinlocks, spinwaits, processor/memory interference, you-
name-it.

IE the 1.7X number I quote is the speed difference between running one thread
on a single 700mhz xeon, and running two threads on that same machine that can
run both in parallel.

Doing this for SMT is much more difficult.  I'd _like_ to be able to test
one cpu (two logical processors) vs two cpus (four logical processors) and
report a speedup there.  But I can't.  I would need to physically remove one
processor and I don't have a cpu terminator block to stick in its place.  If
I try to boot a linux kernel on one processor, it won't see the second logical
processor either.

So, at the moment, I really can't give meaningful 1 vs 2 numbers on my dual
2.8 _yet_.  But I might fix the boot-up code so that it only sees one physical
cpu and try that when I have time.

Comparing two HT cpus to one non-HT cpu makes little sense although it does
make Crafty's overall speedup go over 2.0.  But it is a bogus number.


>
>>I am confronted with 11.4% that i can win. So that would mean
>>that i need 0.9 for each cpu, something which is impossible to get
>>without forward pruning for me (and i'm not using forward pruning
>>currently).
>>
>>I do not claim a lineair speedup for DIEP at all.
>
>Now you're not even making sense. Because you gain 11% with Diep that means you
>have a 10% overhead?
>

"making sense" is _not_ a prerequisite for his statements....



>>when i get a great speedup at 2 processors (which DIEP does)
>>that doesn't mean that the same speedup can be achieved times 2
>>at 4 processors. No you lose simply.
>>
>>You keep losing always. A formula like 0.9(n-1)+1 for n=500 or
>>whatever is of course idiotic. It's already idiotic for 4 processors
>>in fact.


As I said before, .5 * (N-1) would be a _wonderful_ result for _any_
sort of algorithm, if it can really produce that speedup.  It _is_ a
linear speedup.  (I gave you the definition of linear earlier so that
should be clear that it means "straight line" and not "straight line
with a slope of 45 degrees."



>>
>>In my experiments till 30 processors so far i already concluded
>>that one has to split differently for > 4 processors than for
>>2 and 4 processors.
>>
>>>>>>Though it is a great sales argument, the hard facts (11.4%
>>>>>>speedboost) are not lying.
>>>>>
>>>>>11.4% doesn't lie for chess, or at least for Diep. Intel didn't advertise, "Wow!
>>>>>HT will make your chess programs run faster!" Intel said HT will get an average
>>>>>of 30-40% speed gain across applications on -average-.
>>>>
>>>>That is a typical marketing thing. they compare HT versus HT. So
>>>>2 processes HT versus 4 processes HT instead of  2 processes NON HT versus
>>>>4 processes HT.
>>>>
>>>>If you look to diep's speeds you'll see that
>>>>  181538 4 processes HT is a lot faster than 2 processes HT: 135924 nps.
>>>>
>>>>That's 33.6% speedup.
>>>>
>>>>However it is not a fair compare. The fair compare shows a 11.6% speedup.
>>>>
>>>>What was posted from crafty here was the unfair compare. No fair compare
>>>>was posted so far.
>>>>
>>>>Who is testing objectively here?
>>>
>>>You never said what "2 processes" was. Is it one physical CPU with HT or two
>>>physical CPUs without HT?
>>
>>the important test to measure speedup is
>>2 processes at a dual Xeon 2.8Ghz without HT
>>versus 4 processes at a dual Xeon 2.8Ghz with HT turned on.
>>
>>Vaste majority of those 30+% speedups are all comparisions of
>>2 processes with HT versus 4 processes with HT.

I haven't done that because I can't.  But it _is_ the right comparison.
One physical cpu or two physical cpus.


>
>Ok, but it's a % speed-up, so what's the difference? It means each CPU sped up
>about 30% and the overall system sped up 30%.
>
>He did that test anyway. He got around 20% as I recall.
>

On the bigger (worst-case parallel search positions) test, 20% was about
the number.  On the first test I ran, on 6 test positions, it was 30%.  I
have not yet tried to determine what produced the difference in the two
runs, but different positions definitely produce different results all the
time.



>>>Whether or not it's objective, nobody is going to listen if you don't do a good
>>>job of clearly organizing and reporting your data. You didn't list clock speeds,
>>
>>Look who's talking here.
>>
>>I bet you didn't hear about this yet. You can turn OFF and ON the HT
>>in the bios of course, like you can with some processors also
>>turn off and on L1 or L2 cache. That's whats done here; turning on
>>and off the Ht. You use of course the *same* system
>>for those tests to compare.
>>
>>Does the bell slowly ring?
>
>I already knew that, and no bells are going off. A Xeon 2.53 GHz acts a lot
>different from a Xeon 2.8 GHz, and they're even more different if you can change
>bus speed. It has always been important with numbers to post exact
>configurations. Otherwise people laugh at you because nobody can reproduce your
>numbers.
>
>>>bus speeds, memory types, chip types, configurations (HT vs. non-HT), or any of
>>>the other important information which is needed for anyone aside from you to
>>>make any sort of decision based on that data.
>>
>>In that case if i were you i would take into account with Bob that data he
>>posts he has invented himself sometimes. See ICCA start of 97. He has
>>publicly admitted here that he has written down the search time data
>>from his own head, in order to obtain the 'right' speedup.
>
>Ad hominem means what here?
>
>>You aren't guessing that i would do such a thing do you?
>>
>>Even the cpu of my big sponsor i'm not going to call 3 times faster than
>>a K7, because it ISN'T!!!!



That is absolutely hysterical.  You were talking about making _false_
statements to impress them when you first broached the subject about not liking
my DTS speedup numbers.  You wanted to claim null-move was the reason CB was
much better than your program.  That was false and was proven false.  So I
don't think "honesty" is in your list of traits, IMHO.



>>
>>I just present the truth and nothing than the truth!
>
>Whatever you present, it doesn't have any credibility because I can't reproduce
>it. That's what science is all about. I'll run your tests on my AthlonMP 1600
>system because I really have -no- idea what system you used. I'll get different
>numbers. Then I'll say that your numbers are invalid because I can't reproduce
>them.
>
>>Data = data. Simple as that. If something is not faster for me, then
>>i cry it out loud. People disbelief you except those 5 emails i get at home
>>that others who also tried SMT/HT at some P4 cpu's which were supposed
>>to have it, that it didn't produce a speedup either for them.
>>
>>I do not wonder why they don't post it here then, they get so many
>>reactions from complete idiots that they are amazed.
>
>Maybe they're smart enough to say nothing.

Or maybe they thought SMT/HT would speed up _any_ program, which it most
certainly will not, if the program is not threaded.

>
>Now, Dr. Hyatt has posted his system configuration along with the place you can
>buy it. Anyone with $4,600 can buy the same system, run the same tests, and
>duplicate his data.
>
>You even got a speed-up of 11.4%. You have said this many times. Yet HT doesn't
>work?
>
>>What i wonder most about is that my testing is pretty accurate.
>>
>>Forward pruning in DIEP or no forward pruning in diep?
>
>That's pretty irrelevant.

It is _completely_ irrelevant, in fact...


>
>>>>>>So they need to press 2 cpu's which results in a cpu price
>>>>>>2 times higher *at least* than an AMD cpu, the result
>>>>>>is that you win 11.4% in speed.
>>>>>
>>>>>Intel has always charged astronomical prices for their latest CPUs. HT isn't
>>>>>driving the price up. Intel doesn't like losing profits.
>>>>
>>>>>In 6 months, the Pentium 4 3.06 GHz will be in the $200-$300 range just like the
>>>>>Pentium 4 2.53 GHz is now. A year from now, it will cost $100-$200. Five years
>>>>>from now, it will be on keychains.
>>>>
>>>>>>Though i am not a hardware engineer, i can imagine the problems
>>>>>>they had getting this to work.
>>>>
>>>>>Yes, they had to build a mux and duplicate some components. The infrastructure
>>>>>has been there for the past 5 years.
>>>>
>>>>>>Instead of a P4-Xeon cpu clocked at 2.8Ghz which can split itself
>>>>>>into 2 physical processors, i would have preferred a P3-Xeon cpu
>>>>>>which splitted itself into 2 real processors (so each having its
>>>>>>own L1 and L2 caches) clocked at 2.0Ghz.
>>>>>
>>>>>They had trouble clocking the Pentium 3 above 1 GHz. It's been run at
>>>>>frequencies from 150 MHz (the slowest Pentium Pro that I recall ever seeing, but
>>>>>perhaps not the slowest) all the way up to 1.4 GHz. A design only scales so far.
>>>>>Wouldn't it be nice if you could buy 3 GHz Athlons? Athlon just won't run at 3
>>>>>GHz. Pentium 4 does because it's designed to. Pentium 3 wasn't even designed to
>>>>>hit 1.4 GHz; it wouldn't go much further anyway.
>>>>
>>>>Athlon only recently is converted to 0.13
>>>
>>>Yes, last February or so.
>>>
>>>>the reason why the P4 clocks so high is because they use such a small
>>>>L1 cache and a small trace cache (though compared to the data cache it's
>>>>huge).
>>>
>>>No. By that logic, the 386 which has -no- cache should clock higher than all of
>>>them.
>>
>>>The P4 clocks high because it has a deep pipeline. Circuits have latency. It is
>>>small, but it is there. You can only run a circuit so fast because the signals
>>>need time to propigate from one end of the gate to the other. Shrinking the
>>>circuit shrinks the traces, allowing the signals to get there faster. The end
>>>result is a CPU that can clock higher. There are two ways to do this: shrinking
>>>the process and lengthening the pipeline. Doing less work per cycle seems
>>>counter-intuitive, but they get around that by having more gates do less work
>>>per gate.
>>
>>Not being an expert on hardware: How deep is the pipeline of the McKinley
>>actually? It's very interesting for me to know what the potential
>>win could be from profile info needed for reordering!
>>
>>>>What i dislike a lot is the huge branch misprediction penalty. I'm not
>>>>a liar claiming that diep can get speeded up 2 times at the P4 when the
>>>>p4 would not have such a very bad branch misprediction penalty.
>>>
>>>Branch mispredictions almost never occur in well-written code. In poorly-written
>>>code, they're easy to get. The Intel C compiler estimates the probability that a
>>>branch will be taken and schedules it so the CPU will guess correctly as often
>>>as possible.
>>
>>This is a complete idiotic viewpoint on software. I get impression you
>>never wrote a byte of complicated software. Branch predictions happen
>>everywhere in searching software.
>>
>>In fact searching is only taking branches. All other instructions are just
>>overhead to take the right branch!
>>
>>Branch prediction is the keyword to faster cpu's in the future, the
>>pipelines are so deep and like a real layman i'll guess that in future
>>they might even get deeper, which means branch prediction is everything.
>>
>>If you search for the holy grail you'll have to try many doors before
>>you have it!
>
>Haha... I've written a number of benchmarks, some distributed programs, an
>operating system, etc. etc. I am currently writing a binary-level optimizer and
>self-optimizing programs. I've never written complicated software?
>
>Please read what I wrote. I said well-written software rarely MISPREDICTS. This
>is a claim that I'm fairly confident that I can make. It is rare that a day goes
>by when I am not writing assembly, and often times I find myself reading raw
>machine code.


For anyone that reads computing literature, your statement wouldn't raise any
eyebrows at all.  For someone that knows very little about program design,
compiler design, and software engineering principles, of course they won't get
your point.  Anybody can write bad code.  The opposite is also true if they
work at it.



>
>There is one exception to my statement -- emulators mispredict often, and there
>is nothing they can do about it.
>
>>>Branch mispredicts affect the Athlon, Pentium 3, K6, and Pentium processors as
>>>well. Believe it or not, an original Pentium can branch mispredict. It hurts
>>>with deeper pipelines because you have to flush the entire pipeline and do all
>>>that work over again. This is why Intel and AMD have both put extensive effort
>>>into making adaptive branch prediction counters. If there is any cycle to be
>>>found in a branch, their algorithms will find it. I looked at the algorithms
>>>in-depth a year ago, and I was quite impressed with the one on the P6 core.
>>
>>I am not. I find it a very pathetic form of predicting algorithms.
>>
>>People were very excited at the time by the 21264 branch prediction.
>>a 2 table concept. But in computerchess we already know it for 20 years
>>or so for predicting move ordering. Killertables we call it.
>
>Check your facts. The P6 uses a two-table approach, too.

Sorry, but he _never_ has any "facts".  Every fact is spelled "opinion".

>
>>Most pathetic is that AMD doesn't have its own compiler team (not that
>>i know at least).
>>
>>>>also 1 decoder for new instructions i do not understand at all.
>>
>>>Because the trace cache caches the decoded output. I don't understand how they
>>>get any performance out of that, but they obviously do somehow.
>>
>>small programs fit easily in 12k trace cache which gets executed
>>at 3 instructions at a time from trace cache. DIEP doesn't fit in
>>trace cache though. the win will be from loops i guess. I have many
>>general coded loops in DIEP.
>>
>>Where others code white and black, i have in for example my move
>>generator a loop that works both for black and white. that happens
>>a lot too in evaluation of course though at a different order.
>>
>>IN such a way i can imagine things are in trace cache. trace
>>cache is of course great for loops.
>>
>>Patterns you can evaluate cheap because you can logarithmically
>>evaluate them (meaning that you only need to evaluate a few patterns
>>out of the total without losing any clock to the others as a general
>>pattern cuts them away).
>>
>>diep uses (wastes according to Frans Morsch though i get impression
>>nowadays he's evaluating mobility too) a lot of system time
>>doing things like mobility. that code you keep looping over the
>>same code a lot of course.
>>
>>I guess here is where the P4 wins back terrain, of course directly
>>losing it there when branch mispredictions occur.
>>
>>How much clocks penalty does P4 give when a misprediction occurs
>>within code that still is in the trace cache?
>
>It's still large, but I don't know the figures.
>
>>>>Basically the P4 is a cpu where inefficient coding is getting rewarded.
>>>>
>>>>If you code very bad and need a lot of extra variables and instructions
>>>>to get something done then the number of branches get kept relatively
>>>>lower than a very efficient program which is doing a few instructions
>>>>but can't prevent a branch there because other code needs execution.
>>>
>>>...what? I've been doing x86 optimization for several years now, and I've never
>>>come across anything that claims the number of instructions means anything
>>>besides code size.
>>
>>you do not seem to realize how big the code is for a chessprogram like
>>DIEP.
>>
>> if( general pattern ) {
>>   .. skip all the patterns and code here ..
>> }
>>
>>get the idea slowly?
>
>OK, so you're trying to claim that more efficient checks reward the P4. Yes. It
>rewards all CPUs, so what's your point? It's good coding in general.
>
>>>It is the -clever- code that avoids branches. Branches hurt on any x86 CPU, not
>>>just Pentium 4. It takes heavily tweaked code to make the Pentium 4 run
>>>efficiently. I don't see how inefficient coding is getting rewarded by the
>>
>>all chessprogrammers know this already for like 8 years at least.
>>
>>You really think they know little from software?
>>
>>the best programmers in the world are amongst the chessprogrammers!
>>
>>Of course everyone focussed upon removing the number of branches!
>>
>>But you can't avoid them. chess is branches simply. it's the same thing.
>>
>>You want to chose between move A or B. So that's already a branch
>>in itself!


But a _different_ kind of branches.  The alpha was the first cpu to eliminate
many branches with a conditional move.  Others eliminated the flag register as
that is a pipeline bottleneck.  You can ask questions _and_ answer them without
a single branch.

a = (x == y) ? b : c;

is an example...





>
>You want to select the minimum between X and Y. A branch? No. There are many
>algorithms that make choices without branching.

and many special-purpose instructions that were designed so that the compiler
could optimize-away the branches.


>
>>>Pentium 4 since the Athlon blows away the Pentium 4 in unoptimized code.
>>
>>Guess why chesssoftware without exception blows away the P4 at the K7.
>>
>>Note that the K7 isn't holy at all. It has a few weak spots which
>>show clearly in the code produced by compilers nowadays. It's hard to
>>find modern compilers that do very well for the K7.
>>
>>the free gcc 3.2.1 is producing fastest code for diep at the k7.
>>that's in itself a bloody shame that a free compiler is producing
>>the best code!
>
>Not really; as an open-source compiler, K7 gurus can contribute optimizations to
>it. Most other compilers are closed-source, and people can't contribute. When
>Intel is still standard, why would someone market a compiler for AMD?
>
>>>>Replacing branches by extra instructions is simply not possible anymore,
>>>>because already when the pentiumpro came out, i already started slowly
>>>>avoiding branches whenever i could. I had that thing around end of 1996
>>>>if memory serves me well.
>>>
>>>?
>>>
>>>I was writing MMX code just yesterday to simulate long arithmetic including a
>>>64-bit x 64-bit multiply. MMX does not support conditional branching. You have
>>>to play cute little games such as, "When x = 0, y = -1, so I'm going to subtract
>>>y from x."
>>
>>MMX is not so fast on the P4 when compared to K7 or P3. Why would they
>>have chosen for that? I do not understand it at all!
>
>This is news to me...
>The only slower units on a P4 that I found were the FPU and floating-point SSE
>(marginally slower).
>
>>>I've seen non-branching implementations for all sorts of basic functions such as
>>>popcount, min, max, and abs. In fact, there was a lengthy thread on this forum a
>>>couple weeks ago about optimal popcount/bitscan functions. The branching one
>>>came in dead last.
>>
>>to give a random pattern (non existing but presenting the
>>branching problem):
>>
>>if( board[sq_a2] == bishop && OwnsQueen[side] && !OwnsQueen[xside]
>> && KingSafetyTop5[white] <= 5 && (row(sq) >= 3 || (sq&1)) ) {
>>  score += KINGSAFETY_BISHOPBAD_A2;
>>  VERBOSEEVAL(side,"King safety from %s is bad because of bla bla\n",
>>   KINGSAFETY_BISHOPBAD_A2,sq);
>>}
>>
>>How to *ever* remove all these branches?
>
>score += KINGSAFETY_BISHOPBAD_A2 * (board[sq_a2] == bishop && OwnsQueen[side] &&
>!OwnsQueen[xside] && KingSafetyTop5[white] <= 5 && (row(sq) >= 3 || (sq & 1)));
>
>If conditional evaluates to false, in integer context it is 0. Otherwise you
>make it 1. Multiply by the safety constant and add. Is that so hard?

For some it seems nearly impossible.  :)


>
>>Note that the first condition is very unlikely. A bishop on a2 doesn't
>>happen a lot. So that's why i would put this first in this pattern.
>>
>>This would be in my top10 of the easiest patterns in my chessprogram.
>>
>>Of course the bishop on d3 being on 1 forever as the pattern that
>>with branch scores bigtime.
>>
>>I hope you also must realize another aspect of gameprogramming.
>>readability is very important. If i cannot read my own patterns
>>then i can stop working at DIEP.
>>
>>I know a few patterns in diep which with some very artificial
>>toying would get unreadable most likely but remove 1 or 2 useful branches.
>>
>>Like in the above example the bishop on a2 is very unlikely, optimizing
>>many variables to a single one is simply not very clever from
>>readability viewpoint.
>>
>>>>>>That would have kicked anything of course from speed viewpoint as
>>>>>>it scales 1 : 1.2 to a K7 (k7 20% faster for each Ghz than the P3).
>>>>>>
>>>>>>Now we end up with a very expensive cpu which is 1 : 1.4 and a bad
>>>>>>working form of HT/SMT.
>>>>>>
>>>>>>So it's not DIEP having a problem here. But the hardware very clearly.
>>>>>>Intel optimistically claims 20% speed boost here and there. Others
>>>>>>claim 11% for database applications.
>>>>>>
>>>>>>I see 11.4% for DIEP. So that's a market conform viewpoint.
>>>>>>
>>>>>>The not so amazing thing of this all is that a 2.8Ghz Xeon being not
>>>>>>deliverable yet here is very expensive (even a 3.06Ghz P4 is already 885
>>>>>>euro in the shops here also not yet deliverable) and the MP2200 which
>>>>>>DOES get offered for sales here is 290 euro. the fastest Xeon i see
>>>>>>getting offered socket 603 is a 2.0Ghz Xeon for 829 euro at alternate.nl
>>>>>>
>>>>>>a dual motherboard for the P4 i see here is several:
>>>>>>  789 euro for a dual xeon motherboard called: 860d pro (msi)
>>>>>>  549 euro for a tyan S2720GN is by far the cheapest i see
>>>>>>
>>>>>>then you gotta buy ecc registered DDR ram for it.
>>>>>>
>>>>>>a dual motherboard for K7 i see at the same alternate.nl is:
>>>>>>  259 euro for A7M266-D/U
>>>>>>  299 euro chaintech 7KDD (dual; U-DMA/133 RAID en sound)    AMD-762MPX
>>>>>>  289 euro tiger MPX S2466N-4M
>>>>>>
>>>>>>The last mainboard (tiger) for sure needs registered DDR ram. but lucky
>>>>>>not ECC ram.
>>>>>
>>>>>AMD is always cheaper than Intel for the same level of performance.
>>>>
>>>>if you look how huge that P4 chip is compared to the AMD chip it is not
>>>>a miracle either.
>>>>
>>>>knowing AMD has just 1 0.13 factory versus intel a lot it is not a miracle
>>>>either that in the future this will remain the same.
>>>
>>>The P4 is more expensive to produce because Intel needs wider wafers and gets
>>>lower yields. However, it does not cost them $700 per chip.
>>
>>costs of a processor are not only production costs!!
>>
>>>>>Also, I own a TigerMPX S2466N-2M (only difference being that they don't mind
>>>>>telling me to eat a PCI slot for USB). At one point I only had 1 256 MB
>>>>>unregistered/non-ECC DIMM because my other 512 MB unregistered/non-ECC DIMM had
>>>>>failed. I finally replaced both with a single 1 GB Registered/ECC DIMM.
>>>>>
>>>>>If anyone wants to send me a digital camera, I'll take pretty pictures of the
>>>>>BIOS screens, my unregistered DIMM, and a working TigerMPX system on
>>>>>unregistered ram.
>>>>
>>>>not all unregistered DIMMS do not work for a system requiring registered
>>>>dimms. I can give you the names of 3 persons with problems with a Tiger
>>>>(not sure they had MPX chipset though but the older tiger MP760 chipset
>>>>i guess) who after a few days had severe stability problems with it and
>>>>weird crashes each week or so.
>>>
>>>Weird crashes = user problem or bad memory. I've had 2 out of my 4 DDR SDRAM
>>>DIMMs go bad. I've yet to see bad RDRAM (I have also seen very little of it),
>>>and I had -1- bad SDRAM DIMM once.
>>
>>in all cases registered ddr ram worked.
>
>Ok, and that eliminates the possibility of bad ram?
>TigerMP manual says nothing about unregistered DIMMs, so it's possible that it
>won't accept them.
>
>>>>>>It is amazing how many professors and others still throw away money
>>>>>>to get that dual 2.8Ghz P4 which is over 2 times more expensive than
>>>>>>AMD dual at the moment is.
>>>>>
>>>>>Money grows on trees for some people. It is amazing how my coworkers convinced
>>>>>management to purchase machines with Radeon 9700 Pro graphics cards for "work."
>>>>>These cards were 20% of the cost of the whole machine at around $350 USD per
>>>>>card.
>>>>
>>>>right ;)
>>>
>>>I double-checked my estimation and $350 / $2000 = 17.5%, so I was pretty close.
>>>Again, I'd take pretty pictures, but I don't have a digital camera.
>>>
>>>I would also offer an explanation for the ATI Radeon 9700 Pro in my work
>>>machine, but I can't fathom the logic myself. "Let's put the best video card
>>>money can buy in their machines and ask them not to use it to play games."
>>>Right.
>>
>>the best equipment for my personnel is what i basically would use there.
>
>And a high-end graphics card accomplishes what in software reverse-engineering
>research? Is it worth paying $350 instead of $30 for a graphics card that
>contributes nothing to our research?
>
>-Matt



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.