Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Population of disjoint Attacksets

Author: Gerd Isenberg

Date: 03:27:47 05/29/04

Go up one level in this thread


On May 28, 2004 at 17:13:44, Dieter Buerssner wrote:

>>Nice, is it faster?
>
>I did not test. I guess, your method will be a bit faster. I had hoped, you will
>try it out. I really do not have an application, that could use a population
>count for 10 64-bit words. You seem to have one ...

Ok, i postpone my saturday morning business a bit and i did a dumb loop test on
my Athlon32 box, msc6 speed optimization.

The first time reported includes determineAttackCounts but only one popCount
instead of four:

only popCount inlined:

disjointAtta[0] 0x00261E2752F60985
disjointAtta[1] 0x22972E1520AD7E1D
disjointAtta[2] 0x28D2779416DD6DC4
disjointAtta[3] 0x0476011950393E31
disjointAtta[4] 0x22F166AD0BB53958
disjointAtta[5] 0x51F07C9354976532
disjointAtta[6] 0x4819052B70D108C0
disjointAtta[7] 0x25FD7E16098E024E
disjointAtta[8] 0x0348489B420B52F5
disjointAtta[9] 0x5C3B314930A80363
==================================
attackCounts[0] 0x4F835E6EBC07F91E
attackCounts[1] 0x5F6024302D5B024A
attackCounts[2] 0x20EF7F9E523D7FF5
attackCounts[3] 0x0010000100800000
time determineAttackCounts = 73.345 sec/10**9 runs
time 8,4,2,1-popCount      = 99.924 sec/10**9 runs
time Dieter's pc_10bb      = 85.292 sec/10**9 runs
pc1=-1298309357 pcg=-2126923520 pcd=-2126923520


determineAttackCounts, popCount and pc_10bb inlined:

disjointAtta[0] 0x00261E2752F60985
disjointAtta[1] 0x22972E1520AD7E1D
disjointAtta[2] 0x28D2779416DD6DC4
disjointAtta[3] 0x0476011950393E31
disjointAtta[4] 0x22F166AD0BB53958
disjointAtta[5] 0x51F07C9354976532
disjointAtta[6] 0x4819052B70D108C0
disjointAtta[7] 0x25FD7E16098E024E
disjointAtta[8] 0x0348489B420B52F5
disjointAtta[9] 0x5C3B314930A80363
==================================
attackCounts[0] 0x4F835E6EBC07F91E
attackCounts[1] 0x5F6024302D5B024A
attackCounts[2] 0x20EF7F9E523D7FF5
attackCounts[3] 0x0010000100800000
time determineAttackCounts = 28.941 sec/10**9 runs
time 8,4,2,1-popCount      = 54.028 sec/10**9 runs
time Dieter's pc_10bb      = 85.122 sec/10**9 runs
pc1=-1298309357 pcg=-2126923520 pcd=-2126923520

Hmm, those dump loop tests are really not reliable!
Should i try performance counter and cpuid?

Anyway determineAttackCounts(10) takes 34 (6*5 + 2*2) bitwise 64-bit
instructions that are 68 32-bit instructions! And on x86-32 there are a lot of
store/loads with local vars. That code cries for 64-bit and a few more registers
as well.

One 64-bit popcount takes about 21 32-bit instructions.
But the O(log(n)) effect of using determineAttackCounts is probably smaller, due
to the number of odd,maj pairs (10,4 32-bit instructions) increases
exponentially as well, eg. for saving 6 popcounts it takes eight odd/maj pairs.

My primary intention was not aggregated bitcounting per se, but to get those
2**0,2**1,2**2 and 2**3 digit bitboards, where each bit on the same bitposition
represents a 4-digit number from 0..10 in this case.

Cheers,
Gerd

Here the main routine to produce the above results.

#define MAX_ITERATIONS 1000000000 // 10**9

int main(int argc, char* argv[])
{
	BitBoard disjointAtta[10]; // up to 15
	BitBoard attackCounts[4];
	clock_t start0, stop0, start1, stop1, start2, stop2;
	int i, pc1 = 0, pcg = 0, pcd = 0;

	initDisjointAttacks(disjointAtta);
	for (i = 0; i < 10; ++i)
	{
		pc1 += popCount(disjointAtta[i]);
		printf("disjointAtta[%d] 0x%08X%08X\n", i,
		high(disjointAtta[i]), low(disjointAtta[i]));
	}

	start0 = clock();
	for (i = 0 ; i < MAX_ITERATIONS; i++)
	{
		determineAttackCounts(attackCounts, disjointAtta);
		pc1 += popCount(attackCounts[3]);
		disjointAtta[0]++;
	}
	stop0 = clock();

	initDisjointAttacks(disjointAtta);

	start1 = clock();
	for (i = 0 ; i < MAX_ITERATIONS; i++)
	{
		determineAttackCounts(attackCounts, disjointAtta);
		pcg += popCount(attackCounts[0]);
		pcg += popCount(attackCounts[1])*2;
		pcg += popCount(attackCounts[2])*4;
		pcg += popCount(attackCounts[3])*8;
		disjointAtta[0]++;
	}
	stop1 = clock();

	initDisjointAttacks(disjointAtta);

	start2 = clock();
	for (i = 0 ; i < MAX_ITERATIONS; i++)
	{
		pcd += pc_10bb(disjointAtta);
		disjointAtta[0]++;
	}
	stop2 = clock();


	printf("==================================\n");
	for (i = 0; i <  4; ++i)
	{
		printf("attackCounts[%d] 0x%08X%08X\n", i,
		high(attackCounts[i]), low(attackCounts[i]));
	}

	printf("time determineAttackCounts = %.3f sec/10**9 runs\n", (float)(stop0 -
start0) / CLOCKS_PER_SEC);
	printf("time 8,4,2,1-popCount      = %.3f sec/10**9 runs\n", (float)(stop1 -
start1) / CLOCKS_PER_SEC);
	printf("time Dieter's pc_10bb      = %.3f sec/10**9 runs\n", (float)(stop2 -
start2) / CLOCKS_PER_SEC);
	printf("pc1=%d pcg=%d pcd=%d\n", pc1, pcg, pcd);

	getchar();
	return 0;
}




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.