2007-09-08

Magic popcount (popcnt) command

From Frank de Groot blog:
Every serious hacker sooner or later needs the popcount instruction.
This "population count" instruction counts the set bits in a register, and is so useful that the NSA demands that all computers they purchase implement it in hardware.


But this command is not present at x86 architecture. AMD64 documentation (p. 188) supports popcount extension, but I never heard of any processor that implements this feature.

You may wonder why would one need popcount instruction. I needed it for a very simple task. I wanted to change data structure from bitfield to list of bit positions. Basically I needed something like this:

0x1001 -> [0, 12]
0xF000 -> [12,13,14,15]
The simplest solution is to iterate over every bit and check if it's set. More complicated algorithm could count number of set bits and index possible results using this number. I wonder if that approach is faster. It's not so simple to say, because low level performance issues could have great role in this problem.

I wonder what is the fastest possible method for this problem. I bet that popcount instruction is the key to efficient implementation.

I found some interesting popcount implementations in software.
GCC approach, sum bits for every byte:
const UQItype __popcount_tab[] =
{
0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4,1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,
1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,
1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,
2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,
3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,4,5,5,6,5,6,6,7,5,6,6,7,6,7,7,8,
};

int __popcountSI2 (UWtype x)
{
UWtype i, ret = 0;
for (i = 0; i < W_TYPE_SIZE; i += 8)
ret += __popcount_tab[(x >> i) & 0xff];
return ret;
}
Method based on subtraction, from GnuLib:

int popcount(unsigned int x){
int pop;
for (pop = 0; x != 0; pop++)
x &= x - 1;
return pop;
}
AMD64 Optimization Guide (p. 137): (they have also mmx version)
unsigned int popcount(unsigned int v) 
{
unsigned int retVal;
__asm {
MOV EAX, [v] ;v
MOV EDX, EAX ;v
SHR EAX, 1 ;v >> 1
AND EAX, 055555555h ;(v >> 1) & 0x55555555
SUB EDX, EAX ;w = v - ((v >> 1) & 0x55555555)
MOV EAX, EDX ;w
SHR EDX, 2 ;w >> 2
AND EAX, 033333333h ;w & 0x33333333
AND EDX, 033333333h ;(w >> 2) & 0x33333333
ADD EAX, EDX ;x = (w & 0x33333333) + ((w >> 2) & 0x33333333)
MOV EDX, EAX ;x
SHR EAX, 4 ;x >> 4
ADD EAX, EDX ;x + (x >> 4)
AND EAX, 00F0F0F0Fh ;y = (x + (x >> 4) & 0x0F0F0F0F)
IMUL EAX, 001010101h ;y * 0x01010101
SHR EAX, 24 ;population count = (y * 0x01010101) >> 24
MOV retVal, EAX ;store result
}
return (retVal);
}
There is also interesting topic aboutpopcount at programming.reddit.com. Yet another source, John-Mark Gurney tips about popcount implementaion.

0 comments: