[libvirt] [PATCH 3/3] tests: deterministichash: Make hash tables arch-independent

Tue Aug 15 13:03:07 UTC 2017

On 08/15/2017 02:01 PM, Peter Krempa wrote:
> On Mon, Aug 14, 2017 at 20:46:10 -0400, John Ferlan wrote:
>> On 08/03/2017 04:50 AM, Peter Krempa wrote:
> 
> [ trimmed off-topic part ]
> 
>> NB: I didn't dig into the qemumonitorjsontest algorithm, but...
>>
>> While going through the common object changes, I ended up looking at and
>> thinking about the hash table algorithms, initially it was just the
>> "grow" algorithm as I think it is a bit "aggressive" (perhaps wasteful)
>> since as soon as 1 bucket chain exceeds 8 elements, the table is resized
>> by 8 unless the new size is 8 * 2048 - at which time no matter how full
>> a bucket is we don't resize the table
> 
> Resizing the table is very expensive, so it makes sense to scale it
> aggresively. At some point though it would take too much memory.
> 
> If some place in the code expects massive hash table, it can set the
> hash table to > 2048 manually.
> 
>> The next thing I considered was using a prime number as the table bucket
>> size and it seems by just doing that will cause qemumonitorjsontest to
> 
> What would be the benefit of such change? I don't really see how prime
> numbers would improve anything performance-wise.

Because if your hash function is stupid it can cluster all the keys
together. Primes, however, have way fewer divisors, therefore clustering
is harder to achieve. Of course, you can construct a hash function so
that it's utterly stupid (e.g. {return 4;}) and then all of our
optimizations are thrown out of the window. But I believe that using
prime numbers for table size is advised in the literature too.
Apparently, wiki mentions this fact too:

https://en.wikipedia.org/wiki/Hash_table#Choosing_a_hash_function

Michal