|
8f8bc0dcf3bc253ae49159d52db049767f476ced "Move string hash function into String
class" had introduced a new getHash64 that, besides returning sal_uInt64 instead
of just sal_Int32, didn't do sampling of only a handful of characters, but
always computed the hash over all characters (as the usage in SfxItemSet and
SdPage appears to require for either performance or approximated correctness).
However, it would be advantageous to keep the stable URE interface as small as
possible. Now, O(1) sampling was apparently considered state of the art when
the rtl string classes were first created, closely copying java.lang.String,
which at that time demanded sampling for hashCode(), too---but never sampling
more than 15 characters, with the obvious (in hindsight, at least) performance
catastrophes, so they changed it to O(n) somewhere along the way.
Based on that, this commit changes the existing hash functions to not do
sampling any more, and removes the newly introduced -64 variants again. (Where
the extended value range of sal_uInt64 compared to sal_Int32 was hopefully not
vital to the existing uses.)
The old implementation used sampling only for strings of length >= 256, so I did
a "make check" build with an instrumented hash function that flagged all uses
with inputs of length >= 256, and grepped workdir/{Cppunit,Junit,Python}Test for
hits. Of the 2849 hits encountered, 2845 where in the range from 256 to 295
characters, and only the remaining four where of 2472 characters. Those four
were from CppunitTest_sc_subsequent_filters_test, importing long text into a
cell, causing ScDocumentImport::setStringCell to call
svl::SharedStringPool::intern, which internally uses an unordered_set. These
results appear to justify the change.
Change-Id: I78fcc3b0f07389bdf36a21701b95a1ff0a0d970f
|