Optimize RedisValue / Equals#3116
Conversation
|
One word: Marvin You may already know, but "Marvin" is the randomising voodoo inside most recent string hashing / equality implementations that prevents dictionary attacks using precalculated predictable known values to force collisions (hash-flooding). Marvin introduces entropy per-process, preventing that. It is hugely desirable to retain Marvin semantics by default. Longer version: if you're using BLOBs of any non-trivial value, you're unlikely to be using them as keys on dictionaries, or otherwise performing equality checks, so honestly: IMO: don't worry about it, it doesn't adversely affect you. You're probably just slinging BLOBs around, (de)serializing them, etc. Perhaps what we really want here is a couple of pre-rolled Note: it would actually not be a bad idea to try to retain some semblance of Marvin in a BLOB-based implementation, but that's a stretch goal. |
|
I suggest separating the tasks. Hash calculation depends on the algorithm, and UTF-8 hash differs from Unicode hash. However, this doesn't affect the Equals and StartsWith methods. |
|
Another question is, why don't you use Marvin in RedisKey? |
|
Fair question. I suspect there's a lot of validity in the suggestion here. We need to be careful, but: string -> byte is at least well defined, where-as byte -> string isn't (in the general case). I might see if I can drag the private Marvin core over - I did the same when over at aspnet for netfx, so I'll see if I can find it. |
|
The method |
|
The GetHashCode question turned out to be complicated, so I suggest not considering it in this PR |
|
Happy to keep looking at this, but I think it needs some planning and thought, and when we're talking about "optimize", proving that might we worthwhile; I've (separately) done some benchmark comparing System.Text.Encoding.UTF8 with System.Text.Unicode.Utf8 - summary: they use the same core, and it isn't worth getting excited about the delta (which would also need an OOB package for down-level), so: don't touch that. The core idea is sound. |
|
Ok. I understand. It's not a matter of principle. Am I closing the merge? |
|
What about the bug in the method // BUG if Not ASCII
if (s.Length < value.Length) return false; // not enough characters to matchTest failed: var str = "моя строка";
RedisValue orig = str;
Assert.That(orig.StartsWith(str), Is.EqualTo(orig.StartsWith(Encoding.UTF8.GetBytes(str)))); |
|
Don't close it - and yes, the bug needs fixing; I'm just setting expectations that this is going to need some full attention. |
I suggest converting the string to utf8 bytes and comparing the blobs. The tests work.
My position is that if a person uses strings, let their productivity suffer, not the one who chose the blobs.
I understand that the hash of a string and the hash of a string inside a RedisValue are different, but I don't think this is necessary.
I also found a bug in the
StartsWith(ReadOnlySpan<byte> value)method for non-ASCII characters.The test below is failing.