Jay Taylor's notesback to listing index
PHP :: Bug #70644 :: trivial hash complexity DoS attack[web search]
Original source (bugs.php.net)
Clipped on: 2016-08-09
|php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login|
go to bug id or search bugs for
[2015-10-05 15:34 UTC] email@example.com
Description: ------------ As reported here: https://bugs.debian.org/800564 by brian m. carlson: Applies to all PHP versions. PHP uses the DJB "times 33" hash to hash strings in its hash tables, without the use of any secret key. Hash values are therefore the same between multiple invocations. As a result, it's trivial to precompute a set of values that all hash to the same bucket and cause positively abysmal performance. If a script accepts untrusted hash keys, such as from JSON input, it is subject to a DoS attack. PHP implemented the max_input_vars option, but this is not effective in the general case, especially in the era of JSON-laden POST requests. Perl, Python, and Ruby have all addressed their CVEs properly, but PHP has not and as a result is still vulnerable. Cloning my example repository and running "php scripts/exploited.php < example/1048576.json" demonstrates the problem very quickly. The similar Perl and Python scripts are not vulnerable to this attack. A JSON file containing only 65536 entries takes PHP 5.6 22 seconds to process. A new CVE should probably be allocated and the bug should be fixed correctly this time, probably by seeding a key from /dev/urandom and using SipHash-2-4 or the like. Python had CVE-2012-1150 and CVE-2013-7040. Ruby had CVE-2011-4815. I can't find a CVE for Perl's 2003 fix, if one exists. The fix, which went into 5.8, was incomplete and was addressed by CVE-2013-1667.  https://github.com/bk2204/php-hash-dos Test script: --------------- https://github.com/bk2204/php-hash-dos
[2015-10-06 07:26 UTC] phpmpan at mpan dot pl
Note: there already exists a PHP extension for SipHash . Unfortunely of unknown licence, but it is based on public domain version , which also may be used as a quick fix. Both of them linked from Aumasson's website.  <https://github.com/jedisct1/siphash-php>  <https://github.com/floodyberry/siphash>
[2015-10-06 07:39 UTC] firstname.lastname@example.org
The main problem with switching to SipHash is that hashtables in PHP very often use integer keys. While we could probably start using SipHash for string keys (we need to test the performance impact here), I don't think it is realistic for integer keys. I don't know if there's some alternative, very cheap cryptographic hash for that case. The simplest "solution" to this problem I can think of is to keep track of collisions and bail out if some function of the number of collisions and hashtable elements becomes too large. It would likely be enough to count collisions only during rehashes and check them at the end of the operation, so there should be no performance concerns here.
[2015-10-06 11:26 UTC] email@example.com
Looks like they has a same problem in Rust language and the solution was to use FNV hashing: https://github.com/rust-lang/rust/issues/10586 https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function Perhaps that would a way to go for PHP as well?
[2015-10-06 11:32 UTC] firstname.lastname@example.org
Further investigation shows that at least in Python it was shown that FNV hashing is not enough to protect against collision attacks and Python has converted everything to SipHash-2-4: https://www.python.org/dev/peps/pep-0456/
[2015-10-07 16:24 UTC] email@example.com
Here's a branch for testing siphash (doesn't use a key, just perf experiment): https://github.com/php/php-src/compare/master...nikic:siphash From a few quick tests, SipHash doesn't seem to have significant impact on our standard benchmarks (though I didn't do instruction counts). On a synthetic benchmark (inserting English dict in a hashtable) I got something like 10% slowdown. So for string hashes, this is an option. The integer hash problem stays though...
[2015-10-07 16:39 UTC] firstname.lastname@example.org
Also, the idea about counting collisions during rehash won't work either. It won't be able to distinguish between few collisions on many elements and many collisions on few elements. So for size=8 both keys 0 1 2 3 8 9 10 11 and 0 1 2 3 8 16 24 would result in 4 collisions, but only the latter kind would be problematic (for larger sizes of course). We'd have to do per-hash-slot counts, which would no longer be cheap.
[2015-10-08 09:45 UTC] email@example.com
-Package: *Encryption and hash functions +Package: Scripting Engine problem
[2015-10-08 09:45 UTC] firstname.lastname@example.org
Here is an implementation for collision counting: https://github.com/php/php-src/compare/master...nikic:collisionCount Collisions are counted during insert operations, if they exceed a certain level, you get a fatal error. From some initial testing, this doesn't seem to have significant perf impact. One problem is that this approach doesn't work with add_new, so I had to rewrite some code to move away from it.
[2015-10-08 09:56 UTC] email@example.com
One way to potentially make the integer hashing feasible, is to introduce an intermediary step between packed hashes and unpacked ones: Packed hashes only work if the numeric array is both dense and sorted ascending by key. We could add another level that works for dense numeric arrays in general, but does not require a certain key order. In this case the data array would contain the elements in arbitrary order and the hash array would map key to order. With that extra level, we'd only start hashing keys for sparse numeric arrays. Allowing integer hashes at all will require us to switch to a key union. This may negatively affect array iteration.
[2015-10-08 22:52 UTC] firstname.lastname@example.org
Just looked at the HHVM implementation. Looks like they are also doing collision counting during find-phase of insert operations: http://lxr.php.net/xref/OTHER_IMPLEMENT/hiphop-2.0/hphp/runtime/base/array/hphp_array.cpp#555 It took a while, but here's an implementation of using siphash for everything (including integers): https://github.com/php/php-src/compare/master...nikic:integerHash
[2015-10-10 16:14 UTC] email@example.com
Just got around to doing some microbenchmarks. For switching everything to siphash (incl integer hashes): SipHash Baseline Packed int insert: 16.9 mus +/- 0.3 mus 17.1 mus +/- 0.1 mus -1.0% Reverse int insert: 49.0 mus +/- 0.4 mus 17.3 mus +/- 0.2 mus +182.7% Reverse int insert x16 collisions: 54.0 mus +/- 1.1 mus 43.3 mus +/- 0.2 mus +24.6% Incrementing string insert: 62.6 mus +/- 0.7 mus 44.8 mus +/- 0.3 mus +39.6% This means unpacked integer hashtable is about 3x slower. The break-even point with the previous implementation is somewhere between 16x and 32x collisions per bucket. For very short strings w/o hash cache we have 40% slowdown. It's unlikely that this will be deemed acceptable even with more optimization work (like the semi-packed variant mentioned above). So we should probably go with the collision counting during insert.
Copyright © 2001-2016 The PHP Group
All rights reserved.
|Last updated: Mon Aug 08 19:01:39 2016 UTC|