Looks like someone managed to extract a copy of linkedin’s password hash database today.
Didn’t take me long to get my hands on a copy of the file. Here’s what
- The file contains password hashes, but not user IDs. I guess it’s intended to demonstrate that the hashes got exfiltrated and that users pick weak passwords, rather than to compromise actual users.
- The hashes are sha-1 without a salt. Forgetting to salt password hashes is a novice mistake – I’m surprised linkedin did that and in their shoes I would remediate that immediately. A simple way to do so would be to rehash every user’s password at successful login time, using a 64-bit random salt.
- There are 6.5 million hashes in the file…
- It looks like the file has passed through “sort|uniq” — i.e., each hash appears exactly once. Nothing to learn here about password popularity, alas.
- Since the hashes were not salted, I was able to write a script to SHA-1 hash every password in a common dictionary and search the file for matches. That took all of about 10 minutes and yielded almost 20,000 passwords. No big surprise there (out of 6.5M different passwords, 20,000 are common words).
- I’m sure that if I spent some time with it, I could do what crack used to do — add digits at the front and back, implement various capitalization rules and in general permute my dictionary to get many more hits. Probably not worth my while though, since the bit that would really make it interesting is missing (frequency of occurrence of each unsalted hash).
So lessons learned from all this:
- Vendors: use best practices. Protect your password DB and salt password hashes.
- Users: assume that your logins on at least some public web sites will eventually be compromised to some degree. Some people are screaming “change your password!” or “choose a strong password!” but really — you should just keep in mind the risk level of compromise of any given account. Personally, I wouldn’t care all that much if my linkedin account got compromised, but others might be more concerned than me.
- It doesn’t look like anyone really malicious was involved here (they didn’t publish the user IDs, after all) but you have to wonder what would happen if this really was malicious?!