One of my pet peeves when talking with other programmers is when they use the wrong terminology. One of the most common ones that comes up for me is the issue of Encryption, and most of the time people are not encrypting, but hashing. And yes, there is a distinction.
What Hashing Is
Hashing is a mechanism for figuring out if two things are similar and is a one-way process. You take an object (file, string of text, ISO) and convert it to a fixed length string. You can then use this key to see if something else is the same thing.
The most common example of this is with large downloads. All the major Linux distributions will give an MD5 hash for their downloads so that you can verify that the file was not corrupted during transmission. You can run the ISO file through an MD5 generator which will give you back a 32 character string. If that string matches what Canonical says was their MD5, you have a match and your ISO is good.
In programming, one of the most common methods for hashing is password. In this case it is done for two reasons:
You never, ever store passwords in Plaintext in a database
You never should care what the user's password is
Since #2 means you never need to know what the hash stands for, Hashing is a light-weight alternative to encryption while still providing security.
$password = md5($_POST['password']);
$username = $_POST['username'];
$result = mysql_query("SELECT username, password FROM user_accounts WHERE password = '$password' AND username = '$username'");
if( mysql_num_rows($result) == 1 ) {
echo 'Found a proper account!';
} else {
echo 'Invalid username and Password';
}
Salts
No, not Bacon Salt. Salts are additional bits of text you add to something before hashing to prevent someone from cracking the hashes that you are using.
$salt = 'ThisIsAReallyLongAndSecureString';
$hash = md5($_POST['password'] . $salt);
echo md5($_POST['password']) . ' != ' . $hash;
Salts are only useful if you are trying to use Hashing for security reasons, like storing passwords. If you are looking for just verification then a salt is useless. If you are using hashing for security purposes, ALWAYS USE A SALT.
Proper Hashing
Prior to PHP 5.1.2, the two most common ways to hash a file was to use either of the built-in hashing functions for MD5 or SHA1. Due to the fact that MD5 and SHA1 are considered insecure I don't recommend using them even with a salt. You can generate a hash very easily using either MD5 or SHA1 and it should work on every single platform.
$sha1 = sha1('This is a string');
$md5 = md5('This is a string');
Since 5.1.2 PHP has added a proper hashing library that allows you to take advantage of more powerful algorithms. It is turned on by default, but if PHP is compiled by hand it can be disabled.
You can find out what algorithms are installed on a system by running the hash_algos() function. This returns an array of different algorithms that are installed on the system and that are available for use. These are also the names that can be used with the companion function hash(). So, if you wanted to store a password in the database the proper way using a hash, you would do something like the following:
// Our Salt
$salt = 'SuperSecretSaltNoOneWillEverFindOutAbout';
// The user-supplied, unhashed password and username
$usPassword = $_POST['password'];
$sUsername = mysql_real_escape_string($_POST['username']);
// Generate a SHA-384 hash using our salt
$sPassword = hash('sha384', $usPassword . $salt);
// Save it to the DB
mysql_query("UPDATE user_accounts SET password = '$sPassword' WHERE username = '$sUsername'");
Fine, what is Encryption then?
If Hashing is a one-way street, Encryption is two-way. You would use encryption whenever you need to safely store information but need to retrieve it later. (Again, you almost never need to know what a user's password is, so why use encryption?) Encryption is also a heavier action than hashing.
Say we had a table, user_accounts, which had the following fields:
id
username
password
name
social_security_number
address_street
address_city
address_state
address_zip
cc_number
cc_type
cc_cvv
cc_exp_date
We would want to hash the password obviously. ID, username, and name we can probably not worry about doing anything too. Everything else though we would want to encrypt as all that information can be used in a very destructive manner if the database is stolen, but we need to use that data. Encryption would allow us to store that information in the database and sleep at night, yet still pull it back up when we needed it.
Keys
Keys, unlike Salts in hashes, are required. The key is used during the encryption process as a constant. If you use the wrong key you will not get back what you stored as the key is actually used for both encryption and decryption. Like a Salt, you'll want to store this somewhere such as a file with limited read access.
For security reasons you will also want to look at a key rotation plan. One advantage that hashing has over encryption is that it is not meant to turn the random output of the hash back into something useful, where encryption is designed for just that purpose. If someone gets a hold of the encryption key, it is trivial to decrypt the data.
You will want to consider a key rotation plan because of this. Every set number of days you will want to decrypt all the data using the old key and re-encrypt it with the new key. Yes, it is time consuming, but that is the tradeoff for using encryption and staying secure.
Encryption Made Easy
PHP can do encryption as well as hashing. I personally prefer using the Mcrypt library that is available on most Linux systems, but PHP can also use OpenSSL. Encryption is a fairly heavy weight process and requires a bit of setup to get going. I've created a simple class for setting up and using Mcrypt at http://code.google.com/p/tws-code/ .
$key = 'SuperSecretKey';
$twsMcrypt = new Tws_Mcrypt($key);
$encryptedString = $twsMcrypt->encrypt('This is a secret message');
$unencryptedString = $twsMcrypt->decrypt($encryptedString');
Tws_Mcrypt can also take a second constructor argument to specify the type of encryption to use, otherwise it defaults to RIJNDAEL_128. It also encodes the encrypted string in Base64 to make storage easier, so if you encrypt something using Tws_Mcrypt and try to decrypt it straight, make sure you Base64 decode it first.
"Unless you are a cryptanalyst, don't do your own crypto"
One final comment on hashing and encryption. The algorithms that are used in either case are tried and tested algorithms that are hard to crack and, in some cases, extremely sophisticated. People much, much smarter have taken the time to develop them and you, me, nor anyone that is not a cryptanalyst will do better.
Don't think that the nifty encryption algorithm you came up with last night while you were in the zone will be any better than the ones that ship in PHP or in libraries like Mcrypt or OpenSSL. It won't be and will only put you at risk.
So which to use?
Use Hashing when:
You don't care what the actual data is
You just need to do verification
You need something extremely light-weight
Use Encryption when:
- Data needs to be secured, but pulled back up later
Keep all this in mind next time you are talking to someone about how your application works. Don't tell them you are encrypting passwords in the database when what you are really doing is hashing them. Don't think that using straight MD5 is a viable means for data security.
Hopefully this will help you choose which type of security based on what you are doing as well.