by Richard Fant
The Rise
MD5 (message digest version 5) was developed in 1991 and is still very popular today, with a wide range of commercial and government applications. MD5 is used to generate hash values of passwords stored on a system as opposed to storing the passwords in plain text. This password protection method was used by many popular commercial websites such as LinkedIn, eHarmony, and LastFM. In addition, many government agencies originally adopted MD5 for official use.
How it Works
If you take a large set of numbers and apply mathematical operations on it to reduce the large set to a much smaller value, those operations are collectively called a hashing function. Particularly, in Computer Sciences, a hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called hash values, hash codes, digests, or simply hashes.
A typical use of hashing functions is to verify the integrity of files after a file transfer. For example, a person wishing to transfer a document called File A over the internet would first hash the contents of File A into a value representing File A. At the destination, the newly arrived file, call it File A’, is similarly hashed into a value representing File A’. The two hash values are compared. If both values are the same, then File A’ is the same as File A which means the transfer was successful and no damage occurred.
As with all hashing functions, MD5 is designed to be a one-way function: it should be extremely difficult to reverse engineer the output to determine the input. One of the most common ways to attack a one-way function, is to run a brute-force search for all possible inputs to see if they generate something which matches the same specific output. This is known as finding a hash collision. The security strenght of a hash function is measured by how difficult it is to find a hash collision.
How is it Used
MD5 is frequently used as hashing function for passwords. For example, a user’s LinkedIn password such as “MyPasswordIsGood!” could be put into a hash function which would generate a 128-bit hash value starting with something like “7A07C” (the actual hash value would be longer, but shortened here for convenience). This hashed password could be stored on the LinkedIn website. Whenever the user logged into the website with their plain text password, it would be hashed and then compared with what was already stored there. If they matched, the user was authorized access. This process of hashing the password means that simply stealing hashed passwords from the website is insufficient to gain access. This also means that the user’s plain text password is never stored on the website itself which increases overall security. However, there is a weakness in the process, the previously mentioned hash collision.
A hash collision is when two different input values generate the same output value. In the above example, suppose that “MyPasswordIsGood!” generated “7A07C” as output. A hash collision is when another input such as “TqBfjO7#DB” actually hashes to the same value “7A07C”. This means an attacker would not have to know the original plain text password to gain access to a website. Instead, using brute force an attacker could run billions or trillions of random input values into the MD5 hash function until they saw the expected output “7A07C”. And thus, the attacker could access the website using the second input value “TqBfjO7#DB”.
With only 128 bits for the size of its hash value, the probability of having two MD5 hash values accidentally colliding is approximately 1.47*10-29. Given today’s computing power, an MD5 collision can be generated in a matter of seconds. This was the downfall of MD5.
The Fall
MD5 runs fairly quickly and has a simple algorithm which makes it easy to implement. The main weakness with MD5 is that it is relatively easy to generate hash collisions using today’s computer technologies.
In 2005, security researchers announced that MD5 should no longer be considered secure due to an experiment that showed by running a collision-generating brute-force algorithm on a standard PC notebook for 8 hours, a hash collision occurred in MD5. However, MD5 was so deeply embedded in applications and websites, many considered it too expensive to discontinue its use since that would necessitate rewriting code for thousands of applications.
That attitude began to change when several major corporations began reporting security breaches in their systems where MD5 was used. For example in June 2012, LinkedIn announced that 6.4 million hashed passwords had been leaked to a Russian website and that many of those MD5-hashed passwords had been reverse-engineered using brute force to find its matching input strings. In the same month, Microsoft reported that a new piece of malware, called Flame, was taking advantage of the hash collision security flaw in MD5 to generate a counterfeit digital certificate. This forged certificate convinced Windows Operating Systems, that the Flame malware was a legitimate Microsoft product and should be allowed through the firewall. This allowed the malware to bypass many anti-virus programs and install itself on Windows-based PC’s.
As recently as 2019, nearly 15 years after the publication of the flaws of MD5, one quarter of content management systems used in websites still use MD5 for password hashing.
Wrapping Up
Using Moore’s Law, the predicted computational power of a personal computer will double approximately every two years. This means the computer used in the brute-force attack of MD5 in 2005 was 27 times as powerful as one built in 1991 when MD5 was released. A computer in 2020 is 214 times as powerful as a 1991 model. This means when MD5 was released in 1991, the exponential increase of computing power was not taken into account by its users which lead to an overabundance of confidence in the security of MD5.
Final Thoughts
Using MD5 to verify a file hasn’t been corrupted or damaged is a reasonable use of this hash function. Using MD5 to generate the hash value of passwords is a security breach waiting to happen.