Technicalhashsecuritycryptography

Hash Functions Explained: MD5, SHA-256, and When to Use Each

Understand cryptographic hash functions, their differences, and appropriate use cases. Learn why MD5 is obsolete for security and when SHA-256 is overkill.

Loopaloo TeamNovember 10, 202514 min read

Every time you log into a website, verify a downloaded file, or use a blockchain, hash functions are doing invisible work. They're one of the most fundamental primitives in software engineering — a deceptively simple concept (turn input into a fixed-size fingerprint) with profound implications for security, data integrity, and system design.

What Hash Functions Actually Do

A hash function takes an input of arbitrary size — a single character, a novel, a 4GB video file — and produces a fixed-size output called a hash, digest, or fingerprint. The same input always produces the same output, but the output reveals nothing about the input. And crucially, even the smallest change to the input produces a completely different output.

"Hello, World!"  → dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f
"Hello, World"   → 03ba204e50d126e4674c005e04d82e84c21366780af1f43bd54a37816b6ab340
"hello, world!"  → 68e656b251e67e8358bef8483ab0d51c6619f3e7a1a9f0e75838d41ff368f728

Notice that changing a single character — removing the exclamation mark, or changing the case of the first letter — produces an entirely different hash. This is called the avalanche effect, and it's a deliberate design property. It prevents attackers from deducing information about the input by studying the hash.

The Properties That Make Hashes Useful

A cryptographic hash function provides five guarantees, each enabling different applications.

Determinism means that the same input always produces the same output, regardless of when or where you compute it. This is what makes checksums work — if you hash a file and get the same result as the published hash, you know the file hasn't been altered.

Speed means that computing the hash is efficient for any input size. A modern CPU can hash several gigabytes per second with SHA-256. This makes it feasible to verify large files, hash millions of database records, or compute checksums in real-time.

Pre-image resistance means that given a hash output, it's computationally infeasible to find the input that produced it. If someone gives you the hash dffd6021..., you can't work backward to discover it came from "Hello, World!". You'd have to try every possible input until you found a match — a process so expensive for a good hash function that it's effectively impossible.

Collision resistance means that it's infeasible to find two different inputs that produce the same hash. Since a hash function maps an infinite input space to a finite output space, collisions must exist mathematically, but a good hash function makes them astronomically unlikely to find by design.

Avalanche effect means that a one-bit change in the input changes, on average, half the bits in the output. This ensures that similar inputs don't produce similar hashes, which would leak information about the relationship between inputs.

A History of Hash Algorithms

MD5: The Cautionary Tale

MD5, designed by Ron Rivest in 1991, was the dominant hash function of the 1990s and early 2000s. It produces a 128-bit (32 hex character) hash and was used for everything from password storage to digital signatures to file integrity verification.

MD5("Hello, World!") = 65a8e27d8879283831b664bd8b7f0ad4

In 2004, researchers demonstrated practical collision attacks against MD5, and by 2008, researchers used an MD5 collision to create a rogue SSL certificate that could impersonate any website. The attack exploited the fact that if you can create two documents with the same MD5 hash, a digital signature on one is equally valid for the other.

Today MD5 is cryptographically broken — collisions can be generated on a laptop in seconds. However, it remains useful for non-security purposes. As a checksum for file deduplication (where the files aren't created by an adversary trying to forge collisions), MD5 is fast and effective. As a cache key, where the only requirement is that different inputs produce different outputs with overwhelming probability, MD5 is fine. The distinction matters: MD5 is broken for security applications where an attacker is actively trying to create collisions, but it's perfectly adequate for applications where collisions would occur only by chance.

SHA-1: The Long Deprecation

SHA-1, published by NIST in 1995, produces a 160-bit (40 hex character) hash. It replaced MD5 as the standard for digital signatures and was the backbone of SSL/TLS certificates for over a decade.

SHA-1("Hello, World!") = 0a0a9f2a6772942557ab5355d76af442f8f65e01

Theoretical weaknesses were identified beginning in 2005, but the algorithm wasn't practically broken until 2017, when Google's Project SHAttered demonstrated the first real-world SHA-1 collision. The attack required 6,500 years of CPU computation and 110 years of GPU computation — enormous resources, but within reach of well-funded attackers and nation-states.

The industry responded with a coordinated migration away from SHA-1. Browsers now reject SHA-1 certificates. Git, which uses SHA-1 for content addressing, is gradually migrating to SHA-256. Certificate authorities stopped issuing SHA-1 certificates in 2016.

The SHA-1 story illustrates something important about cryptographic algorithms: the time between "theoretically weakened" and "practically broken" can span years or decades, but the migration away from a weakened algorithm should begin immediately, not when the first practical attack is demonstrated.

SHA-256: The Current Standard

SHA-256 is part of the SHA-2 family, designed by the NSA and published by NIST in 2001. It produces a 256-bit (64 hex character) hash and has withstood over two decades of cryptanalysis with no practical weaknesses discovered.

SHA-256("Hello, World!") = dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f

SHA-256 is the workhorse hash function of modern infrastructure. It secures TLS/SSL connections, signs software packages, verifies file downloads, underpins Bitcoin's proof-of-work system, and provides integrity verification for virtually every major protocol designed after 2005.

Its 256-bit output provides a security margin that's almost incomprehensibly large. There are approximately 1.16 × 10^77 possible SHA-256 hashes — more than the estimated number of atoms in the observable universe (approximately 10^80). Finding a collision by brute force would require, on average, computing 2^128 hashes, which is beyond the capability of all computing hardware that has ever existed, even if it all worked in parallel for the remaining lifetime of the sun.

SHA-512: When More Is Better

SHA-512 is SHA-256's larger sibling, producing a 512-bit (128 hex character) hash. Despite being nominally "more secure," SHA-512's primary advantage is performance on 64-bit systems, not security. SHA-512's operations use 64-bit arithmetic natively, while SHA-256 uses 32-bit arithmetic, which means SHA-512 can actually be faster than SHA-256 on modern 64-bit processors.

SHA-512 is the right choice for applications where you're already operating on 64-bit hardware and want maximum performance, or where maximum hash length is desirable for long-term archival integrity. For most applications, SHA-256 is sufficient.

SHA-3: The Insurance Policy

SHA-3, standardized in 2015, is based on a fundamentally different mathematical construction (the Keccak sponge function) than SHA-2. Its purpose is not to replace SHA-2 — which remains secure — but to provide an alternative that doesn't share SHA-2's mathematical lineage. If a flaw in SHA-2's Merkle-Damgård construction were ever discovered, SHA-3 would be unaffected.

SHA-3 also has a practical advantage: it's resistant to length extension attacks, a vulnerability that affects SHA-256 and SHA-512. In a length extension attack, an attacker who knows hash(message) can compute hash(message + padding + additional_data) without knowing the original message. This sounds obscure but has led to real-world vulnerabilities in systems that used raw SHA-256 for authentication instead of HMAC.

For new systems where future-proofing is a priority, SHA-3 is an excellent choice. For systems that need to interoperate with existing infrastructure, SHA-256 remains the practical default.

Choosing the Right Algorithm

The right hash function depends entirely on what you're hashing and why.

File Integrity Verification

When you download a file and want to confirm it wasn't corrupted during transfer, you compare its hash against the publisher's checksum:

sha256sum downloaded-file.iso
# Compare output with published hash

SHA-256 is the standard for this use case. If you're verifying integrity against accidental corruption (not deliberate tampering), even MD5 is adequate — accidental collisions are astronomically unlikely. But for software downloads, firmware updates, or anything where a man-in-the-middle attacker might substitute a malicious file, SHA-256 provides the necessary security margin.

Password Storage

Plain hash functions — even SHA-256 — are a poor choice for password hashing because they're too fast. A modern GPU can compute billions of SHA-256 hashes per second, which means an attacker who obtains your database of SHA-256-hashed passwords can try every common password in minutes.

Password hashing requires algorithms specifically designed to be slow:

bcrypt was designed in 1999 and remains widely used. It incorporates a configurable work factor that can be increased as hardware gets faster, and it includes built-in salting (random data mixed with the password before hashing) that prevents pre-computation attacks.

Argon2 won the Password Hashing Competition in 2015 and is the current recommended algorithm. It's "memory-hard," meaning it requires a configurable amount of memory to compute, which makes GPU-based attacks less effective because GPUs have limited per-core memory.

PBKDF2 is the oldest of the three and the most widely supported (it's implemented in virtually every programming language and cryptographic library). It's acceptable when bcrypt or Argon2 aren't available, but it lacks memory hardness, which makes it more vulnerable to GPU-accelerated attacks.

The principle across all three: hash(password + salt) should take 100-500 milliseconds per computation. This is imperceptible for a legitimate login but makes brute-force attacks across millions of passwords impractical.

Cache Keys and Data Deduplication

For non-security applications where speed matters and an adversary isn't actively trying to forge collisions, MD5 is the pragmatic choice. It's fast, produces reasonably short hashes, and is essentially guaranteed to be unique across naturally occurring data. Using SHA-256 for a cache key is technically more correct but practically overkill — the additional CPU cost provides no benefit when the threat model doesn't include adversarial collision generation.

Digital Signatures

Digital signatures work by hashing the document, then encrypting the hash with a private key. The recipient decrypts with the public key and compares hashes to verify both the document's integrity and the signer's identity.

This use case demands collision resistance: if an attacker could create two documents with the same hash, a signature on one would validate the other. SHA-256 or SHA-512 are the minimum acceptable algorithms. SHA-3 provides additional margin against future mathematical breakthroughs.

Security Deep Dive

Rainbow Tables and Salting

Rainbow tables are pre-computed databases mapping common inputs to their hashes. An attacker who obtains a database of unsalted password hashes can look up each hash in their rainbow table to find the original password. Large rainbow tables exist for MD5 and SHA-1 covering billions of common passwords, dictionary words, and variations.

Salting defeats rainbow tables by making each hash unique. A salt is a random string concatenated with the password before hashing:

hash("password")                    → 5e884898da... (in every rainbow table)
hash("a8f3k2x9" + "password")      → 9b3c08fa2... (unique to this salt)
hash("p2m7v4n1" + "password")      → 41ef6291b... (different salt, different hash)

Even if two users choose the same password, different salts produce different hashes, preventing an attacker from recognizing the duplication. The salt doesn't need to be secret — it's typically stored alongside the hash — because its purpose is to prevent pre-computation, not to add a secret.

Collision Attacks in Practice

The practical impact of a collision attack depends on the application. For file checksums, a collision means an attacker could create a malicious file that appears to match a legitimate file's hash. For digital signatures, a collision means forging documents. For password hashing, collisions don't matter because the attacker's goal is to find the original password (a pre-image attack), not to find a different password with the same hash.

This is why MD5 can still be used for checksums of non-adversarial data but not for digital signatures: the threat models are different.

Comparison

Algorithm	Output	Speed	Status	Best For
MD5	128-bit	Very fast	Broken	Checksums, cache keys
SHA-1	160-bit	Fast	Weak	Legacy compatibility only
SHA-256	256-bit	Moderate	Secure	Signatures, integrity, general purpose
SHA-512	512-bit	Moderate*	Secure	64-bit systems, archival
SHA-3	Variable	Moderate	Secure	Future-proofing, HMAC alternative
bcrypt	184-bit	Intentionally slow	Secure	Password hashing
Argon2	Variable	Intentionally slow	Secure	Password hashing (recommended)

*SHA-512 can be faster than SHA-256 on 64-bit processors.

Conclusion

Hash functions are ubiquitous because they solve a fundamental problem: efficiently creating a compact, deterministic fingerprint of arbitrary data. The key to using them well is matching the algorithm to the threat model. Use MD5 where speed matters and security doesn't. Use SHA-256 as the default for any integrity or security application. Use bcrypt or Argon2 — never raw hash functions — for passwords. And when in doubt, SHA-256 is almost never the wrong choice.

Use our Hash Generator to compute MD5, SHA-1, SHA-256, and SHA-512 hashes for text or files — all processed locally, with your data never leaving your browser.

Try Our Free Tools

200+ browser-based tools for developers and creators. No uploads, complete privacy.

Explore All Tools