A cryptographic hash function is a mathematical algorithm that takes an input of arbitrary size and produces a fixed-size output — the hash digest, or simply the hash. The process is deterministic: the same input always produces the same output. And it is one-way: given only the hash, recovering the original input is computationally infeasible.
This sounds simple. In practice, these two properties — determinism and one-wayness — underpin most of what we consider "secure" in modern computing: passwords, file integrity, digital signatures, TLS certificates, blockchain, API authentication and more.
Core Definition
A hash function H maps an input m of any length to an output h of fixed length, such that h = H(m). The function must be fast to compute, but practically impossible to invert.
The four security properties that matter
Not all hash functions are created equal. A cryptographic hash must satisfy four properties — and the absence of any one of them creates a different, exploitable vulnerability:
- Pre-image resistance. Given a hash
h, it must be infeasible to find any inputmsuch thatH(m) = h. This protects stored password hashes. - Second pre-image resistance. Given an input
m1, it must be infeasible to find a different inputm2such thatH(m1) = H(m2). This protects document integrity. - Collision resistance. It must be infeasible to find any two inputs
m1andm2such thatH(m1) = H(m2). This is the property that MD5 and SHA-1 have completely lost. - Avalanche effect. A single-bit change in the input must produce a radically different hash output — ideally, approximately half the bits in the digest flip.
◈ Avalanche Effect — tiny input change, completely different hash
MD5 (Message Digest Algorithm 5) was designed by Ronald Rivest in 1991. It produces a 128-bit (16-byte) hash, typically expressed as a 32-character hexadecimal string. It is now completely broken for any security purpose.
How MD5 works internally
MD5 processes input in 512-bit blocks. Four 32-bit state variables — A, B, C, D — are transformed across four rounds of 16 operations each (64 total), using non-linear functions and constants derived from the sine function.
⚠ Security Status
MD5 collision resistance was broken in 2004. Collisions can now be generated in seconds on commodity hardware. Pre-built rainbow tables are freely available. MD5 provides zero security for any authentication or integrity purpose.
The Wang attack — why MD5 collisions are trivial
Wang's 2004 differential cryptanalysis exploits weaknesses in MD5's compression function. In 2008, researchers used MD5 collisions to forge a rogue TLS certificate trusted by every browser — at a cost of under $700 in cloud compute.
- Legacy authentication systems storing
md5(password)without salting - File deduplication pipelines where security is erroneously assumed
- Old PHP applications using
md5()for password hashing - Some IoT firmware integrity checks that haven't been reviewed
SHA-1 was published by NIST in 1995. It produces a 160-bit digest with 80 rounds of processing. It was the dominant algorithm for over a decade — TLS certificates, code signing, Git, PGP. Collision resistance is broken.
SHA-1 internals
SHA-1 operates on 512-bit blocks maintaining a 160-bit state. The critical weakness lies in its message schedule — the linear expansion creates detectable algebraic relationships between round inputs, exactly what differential cryptanalysis exploits.
SHAttered — the practical break
In 2017, Google and CWI Amsterdam demonstrated the first practical SHA-1 collision. Total estimated cost: around $75,000–$110,000 in cloud compute at the time. Today significantly cheaper.
SHA-256 is part of the SHA-2 family, standardised by NIST in 2001. It produces a 256-bit digest and was designed specifically to address the structural weaknesses that made SHA-1 and MD5 vulnerable to differential cryptanalysis.
How the compression function works
SHA-256 runs 64 rounds of compression using a non-linear expansion of W[0..63] and round constants K[0..63] derived from cube roots of the first 64 primes. After all rounds, the compressed state is added back to the initial state — this makes SHA-256 resistant to length-extension attacks.
◉ SHA-256 Compression Pipeline
The non-linear message schedule was the key design decision that separated SHA-2 from its predecessors. It's what makes the algebra of differential cryptanalysis collapse before it reaches the output.
A structured comparison across the properties that matter for security decision-making.
| Property | MD5 | SHA-1 | SHA-256 |
|---|---|---|---|
| Output size | 128 bit | 160 bit | 256 bit |
| Block size | 512 bit | 512 bit | 512 bit |
| Rounds | 64 | 80 | 64 |
| Word size | 32 bit | 32 bit | 32 bit |
| Collision resistance | Broken | Broken | Strong |
| Pre-image resistance | Weak | Adequate | Strong |
| Speed (relative) | Fastest | Fast | Moderate |
| GPU attack cost | ~$0 | Pennies | High |
| Production use | ❌ Never | ❌ Never | ✓ Yes |
◆ Estimated Cost of a Full Collision Attack
Pre-built rainbow tables freely available
Google SHAttered attack (2017) — practical
No known practical attack
"MD5 is fine for non-security uses"
Sometimes true — but "non-security" scope creep is how MD5 ends up in authentication flows. If you are touching user data, authentication tokens or anything that needs to detect tampering, the answer is always SHA-256.
"SHA-256 is enough for passwords"
It isn't. SHA-256 is fast — a serious liability for password hashing. A modern GPU can compute billions of SHA-256 hashes per second. Password storage requires a key derivation function with a cost factor: bcrypt, Argon2 or PBKDF2.
Critical Distinction
SHA-256 is a hash function. bcrypt, Argon2 and PBKDF2 are password hashing functions. Use SHA-256 for file integrity, HMAC, and digital signatures. Use Argon2 for passwords.
"Salting fixes MD5"
Salting eliminates rainbow table attacks but does not fix collision resistance. The algorithm itself is broken; a salt doesn't change that.
The right algorithm depends entirely on the threat model.
Migration Principle
When migrating from MD5/SHA-1, never truncate existing hashes. Use a transparent re-hash-on-login pattern for password systems. Store algorithm identifiers alongside hashes to enable future migration without data loss.
This research directly informed the security hardening of three production systems — a document management API, a user authentication service and a file integrity verification pipeline.
- MD5 fully removed from all authentication flows — replaced with bcrypt (cost factor 12) for password hashing and HMAC-SHA256 for API token signing.
- Document integrity pipeline migrated to SHA-256 — tamper detection now covers 100% of documents.
- Zero regressions during migration — re-hash-on-login strategy avoided forced password resets.
- All new systems default to SHA-256 for integrity and Argon2 for passwords — enforced at code review level via static analysis rules.
Key Takeaway
The biggest risk isn't developers choosing MD5 intentionally — it's inherited codebases and copy-pasted examples carrying legacy algorithm choices forward unchallenged.