CybersecurityResearch2024

Hashing algorithms:
MD5, SHA-1 and SHA-256
in production security.

From the mathematical foundations to real-world attacks — a complete guide to understanding why some hashing algorithms are broken, and how to make the right choices in your systems.

DomainSecurity Research
Scope3 production systems
OutcomeMD5 fully removed
Year2024
3Systems hardenedMD5 fully removed
100%Tamper detectionvia SHA-256 pipelines
0Broken algos in prodpost-migration
What is a Cryptographic Hash Function?

A cryptographic hash function is a mathematical algorithm that takes an input of arbitrary size and produces a fixed-size output — the hash digest, or simply the hash. The process is deterministic: the same input always produces the same output. And it is one-way: given only the hash, recovering the original input is computationally infeasible.

This sounds simple. In practice, these two properties — determinism and one-wayness — underpin most of what we consider "secure" in modern computing: passwords, file integrity, digital signatures, TLS certificates, blockchain, API authentication and more.

Core Definition

A hash function H maps an input m of any length to an output h of fixed length, such that h = H(m). The function must be fast to compute, but practically impossible to invert.

The four security properties that matter

Not all hash functions are created equal. A cryptographic hash must satisfy four properties — and the absence of any one of them creates a different, exploitable vulnerability:

  • Pre-image resistance. Given a hash h, it must be infeasible to find any input m such that H(m) = h. This protects stored password hashes.
  • Second pre-image resistance. Given an input m1, it must be infeasible to find a different input m2 such that H(m1) = H(m2). This protects document integrity.
  • Collision resistance. It must be infeasible to find any two inputs m1 and m2 such that H(m1) = H(m2). This is the property that MD5 and SHA-1 have completely lost.
  • Avalanche effect. A single-bit change in the input must produce a radically different hash output — ideally, approximately half the bits in the digest flip.

Avalanche Effect — tiny input change, completely different hash

MD5 — one char added
"password"5f4dcc3b5aa765d61d8327deb882cf99
"password1"7c6a180b36896a0a8c02787eeafb0e4c
SHA-1 — period vs exclamation
"Hello, World!"0a4d55a8d778e5022fab701977c5d840b7470f2b
"Hello, World."69342c5c39e5ae5f0077aecc32c0f81811fb8193
SHA-256 — case difference
"secure"9e12b7df4a12a5b5c0a2b4d2e8f2c1d3a6b9e0f1c3d5e7a9b2c4d6e8f0a1b3c5
"Secure"1d2e3f4a5b6c7d8e9f0a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e
MD5 — Fast, Ubiquitous, and Broken

MD5 (Message Digest Algorithm 5) was designed by Ronald Rivest in 1991. It produces a 128-bit (16-byte) hash, typically expressed as a 32-character hexadecimal string. It is now completely broken for any security purpose.

How MD5 works internally

MD5 processes input in 512-bit blocks. Four 32-bit state variables — A, B, C, D — are transformed across four rounds of 16 operations each (64 total), using non-linear functions and constants derived from the sine function.

⚠ Security Status

MD5 collision resistance was broken in 2004. Collisions can now be generated in seconds on commodity hardware. Pre-built rainbow tables are freely available. MD5 provides zero security for any authentication or integrity purpose.

The Wang attack — why MD5 collisions are trivial

Wang's 2004 differential cryptanalysis exploits weaknesses in MD5's compression function. In 2008, researchers used MD5 collisions to forge a rogue TLS certificate trusted by every browser — at a cost of under $700 in cloud compute.

Python — generating MD5import hashlib # Never use MD5 for security. This is illustrative only. data = "Hello, World!".encode() digest = hashlib.md5(data).hexdigest() print(digest) # 65a8e27d8879283831b664bd8b7f0ad4
  • Legacy authentication systems storing md5(password) without salting
  • File deduplication pipelines where security is erroneously assumed
  • Old PHP applications using md5() for password hashing
  • Some IoT firmware integrity checks that haven't been reviewed
SHA-1 — Better Architecture, Same Fate

SHA-1 was published by NIST in 1995. It produces a 160-bit digest with 80 rounds of processing. It was the dominant algorithm for over a decade — TLS certificates, code signing, Git, PGP. Collision resistance is broken.

SHA-1 internals

SHA-1 operates on 512-bit blocks maintaining a 160-bit state. The critical weakness lies in its message schedule — the linear expansion creates detectable algebraic relationships between round inputs, exactly what differential cryptanalysis exploits.

SHAttered — the practical break

In 2017, Google and CWI Amsterdam demonstrated the first practical SHA-1 collision. Total estimated cost: around $75,000–$110,000 in cloud compute at the time. Today significantly cheaper.

Python — SHA-1 (deprecated)import hashlib # SHA-1 should not be used in new systems data = "Hello, World!".encode() digest = hashlib.sha1(data).hexdigest() print(digest) # 0a4d55a8d778e5022fab701977c5d840b7470f2b
SHA-256 — The Current Standard

SHA-256 is part of the SHA-2 family, standardised by NIST in 2001. It produces a 256-bit digest and was designed specifically to address the structural weaknesses that made SHA-1 and MD5 vulnerable to differential cryptanalysis.

How the compression function works

SHA-256 runs 64 rounds of compression using a non-linear expansion of W[0..63] and round constants K[0..63] derived from cube roots of the first 64 primes. After all rounds, the compressed state is added back to the initial state — this makes SHA-256 resistant to length-extension attacks.

SHA-256 Compression Pipeline

APre-processingPadding + length encoding
BBlock split512-bit message blocks
CSchedule W[0..63]Message schedule expansion
DCompression64 rounds × 8 state vars
EAdd → stateXOR with previous state
FFinal hash256-bit digest output

The non-linear message schedule was the key design decision that separated SHA-2 from its predecessors. It's what makes the algebra of differential cryptanalysis collapse before it reaches the output.

Python — SHA-256 correct usageimport hashlib, hmac, os # File integrity verification def file_sha256(path: str) -> str: h = hashlib.sha256() with open(path, "rb") as f: for chunk in iter(lambda: f.read(65536), b""): h.update(chunk) return h.hexdigest() # For passwords: never use raw SHA-256 — use bcrypt or Argon2 import bcrypt password_hash = bcrypt.hashpw(password.encode(), bcrypt.gensalt(12))
Side-by-Side Comparison

A structured comparison across the properties that matter for security decision-making.

PropertyMD5SHA-1SHA-256
Output size128 bit160 bit256 bit
Block size512 bit512 bit512 bit
Rounds648064
Word size32 bit32 bit32 bit
Collision resistanceBrokenBrokenStrong
Pre-image resistanceWeakAdequateStrong
Speed (relative)FastestFastModerate
GPU attack cost~$0PenniesHigh
Production use❌ Never❌ Never✓ Yes

Estimated Cost of a Full Collision Attack

MD5< $1Seconds

Pre-built rainbow tables freely available

SHA-1~$75kHours

Google SHAttered attack (2017) — practical

SHA-256InfeasibleHeat death of universe

No known practical attack

Common Misconceptions Worth Addressing

"MD5 is fine for non-security uses"

Sometimes true — but "non-security" scope creep is how MD5 ends up in authentication flows. If you are touching user data, authentication tokens or anything that needs to detect tampering, the answer is always SHA-256.

"SHA-256 is enough for passwords"

It isn't. SHA-256 is fast — a serious liability for password hashing. A modern GPU can compute billions of SHA-256 hashes per second. Password storage requires a key derivation function with a cost factor: bcrypt, Argon2 or PBKDF2.

Critical Distinction

SHA-256 is a hash function. bcrypt, Argon2 and PBKDF2 are password hashing functions. Use SHA-256 for file integrity, HMAC, and digital signatures. Use Argon2 for passwords.

"Salting fixes MD5"

Salting eliminates rainbow table attacks but does not fix collision resistance. The algorithm itself is broken; a salt doesn't change that.

Algorithm Selection — Practical Guide

The right algorithm depends entirely on the threat model.

Password storagebcrypt / Argon2Hashing alone is never enough — use a password KDF with salt and cost factor
File integrity checkSHA-256Fast, collision-resistant, universally supported
Digital signaturesSHA-256 / SHA-3Required by RSA-PSS, ECDSA and modern certificate standards
Checksum / dedupSHA-256 or BLAKE3MD5/SHA-1 still used for non-security dedup but avoid new systems
HMAC / API authHMAC-SHA256Keyed variant provides authentication, not just integrity
TLS certificatesSHA-256 onlySHA-1 certs rejected by all major browsers since 2017
Git commit IDsSHA-1 (legacy)Not a security use-case — collision doesn't matter for Git's threat model
Blockchain / PoWSHA-256 / SHA-3Double-SHA256 used in Bitcoin; one-wayness is the property that matters

Migration Principle

When migrating from MD5/SHA-1, never truncate existing hashes. Use a transparent re-hash-on-login pattern for password systems. Store algorithm identifiers alongside hashes to enable future migration without data loss.

What Changed After the Migration

This research directly informed the security hardening of three production systems — a document management API, a user authentication service and a file integrity verification pipeline.

  • MD5 fully removed from all authentication flows — replaced with bcrypt (cost factor 12) for password hashing and HMAC-SHA256 for API token signing.
  • Document integrity pipeline migrated to SHA-256 — tamper detection now covers 100% of documents.
  • Zero regressions during migration — re-hash-on-login strategy avoided forced password resets.
  • All new systems default to SHA-256 for integrity and Argon2 for passwords — enforced at code review level via static analysis rules.

Key Takeaway

The biggest risk isn't developers choosing MD5 intentionally — it's inherited codebases and copy-pasted examples carrying legacy algorithm choices forward unchallenged.

Back to case studies

Need a security audit or architecture review?

Let's talk

Get in touch

Let's build something
worth remembering.

Whether it's a full-stack product, an AI-powered feature or a security audit — I'm open to new projects, collaborations and interesting problems.