Hashing Algorithms: MD5, SHA-1 and SHA-256

◈What is a Cryptographic Hash Function?

A cryptographic hash function is a mathematical algorithm that takes an input of arbitrary size and produces a fixed-size output — the hash digest, or simply the hash. The process is deterministic: the same input always produces the same output. And it is one-way: given only the hash, recovering the original input is computationally infeasible.

This sounds simple. In practice, these two properties — determinism and one-wayness — underpin most of what we consider "secure" in modern computing: passwords, file integrity, digital signatures, TLS certificates, blockchain, API authentication and more.

Core Definition

A hash function H maps an input m of any length to an output h of fixed length, such that h = H(m). The function must be fast to compute, but practically impossible to invert.

The four security properties that matter

Not all hash functions are created equal. A cryptographic hash must satisfy four properties — and the absence of any one of them creates a different, exploitable vulnerability:

Pre-image resistance. Given a hash h, it must be infeasible to find any input m such that H(m) = h. This protects stored password hashes.
Second pre-image resistance. Given an input m1, it must be infeasible to find a different input m2 such that H(m1) = H(m2). This protects document integrity.
Collision resistance. It must be infeasible to find any two inputs m1 and m2 such that H(m1) = H(m2). This is the property that MD5 and SHA-1 have completely lost.
Avalanche effect. A single-bit change in the input must produce a radically different hash output — ideally, approximately half the bits in the digest flip.

◈ Avalanche Effect — tiny input change, completely different hash

MD5 — one char added

"password"→5f4dcc3b5aa765d61d8327deb882cf99

"password1"→7c6a180b36896a0a8c02787eeafb0e4c

SHA-1 — period vs exclamation

"Hello, World!"→0a4d55a8d778e5022fab701977c5d840b7470f2b

"Hello, World."→69342c5c39e5ae5f0077aecc32c0f81811fb8193

SHA-256 — case difference

"secure"→9e12b7df4a12a5b5c0a2b4d2e8f2c1d3a6b9e0f1c3d5e7a9b2c4d6e8f0a1b3c5

"Secure"→1d2e3f4a5b6c7d8e9f0a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e

◆MD5 — Fast, Ubiquitous, and Broken

MD5 (Message Digest Algorithm 5) was designed by Ronald Rivest in 1991. It produces a 128-bit (16-byte) hash, typically expressed as a 32-character hexadecimal string. It is now completely broken for any security purpose.

How MD5 works internally

MD5 processes input in 512-bit blocks. Four 32-bit state variables — A, B, C, D — are transformed across four rounds of 16 operations each (64 total), using non-linear functions and constants derived from the sine function.

⚠ Security Status

MD5 collision resistance was broken in 2004. Collisions can now be generated in seconds on commodity hardware. Pre-built rainbow tables are freely available. MD5 provides zero security for any authentication or integrity purpose.

The Wang attack — why MD5 collisions are trivial

Wang's 2004 differential cryptanalysis exploits weaknesses in MD5's compression function. In 2008, researchers used MD5 collisions to forge a rogue TLS certificate trusted by every browser — at a cost of under $700 in cloud compute.

Python — generating MD5import hashlib # Never use MD5 for security. This is illustrative only. data = "Hello, World!".encode() digest = hashlib.md5(data).hexdigest() print(digest) # 65a8e27d8879283831b664bd8b7f0ad4

Legacy authentication systems storing md5(password) without salting
File deduplication pipelines where security is erroneously assumed
Old PHP applications using md5() for password hashing
Some IoT firmware integrity checks that haven't been reviewed

◉SHA-1 — Better Architecture, Same Fate

SHA-1 was published by NIST in 1995. It produces a 160-bit digest with 80 rounds of processing. It was the dominant algorithm for over a decade — TLS certificates, code signing, Git, PGP. Collision resistance is broken.

SHA-1 internals

SHA-1 operates on 512-bit blocks maintaining a 160-bit state. The critical weakness lies in its message schedule — the linear expansion creates detectable algebraic relationships between round inputs, exactly what differential cryptanalysis exploits.

SHAttered — the practical break

In 2017, Google and CWI Amsterdam demonstrated the first practical SHA-1 collision. Total estimated cost: around $75,000–$110,000 in cloud compute at the time. Today significantly cheaper.

Python — SHA-1 (deprecated)import hashlib # SHA-1 should not be used in new systems data = "Hello, World!".encode() digest = hashlib.sha1(data).hexdigest() print(digest) # 0a4d55a8d778e5022fab701977c5d840b7470f2b

◇SHA-256 — The Current Standard

SHA-256 is part of the SHA-2 family, standardised by NIST in 2001. It produces a 256-bit digest and was designed specifically to address the structural weaknesses that made SHA-1 and MD5 vulnerable to differential cryptanalysis.

How the compression function works

SHA-256 runs 64 rounds of compression using a non-linear expansion of W[0..63] and round constants K[0..63] derived from cube roots of the first 64 primes. After all rounds, the compressed state is added back to the initial state — this makes SHA-256 resistant to length-extension attacks.

◉ SHA-256 Compression Pipeline

APre-processingPadding + length encoding

BBlock split512-bit message blocks

CSchedule W[0..63]Message schedule expansion

DCompression64 rounds × 8 state vars

EAdd → stateXOR with previous state

FFinal hash256-bit digest output

The non-linear message schedule was the key design decision that separated SHA-2 from its predecessors. It's what makes the algebra of differential cryptanalysis collapse before it reaches the output.

Python — SHA-256 correct usageimport hashlib, hmac, os # File integrity verification def file_sha256(path: str) -> str: h = hashlib.sha256() with open(path, "rb") as f: for chunk in iter(lambda: f.read(65536), b""): h.update(chunk) return h.hexdigest() # For passwords: never use raw SHA-256 — use bcrypt or Argon2 import bcrypt password_hash = bcrypt.hashpw(password.encode(), bcrypt.gensalt(12))

▣Side-by-Side Comparison

A structured comparison across the properties that matter for security decision-making.

Property	MD5	SHA-1	SHA-256
Output size	128 bit	160 bit	256 bit
Block size	512 bit	512 bit	512 bit
Rounds	64	80	64
Word size	32 bit	32 bit	32 bit
Collision resistance	Broken	Broken	Strong
Pre-image resistance	Weak	Adequate	Strong
Speed (relative)	Fastest	Fast	Moderate
GPU attack cost	~$0	Pennies	High
Production use	❌ Never	❌ Never	✓ Yes

◆ Estimated Cost of a Full Collision Attack

MD5< $1Seconds

Pre-built rainbow tables freely available

SHA-1~$75kHours

Google SHAttered attack (2017) — practical

SHA-256InfeasibleHeat death of universe

No known practical attack

◈Common Misconceptions Worth Addressing

"MD5 is fine for non-security uses"

Sometimes true — but "non-security" scope creep is how MD5 ends up in authentication flows. If you are touching user data, authentication tokens or anything that needs to detect tampering, the answer is always SHA-256.

"SHA-256 is enough for passwords"

It isn't. SHA-256 is fast — a serious liability for password hashing. A modern GPU can compute billions of SHA-256 hashes per second. Password storage requires a key derivation function with a cost factor: bcrypt, Argon2 or PBKDF2.

Critical Distinction

SHA-256 is a hash function. bcrypt, Argon2 and PBKDF2 are password hashing functions. Use SHA-256 for file integrity, HMAC, and digital signatures. Use Argon2 for passwords.

"Salting fixes MD5"

Salting eliminates rainbow table attacks but does not fix collision resistance. The algorithm itself is broken; a salt doesn't change that.

◆Algorithm Selection — Practical Guide

The right algorithm depends entirely on the threat model.

Password storagebcrypt / Argon2Hashing alone is never enough — use a password KDF with salt and cost factor

File integrity checkSHA-256Fast, collision-resistant, universally supported

Digital signaturesSHA-256 / SHA-3Required by RSA-PSS, ECDSA and modern certificate standards

Checksum / dedupSHA-256 or BLAKE3MD5/SHA-1 still used for non-security dedup but avoid new systems

HMAC / API authHMAC-SHA256Keyed variant provides authentication, not just integrity

TLS certificatesSHA-256 onlySHA-1 certs rejected by all major browsers since 2017

Git commit IDsSHA-1 (legacy)Not a security use-case — collision doesn't matter for Git's threat model

Blockchain / PoWSHA-256 / SHA-3Double-SHA256 used in Bitcoin; one-wayness is the property that matters

Migration Principle

When migrating from MD5/SHA-1, never truncate existing hashes. Use a transparent re-hash-on-login pattern for password systems. Store algorithm identifiers alongside hashes to enable future migration without data loss.

◉What Changed After the Migration

This research directly informed the security hardening of three production systems — a document management API, a user authentication service and a file integrity verification pipeline.

MD5 fully removed from all authentication flows — replaced with bcrypt (cost factor 12) for password hashing and HMAC-SHA256 for API token signing.
Document integrity pipeline migrated to SHA-256 — tamper detection now covers 100% of documents.
Zero regressions during migration — re-hash-on-login strategy avoided forced password resets.
All new systems default to SHA-256 for integrity and Argon2 for passwords — enforced at code review level via static analysis rules.

Key Takeaway

The biggest risk isn't developers choosing MD5 intentionally — it's inherited codebases and copy-pasted examples carrying legacy algorithm choices forward unchallenged.

Hashing algorithms:
MD5, SHA-1 and SHA-256
in production security.

The four security properties that matter

How MD5 works internally

The Wang attack — why MD5 collisions are trivial

SHA-1 internals

How the compression function works

"MD5 is fine for non-security uses"

"SHA-256 is enough for passwords"

"Salting fixes MD5"

Got something
complex to build?

Hashing algorithms:MD5, SHA-1 and SHA-256in production security.

The four security properties that matter

How MD5 works internally

The Wang attack — why MD5 collisions are trivial

SHA-1 internals

How the compression function works

"MD5 is fine for non-security uses"

"SHA-256 is enough for passwords"

"Salting fixes MD5"

Got something complex to build?

Hashing algorithms:
MD5, SHA-1 and SHA-256
in production security.

Got something
complex to build?