Download a Linux ISO and you'll find a hash next to it: sha256: 3b4c...a91f. Download the file, compute its hash, compare. If they match, your download is intact. If they don't, something changed in transit — a corrupted packet, a tampered file, or just an incomplete download.
That's hashing in its simplest practical form: verifying that data is exactly what you expect. But hashing does a lot more than verify downloads.
What Is a Hash Function?
A hash function takes input of any size and produces a fixed-size output (the "hash" or "digest"). The same input always produces the same output, but even a tiny change in the input creates a completely different hash.
Input: "Hello, World!"
MD5: 65a8e27d8879283831b664bd8b7f0ad4
Input: "Hello, World" (removed exclamation mark)
MD5: 82bb413746aee42f89dea2b59614f9ef
One character difference, completely different hash. This property is called the "avalanche effect" — a small change in input cascades into a large change in output.
Key properties of cryptographic hash functions:
- Deterministic: same input always gives same output
- Fast to compute: generating a hash is cheap
- One-way: you can't reconstruct the input from the hash
- Collision-resistant: it's infeasible to find two different inputs that produce the same hash
Generate hash digests instantly with the MD5 Hash tool.
Common Hash Algorithms
MD5
Produces a 128-bit (32 hex character) digest. It's fast, widely supported, and no longer considered secure for cryptographic purposes — researchers have demonstrated practical collision attacks.
"textoolz" → MD5 → 7f1b2e3d4c5a6b7890abcdef12345678
MD5 is still perfectly fine for:
- Checksums (verifying data integrity against accidental corruption)
- Cache keys and deduplication
- Non-security hash tables
It's NOT fine for:
- Password storage (use bcrypt, scrypt, or Argon2)
- Digital signatures
- Any context where a determined attacker might forge collisions
SHA-256
Part of the SHA-2 family, SHA-256 produces a 256-bit (64 hex character) digest. It's the current standard for most security applications.
"textoolz" → SHA-256 → a1b2c3d4...64 hex characters total
SHA-256 is used in TLS/SSL certificates, Git commit hashes, Bitcoin proof-of-work, package managers (npm, pip), and pretty much any modern system that needs strong integrity guarantees.
SHA-1
Produces a 160-bit digest. Like MD5, it's been broken for collision resistance and is being phased out. Git still uses SHA-1 for commit identifiers (with collision detection mitigations), but is migrating to SHA-256.
Quick Comparison
| Algorithm | Output size | Security | Speed | Use today? | |-----------|-----------|----------|-------|------------| | MD5 | 128 bits | Broken | Very fast | Checksums only | | SHA-1 | 160 bits | Broken | Fast | Legacy systems | | SHA-256 | 256 bits | Secure | Fast | General purpose | | SHA-512 | 512 bits | Secure | Fast (on 64-bit) | High-security needs | | Blake3 | 256 bits | Secure | Very fast | Performance-critical |
Practical Uses of Hashing
File Integrity Verification
The original use case. When you download software, the distributor publishes the expected hash. You compute the hash of your downloaded file and compare:
Expected: e3b0c44298fc1c149afbf4c8996fb924
Computed: e3b0c44298fc1c149afbf4c8996fb924
Result: ✓ Match — file is intact
This protects against corrupted downloads, man-in-the-middle attacks (if the hash is obtained through a trusted channel), and mirror tampering.
Deduplication
Cloud storage services, backup tools, and version control systems use hashing to detect duplicate content. If two files have the same hash, they're (almost certainly) identical — store one copy and point both references to it.
Git uses this extensively. Every file, directory tree, and commit is identified by its SHA-1 hash. This is how Git detects changes, deduplicates content, and ensures repository integrity.
Cache Invalidation
Web applications hash static assets (JavaScript, CSS) and include the hash in the filename:
app.js → app.3a7b2c.js
styles.css → styles.f8e1d2.css
When the content changes, the hash changes, the filename changes, and the browser fetches the new version instead of serving a stale cache.
Password Storage (Done Right)
You never store passwords directly. Instead, you hash them and store the hash. When a user logs in, you hash their input and compare it to the stored hash.
But plain hashing isn't enough for passwords. You need:
- Salting: adding random data to each password before hashing, so identical passwords produce different hashes
- Key stretching: deliberately slow algorithms (bcrypt, Argon2) that make brute-force attacks impractical
Regular hash functions like SHA-256 are too fast for password storage — an attacker can compute billions of hashes per second. Purpose-built password hashing functions are designed to be slow.
Data Integrity in APIs
API responses sometimes include a hash of the response body. The client recomputes the hash and compares, ensuring nothing was modified in transit (beyond what TLS already provides). This is common in webhook signatures — the sender includes an HMAC (hash-based message authentication code) so the receiver can verify the payload is authentic.
Hashing vs. Encoding vs. Encryption
These three concepts are frequently confused. They're fundamentally different operations:
Hashing is one-way. You can't get the original data back from a hash. Its purpose is verification and identification.
Encoding is reversible and not secret. Base64 encoding converts binary data to text for transport — anyone can decode it. It provides format conversion, not security.
Encryption is reversible with a key. Only someone with the correct key can decrypt the data. It provides confidentiality.
| Operation | Reversible? | Needs a key? | Purpose | |-----------|------------|-------------|---------| | Hashing | No | No | Verify integrity, identify data | | Encoding | Yes | No | Format conversion | | Encryption | Yes | Yes | Confidentiality |
A common mistake: using Base64 encoding to "secure" data. Base64 is not encryption — it's trivially reversible. If you need confidentiality, use proper encryption (AES, ChaCha20).
Checksums in Practice
A checksum is a simplified form of hashing optimized for detecting accidental changes rather than resisting deliberate attacks. CRC32, Adler-32, and simple XOR checksums are faster than cryptographic hashes but provide weaker guarantees.
You encounter checksums in:
- Network protocols (TCP, Ethernet frame check sequences)
- File formats (PNG, ZIP, gzip)
- Data transmission (serial protocols, Modbus)
For verifying file downloads or detecting tampering, use a cryptographic hash (SHA-256). For detecting transmission errors in performance-critical scenarios, a checksum might be appropriate.
Viewing Hashes as Hex
Hash outputs are typically displayed as hexadecimal strings. The reason is readability — a 256-bit hash is 32 bytes, which is 64 hex characters. Much more readable than 256 ones and zeros.
Binary: 11100011 10110000 11000100 01000010 ...
Hex: e3b0c442...
The Hex-Text converter helps you work with hex representations when you need to inspect or manipulate hash values at the byte level.
Try It Yourself
Explore how hashing works hands-on:
- MD5 Hash Generator — compute MD5 digests of any text
- Base64 Encoder — encode data for safe transport (not security!)
- Hex-Text Converter — inspect text at the byte level
All processing happens in your browser. Your data never touches a server.