iqtoolkit journal
Understanding PostgreSQL TOAST: The 8KB Block Constraint and Oversized Data Storage
Understanding PostgreSQL TOAST: The 8KB Block Constraint and Oversized Data Storage
If you've worked with PostgreSQL for any length of time, you've probably encountered scenarios where you need to store large text fields, JSON documents, or binary data. Behind the scenes, PostgreSQL uses a sophisticated mechanism called TOAST (The Oversized-Attribute Storage Technique) to handle values that don't fit within its fundamental storage unit: the 8KB block.
Understanding TOAST isn't just academic—it directly impacts query performance, storage efficiency, and how you should design your schema for large data types.
The 8KB Block: PostgreSQL's Fundamental Constraint
PostgreSQL organizes all table data into fixed-size pages (also called blocks), which are 8KB (8192 bytes) by default. This is a compile-time constant that defines how PostgreSQL reads and writes data to disk.
Why 8KB?
The 8KB block size is a carefully chosen compromise:
- Small enough to minimize I/O overhead when accessing individual rows
- Large enough to store multiple rows per block for better disk utilization
- Aligned with OS page sizes for efficient memory management
- Matches common filesystem block sizes for optimized disk operations
The Problem: What Happens When Data Exceeds 8KB?
A single PostgreSQL row must fit within a single 8KB page. But what happens when you try to insert:
- A large TEXT column with a 50KB document?
- A BYTEA field storing a 5MB image?
- A JSONB column containing a 100KB configuration object?
Without TOAST, PostgreSQL would either:
- Fail the insert (unacceptable)
- Require variable-sized pages (complex, slow)
- Chain pages together (fragmentation nightmare)
Enter TOAST.
What is TOAST?
TOAST (The Oversized-Attribute Storage Technique) is PostgreSQL's solution for storing values that exceed the block size. It's an automatic, transparent mechanism that:
- Compresses large values to try fitting them in the main table
- Moves oversized values to a separate TOAST table if compression isn't enough
- Chunks extremely large values into multiple TOAST rows
- Maintains references in the main table pointing to TOAST storage
TOAST Architecture
Every table with potentially large columns gets a companion TOAST table automatically created:
-- Main table
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
title TEXT,
content TEXT, -- Can be large, eligible for TOAST
metadata JSONB -- Can be large, eligible for TOAST
);
-- PostgreSQL automatically creates:
-- pg_toast.pg_toast_<oid> (the TOAST table)
-- pg_toast.pg_toast_<oid>_index (index for TOAST chunks)
The TOAST table is hidden from normal queries but stores oversized attribute values in chunks.
When Does TOAST Activate?
TOAST doesn't activate for every value—only when necessary. Here's the decision process:
1. Compression First (TOAST Strategy: EXTENDED)
PostgreSQL attempts to compress the value using LZ compression. If the compressed value fits within the page, no TOAST storage is needed.
-- Example: Large text that compresses well
INSERT INTO documents (content)
VALUES (repeat('PostgreSQL is awesome! ', 5000));
-- This might compress enough to stay in the main table
2. Out-of-line Storage (When Compression Fails)
If compression doesn't reduce the value below ~2KB (the threshold), PostgreSQL moves the value to the TOAST table.
3. The 2KB Threshold
PostgreSQL uses a 2KB (2048 bytes) heuristic:
- Values < 2KB: Stay inline in the main table
- Values ≥ 2KB: Eligible for compression or TOASTing
- After compression, if still > ~2KB → moved to TOAST table
4. Maximum Row Size
Even with TOAST, PostgreSQL enforces limits:
- Main table row header: ~24 bytes
- Inline data (before TOAST): ~2KB per column
- Total uncompressed row: Must fit calculation before TOAST activation
TOAST Storage Strategies
PostgreSQL supports four TOAST strategies per column:
1. PLAIN (No TOAST)
ALTER TABLE documents
ALTER COLUMN id SET STORAGE PLAIN;
- No compression, no out-of-line storage
- Only for fixed-size types (INTEGER, BIGINT, etc.)
- Cannot be changed for variable-length types
2. EXTENDED (Default for Most Types)
ALTER TABLE documents
ALTER COLUMN content SET STORAGE EXTENDED;
- Compression first, then out-of-line storage if needed
- Default for TEXT, BYTEA, JSONB
- Best balance of performance and storage
3. EXTERNAL
ALTER TABLE documents
ALTER COLUMN content SET STORAGE EXTERNAL;
- Out-of-line storage without compression
- Useful when data is already compressed (JPEG, GZIP files)
- Faster access since no decompression needed
4. MAIN
ALTER TABLE documents
ALTER COLUMN content SET STORAGE MAIN;
- Compression allowed, but prefers inline storage
- Avoids TOAST table unless absolutely necessary
- Can hurt performance if large values stay inline
How TOAST Works: Step-by-Step
Let's trace what happens when you insert a large value:
Example: Inserting a 50KB Document
CREATE TABLE articles (
id SERIAL PRIMARY KEY,
title VARCHAR(200),
body TEXT -- EXTENDED storage by default
);
INSERT INTO articles (title, body)
VALUES ('Large Article', repeat('Lorem ipsum...', 10000));
-- Generates ~50KB of text
Step 1: Initial Size Check
- PostgreSQL sees
bodycolumn is 50KB - Exceeds the 2KB inline threshold
- TOAST activation triggered
Step 2: Compression Attempt
- Applies LZ compression to the 50KB value
- Assume compression reduces it to 15KB (still too large)
Step 3: Move to TOAST Table
- Splits the 15KB compressed value into 2KB chunks
- Stores chunks in
pg_toast.pg_toast_<oid>:
Chunk 1: [2KB of compressed data]
Chunk 2: [2KB of compressed data]
Chunk 3: [2KB of compressed data]
...
Chunk 8: [Remaining compressed data]
Step 4: Store Reference in Main Table
- Main table
articlesstores only a TOAST pointer (~18 bytes) - Pointer contains: TOAST table OID, chunk ID, length
Reading TOAST Data
When you query the row:
SELECT body FROM articles WHERE id = 1;
PostgreSQL:
- Reads the main table row (~200 bytes)
- Detects TOAST pointer in
bodycolumn - Follows pointer to TOAST table
- Reads all 8 chunks sequentially
- Reassembles and decompresses
- Returns the original 50KB value
Performance implication: TOAST reads require additional I/O operations.
Performance Implications
1. Sequential Scans Benefit
SELECT id, title FROM articles; -- Fast
-- Only reads main table, no TOAST access needed
When you don't access TOASTed columns, PostgreSQL skips TOAST reads entirely.
2. Selective Column Access
SELECT id, title, body FROM articles WHERE id = 1; -- Slower
-- Must fetch and decompress TOAST data
Fetching TOASTed columns adds overhead.
3. Index-Only Scans Break
CREATE INDEX idx_title ON articles(title);
SELECT title FROM articles WHERE title LIKE 'Large%';
-- If TOAST columns exist, may not be index-only scan
TOAST columns in the table can prevent index-only scans.
4. UPDATE Performance
UPDATE articles SET title = 'Updated' WHERE id = 1;
-- Even if not touching `body`, PostgreSQL may rewrite TOAST data
Updates can trigger TOAST rewrites if the row is moved to a new page.
Best Practices for Working with TOAST
1. Separate Large Columns into Different Tables
-- Instead of:
CREATE TABLE articles (
id SERIAL PRIMARY KEY,
title TEXT,
body TEXT, -- Large
summary TEXT
);
-- Consider:
CREATE TABLE articles (
id SERIAL PRIMARY KEY,
title TEXT,
summary TEXT
);
CREATE TABLE article_bodies (
article_id INTEGER PRIMARY KEY REFERENCES articles(id),
body TEXT -- TOAST isolated
);
Benefit: Queries on articles avoid TOAST overhead unless explicitly joined.
2. Use EXTERNAL for Pre-Compressed Data
CREATE TABLE images (
id SERIAL PRIMARY KEY,
filename TEXT,
data BYTEA -- Stores JPEG/PNG
);
ALTER TABLE images ALTER COLUMN data SET STORAGE EXTERNAL;
-- Avoids double compression
3. Monitor TOAST Table Growth
-- Check TOAST table size
SELECT
t.schemaname,
t.tablename,
pg_size_pretty(pg_total_relation_size(format('%I.%I', t.schemaname, t.tablename)::regclass)) AS total_size,
pg_size_pretty(pg_relation_size(format('%I.%I', t.schemaname, t.tablename)::regclass)) AS table_size,
pg_size_pretty(pg_total_relation_size(c.reltoastrelid)) AS toast_size
FROM pg_tables t
JOIN pg_class c
ON c.relname = t.tablename
JOIN pg_namespace n
ON n.oid = c.relnamespace
AND n.nspname = t.schemaname
WHERE t.schemaname = 'public'
AND c.reltoastrelid <> 0
ORDER BY pg_total_relation_size(c.reltoastrelid) DESC;
4. Limit Column Size When Possible
-- Instead of TEXT (unlimited):
ALTER TABLE articles ADD COLUMN summary VARCHAR(500);
-- Guarantees inline storage if < 2KB
5. Consider JSONB Compression
CREATE TABLE configs (
id SERIAL PRIMARY KEY,
settings JSONB -- EXTENDED by default, compresses well
);
-- JSONB benefits from TOAST compression for large documents
Common Pitfalls
1. Assuming Small TOAST Values Are Cached
TOAST values are not cached separately—they're part of the buffer pool but require full chunk retrieval.
2. Forgetting TOAST on Partitioned Tables
Each partition gets its own TOAST table. Monitor all partitions.
3. Not Accounting for TOAST in Backup Sizes
TOAST tables can be larger than the main table. Always check total relation size.
Conclusion
PostgreSQL's TOAST mechanism elegantly solves the 8KB block size limitation by:
- Compressing large values first
- Chunking and storing oversized data out-of-line
- Maintaining transparent access through pointer indirection
Understanding TOAST helps you:
- Design better schemas (separate large columns)
- Write faster queries (avoid TOASTed columns when not needed)
- Monitor storage growth (track TOAST table sizes)
- Optimize updates (minimize TOAST rewrites)
The next time you insert a large value into PostgreSQL, you'll know exactly what's happening under the hood—and how to design around it for optimal performance.
Want to dive deeper into PostgreSQL internals? Check out PostgreSQL Internals Mastery Volume I for comprehensive coverage of storage architecture, memory management, and advanced optimization techniques.
Have questions about TOAST or PostgreSQL performance? Reach out on Twitter or Mastodon.