Database Indexing Explained: Speed Up Queries by 100x

Why Your Queries Are Slow

Imagine a library with 10 million books but no catalog, no Dewey Decimal system, and no organization whatsoever. Finding a specific book means walking through every aisle, checking every shelf, and examining every spine. That is exactly what your database does when you run a query on a table without proper indexes — it performs a full table scan, examining every single row to find the ones that match your criteria.

Now imagine that same library with a well-organized catalog. You look up the author or title, get a shelf number, and walk directly to the book. That is what an index does for your database. It creates a structured shortcut that allows the query engine to locate data without scanning every row.

4,200ms

Without index (full table scan on 5M rows)

3ms

With proper index (B-tree lookup)

That is not a typo. A well-designed index can make queries over 1,000 times faster. The difference between a 4-second page load and a 3-millisecond one is the difference between a frustrated user hitting the back button and a smooth, responsive application.

How B-Tree Indexes Work

The most common index type in both MySQL and PostgreSQL is the B-tree (balanced tree). Understanding its structure helps you design better indexes and predict query performance.

A B-tree is a self-balancing tree structure where each node can contain multiple keys and pointers. Data is sorted, and the tree stays balanced — meaning every leaf node is the same distance from the root. This guarantees logarithmic lookup time: in a table with 10 million rows, a B-tree index can find any row in approximately 3-4 disk reads.

Root Node
[50, 100]

→

Branch Nodes
[25, 37] [75, 88] [125, 150]

→

Leaf Nodes
Actual row pointers

When you search for WHERE id = 75, the database starts at the root node. It sees that 75 is between 50 and 100, so it follows the middle pointer to the branch node. In the branch node, it finds 75 directly and follows its pointer to the leaf node containing the actual row data. Three steps instead of 10 million scans.

B-tree vs B+tree: Most databases (MySQL InnoDB, PostgreSQL) actually use B+trees, where all data pointers are in leaf nodes and leaf nodes are linked together. This makes range queries (WHERE age BETWEEN 20 AND 30) extremely efficient because the database can follow the linked list of leaf nodes sequentially.

Creating Indexes

Basic Index Creation

-- MySQL CREATE INDEX idx_users_email ON users(email); CREATE INDEX idx_orders_user_id ON orders(user_id); CREATE INDEX idx_orders_created ON orders(created_at); -- PostgreSQL (same syntax) CREATE INDEX idx_users_email ON users(email); CREATE INDEX CONCURRENTLY idx_orders_status ON orders(status);

PostgreSQL Tip: Use CREATE INDEX CONCURRENTLY in production to avoid locking the table during index creation. Without CONCURRENTLY, the table is locked against writes for the entire duration of index building, which can take minutes or hours on large tables.

Unique Indexes

-- Enforce uniqueness + create index CREATE UNIQUE INDEX idx_users_email_unique ON users(email); -- Equivalent to adding a UNIQUE constraint ALTER TABLE users ADD CONSTRAINT uq_email UNIQUE (email);

Unique indexes serve dual purposes: they speed up queries (like a regular index) and enforce data integrity (preventing duplicate values). Every table should have a primary key, which automatically creates a unique clustered index.

Understanding EXPLAIN: Your Query Microscope

EXPLAIN is the single most important tool for understanding query performance. It shows you exactly how the database plans to execute your query — which indexes it will use, how many rows it expects to scan, and what type of join or scan it will perform.

MySQL EXPLAIN

-- Before index: full table scan EXPLAIN SELECT * FROM orders WHERE customer_email = '[email protected]'; +----+------+------+---------+------+----------+-------------+ | id | type | key | key_len | rows | filtered | Extra | +----+------+------+---------+------+----------+-------------+ | 1 | ALL | NULL | NULL | 5.2M | 10.00 | Using where | +----+------+------+---------+------+----------+-------------+ -- type=ALL means full table scan. 5.2 million rows scanned! CREATE INDEX idx_orders_email ON orders(customer_email); -- After index: direct lookup EXPLAIN SELECT * FROM orders WHERE customer_email = '[email protected]'; +----+------+------------------+---------+------+----------+-------+ | id | type | key | key_len | rows | filtered | Extra | +----+------+------------------+---------+------+----------+-------+ | 1 | ref | idx_orders_email | 767 | 12 | 100.00 | | +----+------+------------------+---------+------+----------+-------+ -- type=ref using the index. Only 12 rows scanned!

Key EXPLAIN Types (Best to Worst)

Type	Meaning	Performance
`system/const`	Table has at most 1 row (exact match on primary key)	Best
`eq_ref`	One row read from this table for each combination from previous tables	Excellent
`ref`	All matching rows read using a non-unique index	Good
`range`	Index used for a range of values (BETWEEN, <, >, IN)	Good
`index`	Full index scan (reads entire index, but not the table)	Acceptable
`ALL`	Full table scan — every row examined	Worst

PostgreSQL EXPLAIN ANALYZE

PostgreSQL's EXPLAIN ANALYZE actually executes the query and shows real execution times, not just estimates:

EXPLAIN ANALYZE SELECT * FROM orders WHERE status = 'pending'; Seq Scan on orders (cost=0.00..185432.00 rows=5200000 width=120) (actual time=0.023..4215.432 rows=52341 loops=1) Filter: (status = 'pending') Rows Removed by Filter: 4947659 Planning Time: 0.124 ms Execution Time: 4241.883 ms CREATE INDEX idx_orders_status ON orders(status); EXPLAIN ANALYZE SELECT * FROM orders WHERE status = 'pending'; Index Scan using idx_orders_status on orders (cost=0.43..1234.56 rows=52341 width=120) (actual time=0.032..42.156 rows=52341 loops=1) Planning Time: 0.198 ms Execution Time: 48.234 ms

From 4,241ms to 48ms — an 88x improvement from a single index.

Composite (Multi-Column) Indexes

Single-column indexes are the basics. Composite indexes — indexes on multiple columns — are where real query optimization happens. They are essential for queries that filter or sort by multiple columns.

-- Query that filters on two columns SELECT * FROM orders WHERE user_id = 42 AND status = 'shipped' ORDER BY created_at DESC; -- Composite index that covers this query perfectly CREATE INDEX idx_orders_user_status_created ON orders(user_id, status, created_at DESC);

Column Order Matters: A composite index on (user_id, status, created_at) can be used for queries filtering on user_id alone, or user_id + status, or all three columns. But it CANNOT be used for queries filtering only on status or only on created_at. Think of it like a phone book sorted by last name, then first name — you can look up by last name, or last + first, but not by first name alone.

This principle is called the leftmost prefix rule. Given an index on (A, B, C):

Query Filter	Uses Index?	Why
`WHERE A = 1`	Yes	Leftmost prefix
`WHERE A = 1 AND B = 2`	Yes	First two columns
`WHERE A = 1 AND B = 2 AND C = 3`	Yes	Full index
`WHERE B = 2`	No	Missing leftmost column
`WHERE C = 3`	No	Missing leftmost columns
`WHERE A = 1 AND C = 3`	Partial	Uses A only, C requires filter

Covering Indexes

A covering index contains all the columns needed to satisfy a query, so the database never needs to read the actual table rows. This eliminates the "table lookup" step and can double or triple performance for frequently run queries.

-- This query only needs id, email, and name SELECT id, email, name FROM users WHERE email = '[email protected]'; -- Covering index: includes all selected columns CREATE INDEX idx_users_email_covering ON users(email, name, id); -- MySQL EXPLAIN shows "Using index" — no table access needed EXPLAIN SELECT id, email, name FROM users WHERE email = '[email protected]'; Extra: Using index ← This means covering index is used!

Without Covering Index

→

Search Index
Find row pointer

→

Read Table Row
Extra disk I/O

With Covering Index

→

Search Index
All data here!

→

Done
No table access

Partial Indexes (PostgreSQL)

PostgreSQL supports partial indexes — indexes that only include rows matching a specific condition. This is incredibly useful for queries that always filter by the same condition, like active users or pending orders.

-- Full index on status: indexes ALL rows CREATE INDEX idx_orders_status ON orders(status); -- Size: ~200MB for 5M rows -- Partial index: only indexes pending orders (1% of rows) CREATE INDEX idx_orders_pending ON orders(created_at) WHERE status = 'pending'; -- Size: ~2MB! 100x smaller, 100x faster to maintain -- Query that benefits from the partial index SELECT * FROM orders WHERE status = 'pending' ORDER BY created_at;

Partial Index Power: If you have a 10-million-row orders table where only 1% are "pending," a partial index on pending orders is 100x smaller than a full index. It is faster to query, faster to maintain during inserts, and takes less storage. Use partial indexes whenever your queries consistently filter on a condition that selects a small subset of rows.

Full-Text Indexes

Regular B-tree indexes are useless for text search queries like WHERE content LIKE '%database optimization%'. Full-text indexes use inverted indexes (mapping words to documents) to make text search efficient.

-- MySQL Full-Text Index ALTER TABLE articles ADD FULLTEXT INDEX ft_content (title, body); SELECT * FROM articles WHERE MATCH(title, body) AGAINST('database indexing' IN NATURAL LANGUAGE MODE); -- PostgreSQL Full-Text Search CREATE INDEX idx_articles_fts ON articles USING GIN(to_tsvector('english', title || ' ' || body)); SELECT * FROM articles WHERE to_tsvector('english', title || ' ' || body) @@ to_tsquery('english', 'database & indexing');

Index Selectivity: When Indexes Help (and When They Hurt)

Index selectivity is the ratio of distinct values to total rows. High selectivity means the column has many unique values; low selectivity means few unique values. Indexes are most effective on columns with high selectivity.

Column	Distinct Values	Total Rows	Selectivity	Index Useful?
user_id (UUID)	1,000,000	1,000,000	1.0 (100%)	Excellent
email	980,000	1,000,000	0.98 (98%)	Excellent
city	15,000	1,000,000	0.015 (1.5%)	Depends
status	5	1,000,000	0.000005	Usually not
is_active	2	1,000,000	0.000002	Rarely

The Boolean Column Trap: An index on a boolean column like is_active is almost always wasteful. With only two possible values, the database will scan 50% of the table regardless. The exception is when the distribution is extremely skewed — if only 0.1% of rows have is_active = false, a partial index on that condition (PostgreSQL) or a composite index with other selective columns can be useful.

When NOT to Index: The Cost of Over-Indexing

Every index comes with costs. Blindly adding indexes to every column is a common mistake that can actually degrade overall performance.

Write Performance

Every INSERT, UPDATE, and DELETE must update not just the table but every index on that table. A table with 10 indexes means each insert triggers 10 index updates. Write-heavy applications suffer dramatically from over-indexing.

Storage Overhead

Indexes consume disk space. A large composite index can be as big as the table itself. On a 50GB table with 5 large indexes, your total storage might be 200GB+ — 4x the raw data size.

Memory Pressure

The database caches frequently used indexes in memory for performance. More indexes means more memory needed. If indexes exceed available RAM, performance drops as the database reads from disk.

Maintenance Overhead

Indexes can become fragmented over time, requiring periodic REINDEX or OPTIMIZE operations. More indexes means more maintenance. PostgreSQL's autovacuum must process more index pages.

MySQL vs PostgreSQL Index Types

Index Type	MySQL	PostgreSQL	Best For
B-tree	Default	Default	Equality, range, sorting
Hash	Memory engine	Supported	Exact equality only
Full-text	FULLTEXT	GIN + tsvector	Text search
Spatial	R-tree (SPATIAL)	GiST	Geometric/geographic data
GIN	N/A	Supported	Arrays, JSONB, full-text
BRIN	N/A	Supported	Large tables with natural order (timestamps)
Partial	N/A	Supported	Subset of rows matching a condition
Expression	Generated columns	Native	Computed values (LOWER, DATE_TRUNC)

Detecting and Removing Unused Indexes

Over time, tables accumulate indexes that were added for queries that no longer run, or for optimization attempts that did not pan out. These unused indexes consume resources for zero benefit.

-- PostgreSQL: Find unused indexes SELECT schemaname, tablename, indexname, idx_scan, pg_size_pretty(pg_relation_size(indexrelid)) as index_size FROM pg_stat_user_indexes WHERE idx_scan = 0 ORDER BY pg_relation_size(indexrelid) DESC; tablename | indexname | idx_scan | index_size -----------+------------------------+----------+----------- orders | idx_orders_old_status | 0 | 245 MB users | idx_users_created_2023 | 0 | 128 MB logs | idx_logs_level | 0 | 89 MB -- MySQL: Check index usage stats SELECT object_schema, object_name, index_name, count_star FROM performance_schema.table_io_waits_summary_by_index_usage WHERE index_name IS NOT NULL AND count_star = 0 ORDER BY object_schema, object_name;

Wait Before Dropping: An index with idx_scan = 0 might be used by a monthly report, quarterly backup, or seasonal process. Check at least 30-60 days of usage data before removing an index. Consider disabling it temporarily (if your database supports it) before permanently dropping it.

Real-World Optimization Example

Let us walk through a complete optimization of a slow query, from diagnosis to solution:

Identify the slow query from your slow query log or application monitoring.

-- The slow query (3.8 seconds) SELECT o.id, o.total, u.name, u.email FROM orders o JOIN users u ON u.id = o.user_id WHERE o.status = 'pending' AND o.created_at > '2026-01-01' ORDER BY o.created_at DESC LIMIT 50;

Run EXPLAIN to understand the current execution plan.

EXPLAIN ANALYZE SELECT ... Seq Scan on orders (actual time=0.02..3812.45 rows=52341) Filter: status='pending' AND created_at > '2026-01-01' Rows Removed by Filter: 4,947,659 ^^^ Scanning 5 million rows to find 52 thousand!

Create a targeted composite index matching the query's WHERE and ORDER BY.

CREATE INDEX idx_orders_status_created ON orders(status, created_at DESC);

Verify the improvement with EXPLAIN ANALYZE again.

EXPLAIN ANALYZE SELECT ... Index Scan using idx_orders_status_created on orders (actual time=0.03..12.45 rows=50) Index Cond: status='pending' AND created_at > '2026-01-01' ^^^ 50 rows from index, no table scan. 3812ms → 12ms!

Index Best Practices Checklist

Always index foreign key columns (user_id, order_id, etc.)
Index columns used in WHERE clauses of frequent queries
Index columns used in ORDER BY and GROUP BY
Use composite indexes for queries filtering on multiple columns
Put the most selective column first in composite indexes
Use covering indexes for read-heavy, frequently run queries
Use partial indexes (PostgreSQL) for queries with constant filters
Run EXPLAIN on every query that touches large tables
Monitor and remove unused indexes quarterly
Avoid indexing columns with very low selectivity (boolean, enum with few values)
Consider the write-to-read ratio before adding indexes
Keep total indexes per table reasonable (5-8 is typical)

Key Takeaways

Database indexing is the single most impactful performance optimization you can make to a data-driven application. A well-designed index turns a 4-second full table scan into a 3-millisecond direct lookup. The key is understanding how B-trees work, using EXPLAIN to diagnose slow queries, designing composite indexes that match your query patterns, and avoiding over-indexing that degrades write performance.

Start with the slow query log. Identify the queries that consume the most time. Run EXPLAIN on each one. Create targeted indexes that match the WHERE, ORDER BY, and JOIN conditions. Verify the improvement. Then monitor ongoing to remove indexes that are no longer used. This iterative cycle of measure, optimize, verify is the path to a consistently fast database.