Why Your Queries Are Slow
Imagine a library with 10 million books but no catalog, no Dewey Decimal system, and no organization whatsoever. Finding a specific book means walking through every aisle, checking every shelf, and examining every spine. That is exactly what your database does when you run a query on a table without proper indexes — it performs a full table scan, examining every single row to find the ones that match your criteria.
Now imagine that same library with a well-organized catalog. You look up the author or title, get a shelf number, and walk directly to the book. That is what an index does for your database. It creates a structured shortcut that allows the query engine to locate data without scanning every row.
That is not a typo. A well-designed index can make queries over 1,000 times faster. The difference between a 4-second page load and a 3-millisecond one is the difference between a frustrated user hitting the back button and a smooth, responsive application.
How B-Tree Indexes Work
The most common index type in both MySQL and PostgreSQL is the B-tree (balanced tree). Understanding its structure helps you design better indexes and predict query performance.
A B-tree is a self-balancing tree structure where each node can contain multiple keys and pointers. Data is sorted, and the tree stays balanced — meaning every leaf node is the same distance from the root. This guarantees logarithmic lookup time: in a table with 10 million rows, a B-tree index can find any row in approximately 3-4 disk reads.
[50, 100]
[25, 37] [75, 88] [125, 150]
Actual row pointers
When you search for WHERE id = 75, the database starts at the root node. It sees that 75 is between 50 and 100, so it follows the middle pointer to the branch node. In the branch node, it finds 75 directly and follows its pointer to the leaf node containing the actual row data. Three steps instead of 10 million scans.
WHERE age BETWEEN 20 AND 30) extremely efficient because the database can follow the linked list of leaf nodes sequentially.
Creating Indexes
Basic Index Creation
CREATE INDEX CONCURRENTLY in production to avoid locking the table during index creation. Without CONCURRENTLY, the table is locked against writes for the entire duration of index building, which can take minutes or hours on large tables.
Unique Indexes
Unique indexes serve dual purposes: they speed up queries (like a regular index) and enforce data integrity (preventing duplicate values). Every table should have a primary key, which automatically creates a unique clustered index.
Understanding EXPLAIN: Your Query Microscope
EXPLAIN is the single most important tool for understanding query performance. It shows you exactly how the database plans to execute your query — which indexes it will use, how many rows it expects to scan, and what type of join or scan it will perform.
MySQL EXPLAIN
Key EXPLAIN Types (Best to Worst)
| Type | Meaning | Performance |
|---|---|---|
system/const | Table has at most 1 row (exact match on primary key) | Best |
eq_ref | One row read from this table for each combination from previous tables | Excellent |
ref | All matching rows read using a non-unique index | Good |
range | Index used for a range of values (BETWEEN, <, >, IN) | Good |
index | Full index scan (reads entire index, but not the table) | Acceptable |
ALL | Full table scan — every row examined | Worst |
PostgreSQL EXPLAIN ANALYZE
PostgreSQL's EXPLAIN ANALYZE actually executes the query and shows real execution times, not just estimates:
From 4,241ms to 48ms — an 88x improvement from a single index.
Composite (Multi-Column) Indexes
Single-column indexes are the basics. Composite indexes — indexes on multiple columns — are where real query optimization happens. They are essential for queries that filter or sort by multiple columns.
(user_id, status, created_at) can be used for queries filtering on user_id alone, or user_id + status, or all three columns. But it CANNOT be used for queries filtering only on status or only on created_at. Think of it like a phone book sorted by last name, then first name — you can look up by last name, or last + first, but not by first name alone.
This principle is called the leftmost prefix rule. Given an index on (A, B, C):
| Query Filter | Uses Index? | Why |
|---|---|---|
WHERE A = 1 | Yes | Leftmost prefix |
WHERE A = 1 AND B = 2 | Yes | First two columns |
WHERE A = 1 AND B = 2 AND C = 3 | Yes | Full index |
WHERE B = 2 | No | Missing leftmost column |
WHERE C = 3 | No | Missing leftmost columns |
WHERE A = 1 AND C = 3 | Partial | Uses A only, C requires filter |
Covering Indexes
A covering index contains all the columns needed to satisfy a query, so the database never needs to read the actual table rows. This eliminates the "table lookup" step and can double or triple performance for frequently run queries.
Find row pointer
Extra disk I/O
All data here!
No table access
Partial Indexes (PostgreSQL)
PostgreSQL supports partial indexes — indexes that only include rows matching a specific condition. This is incredibly useful for queries that always filter by the same condition, like active users or pending orders.
Full-Text Indexes
Regular B-tree indexes are useless for text search queries like WHERE content LIKE '%database optimization%'. Full-text indexes use inverted indexes (mapping words to documents) to make text search efficient.
Index Selectivity: When Indexes Help (and When They Hurt)
Index selectivity is the ratio of distinct values to total rows. High selectivity means the column has many unique values; low selectivity means few unique values. Indexes are most effective on columns with high selectivity.
| Column | Distinct Values | Total Rows | Selectivity | Index Useful? |
|---|---|---|---|---|
| user_id (UUID) | 1,000,000 | 1,000,000 | 1.0 (100%) | Excellent |
| 980,000 | 1,000,000 | 0.98 (98%) | Excellent | |
| city | 15,000 | 1,000,000 | 0.015 (1.5%) | Depends |
| status | 5 | 1,000,000 | 0.000005 | Usually not |
| is_active | 2 | 1,000,000 | 0.000002 | Rarely |
is_active is almost always wasteful. With only two possible values, the database will scan 50% of the table regardless. The exception is when the distribution is extremely skewed — if only 0.1% of rows have is_active = false, a partial index on that condition (PostgreSQL) or a composite index with other selective columns can be useful.
When NOT to Index: The Cost of Over-Indexing
Every index comes with costs. Blindly adding indexes to every column is a common mistake that can actually degrade overall performance.
Write Performance
Every INSERT, UPDATE, and DELETE must update not just the table but every index on that table. A table with 10 indexes means each insert triggers 10 index updates. Write-heavy applications suffer dramatically from over-indexing.
Storage Overhead
Indexes consume disk space. A large composite index can be as big as the table itself. On a 50GB table with 5 large indexes, your total storage might be 200GB+ — 4x the raw data size.
Memory Pressure
The database caches frequently used indexes in memory for performance. More indexes means more memory needed. If indexes exceed available RAM, performance drops as the database reads from disk.
Maintenance Overhead
Indexes can become fragmented over time, requiring periodic REINDEX or OPTIMIZE operations. More indexes means more maintenance. PostgreSQL's autovacuum must process more index pages.
MySQL vs PostgreSQL Index Types
| Index Type | MySQL | PostgreSQL | Best For |
|---|---|---|---|
| B-tree | Default | Default | Equality, range, sorting |
| Hash | Memory engine | Supported | Exact equality only |
| Full-text | FULLTEXT | GIN + tsvector | Text search |
| Spatial | R-tree (SPATIAL) | GiST | Geometric/geographic data |
| GIN | N/A | Supported | Arrays, JSONB, full-text |
| BRIN | N/A | Supported | Large tables with natural order (timestamps) |
| Partial | N/A | Supported | Subset of rows matching a condition |
| Expression | Generated columns | Native | Computed values (LOWER, DATE_TRUNC) |
Detecting and Removing Unused Indexes
Over time, tables accumulate indexes that were added for queries that no longer run, or for optimization attempts that did not pan out. These unused indexes consume resources for zero benefit.
idx_scan = 0 might be used by a monthly report, quarterly backup, or seasonal process. Check at least 30-60 days of usage data before removing an index. Consider disabling it temporarily (if your database supports it) before permanently dropping it.
Real-World Optimization Example
Let us walk through a complete optimization of a slow query, from diagnosis to solution:
Index Best Practices Checklist
- Always index foreign key columns (user_id, order_id, etc.)
- Index columns used in WHERE clauses of frequent queries
- Index columns used in ORDER BY and GROUP BY
- Use composite indexes for queries filtering on multiple columns
- Put the most selective column first in composite indexes
- Use covering indexes for read-heavy, frequently run queries
- Use partial indexes (PostgreSQL) for queries with constant filters
- Run EXPLAIN on every query that touches large tables
- Monitor and remove unused indexes quarterly
- Avoid indexing columns with very low selectivity (boolean, enum with few values)
- Consider the write-to-read ratio before adding indexes
- Keep total indexes per table reasonable (5-8 is typical)
Key Takeaways
Database indexing is the single most impactful performance optimization you can make to a data-driven application. A well-designed index turns a 4-second full table scan into a 3-millisecond direct lookup. The key is understanding how B-trees work, using EXPLAIN to diagnose slow queries, designing composite indexes that match your query patterns, and avoiding over-indexing that degrades write performance.
Start with the slow query log. Identify the queries that consume the most time. Run EXPLAIN on each one. Create targeted indexes that match the WHERE, ORDER BY, and JOIN conditions. Verify the improvement. Then monitor ongoing to remove indexes that are no longer used. This iterative cycle of measure, optimize, verify is the path to a consistently fast database.