Mysql optimization
From wikinotes
Documentation
MySQL-8.0: Explain Output Docs https://dev.mysql.com/doc/refman/8.0/en/explain-output.html
Mental Model of a Query
Basics
An index
Queries use one index per table/join.
AdditionalWHERE
statements (even on indexes) involve full table scans of that selection result.
You'll see this represented in thefiltered
percentage ofEXPLAIN
.
While these reduce the number of results, they might make the query more expensive than a broader one.
In very large tables, you may benefit from:
multi-column-indexes
/* multi-column indexes */ CREATE INDEX projects_on_team_and_status ON projects (team_id, status);a broad, cursor-based outer-query, and a specific inner-query
# outer-loop -- broadly iterates over batches of projects # uses 'id' index here to quickly fetch results def iterate_over_projects_in_batches(): iteration = 0 while True: outer_query = ''' SELECT id FROM projects WHERE id > ? ORDER BY id LIMIT 1000 ''' yield cursor.execute(outer_query, iteration) # inner-loop -- uses full-result scan of small number of items # doing this query from the top would be slow. def select_specific_projects_from_batch(project_ids): inner_query = ''' SELECT * FROM projects WHERE id IN (?) AND status = "active" AND created_by "foo@user.com" AND created_at > "1970-01-01" '''' for project in cursor.execute(inner_query, project_ids): do_thing(project) # general flow for project_batch in iterate_over_projects_in_batches(): select_specific_projects_from_batch(project_batch)
Explain
Typing the word
EXPLAIN
before a query, you can get a breakdown of the cost of a query.
rows
(smaller is better)
Probably the main indicator of a bad query.
The number of rows that will need to be searched to return a result.
Querying on an index can reduce this.filtered
(smaller is better)
The percentage ofrows
we expect the query narrows the search down to.(filtered/100) * rows
gives you an estimate of the number of rows that will be searched.
I think this might be impacted byWHERE
conditions, but uncertain.possible_keys
Columns that are queried on. You may want these to be indexes. extra
Extra information about the query. Using temporary
indicates the use of a temporary table, which is slow.ref
?? key_len
not totally certain, related to the bitwidth of the index I think.
Timing Queries
Test adding Index
Rather than a JOIN
or IN/NOT IN
,
substitute values for the primary-key values and time it.
Limit Records
The more records included, the slower the query. If you can limit by a date-range that helps a lot.