Mysql optimization

From wikinotes
Revision as of 03:11, 3 September 2022 by Will (talk | contribs) (→‎Basics)

Documentation

MySQL-8.0: Explain Output Docs https://dev.mysql.com/doc/refman/8.0/en/explain-output.html

Mental Model of a Query

Basics

An index


Queries use one index per table/join.


Additional WHERE statements (even on indexes) involve full table scans of that selection result.
You'll see this represented in the filtered percentage of EXPLAIN.
While these reduce the number of results, they might make the query more expensive than a broader one.

In very large tables, you may benefit from:

multi-column-indexes

/* multi-column indexes */
CREATE INDEX projects_on_team_and_status
ON projects (team_id, status);

a broad, cursor-based outer-query, and a specific inner-query

# outer-loop -- broadly iterates over batches of projects
#               uses 'id' index here to quickly fetch results
def iterate_over_projects_in_batches():
    iteration = 0
    while True:
        outer_query = '''
            SELECT id
            FROM projects
            WHERE id > ?
            ORDER BY id
            LIMIT 1000
            '''
        yield cursor.execute(outer_query, iteration)


# inner-loop -- uses full-result scan of small number of items
#               doing this query from the top would be slow.
def select_specific_projects_from_batch(project_ids):
    inner_query = '''
        SELECT *
        FROM projects
        WHERE id IN (?)
        AND status = "active"
        AND created_by "foo@user.com"
        AND created_at > "1970-01-01"
        ''''
     for project in cursor.execute(inner_query, project_ids):
         do_thing(project)

# general flow
for project_batch in iterate_over_projects_in_batches():
    select_specific_projects_from_batch(project_batch)




Explain

Typing the word EXPLAIN before a query, you can get a breakdown of the cost of a query.

rows (smaller is better)
Probably the main indicator of a bad query.
The number of rows that will need to be searched to return a result.
Querying on an index can reduce this.
filtered (smaller is better)
The percentage of rows we expect the query narrows the search down to.
(filtered/100) * rows gives you an estimate of the number of rows that will be searched.
I think this might be impacted by WHERE conditions, but uncertain.
possible_keys Columns that are queried on. You may want these to be indexes.
extra Extra information about the query. Using temporary indicates the use of a temporary table, which is slow.
ref ??
key_len not totally certain, related to the bitwidth of the index I think.

Timing Queries

Test adding Index

Rather than a JOIN or IN/NOT IN,
substitute values for the primary-key values and time it.

Limit Records

The more records included, the slower the query. If you can limit by a date-range that helps a lot.