Mysql indexes: Difference between revisions

From wikinotes
Line 45: Line 45:
where the values of multiple columns are concatenated together to form one string.
where the values of multiple columns are concatenated together to form one string.


'''order'''<br>
=== Index Column Order ===
<blockquote>
The order of the columns is significant,<br>
The order of the columns is significant,<br>
there are debates over the optimal ordering, generally least-specific to most-specific is correct.<br>
there are debates over the optimal ordering, generally least-specific to most-specific is correct.<br>
Line 62: Line 63:
# avoiding the write-cost of adding a row-change to 2x indexes.
# avoiding the write-cost of adding a row-change to 2x indexes.
</syntaxhighlight>
</syntaxhighlight>
</blockquote><!-- Index Column Order -->


'''in single-column queries'''<br>
=== Single-Column Queries ===
<blockquote>
You can take advantage of a multi-column index for single-column queries,<br>
You can take advantage of a multi-column index for single-column queries,<br>
if your query includes items in the leftmost-to-rightmost order.
if your query includes items in the leftmost-to-rightmost order.
Line 76: Line 79:
SELECT ... WHERE proince = "ON";                    # CANNOT use index, queries 2nd column without 1st
SELECT ... WHERE proince = "ON";                    # CANNOT use index, queries 2nd column without 1st
</syntaxhighlight>
</syntaxhighlight>
</blockquote><!-- Single-Column Queries -->


'''beware of range queries'''<br>
=== Beware of Range Queries ===
<blockquote>
As soon as you hit a fork in the index, you can no longer use it for later columns.
As soon as you hit a fork in the index, you can no longer use it for later columns.


Line 88: Line 93:
SELECT .. WHERE country="CA" AND province IN ("NB", "NS") AND city="Foo"  # uses index, but only (1st, 2nd column)
SELECT .. WHERE country="CA" AND province IN ("NB", "NS") AND city="Foo"  # uses index, but only (1st, 2nd column)
</syntaxhighlight>
</syntaxhighlight>
</blockquote><!-- Beware of Range Queries -->
</blockquote><!-- Multi-Column Index -->
</blockquote><!-- Multi-Column Index -->
</blockquote><!-- Index Types -->
</blockquote><!-- Index Types -->

Revision as of 16:11, 24 September 2022

Indexes use various datastructures to store data to prevent full scans of all rows in the table.

Documentation

CREATE INDEX docs https://dev.mysql.com/doc/refman/5.7/en/create-index.html

Usage

# create index
CREATE INDEX index_users_on_created_at ON (users);

# show indexes
SHOW CREATE TABLE users
SHOW INDEX FROM users

# force FROM to use index
SELECT ..
FROM users FORCE INDEX (index_users_on_created_at)
...;

# force JOIN to use index
SELECT ..
FROM ..
JOIN projects FORCE INDEX (index_projects_on_user_id)
ON ...

Index Types

Index

A regular index, on a single column. Defaults to BTREE.

Multi-Column Index

A multi-column index is a regular index,
where the values of multiple columns are concatenated together to form one string.

Index Column Order

The order of the columns is significant,
there are debates over the optimal ordering, generally least-specific to most-specific is correct.
see https://dba.stackexchange.com/questions/160649/composite-indexes-most-selective-column-first/210122#210122

- (unspecific -> specific)   (ex. id, province, country)
  - arguably most scalable. you can reuse the same index when breaking up query into batches.

- (specific -> unspecfic)
  - creates larger decision-tree.
  - more work on initial query, making it difficult to decompose into multiple queries.
  - less likely to select on most-specific rows alone


# but perhaps it is best to consider which columns will be used without the others (likely least specific)
# avoiding the write-cost of adding a row-change to 2x indexes.

Single-Column Queries

You can take advantage of a multi-column index for single-column queries,
if your query includes items in the leftmost-to-rightmost order.

For example

# we have a multi-column index on (country, province, city)

SELECT ... WHERE country = "CA";                     # CAN use index, query on leftmost column
SELECT ... WHERE country = "CA" AND province = "ON"  # CAN use index, query on each of the two leftmost columns

SELECT ... WHERE proince = "ON";                     # CANNOT use index, queries 2nd column without 1st

Beware of Range Queries

As soon as you hit a fork in the index, you can no longer use it for later columns.

For example

INDEX (country, province, city)

SELECT .. WHERE country IN ("CA", "US");                                  # uses index (1st-column)
SELECT .. WHERE country="CA" AND province IN ("NB", "NS")                 # uses index (1st, 2nd column)
SELECT .. WHERE country="CA" AND province IN ("NB", "NS") AND city="Foo"  # uses index, but only (1st, 2nd column)

Index Datastructure Types

BTree

A binary tree. Default for persisted storage engines.

Optimize Queries that are

- sorted
- match on field prefixes (but not suffixes)
- range bound (less than 10, between C and F)

But not queries

- match on field suffixes

HASH

A hash table. Default for the memory engine.
hash-collisions are accounted for, but each key with the same hash will need to be checked, making the query more expensive.
Indexes have a small memory footprint.

Optimize queries that are

- based on the full value (ex. 'IN, NOT IN, =')

But not queries that

- match on only a single key from a multi-key index
  (since both keys are hashed together)
- are range bound. (ex. less than 10, betweeen C and F)
  (hash-tables are un-ordered, so range queries are not optimized)

FULLTEXT

Fulltext indexes are optimized for finding keywords within blocks of text.
They are used when the MATCH AGAINST ".." operator is used.
It may still be beneficial to have a second index with a different type on the same column.

Index Strategies

Covering Index

A covering index is a multi-column-index that indexes all fields used by a query.
It makes that one specific query more efficient.