-
Can MySQL use primary key values from a secondary index?
In the article about the role of a primary key, I mentioned that a secondary index in an InnoDB table consists not only of the values of its member columns, but also values of the table’s primary key are concatenated to the index. I.e. the primary key contents is part of every other index.
Assuming the following table structure:
CREATE TABLE `bets` (
`id` int(10) unsigned NOT NULL,
`user_id` int(10) unsigned NOT NULL,
`game_id` int(10) unsigned NOT NULL,
...
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`)
) ENGINE=InnoDB
Here is the visualization:
If MySQL could use in queries these implicitly added values, it would maybe allow to save some space on listing the primary key columns at the end of an index explicitly. Let’s check various cases.
Row filtering
mysql> EXPLAIN
-> SELECT *
-> FROM bets
-> JOIN games
-> ON games.id = bets.id
-> WHERE bets.user_id = 111
-> AND bets.id > 3476\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: bets
type: ref
possible_keys: user_id
key: user_id
key_len: 4
ref: const
rows: 22
Extra: Using where
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: games
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: game.bets.id
rows: 1
Extra:
Both key_len and ref fields indicate that only one four bytes long column is used from the user_id index. MySQL cannot use the primary key values in a secondary index for filtering in WHERE clause.
Sorting with ORDER BY
mysql> EXPLAIN
-> SELECT *
-> FROM bets
-> JOIN games
-> ON games.id = bets.game_id
-> WHERE bets.user_id = 111
-> ORDER BY bets.id DESC\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: bets
type: ref
possible_keys: user_id
key: user_id
key_len: 4
ref: const
rows: 22
Extra: Using where
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: games
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: game.bets.game_id
rows: 1
Extra:
Extra only returns Using where, but there is no Using filesort. It means ORDER BY will be optimized using the hidden primary key data from the secondary index.
Aggregating with GROUP BY
mysql> EXPLAIN
-> SELECT *
-> FROM bets
-> JOIN games
-> ON games.id = bets.game_id
-> WHERE bets.user_id = 111
-> GROUP BY bets.id\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: bets
type: ref
possible_keys: user_id
key: user_id
key_len: 4
ref: const
rows: 22
Extra: Using where
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: games
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: game.bets.game_id
rows: 1
Extra:
Also in this case Extra neither shows Using filesort nor Using temporary, which would indicate no index is used for grouping. Therefore MySQL can optimize GROUP BY on the concatenated primary key values.
Covering index
mysql> EXPLAIN
-> SELECT bets.id
-> FROM bets
-> WHERE bets.user_id = 111\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: bets
type: ref
possible_keys: user_id
key: user_id
key_len: 4
ref: const
rows: 22
Extra: Using index
The query execution plan confirms through Using index that it will only need index contents to return result. MySQL can read and return the hidden primary key values to avoid the additional data lookup.
Summary
In InnoDB tables each entry of a secondary index always contains the copy of the corresponding primary key row. These values may in some cases be used to the benefit of query execution plan:
for ORDER BY on the primary key column(s)
for GROUP BY on the primary key column(s)
when returning the primary key column(s) values in the SELECT list
MySQL cannot use them, however, to optimize filtering in WHERE.
-
Joins: inner, outer, left, right
In (My)SQL, join is a means for combining records from two tables into a single set which can be either returned as is or used in another join. In order to perform the operation a join has to define the relationship between records in either table, as well as the way it will evaluate the relationship. The relationship itself is created through a set of conditions that are part of the join and usually are put inside ON clause. The rest is determined through a join type, which can either be an inner join or an outer join.
The SQL clauses that set the respective join type in a query are [INNER] JOIN and {LEFT | RIGHT} [OUTER] JOIN. As you can see the actual keywords INNER and OUTER are optional and can be omitted, however outer joins require specifying the direction – either left or right.
Examples of queries:
SELECT * FROM users JOIN files ON files.owner_id = users.id WHERE users.id = 1;
SELECT * FROM users LEFT JOIN files ON files.owner_id = users.id WHERE users.id = 1;
Inner join, outer join
The primary difference between the two basic types (each has several subtypes) is in making the decision whether joining of two rows was successful or not, which essentially determines whether the combined row can be returned or not.
Inner joins require that a row from the first table has a match in the second table based on the join conditions. In means that the first query from the example above will only return any rows if files table contains at least one record where owner_id is 1 (has to be equal to users.id by the join conditions and users.id is filtered in WHERE to accept only that one value). Otherwise it will return no rows at all, even if users contains a valid user record. Assuming there are two users, but only one has any files:
mysql> SELECT * FROM users JOIN files ON files.owner_id = users.id WHERE users.id = 1;
+----+--------------+---------+----+----------+------------------+
| id | name | enabled | id | owner_id | filename |
+----+--------------+---------+----+----------+------------------+
| 1 | Albin Kolano | 1 | 1 | 1 | audit report.doc |
+----+--------------+---------+----+----------+------------------+
1 row in set (0.00 sec)
mysql> SELECT * FROM users JOIN files ON files.owner_id = users.id WHERE users.id = 2;
Empty set (0.01 sec)
Outer joins, on the other hand, consider a join successful even if no records from the second table meet the join conditions (i.e. whether there are any matches or not). In such case outer join sets all values in the missing columns to NULL. The second query from the example will return rows whenever there are matches in users and regardless of the contents of files table.
mysql> SELECT * FROM users LEFT JOIN files ON files.owner_id = users.id WHERE users.id = 1;
+----+--------------+---------+------+----------+------------------+
| id | name | enabled | id | owner_id | filename |
+----+--------------+---------+------+----------+------------------+
| 1 | Albin Kolano | 1 | 1 | 1 | audit report.doc |
+----+--------------+---------+------+----------+------------------+
1 row in set (0.00 sec)
mysql> SELECT * FROM users LEFT JOIN files ON files.owner_id = users.id WHERE users.id = 2;
+----+-------------------+---------+------+----------+----------+
| id | name | enabled | id | owner_id | filename |
+----+-------------------+---------+------+----------+----------+
| 2 | Nadzieja Surowiec | 1 | NULL | NULL | NULL |
+----+-------------------+---------+------+----------+----------+
1 row in set (0.00 sec)
Left join, right join
Unlike inner joins, outer joins require that the join direction is specified. Inner join is a symmetrical and bi-directional relationship, which means A JOIN B produces the same result as B JOIN A. That is not true for outer joins, because they accept when for a record in A there is no matching record in B and in such case the reverse operation is impossible (it would have to start with the non-existing record in B). This is the reason why setting the direction is necessary. A LEFT JOIN B finds matches for rows from table A in table B, while A RIGHT JOIN B finds matches for records from B in A.
In practice there is very little or even no real purpose for using RIGHT JOIN and in majority of cases everyone just sticks to using LEFT JOIN whenever they need outer join.
When does the join type matter?
Choosing the appropriate type depends on the logic you are trying to implement.
You have to use inner join when mandatory pieces of information are located in both tables and partial information is considered incomplete or even useless. The case of this could be listing user’s files based on the earlier example:
mysql> SELECT *
-> FROM users
-> JOIN files
-> ON files.owner_id = users.id
-> WHERE users.name = 'Nadzieja Surowiec'
-> AND users.enabled = 1;
Empty set (0.00 sec)
The query finds the user’s record by in users table and verifies that they are allowed to use the service through the value of users.enabled column and then searches for their files in files table. If there are no matches in either table, the query does not return any result, which is the correct behavior. If outer join was used in this case, a useless partial result could be returned or even incorrect result:
mysql> SELECT *
-> FROM users
-> LEFT JOIN files
-> ON files.owner_id = users.id
-> WHERE users.name = 'Nadzieja Surowiec'
-> AND users.enabled = 1;
+----+-------------------+---------+------+----------+----------+
| id | name | enabled | id | owner_id | filename |
+----+-------------------+---------+------+----------+----------+
| 2 | Nadzieja Surowiec | 1 | NULL | NULL | NULL |
+----+-------------------+---------+------+----------+----------+
1 row in set (0.00 sec)
mysql> SELECT COUNT(1)
-> FROM users
-> LEFT JOIN files
-> ON files.owner_id = users.id
-> WHERE users.name = 'Nadzieja Surowiec'
-> AND users.enabled = 1;
+----------+
| COUNT(1) |
+----------+
| 1 |
+----------+
1 row in set (0.00 sec)
The application relying on such queries not only would not be able to make anything out of such file information where file data is all set to NULL values, but also it would have to include additional and in fact redundant logic to filter out such results. The row count in this case is correct, as the query returned a single row, but it does not represent how many files the user has, so it is not a valid information that the application could use.
Outer join must be used to perform a join with a table, which holds information that is only optional for the result.
In our example we are working with a query that lists user’s files and we already established that the join between users and files has to be inner join. But let’s give our users the opportunity to choose a custom icon for any file if they want to. The information could be kept in a separate table called file_icon.
Now, for each listed file we also want to see if user has set a custom icon for that file and return the icon name if they have set it. The icon information is entirely optional, so we want the query to return rows regardless of whether there is an entry for the given file in file_icon or not. Therefore we have to use outer join for this particular task.
mysql> SELECT *
-> FROM users
-> JOIN files
-> ON files.owner_id = users.id
-> LEFT JOIN file_icon
-> ON file_icon.file_id = files.id
-> WHERE users.name = 'Albin Kolano'
-> AND users.enabled = 1;
+----+--------------+---------+----+----------+------------------+---------+------------------+
| id | name | enabled | id | owner_id | filename | file_id | icon_image |
+----+--------------+---------+----+----------+------------------+---------+------------------+
| 1 | Albin Kolano | 1 | 1 | 1 | audit report.doc | 1 | MS-Word-Icon.png |
| 1 | Albin Kolano | 1 | 2 | 1 | stats-201104.xls | NULL | NULL |
+----+--------------+---------+----+----------+------------------+---------+------------------+
The outer join allowed us to grab the complete list of user’s files and along with their icons if any were set. If we used inner join instead, the result would be missing the record of stats-201104.xls file.
To Be Continued
More on dealing with practical problems when designing join queries will be covered in a separate post.
-
Training in London next week
I’m going to deliver MySQL Training next week (May 21-24) in London.
This is a rare opportunity as I do not personally deliver a lot of Training, especially outside of US. There are still some places left if you want to sign up.
You will also get a signed copy of High Performance MySQL 3rd edition as an attendee.
-
MySQL Workbench 5.2.40 GA Released
The MySQL Developer Tools team is announcing the next maintenance release of it’s flagship product, MySQL Workbench, version 5.2.40.
This version contains more than 28 bug fixes applied over version 5.2.39.
MySQL Workbench 5.2 GA
• Data Modeling
• Query
• Administration
Please get your copy from our Download site.
Sources and binary packages are available for several platforms, including Windows, Mac OS X and Linux.
http://dev.mysql.com/downloads/workbench/
Workbench Documentation can be found here.
http://dev.mysql.com/doc/workbench/en/index.html
Utilities Documentation can be found here.http://dev.mysql.com/doc/workbench/en/mysql-utilities.html
In addition to the new Query/SQL Development and Administration
modules, version 5.2 features improved stability and performance –
especially in Windows, where OpenGL support has been enhanced and the UI
was optimized to offer better responsiveness.
This release also includes improvements to the scripting capabilities of the SQL Editor. You can read more about it in
http://wb.mysql.com/workbench/doc/
For a detailed list of resolved issues, see the change log.
http://dev.mysql.com/doc/workbench/en/wb-change-history.html
If you need any additional info or help please get in touch with us.
Post in our forums or leave comments on our blog pages.
- The MySQL Workbench Team
-
MySQL Backup & Recovery Essentials
Download PDF Presentation
A hardware, software or human failure can occur at any time. Are you prepared?
Many organizations take a risk of serious data loss and system downtime with inadequate procedures in place to support a disaster recovery. This presentation covers the essentials of MySQL backup and recovery options, identifying the necessary tools for an effective strategy to support data resilience and business continuity for your organization. MySQL has no one single unbreakable backup solution, so it is important to understand the impact of MySQL replication, storage engines, configuration options for durability, hardware configuration and the impact on locking and uptime for the various hot/warm/cold options available.
Short Url: http://j.mp/EM-BandR
Presenter: Ronald Bradford
Schedule: RMOUG QEW – May 2012- Denver, Colorado
|