How to perform indexing in MySQL?

Removing indexes from tables


Once-useful indexes may become obsolete as requirements change with the evolving
database. In this chapter, we will show you how to get rid of the IDX_author index created in
the Adding indexes to tables recipe.


Getting ready


Connect to the database server with your administrative account.


How to do it…



  1. Change the default database to library:


  2. USE library;

  3. Drop the IDX_author index using the following command:


  4. ALTER TABLE books DROP INDEX IDX_author;


How it works…


Using the ALTER TABLE statement, we tell the database that we want to remove (DROP) the
index named IDX_author from the books table.


There’s more…


As with the creation of new indexes, you can drop multiple indexes at once using the ALTER
TABLE statement by simply adding more DROP INDEX clauses, separated by comma. If you
were to delete both indexes defined in Adding indexes to tables, you could use this statement:



ALTER TABLE books DROP INDEX IDX_author, DROP INDEX IDX_title;


See also



  • Adding indexes to tables


Estimating InnoDB index space requirements


While indexes might very well be the single most important key in database performance
tuning, they come at the price of redundancy.


There are two main disadvantages tightly connected to redundant data storage:



  • The danger of inconsistencies between the redundant copies of data that should be
    at all times identical.

  • Increased storage and memory consumption because the same data is
    physically duplicated.


Fortunately, the former is a non-issue with indexes. As the database server takes care of
keeping data and indexes consistent without human intervention, you cannot get into a
situation where two columns that should contain equal data at all times are out of sync due to
programming errors or the like. This is usually a problem when violating normalization rules.


In contrast to that, there is no way to prevent the latter disadvantage. We need to store
multiple copies if we want different sort orders for quick lookups. What we can do, however,
is to attempt to minimize the negative effect by trying to limit the amount of duplicated
information as far as possible.



The employees database is an open source test database available for
free. It contains examples for many MySQL features including large tables,
foreign key constraints, views, and more. It can be found along with some
documentation at http://dev.mysql.com/doc/employee/en/
employee.html.


In the example below, we assume the existence of the employees test database with an
employees table defined as follows:



CREATE TABLE employees (
emp_no int(11) NOT NULL,
birth_date date NOT NULL,
first_name varchar(14) NOT NULL,
last_name varchar(16) NOT NULL,
gender enum(‘M’,’F’) NOT NULL,
hire_date date NOT NULL,
PRIMARY KEY (emp_no)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;


We will add an index each to the last_name and the first_name columns and try to
predict the necessary space.



Please note that the results will never be exact. The storage requirements—
especially of text-value table columns (VARCHAR, TEXT, CHAR, and so
on)—can be difficult to determine because there are multiple factors that
infl uence the calculation. Apart from differences between storage engines,
an important aspect is the character set used. For details refer to the
online manual for your server version: http://dev.mysql.com/doc/
refman/5.1/en/storage-requirements.html.
Moreover, it is not possible to find out the exact size even for existing indexes
because MySQL’s SHOW TABLE STATUS command only gives approximate
results for InnoDB tables.


Getting ready…


Connect to the database server with your administrative account.


How to do it…



  1. Find out the size of one primary key entry.

  2. To do so, look at the primary key definition in the table structure. Add the sizes of all
    primary key columns as documented in the MySQL Online Manual. In the example,
    the INT column takes 4 bytes. Write this number down.


  3. Determine the size of each column to be included in the new indexes and add them
    up per index. In the example, both first_name and last_name are VARCHAR
    columns—this means their lengths are not fixed as with the INT type. For simplicity,
    we will assume completely filled columns, meaning 14 bytes for first_name and 16
    bytes for the last_name column.

  4. For each index, add the lengths of all relevant columns and the size of the primary
    key. In our example, this gives the following results:


  5. The rightmost column contains the pure data size of a single index record including
    the implicit primary key.


  6. Multiply the size per index record with the number of rows in the table:


  7. The rightmost column contains the estimated size of the index, based on the current
    number of records, and the overhead taken by InnoDB to internally organize and store
    the data.



How it works


In the previous steps, we simply added up the sizes of all columns that will form a secondary
index entry. This includes all columns of the secondary index itself and also those of the
primary key because, as explained in the chapter introduction, InnoDB implicitly adds those to
every index.


Internally, the server of course needs a little more than just the raw column contents—all sorts
of management overhead (such as column widths, information on which columns can be null,
as well as some constant overhead) add to the required space. Calculating these in detail
is complicated and error-prone because they rely on many parameters and implementation
details can change between MySQL versions. This is not required, however, because our aim
is a ballpark number. As table contents often change quickly, exact numbers would not be
valid for long.


You can see this in our example—the values are too low. In reality, you will need to experiment
with these values. You are usually going to be on the safe side when you multiply your results
with a factor of 1.5 to 2.5.


You will find that depending on the lengths of the columns indexed and those that make up
the primary key, the accuracy of the estimates can vary.


There’s more…


Predicting space requirements is not an exact science. The following items are intended to
give some more hints on what you might want to think about.


Considering actual data lengths in your estimate


When adding an index to an existing column, you can try to use the average length of the
column values:



SELECT AVG(LENGTH(first_name)) AS avg_first, AVG(LENGTH(last_name)) AS
avg_last FROM employees;


For the sample data the results are:



Round this up to the next integer (7/8). Note that especially for short columns like this, the
estimates can be much less reliable because relative to internal database overhead data size
is less significant. This is why in the recipe earlier we went with declared maximum length of
the VARCHAR columns instead.


Minding character sets


For columns storing text information—such as CHAR and VARCHAR, VARBINARY, and
TEXT—the storage requirements depend on the character set used for the text inside.
For most English-speaking countries, this is something like the Latin-1 character set, which
uses a single byte per character of text. However, in international environments, this encoding
is hardly sufficient. To accommodate German text, for example, you need some special
characters—not to mention Chinese, Japanese, or other non-Latin languages.


MySQL supports different character sets on a per column basis. However, often you will define
a default character set for a database including all its tables and their columns.


When estimating index (and data) sizes for Unicode-aware columns (MySQL supports
UTF-8 and UCS2 character sets for this purpose), you need to take into account that those
may require more than a single byte per character. The very popular UTF-8 encoding uses
between 1 and 4 (even though 4 are only used in very special cases) bytes per character.
UCS2 has a constant size of 2 bytes per character. For details on how UTF-8 works,
see http://en.wikipedia.org/wiki/UTF-8.


Using prefix primary keys


In this example we will add indexes to two tables that are almost identical. The only difference
will be the definition of their primary keys. You will see the difference in space consumption
for secondary indexes between a regular full column primary key and a prefix primary key.
The sample table structure and data are designed to demonstrate the effect very evidently.
In real-world scenarios the effect will most certainly be less severe.


Getting ready…


Connect to the database server with your administrative account.


How to do it…



  1. Download the sample script for this chapter from the book’s website and save
    it to your local disk. In the example below, we will assume it is stored in /tmp/
    idxsizeestimate_sample.sql.

  2. Create a new database and make it the default database:


  3. CREATE DATABASE pktests;
    USE pktests;

  4. Import the sample data from the downloaded file. When done, you will be presented
    with some statistics about the two tables loaded. Note that both tables have an Index
    Length of 0.


  5. SOURCE /tmp/idxsizeestimate_sample.sql;

  6. Now with the sample tables present, add an index to each of them:


  7. ALTER TABLE LongCharKey ADD INDEX IDX_PAY_10(Payload(10));
    ALTER TABLE LongCharKey10 ADD INDEX IDX_PAY_10(Payload(10));

  8. Display the data and index sizes of the tables now:


  9. SHOW TABLE STATUS LIKE ‘LongCharKey%';

  10. Add another index to each table to make the difference even more evident:


  11. ALTER TABLE LongCharKey ADD INDEX IDX2_PAY_10(Payload(10));
    ALTER TABLE LongCharKey10 ADD INDEX IDX2_PAY_10(Payload(10));

  12. Display the data and index sizes of the tables again and compare with the previous
    values:


  13. SHOW TABLE STATUS LIKE ‘LongCharKey%';


    With the second index added, the difference in index length becomes even clearer.



How it works…


Executing the downloaded script will set up two tables with the following structures:



CREATE TABLE `LongCharKey` (
`LongChar` char(255) NOT NULL,
`Payload` char(255) DEFAULT NULL,
PRIMARY KEY (`LongChar`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `LongCharKey10` (
`LongChar` char(255) NOT NULL,
`Payload` char(255) DEFAULT NULL,
PRIMARY KEY (`LongChar`(10))
) ENGINE=InnoDB DEFAULT CHARSET=latin1;


The two tables are almost identical, except for the primary key definition. They are pre-filled
with 50,000 records of sample data.


The tables are populated with exactly the same 50,000 records of pseudo-random data. The
Payload column is filled with sequences of 255 random letters each. The LongChar column
is filled with a sequential number in the first 10 characters and then filled up to use all
remaining 245 character with the same sort of random data.



SELECT LEFT(LongChar,20), LEFT(Payload, 20) from LongCharKey LIMIT 5;



While the LongKeyChar table simply marks the whole LongChar column as a primary key with
its entire 255 characters length, the LongCharKey10 table limits the primary key to the first
10 characters of that column. This is perfectly fine for this table, because the test data was
crafted to be unique in this range.


Neither one of the two tables has any secondary indexes defined. Looking at some relevant
table data shows they are equally big (some columns left out for brevity):



SHOW TABLE STATUS LIKE ‘LongCharKey%';



With each index added, the Index Length for the first table will increase significantly, while for
the second one its growth is much slower.


In case of the LongCharKey table, each secondary index record will carry around with it a
complete copy of the LongChar column because it is the primary key without limitation.
Assuming a single byte character encoding, this means every secondary index record is blown
up in size by 255 bytes on top of the 10 bytes needed for the actual index entry. This means a
whole kilobyte is spent just for the primary key reference for every 4 records!


In contrast to that, the primary key definition of the LongCharKey10 table only includes the
leading 10 characters of the LongChar column, making the secondary index entry 245 bytes
shorter and thereby explaining the much slower growth upon adding further indexes.

Comments

comments

Pages: 1 2 3 4

About Krishna Srinivasan

He is Founder and Chief Editor of JavaBeat. He has more than 8+ years of experience on developing Web applications. He writes about Spring, DOJO, JSF, Hibernate and many other emerging technologies in this blog.

Speak Your Mind

*