How to perform indexing in MySQL?

«»

Adding indexes to tables


Over time requirements for a software product usually change and affect the underlying
database as well. Often the need for new types of queries arises, which makes it necessary to
add one or more new indexes to perform these new queries fast enough.


In this recipe, we will add two new indexes to an existing table called books in the library
schema. One will cover the author column, the other the title column. The schema and
table can be created like this:



mysql> CREATE DATABASE library;
mysql> USE library;
mysql> CREATE TABLE books (
isbn char(13) NOT NULL,
author varchar(64) default NULL,
title varchar(64) NOT NULL,
PRIMARY KEY (isbn)
) ENGINE=InnoDB;


Getting ready


Connect to the database server with your administrative account.


How to do it…



  1. Change the default database to library:


  2. USE library;

  3. Create both indexes in one go using the following command:


  4. ALTER TABLE books ADD INDEX IDX_author(author), ADD INDEX IDX_
    title(title);


How it works…


The ALTER table statement shown above is almost self-explanatory. The books table is
altered to be indexed with individual indexes on the author and the title columns. Each
index is given an easily recognizable name: IDX_author and IDX_title for the author
and title columns respectively.


Index names are helpful when you later decide to remove an index from a table. Instead of
listing all the columns again, you can just refer to the index name.



Index names
It is very common to name indexes with some sort of prefix like IDX_ and
then append the column name(s) the index spans.
This is not strictly necessary and you might want to establish a different
naming scheme. Whatever you choose, make sure you follow your scheme
and assign names consistent with it for all your indexes.


There’s more…


There are some more details worth knowing about when creating indexes on any given table.


Using MySQL Query Browser to generate the SQL statements


Setting up indexes can be done either through a command line as shown earlier or using an
arguably more comfortable graphical tool like MySQL Query Browser. Especially when dealing
with more complex table setups, the graphical presentation can provide additional clarity. Before
applying any changes to your database, the product will display and allow you to copy or save the
full SQL statement(s) that are equivalent to the changes you made in the graphical editor.


This is very convenient because this way you can be sure not to make any mistakes
concerning statement syntax, table, or column names. We usually make changes using
MySQL Query Browser on a development or testing machine just to grab the SQL statements
and put them into SQL update script files for later execution, for example, as a part of our
software update routine. The following figure shows what the changes made in this example
look like. Note that the generated statements contain all table and column names in backtick
quotes. This is generally not required as long as those identifiers do not collide with MySQL
keywords—something you should avoid anyway. Also, the statements will be fully qualified,
which means the database name is put before the table name. This is also not strictly
required if you set the default database to the right schema beforehand.



Prefix indexes


In the example above, we created an index with default settings. This will create an index that
is usually “just right”. You may, however, have special requirements or possess knowledge
about the table data that cannot be derived from the schema definition alone, making a
custom index a better choice than the default one.

The detail most often changed in an index definition is the length of the index fields. MySQL
provides support for so-called prefix indexes. As the database does not know about the nature
of the contents that are going to be stored in any particular column apart from the data type,
it has no choice but to take the safe route and consider the full length of the column in its
sorted index copy.


For long columns in large tables, it can be a waste of space to copy the complete column
values to the index, which in turn can have negative impact on performance just because
there’s more data involved.


You can aid the database to work more efficiently with your domain knowledge. In the books
example table the title can be up to 64 characters long. However, it is very unlikely that there
will be a lot of books whose titles start alike and only differ in the last few characters. So,
having the index cover the maximum length is probably not necessary for quick lookups. By
changing the index creation statement to include a prefix length (say 20 characters) for the
column to be indexed, you can tell MySQL to only copy the first 20 characters of the title to
the index:



ALTER TABLE books ADD INDEX IDX_title(title(20));


As a result, the index will use less space—in terms of both disk usage and memory when used
for queries. As long as the book title differs within the first 20 characters, this index will be
more efficient than one covering the full column.


Even when there is a certain number of titles that are identical within this 20 character prefix,
the index will still be useful. This is because as long as MySQL can rule out all but a few
records, having to look at the actual table data for the final decision as to which rows fulfill the
query conditions is still significantly faster than having to scan the whole table with all books.


Unfortunately, there is no easy-to-use formula to determine the ideal prefix length because it
heavily depends on the actual data content. This is why by default the whole column is indexed.


Prefix primary keys


Most documentation on indexing in some way or another covers the topic of prefix indexes
for text type columns, using only a portion at the beginning of column contents instead of the
whole values for the actual index.


However, often this topic is presented in a way that might suggest this only works for
secondary keys; but that is not true. You can also use a prefix primary key, as long as the most
important requirement of a primary key is not violated: the uniqueness of each key value must
be guaranteed.


See also



  • Estimating InnoDB index space requirements

  • Removing indexes from tables


Adding a fulltext index


Indexes are an important means of making sure a database performs well and responds
quickly when queried. However, they can only live up to their full potential when applied to
well-structured data. Unfortunately, not all information we would like to query can be made
to fit into regular relational database tables and columns.


A prime example of this is free text. It does not follow any well-defined structure and does not
lend itself to the principle by which regular indexes work. For example, a fulltext index allows
querying for search terms no matter where in the indexed column they occur and not only at
the beginning of the column as would be the case with normal indexes.


Fulltext indexes require you to use a special syntax to express your queries. Querying with
the LIKE keyword will not be accelerated by a fulltext index. In this recipe you will learn how
to create a fulltext index on an existing database column. For the purpose of this example,
we assume a fictional forum database with a posts table that in turn has a content column
storing the actual text of a forum entry.



InnoDB tables do not support fulltext indexing. This feature is only
available for tables using the MyISAM storage engine.


Getting ready


Connect to the database using your administrative account.


How to do it…



  1. Change the default database to forum:


  2. USE forum;

  3. Create the fulltext index using the following command:


  4. ALTER TABLE posts ADD FULLTEXT INDEX IDX_content(content);


How it works…


While regular indexes create ordered copies of the relevant columns to enable quick lookups,
fulltext indexes are a more complicated matter.



Dropping and recreating fulltext indexes for bulk data imports
When (first) inserting bulk data into a table, it is faster to first drop an existing
fulltext index and then later recreate it. This will speed up the data insertion
significantly because keeping the fulltext index up to date during data insert is
an expensive operation.


There’s more…


Here are some details that are important to know when including fulltext indexing in
your applications.



Please be aware that changes to any of the parameters that follow require a
rebuild of any fulltext index that was created before the change!
See the MySQL online manual at http://dev.mysql.com/doc/
refman/5.1/en/fulltext-fine-tuning.html for more details.


Case sensitivity


Fulltext index queries are usually run in a case-insensitive way. If you need case-sensitive
fulltext search, you will have to change the collation of the affected underlying columns to
a binary variant.


Word length


When a fulltext index is created, only words within a configurable range of lengths are
considered. This helps to prevent acronyms and abbreviations being included in the index. You
can configure the acceptable length range using the ft_min_word_len and ft_max_word_
len variables. The default value for the minimum length is 4 characters.


Stopwords


In every language, there are many words that are usually not wanted in fulltext search
matching. These so called stopwords might be “is, a, be, there, because, done” among others.
They appear so frequently in most texts that searching for them is hardly useful. To conserve
resources, these stopwords are ignored when building a fulltext index. MySQL uses a default
stopword list that defines what is to be ignored, which contains a list of about 550 English
stopwords. You can change this list of stopwords with the ft_stopword_file variable. It
takes a filename with a plain text file containing the stopwords you would like to use. Disabling
stopwords can be achieved by setting this variable to an empty string.


Ignoring frequent words


Frequent words will be ignored: if a search term is present in more than half of the rows
searched, it will be considered a stopword and effectively ignored. This is useful especially
in large tables; otherwise you would get half of the table as query hits, which can hardly be
considered useful.



When experimenting with fulltext search, make sure you have a reasonably
large dataset to play with. Otherwise you will easily hit the 50 percent mark
described above and not get any query results. This can be confusing and will
make you think you did something wrong, while in fact everything is perfectly
in order.


Query modes


Apart from the default human query mode you can use a boolean query mode, which enables
special search-engine-like operators to be used—for example, the plus and minus signs to
include or exclude words in the search.


This would allow you to use query terms such as ‘+apple -macintosh’ to find all records
containing the word apple, but not the word macintosh.


For all the possible operators, have a look at the MySQL online manual at

http://dev.mysql.com/doc/refman/5.1/en/fulltext-boolean.html

Sphinx


MySQL’s built-in fulltext search is only available for MyISAM tables. In particular, InnoDB is
not supported. If you cannot or do not want to use MyISAM, have a look at Sphinx—an open
source, free fulltext search engine that was designed to integrate nicely with MySQL. See
http://sphinxsearch.com/ for more information.


See also



  • Removing indexes from tables


Creating a normalized text search column


Usually, regular and fulltext indexing as supported by MySQL are sufficient for most use cases.
There are, however, situations where they are not perfectly usable:



  • InnoDB tables cannot use fulltext indexes. At the time of writing there were no signs
    of this changing in the foreseeable future.

  • There are different ways to spell the search terms


Especially in non-English speaking countries, a problem often arises that does not surface as
often in American or British environments. Words in the English language consist of the letters
from A to Z without diacritics. From a software development perspective this is a welcome
simplification because it allows for simpler implementations.


One problem you are often faced with German, for example, is different ways to spell the
same word, making it complicated to formulate suitable search terms.


Consider the German words “Dübel” (dowel) and “Mörtel” (mortar). In a merchandise
management database you might find several variants of similar products, but each could be
spelled in different ways:



As an end user of the corresponding application searching for those becomes cumbersome
because to find exactly what you are looking for you might have to attempt several searches.


In this recipe, we will present an idea that needs some support on the application level but
will allow you to use simple regular indexes to quickly search and find relevant records in
situations like the above.



To implement the ideas presented in this recipe, modifications to the
software accessing the database as well as the table definition will be
necessary. We advise that this is a process that usually entails a higher
complexity and increased testing efforts than simply adding an index.


Getting ready


To implement the ideas presented here, you will need to connect to the database server with
your administrative account for the schema modifications. Because apart from the database
modifications application program code changes will be necessary as well, you should contact
an application developer.


In the example, we are going to assume a table definition as follows:



CREATE TABLE products (
productID int(11) NOT NULL,
name char(30) default NULL,
stock int(11) default NULL,
PRIMARY KEY (productID)
) ENGINE=InnoDB


How to do it…



  1. Connect to the database server using your administrative account and make test
    the default schema:


  2. use test;

  3. Add a new column norm_name to the products table:


  4. mysql> ALTER TABLE products ADD COLUMN norm_name CHAR(90) AFTER
    name;

    The column needs to be at least as long as your original column. Depending on the
    character mapping rules you are going to implement, the projected values might take
    up more space.


  5. Define an index on the new column. Make sure it is not set to UNIQUE:


  6. mysql> ALTER TABLE products ADD INDEX IDX_norm_name (norm_name);

  7. Optionally, consider dropping an existing index on the original column. Also, consider
    modifying other indexes currently containing the original column to include the new
    one instead.

  8. Implement the replacement algorithm depending on your language. For German
    language substitutions, the following substitutions could be used. This is just an
    excerpt from the Transformers.java class you can find on the book’s website.



    private static String[] replacements = {
    “ä”, “ae”, “null”, “0″, “:”, “”,
    “ö”, “oe”, “eins”, “1″, “:”, “”,
    “ü”, “ue”, “zwei”, “2″, “.”, “”,
    “ß”, “ss”, /* … */ “-”, “”,
    ” “, “”, “neun”, “9″, “,”, “”,
    // … further replacements…
    };

  9. Modify your application code to use the new mapping function and issue queries
    against the new norm_name column where previously the original name column was
    used. Depending on how you decide to expose the search features to your end users,
    you might want to make searching the new or the old column an option.

  10. Modify your application code to update the new column parallel to the original
    one. Depending on the application scenario, you might decide to only update the
    normalized search column periodically instead.

  11. Before handing out a new version of your software containing the new code, make
    sure the normalized search column gets populated with the correct values.

  12. Optionally, declare the new column NOT NULL, after it has been initially filled.


How it works…


By implementing the mapping algorithm, we make the application think about the different
ways to spell things, not the end user. Instead of creating all possible variants, which could
become a large set of permutations depending on the length and content of the original
input, we project the search terms to a normalized form for both the original data and later
for queries issued against it. As both use the same mapping functions, only a single—index
supported—query against MySQL is needed. The application of course usually never reveals
these internals. The person in front of the computer will just be pleased to find the desired
records easily.


The mapping rules from input to search terms depend on the language and applicationspecifi
c needs. For German words, they are rather short—only the umlaut characters need to
be transformed to a normalized form. Other languages might require more complex rules.


In the example code earlier, we also transform the input to lowercase and remove several
special characters like dashes and colons, and also the whitespace. For the sample data set
of products we used, this is the result of the transformation:



Now instead of querying the original data column, we ask the database to search for the
transformed representation of the search terms in the additional norm_name (normalized)
column. For this it can use regular indexes and provide results quickly and efficiently.


Note that the Transformer.java code available from the book’s website is nowhere near
production quality but only serves for demonstration purposes. It does not, for example, contain
any error checking or exception handling and the mapping algorithm is very simple, too.


There is more…


If you do not care about international specialties but still want to improve user experience by
allowing for less strict searches, you might want to have a look at the SOUNDEX() function.
It is designed to work for English language words only and allows you to query for results that
sound like the search terms.


However, note that the results of using it may not be what you expect—opinions on the
Internet range from extreme enthusiasm to complete disappointment. You can find its
documentation at http://dev.mysql.com/doc/refman/5.1/en/string-functions.
html#function_soundex.

«»

Comments

comments

Pages: 1 2 3 4

About Krishna Srinivasan

He is Founder and Chief Editor of JavaBeat. He has more than 8+ years of experience on developing Web applications. He writes about Spring, DOJO, JSF, Hibernate and many other emerging technologies in this blog.

Speak Your Mind

*

Close
Please support the site
By clicking any of these buttons you help our site to get better