Issue #10 August 2005

Getting data out of MySQL

The simplest way to retrieve data from MySQL is to use the SELECT statement. Since the SELECT statement is an essential SQL statement, it has many options available with it. It's not necessary to know or use them all—you could execute very basic SELECT statements if that satisfies your needs. However, as you use MySQL more, you may need more powerful SELECT statements. This article goes through the basics of SELECT and progresses to more involved SELECT statements, moving from the beginner level to the more intermediate. Hopefully you will find some benefit from this article regardless of your skill level. For absolute beginners who are just starting with MySQL, you may want to read over the previous article on Getting started with MySQL.

Basic elements

The basic, minimal elements of the SELECT statement call for the keyword SELECT, the columns to select or retrieve from, and the table from which to retrieve rows of data. Actually, for the columns to select, we can use the asterisk as a wildcard to select all columns in a particular table. Using a database from a fictitious bookstore, we might enter the following SQL statement to get a list of all columns and rows in a table containing information on books:

SELECT * FROM books;

This will retrieve all of the data contained in the books table. If we want to retrieve only certain columns, we would list them in place of the asterisk in a comma-separated list like so:

SELECT isbn, title, author_id 
    FROM books;

This narrows the width of the results set by retrieving only three columns, but it still retrieves all of the rows in the table. If the table contains thousands of rows of data, this may be more data than we want. If we want to limit the results to just a few books, say five, we would include what is known as a LIMIT clause:

SELECT isbn, title, author_id
    FROM books
    LIMIT 5;

This will give us the first five rows found in the table. If we want to get the next 10 found, we would add a starting point parameter just before the number of rows to display, separated by a comma:

SELECT isbn, title, author_id
    FROM books
    LIMIT 5, 10;

Selectivity and order

The previous statements have narrowed the number of columns and rows retrieved, but they haven't been very selective. Suppose that we want only books written by a certain author, say Dostoevsky. Looking in the authors table we find that his author identification number is 4729. Using a WHERE clause, we can retrieve a list of books from the database for this particular author like so:

SELECT isbn, title
    FROM books
    WHERE author_id = 4729
    LIMIT 5;

I removed the author_id from the list of columns to select, but left the basic LIMIT clause in because I want to point out that the syntax is fairly strict on ordering of clauses and flags. You can't enter them in any order. You'll get an error in return.

The SQL statements we've looked at thus far will display the titles of books in the order in which they're found in the database. If we want to put the results in alphanumeric order based on the values of the title column, for instance, we would add an ORDER BY clause like this:

SELECT isbn, title
    FROM books
    WHERE author_id = 4729
    ORDER BY title ASC
    LIMIT 5;

Notice that the ORDER BY clause goes after the WHERE clause and before the LIMIT clause. Not only will this statement display the rows in order by book title, but it will retrieve only the first five based on the ordering. That is to say, MySQL will first retrieve all of the rows based on the WHERE clause, order the data based on the ORDER BY clause, and then display a limited number of rows based on the LIMIT clause. Hence the reason for the order of clauses. You may have noticed that I slipped in the ASC flag. It tells MySQL to order the rows in ascending order for the column name it follows. It's not necessary, though, since ascending order is the default. However, if we want to display data in descending order, we would replace the flag with DESC. To order by more than one column, additional columns may be given in the ORDER BY clause in a comma separated list, each with the ASC or DESC flags if preferred.

Friendlier and more complicated

So far we've been working with one table of data containing information on books for a fictitious bookstore. A database will usually have more than one table, of course. In this particular database, there's also one called authors in which the name and other information on authors is contained. To be able to select data from two tables in one SELECT statement, we will have to tell MySQL that we want to join the tables and will need to provide a join point. This can be done with a JOIN clause as shown in the following SQL statement

SELECT isbn, title,
    CONCAT(name_first, ' ', name_last) AS author
    FROM books
    JOIN authors USING (author_id)
    WHERE author_last = 'Dostoevsky'
    ORDER BY title ASC
    LIMIT 5;

Producing the following results:

+-------------+------------------------+-------------------+
| isbn        | title                  | author            |
+-------------+------------------------+-------------------+
|  0553212168 | Brothers Karamozov     | Fyodor Dostoevsky |
|  0679420290 | Crime & Punishment     | Fyodor Dostoevsky |
|  0553211757 | Crime & Punishment     | Fyodor Dostoevsky |
|  0192834118 | Idiot                  | Fyodor Dostoevsky |
|  067973452X | Notes from Underground | Fyodor Dostoevsky |
+-------------+------------------------+-------------------+
5 rows in set (0.00 sec)

Our SELECT statement is getting hefty, but it's the same one to which we've been adding. Don't let the clutter fluster you. Looking for the new elements, let's focus on the JOIN clause first. There are a few possible ways to construct a join. This method works if you're using a newer version of MySQL and if both tables contain a column of the same name and value. Otherwise you'll have to redo the JOIN clause to look something like this:

...
JOIN authors ON author_id = row_id
...

This excerpt is based on the assumption that the key field in the authors table is not called author_id, but row_id instead. There's much more that can be said about joins, but that would make for a much longer article. If you want to learn more on joins, look at the MySQL documentation page on JOIN syntax.

Looking again at the last full SQL statement above, you must have spotted the CONCAT() function added to the on-going example statement. This string function takes the values of the columns and strings given and pastes them together to give one neat field in the results. I also employed the AS parameter to change the heading of the results set for the field to author. This is much tidier. Since we joined the books and the authors tables together, we were able to search for books based on the author's last name rather than having to look up the author ID first. This is a much friendlier method, albeit more complicated. Incidentally, we can have MySQL check columns from both tables to narrow our search. We would just add column = value pairs separated by commas in the WHERE clause. Notice that the string containing the author's name is wrapped in quotes—otherwise, the string would be considered a column name and we'd get an error.

The name Dostoevsky is sometimes spelled Dostoevskii, as well as a few other ways. If we're not sure how it's spelled in the authors table, we could use the LIKE operator instead of the equal-sign, along with a wildcard. If we think the author's name is probably spelled either of the two ways mentioned, we could enter something like this:

SELECT isbn, title,
    CONCAT(name_first, ' ', name_last) AS author
    FROM books
    JOIN authors USING (author_id)
    WHERE author_last LIKE 'Dostoevsk%'
    ORDER BY title ASC
    LIMIT 5;

This will match any author last name starting with Dostoevsk. Notice that the wildcard here is not an asterisk, but a percent-sign.

Some flags

There are many flags or parameters that can be used in a SELECT statement. To list and explain all of them with examples would make this a very lengthy article. The reality is that most people never use some of them anyway. So, let's take a look at a few that you may find useful as you get more involved with MySQL or if you work with large tables on very active servers.

The first flag that may be given, the ALL flag, goes immediately after the SELECT keyword. By default, all rows that meet the requirements of the various clauses given are selected, so this isn't necessary. If instead we would only want the first occurrence of a particular criteria to be displayed, we could add the DISTINCT flag. For instance, for authors like Dostoevsky there will be several printings of a particular title. In the results shown earlier you may have noticed that there were two copies of Crime and Punishment listed; however, they have different ISBN numbers and different publishers. Suppose that for our search we only want one row displayed for each title. We could do that like so:

SELECT DISTINCT isbn, title
    FROM books
    JOIN authors USING (author_id)
    WHERE author_last = 'Dostoevsky'
    ORDER BY title;

I've thinned out the ongoing SQL statement a bit for clarity. This statement will result in only one row displayed for Crime and Punishment, and it will be the first one found.

If we're retrieving data from an extremely busy database, by default any other SQL statements entered simultaneously which are changing or updating data will be executed before a SELECT statement. SELECT statements are considered to be of lower priority. However, if we would like a particular SELECT statement to be given a higher priority, we can add the keyword HIGH_PRIORITY. Modifying the previous SQL statement for this factor, we would enter it like this:

SELECT DISTINCT HIGH_PRIORITY isbn, title
    FROM books
    JOIN authors USING (author_id)
    WHERE author_last = 'Dostoevsky'
    ORDER BY title;

You may have noticed in the one example earlier in which the results are shown, that there's a status line displayed that specifies the number of rows in the results set. This is less than the number of rows that were found in the database that met the statement's criteria. It's less because we used a LIMIT clause. If we add the SQL_CALC_FOUND_ROWS flag just before the column list, MySQL will calculate the number of columns found even if there is a LIMIT clause.

SELECT SQL_CALC_FOUND_ROWS isbn, title
    FROM books
    JOIN authors USING (author_id)
    WHERE author_last = 'Dostoevsky'
    LIMIT 5;

To retrieve this information, though, we will have to use the FOUND_ROW() function like so:

SELECT FOUND_ROW();

which results in:

+--------------+
| FOUND_ROWS() |
+--------------+
|           26 |
+--------------+

This value is temporary and will be lost if the connection is terminated. It cannot be retrieved by any other client session. It relates only to the current session and the value for the variable when it was last calculated.

Conclusion

There are several more parameters and possibilities for the SELECT statement that are not covered in this article to keep it at a reasonable length. A popular one that I left out is the GROUP BY clause for calculating aggregate data for columns (such as an average). There are several flags for caching results and a clause for exporting a results set to a text file. If you would like to learn more about SELECT and all of the options available, look at the online documentation for SELECT Statements on the MySQL website.

About the author

Russell Dyer has written on MySQL and other open sources topics for several magazines over the last three years. He is the author of MySQL in a Nutshell (O'Reilly 2005). He lives and works in New Orleans and can be reached at russell@dyerhouse.com.