MySQL: Working with Multiple Tables

As mentioned earlier, I’m re-reading the Visual QuickStart Guide to SQL and working out some examples. I’m using phpMyAdmin, so the examples here have backticks. I’m using two tables. The first is a table of words and definitions. The second is examples of word usage from the internet.

In this query, I’m looking for the words that have the most usage citations. It turns out that the top three are egregious with 21 citations, interlocutor with 15, and trepidation with 15.


SELECT `words`.`word`, COUNT(*)
FROM `words` 
INNER JOIN `usage`
ON `words`.`word` = `usage`.`word`
GROUP BY `words`.`word`
ORDER BY COUNT(*) DESC
LIMIT 20

Find all the words that are not in the definition portion – mostly alternate spellings and plurals. This table has the number of rows that are in the usage table.


SELECT `words`.`word`, `usage`.`word` 
FROM `words` 
RIGHT OUTER JOIN `usage`
ON `words`.`word` = `usage`.`word`
ORDER BY `words`.`word`  ASC

After I cleaned up the tables so that every word has at least one usage example and every example is associated with a word I ran the counts again. Just for fun I ran a RIGHT OUTER JOIN or a LEFT OUTER JOIN instead of an INNER JOIN. As you can probably guess, I got the same result with all three methods.

You can also use WHERE syntax and in this case it yields the same result as well.


SELECT `words`.`word`, `usage`.`word` 
FROM `words`,`usage`
WHERE `words`.`word` = `usage`.`word`
GROUP BY `words`.`word`

Simulate a FULL OUTER JOIN. In this case, since we cleaned up the database in the previous step, it has the same number of rows as words.


SELECT `words`.`word`, `usage`.`word`
FROM `words`
LEFT OUTER JOIN `usage`
ON `words`.`word` = `usage`.`word`

UNION

SELECT `words`.`word`, `usage`.`word`
FROM `words`
RIGHT OUTER JOIN `usage`
ON `words`.`word` = `usage`.`word`

BIG difference between UNION and UNION ALL. UNION ALL doesn’t remove duplicates.

While I was cleaning up tables, I looked at my products and product details tables to see if there were any orphaned product details. This query finds the rows in product_details that are not in product:


SELECT `product`.`name`, `product_details`.`product_id`
FROM `product`
RIGHT OUTER JOIN `product_details`
ON `product`.`id` = `product_details`.`product_id`

I already know that there are no products without details, otherwise the website would be broken. But this query checks for that.

SELECT `product`.`name`, `product_details`.`product_id`
FROM `product`
LEFT OUTER JOIN `product_details`
ON `product`.`id` = `product_details`.`product_id`

Miscellaneous MySQL

Case sensitive selection

These three methods are equivalent.


SELECT name FROM `customers` WHERE CAST(name AS BINARY) LIKE '%Net%'

SELECT name FROM `customers` WHERE name COLLATE utf8_bin LIKE '%Net%'

SELECT name FROM `customers` WHERE name LIKE BINARY '%Net%'

It also works for non-latin characters. This search will find only names with an “ñ” in the name. If you leave out the BINARY you will get all the names with an “n” in the name.


SELECT name FROM `customers` WHERE name LIKE BINARY '%Muñoz%'

BETWEEN

You can search fields containing numbers or characters using BETWEEN.


SELECT * FROM `customers` WHERE id BETWEEN 900 AND 999

You can also search a character string of numbers with BETWEEN.


SELECT * FROM `customers` WHERE zip BETWEEN 90000 AND 99999
SELECT * FROM `customers` WHERE zip*2 BETWEEN 90000 AND 99999

These also work:

SELECT * FROM `customers` WHERE name BETWEEN 'Ja' AND 'Jz'
SELECT * FROM `customers` WHERE name BETWEEN 'Jam' AND 'Jaz'
SELECT * FROM `customers` WHERE city BETWEEN 'A' AND 'M'

You might expect this to work, but it doesn’t. The first set of characters must be alphabetically before the second set of characters.

SELECT * FROM `customers` WHERE name BETWEEN 'Ja' AND 'Hz'

IN

This could come in handy if you want to restrict searches to just a few of the possible values in a field.


SELECT * FROM `customers` WHERE city IN ('Tulsa', 'Orlando')

Derived Columns: CONCAT

mySQL uses the CONCAT function not the concat operator, ||, for concatenation.


SELECT CONCAT(city, ', ', state) AS City_State FROM `customers`

This displays the column heading for the result as City_State, but it is not available for use in a WHERE clause. This fails with an unknown column error:

SELECT CONCAT(city, ', ', state) AS City_State 
FROM `customers` 
WHERE City_State = 'Anchorage, AK'

However, you can wrap the query as follows,

SELECT *
FROM (SELECT CONCAT(city, ', ', state) AS City_State 
FROM `customers`) 
AS firstQuery
WHERE City_State = 'Anchorage, AK'

This doesn’t work in phpMyAdmin or mySQL but works in other DBMSs:


SELECT * FROM `customers`
WHERE city || ', ' || state = 'Anchorage' || ', ' || 'AK'

Substring gotcha

If you want to find the cities that start with ‘Los’, you might be tempted to write the SUBSTRING as SUBSTRING(city, 0, 2) like any normal string access e.g. PHP. But MySQL starts with 1 and then has the number of characters you want to extract. So you’d do it like this:


SELECT city FROM `customers` WHERE SUBSTRING(city FROM 1 FOR 3) LIKE 'Los'

UPPER and LOWER

You can transform fields or literals in SELECT, WHERE, and ORDER BY clauses.

Since all of the state abbreviations are in upper case, this works:


SELECT state FROM `customers` WHERE state COLLATE utf8_bin LIKE UPPER('%ca%')

as does this:

SELECT state FROM `customers` WHERE LOWER(state) COLLATE utf8_bin LIKE'%ca%'

Combining operators

You might want to uppercase the address for mailing labels, this is one way to do it.


SELECT UPPER(CONCAT(city, ', ', state, ' ', zip)) AS City_State_Zip FROM `customers`

The result is

ST. ALBERT, AB T8N 2N9
ANCHORAGE, AK 99503

Interesting Query

Try this query:


SELECT CURRENT_DATE AS  'Date', CURRENT_TIME AS  'Time',  'Well Golly' AS 'Site'

It yields one row:

Date        Time      Site
2014-12-05  13:19:38  Well Golly

Then try this one:

SELECT CURRENT_DATE AS 'Date', CURRENT_TIME AS 'Time', 'Well Golly' AS 'Site', name FROM `customers`

It yields one row for each row in the database:

Date        Time      Site       name
2014-12-05  13:31:59  Well Golly Carmen Edsel
2014-12-05  13:31:59  Well Golly Jitka Dragonola
2014-12-05  13:31:59  Well Golly Deryk Bealer

GROUP BY and COUNT(*)

When testing this one out, I found some values that shouldn’t be in the state field.


SELECT state, COUNT(*) FROM `customers` GROUP BY state 

state   COUNT(*)
D       1
da      1

If you only want selected states, you can use the IN selection criterion:

SELECT state, COUNT(*) 
FROM `customers` 
WHERE state IN ('CA', 'IN') 
GROUP BY state 

Or if you want to find the number of customers in the top 20 cities, ordered from most to least.

SELECT city, state, COUNT(*) as count 
FROM `customers` 
GROUP BY city, state 
ORDER BY count DESC 
LIMIT 20

Note that the SELECT and GROUP BY must have the same fields or you will get incorrect results. In this case, leaving out the state in the group by yields the most customers in Brooklyn, FL

HAVING

Use the HAVING search condition to filter the results.


SELECT city, state, COUNT(*) as count 
FROM `customers` 
GROUP BY city, state 
HAVING COUNT(*) > 50 
ORDER BY count DESC

HAVING is applied after GROUP BY. Its main use is to filter on aggregates—SUM, COUNT, AVG, etc.

In general, you can’t reference aliases in WHERE or GROUP BY clauses, but MySQL allows referencing SELECT level aliases in GROUP BY, ORDER BY and HAVING.

mySQL: INNER JOIN, LEFT OUTER JOIN, RIGHT OUTER JOIN

As mentioned earlier, I’ve been brushing up on my JOIN terminology using data in my mySQL databases. The results here probably apply to other SQL DBMSs as well.

Differences

These three joins result in almost the same rows with minor—crucial—differences. The INNER JOIN returns all of the rows in the first table that have a match in the second table. The order in which the tables is specified doesn’t matter. If there is a row in either table that doesn’t have a match (specified by the join condition) it doesn’t appear in the join.

The LEFT OUTER JOIN returns all of the rows in the first table regardless of whether there is a matching the second table. If there is no match in the second table, the values for those fields are NULL. The RIGHT OUTER JOIN returns all of the rows in the second table regardless of whether there is a matching the first table. If there is no match in the first table, the values for those fields are NULL. As you can see by the specification, the order in which the tables are specified for the outer joins matters.

Examples

In my database the `apps` table has 30 rows. The `category` table has 10 rows. Two of the apps do not have a category assigned to them. Two of the categories have no apps associated with them.

In my database, this code returns 28 rows—the two apps with no assigned category do not appear.


SELECT apps.name, app_category.name 
FROM apps
INNER JOIN app_category 
ON category_id = app_category.id

This code returns 30 rows—the two apps with no assigned category appear in the result and the result and the app_category.name is NULL.


SELECT apps.name, app_category.name 
FROM apps
LEFT OUTER JOIN app_category 
ON category_id = app_category.id

This is a bit more complicated. This code returns 30 rows—the two apps with no assigned category do not appear in the result. The two categories that have no matching apps have an app.name of NULL.


SELECT apps.name, app_category.name 
FROM apps
RIGHT OUTER JOIN app_category 
ON category_id = app_category.id

Inner Joins

I’m re-reading the Visual QuickStart Guide to SQL and working out some examples. I don’t use a whole lot of joins on my websites and I’ve never given the topic much thought. In fact, if you asked me, I would have said that I never use joins, because none of my code uses the word.

It turns out that a lot of the queries on my websites use INNER JOINs, but without using the INNER JOIN keyword. In this example I have a table of products and a table of categories. Each product belongs to one category and the category key is in the product table. In this case the category table has unique values for for `name`.

I frequently use something like this:


SELECT product.name, product_category.name 
FROM product, product_category
WHERE product.category_id = product_category.id

Which is equivalent to this using JOIN syntax.

SELECT product.name, category.name 
FROM product
INNER JOIN product_category AS category
ON product.category_id = category.id

Here’s an example where I combine three tables. The product table has information about the product. It has a unique id for each product. The details table has selling information about the product. This table has a one-to-one mapping to the product table using `product_id`. The category table is the same table as in the example above. It is a one-to-many mapping. Note that the order in which the joins are made doesn’t matter.


SELECT product.name, details.itemNum, details.price, category.name, details.taxable, details.shipping_charge
FROM product, product_details AS details, product_category AS category
WHERE product.id = details.product_id
AND product.category_id = category.id
AND  details.isForSale = 'true'
ORDER BY product.name

Which is equivalent to this using JOIN syntax.

SELECT product.name, details.itemNum, details.price, category.name, details.taxable, details.shipping_charge
FROM product
INNER JOIN product_details AS details
ON product.id = product_id
INNER JOIN product_category AS category
ON product.category_id = category.id
WHERE details.isForSale = 'true'
ORDER BY product.name

Backticks in mySQL

When using phpMyAdmin to work on mySQL databases, you may have noticed that database and table names, as well as fields, are surrounded by backticks. If you do a query, the result may look something like this:


SELECT *  FROM `product` WHERE `name` LIKE 'Articulation'

But you can take them out and it still works.


SELECT *  FROM product WHERE name LIKE 'Articulation'

There is a discussion of this at StackOverflow and the consensus is that it is a non-standard practice that could cause problems if you migrate to another DBMS, but it has advantages if you are sticking with mySQL.

If you use backticks in your code, you can use reserved keywords for table names. You can also use table names with spaces.

“It’s a lot easier to search your code-base for something in backticks. Say you have a table named event. grep -r “event” * might return hundreds of results. grep -r “\`event\`” * will return anything probably referencing your database.” (EllisGL)

If you use them consistently, it does make the code a bit easier to read, but harder to write. Note that if you need more than the table name, you put backticks around the database as well as the table. e.g. `myDatabase`.`myTable` NOT `myDatabase.myTable`

If you are doing a bunch of things in a query it can get a bit messy. Here’s an example with and without.


SELECT `product`.`name`, `category`.`name` 
FROM `product`
INNER JOIN product_category AS category
ON `product`.`category_id` = `category`.`id`

But you can take them out and it still works.


SELECT product.name, category.name 
FROM product
INNER JOIN product_category AS category
ON product.category_id = category.id

You can even take out just some of them and it works fine, however it is probably not good coding practice.


SELECT product.name, `category`.name
FROM `product`
INNER JOIN product_category AS category
ON `product`.`category_id` = `category`.`id`

Note that you still need to use quotes for search terms (like ‘Articulation’ above) and when inserting into non-numeric fields.

I haven’t given this topic any thought before now, but looking at my PHP code, I noticed that I never use backticks in my queries.