mySQL: UNION

I wanted to find all of the emails in my customer database that were duplicates. This could be because several people at the same business share an address or because someone is in the database twice with slightly different spellings of their name e.g. Kathy and Kathleen, or with and without a middle initial, etc.

Before I started I set all of the email addresses that were empty strings to NULL, since joins don’t match on NULL but do on empty strings.

Then on the first try I used this:


SELECT t1.name, t1.email, t2.name, t2.email
FROM customers AS t1
INNER JOIN customers AS t2 
ON t1.email = t2.email
WHERE t1.name <> t2.name

Or in WHERE notation.

SELECT t1.name, t1.email, t2.name, t2.email
FROM customers AS t1, customers AS t2 
WHERE t1.email = t2.email
AND t1.name <> t2.name

I got 384 records, but I got a lot of duplicates where t1.name and t2.name are reversed. i.e.


Helen Jaffa     hjaffa@ino.net   Helen A Jaffa   hjaffa@ino.net
Helen A Jaffa   hjaffa@ino.net   Helen Jaffa     hjaffa@ino.net

To eliminate duplicates I used a UNION and two OUTER JOINS


SELECT t1.email, t1.name
FROM customers AS t1
LEFT OUTER JOIN customers AS t2 ON t1.email = t2.email
WHERE t1.name <> t2.name
UNION
SELECT t2.email, t2.name
FROM customers AS t2
LEFT OUTER JOIN customers AS t1 ON t1.email = t2.email
WHERE t1.name <> t2.name

This yielded 343 records and no duplicates.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.