REGEX and MySQL

I’m working on my words database using phpMyAdmin and thought others might benefit from some of the search expressions.

Find words that start with ‘Wr’ or ‘wr’ and the CVC category is not null.


SELECT *  FROM `words` WHERE `word` REGEXP '^[Ww]r[a-z]*' 
AND `CVC` IS NOT NULL

This gives me words like, Rhoda, Rhone, rhyme because I’m asking to start at the beginning of the word ‘^’ then look for either a capital or lowercase w ‘[Ww] followed by an ‘r’ and then any letter ‘[a-z]’ that occurs zero or more times ‘*’. I’m further restricting the search to words that have an entry in the CVC category. In phpMyAdmin you can paste this into the SQL editing field or just use the REGEX in the word field and set it to REGEXP.

Here’s one looking for all the words that end in the letters mb and the CVC category is not null.


SELECT *  FROM `words` WHERE `word` REGEXP 'mb$' 
AND `CVC` IS NOT NULL

The ‘$’ means the end of the word.

This search yields words like climb, plumb, thumb.

This search looks for words like gnome and gnat. If you want you can leave out the [a-z]* and you’ll get the same result.


SELECT *  FROM `words_for_slps` WHERE `word` REGEXP '^[Gg]n[a-z]*' 
AND `CVC` IS NOT NULL

This one’s a little more complicated, I’m looking for words like tongue and dengue but don’t know if there are words with gue in the middle—like tongues.


SELECT *  FROM `words_for_slps` WHERE `word` REGEXP '^[a-zA-z]*ngue[a-z]*' 
AND `CVC` IS NOT NULL

You can’t do OR queries in phpMyAdmin interface everything is an AND. But if you do a query, then edit the SQL it is easy to make the query into an OR.

This is the original query to find words where the grade and level are NULL.


SELECT *  FROM `words_for_slps` WHERE `level` IS NULL AND `grade` IS NULL 
AND `CVC` IS NOT NULL

And here is the same code where all I changed is the AND to OR and just to be sure I’m getting the right results, I grouped the OR with parentheses. (I can never the default rules for the different languages so I always use parentheses so I’m sure I get what I want.)


SELECT *  FROM `words_for_slps` WHERE (`level` IS NULL OR `grade` IS NULL) 
AND `CVC` IS NOT NULL

I like this result but want to look deal with the shorter words first and then work my way up to the harder ones. I also want to skip the words that do not follow the rules (DNFR). Use <> for not equal.


SELECT *  FROM `words_for_slps` WHERE (`level` IS NULL OR `grade` IS NULL) 
AND `CVC` IS NOT NULL AND `CVC` <> 'DNFR' ORDER BY LENGTH(word)

Here’s the code I use to find all of the conosonants


([bcdfghjklmnpqrstvwxz]|ch|sh|zh|th|wh|ng|qu|wh|tch|ph)

And this finds all the vowels.


(a|e|i|o|u|aw|ow|ie|ae|oe|oy|oo|uu|y)

And this finds all of the CVCCV words. Note the {2} that finds exactly two occurrences of the previous search string.


SELECT *  FROM `words_for_slps` WHERE `word`
^([bcdfghjklmnpqrstvwxz]|ch|sh|zh|th|wh|ng|qu|wh|tch|ph)
(a|e|i|o|u|aw|ow|ie|ae|oe|oy|oo|uu|y)
([bcdfghjklmnpqrstvwxz]|ch|sh|zh|th|wh|ng|qu|wh|tch|ph){2}
(a|e|i|o|u|aw|ow|ie|ae|oe|oy|oo|uu|y)$