Concatenating Fields in MySql

I’m working on an iPhone/iPad product that needs to select a word based on the phonemes in it. I have about 40 phonemes that are stored in a MySql table as booleans. I didn’t feel like adding 40 attributes to my datamodel and then I’d have to write 40 conditionals to figure out whether a word contained the phoneme. Fortunately, MySql has a CONCAT function String Operator that I could use to concatenate all the fields into one. Now I can use REGEX expressions to query one field to determine if it is on or off for a specific word.

My first attempt to understand the operator was to just copy the example.


UPDATE `ArticIV_Phonemes` SET `Concatenated_phonemes` = CONCAT(`R`,`VocR`)

This isn’t what I wanted since it treats the stuff between the `s as literals. The result is RVocR. Not the values of the fields. Removing the `s works.


UPDATE `ArticIV_Phonemes` SET `Concatenated_phonemes` = CONCAT(R,VocR)

Now I get, 01, 10, 00, etc. which is what I want.

One minor complication. One of the phonemes is \or\ so naturally I used or as it’s name. My treats this name as a conditional and the CONCAT operation fails. I also used a hyphen in some of the names. My doesn’t like that either. It probably thinks it’s an arithmetic operator.

This is what I ended up using.


UPDATE `ArticIV_Phonemes` SET `Concatenated_phonemes` = CONCAT(R_Words,Vocalic_R_Words,S_Words,Z_Words,L_Words,Voiced_Th_Words,Voiceless_Th_Words,
Initial_R, Initial_R_Clusters, Medial_R, Medial_R_Clusters, 
VocR_aer, VocR_ar, VocR_eer, VocR_er, VocR_ier, VocR_or, VocR_our, VocR_yer, 
Initial_S, Initial_S_Clusters, Medial_S, Medial_S_Clusters, Final_S, Final_S_Clusters, 
Initial_Z, Initial_Z_Clusters, Medial_Z, Medial_Z_Clusters, Final_Z, Final_Z_Clusters, 
Initial_L, Initial_L_Clusters, Medial_L, Medial_L_Clusters, Final_L, 
Final_L_Clusters, Pre_vocalic_L, Post_vocalic_L, Pre_and_Post_V_L, 
Initial_Th, Medial_Th, 
Initial_Voiceless_Th, Initial_Voiceless_Th_Clusters,
Medial_Voiceless_Th, Final_Voiceless_Th, Final_Voiceless_Th_Clusters)

And here are the first few lines.


00001000000000000000000000000000010000010000000
00000000100000000000000000000000000000000000000
00000000000000000100000100000000000000000000000

When I want to find all of the words that have say, VocR_ier, in them I use this REGEX


SELECT *  FROM `ArticIV_Phonemes` WHERE `Concatenated_phonemes` REGEXP '[01]{9}1[01]{34}'

Basically this says, look for 9 occurrences of either a 0 or a 1. Then a 1. Then the rest of the string can be either 0 or 1. Note: you must put the final part in. The first part of the REGEX looks for a string of 9 0’s or 1’s followed by a 1. It doesn’t care where they are in the string. They could be the last 10 digits or the first.

One way to get around that is to tell the REGEX to start at the beginning of the string and look for the pattern.
Note the ^ in this example.


SELECT *  FROM `ArticIV_Phonemes` WHERE `Concatenated_phonemes` REGEXP '^[01]{9}1'

Both examples give the same result,


00000000110000000000000000000000000000000000000
00000000110000000000000000000000100000000000000
00000000100000000000000000000000010000010000000

Here’s what I used in a different database.
UPDATE `OldArtic_Categories` SET `Concatenated_Category` = CONCAT(A, B, M, D, C, E, F, G, H, I, K, L, O, P, Q, R, S, T, U, V, W, Y, Z, N)