Extract the first, longest pattern that matches a regular expression.
You could add a parameter or write a similar function to get the shortest match instead (by starting with e = s and then incrementing e, instead of starting with e = LENGTH(string) and decrementing).
CREATE FUNCTION REGEXP_EXTRACT(string TEXT, exp TEXT) -- Extract the first longest string that matches the regular expression -- If the string is 'ABCD', check all strings and see what matches: 'ABCD', 'ABC', 'AB', 'A', 'BCD', 'BC', 'B', 'CD', 'C', 'D' -- It's not smart enough to handle things like (A)|(BCD) correctly in that it will return the whole string, not just the matching token. RETURNS TEXT DETERMINISTIC BEGIN DECLARE s INT DEFAULT 1; DECLARE e INT; DECLARE adjustStart TINYINT DEFAULT 1; DECLARE adjustEnd TINYINT DEFAULT 1; -- Because REGEXP matches anywhere in the string, and we only want the part that matches, adjust the expression to add '^' and '$' -- Of course, if those are already there, don't add them, but change the method of extraction accordingly. IF LEFT(exp, 1) = '^' THEN SET adjustStart = 0; ELSE SET exp = CONCAT('^', exp); END IF; IF RIGHT(exp, 1) = '$' THEN SET adjustEnd = 0; ELSE SET exp = CONCAT(exp, '$'); END IF; -- Loop through the string, moving the end pointer back towards the start pointer, then advance the start pointer and repeat -- Bail out of the loops early if the original expression started with '^' or ended with '$', since that means the pointers can't move WHILE (s <= LENGTH(string)) DO SET e = LENGTH(string); WHILE (e >= 1) DO IF SUBSTRING(string, s, e) REGEXP exp THEN RETURN SUBSTRING(string, s, e); END IF; IF adjustEnd THEN SET e = e - 1; ELSE SET e = s - 1; -- ugh, such a hack to end it early END IF; END WHILE; IF adjustStart THEN SET s = s + 1; ELSE SET s = LENGTH(string) + 1; -- ugh, such a hack to end it early END IF; END WHILE; RETURN NULL; END
This is awesome. Period.
Comment by Eric Lewis — January 10, 2012 @ 4:57 pm
Agree with Eric. Bookmarkin’ it.
Comment by Matt — January 10, 2012 @ 5:01 pm
How advanced can the expressions passed be?
Comment by Mark — January 29, 2016 @ 12:24 pm
Oh now I see the REGEXP inside the created function, so I’m assuming anything that REGEXP supports
Comment by Mark — January 29, 2016 @ 1:37 pm