Tag Archives: REGEXP_SUBSTR

REGEXP_LIKE, REGEXP_REPLACE & REGEXP_SUBSTR Regular Expressions function in SQL

Regular Expressions function in SQL

Regular Expression belongs to search pattern, used in SQL query for helping in search of a specific pattern in column data. Oracle SQL supports regular expressions with the pattern-matching condition and functions

Example:
Search pattern may be difficult. It helps to find.
This regular expression matches any string that begins with either s or ss, followed by qq, optionally followed by s, followed by the colon (:).
(s|ss)qqs?:
Explain:
pipe symbol(|) indicates a choice between the data from either side of it. ( s | aa) choose either s or aa.
question mark (?) indicates that the preceding element, s, is optional

Following are the function present in Oracle SQL:

REGEXP_LIKE Condition that can appear in the WHERE clause of a query, causing the query to return rows that match the given pattern.
Example:
This WHERE clause identifies employees with the first name of Steven or Stephen:
WHERE REGEXP_LIKE((hr.employees.first_name, '^Ste(v|ph)en$')

REGEXP_COUNT Function that returns the number of times the given pattern appears in the given string.
Example:
Function returns the number of times that e (but not E) appears in the string ‘Albert Einstein’, starting at character position 7. last c column specify case sensitive
REGEXP_COUNT('Albert Einstein', 'e', 7, 'c')

REGEXP_INSTR Function that returns an integer that indicates the starting position of the given pattern in the given string.
Example:
This function invocation returns the starting position of the first valid email address in the column hr.employees.email
REGEXP_INSTR(hr.employees.email, '\w+@\w+(\.\w+)+')
If the returned value is greater than zero, then the column contains a valid email address.

REGEXP_REPLACE Function that returns the string that results from replacing occurrences of the given pattern in the given string with a replacement string.
Example:
This function invocation puts a space after each character in the column hr.countries.country_name:
REGEXP_REPLACE(hr.countries.country_name, '(.)', '\1 ')

REGEXP_SUBSTR Function returns the matching substring itself.
Example:
This function invocation returns ‘Oracle’ because the x option ignores the spaces in the pattern:
REGEXP_SUBSTR('Oracle 2010', 'O r a c l e', 1, 1, 'x')

Pattern Matching Option

i Case insensitive Matching
c case sensitive matching
n allow the dot(.) operator for matching new line character. E.g. REGEXP_SUBSTR(‘a’||CHR(10)||’d’, ‘a.d’, 1, 1, ‘n’)
Note: CHR(10) is for newline char
m allow multiline mode, string contain multiple lines. E.g. REGEXP_SUBSTR(‘ab’||CHR(10)||’ac’, ‘^a.’, 1, 2,’m’)
x Ignores whitespace characters in the string. REGEXP_SUBSTR(‘Oracle’, ‘O r a c l e’, 1, 1, ‘x’)

Example: you can use both option as show below:
Note: i for case insenstive and x for white space removal.

SQL> select REGEXP_SUBSTR('Oracle', 'o r a cle', 1, 1, 'ix') from dual;
REGEXP
------
Oracle
-- Remove the case insensitive i then it return blank.
SQL> select REGEXP_SUBSTR('The Oracle', 'o ra cle', 1, 1, 'x') from dual;
R
-

Operator meaning
. – means matching any character in database including newline like a.b find abc, adb, afb etc but not bcb or bdb.
+ – means matching one or more occurrence of preceding expression. like a+ find a, aa, aaaa etc but not ba or ab.
* – means matching zero or more occurrence of preceding expression. like ab*c find ac,abc,abbbc etc but not abd or abk.
? – means matching zero or one occurrence of preceding expression. like ab?c find ac and abc but not abbc or adc.
{m} – means matching exact occurrence of preceding expression. like a{3} find aaa but not aa.
{m,} – means matching at least m occurrence of preceding expression. like a{3,} find aaa and aaaa but not aa.
{m,n} – means matching m occurrence but not more than n. like a{3,4} find aaa and aaaa but not aaaaa or aa.
[char…] – means matching any single character in string like [abc] first character find all, bill, cold but not doll.
[^char…] – means not matching any single character in string for selected position. like [^ab] first character should but not be a or b.
\ – means treat operator character as a literal. like abc\+def with slash(\) you can use + sign as data.
^ – means matches begin of line with the string. like ^def matches the starting string defghi but not abcdef.
$ – means matches one of line with string. like def$ matches the ending string abcdef but not defghi.
\d – means matching the digit character like ^\(\d{3}\) is (550) but not 550.
\D – means match with non digit character like \d\D is 4b or 3_ but not 22.
\w – means matches with a world character like \w+@\w+(\.\w+)+ is sunny@gmail.com but not sunny@gmail.
\W – means matches a non world character like \w+\W\s\w+ matches the string to: bill but does not match to bill
\s – means matches a white space like \(\w\s\w\s\) is ( a b ) but not match (ab) or (a,b.)
\S – means matches a non white space character like \(\w\S\w\S\) is (abde) and (a,b.) but not match (a b d e)

Example of using the Regular Expression operator:

Check the phone number in particular format:

(REGEXP_LIKE (p_number, '^\(\d{3}\) \d{3}-\d{4}$'))