Friday, 7 August 2020

Expressway Regex

 Credits for this article goes to https://www.collabarchitects.co/

 Test your Regex at https://regex101.com/

 

Example 1: Match a specific URI

 

.*@example.webex.com

 

We often use a regex like the one above to route calls to a specific URI.  In this case, we're matching anything with the domain example.webex.com.  Here's how we did it:

 

. matches any single character

 

* matches the character proceeding it zero or more times 

 

Thus .* would match any single character zero or more times

 

Add the domain @example.webex.com as a qualifier to only match the characters proceeding that specific domain.

 

Make sense?  Let's use an example:

 

jonathan@example.webex.com is a MATCH

 

jonathan123@example.webex.com is a MATCH

 

jonathan@acme.webex.com is NOT A MATCH, we are only matching URIs with the specific domain example.webex.com

 

Pretty simple, right?

 

Example 2: Match all URIs EXCEPT a specific URI

 

(?!.*@example\.com*$).*

 

This regex is often used to match all domains that are not the local domain (i.e. external domains).  In the above example, we're matching everything except example.com.  Here's how we did it:

 

( ) nests characters for grouping

 

.* matches all characters (remember our previous example?)

 

Thus, the regex reads: check the expression in the ( ) and otherwise match everything .* 

 

Do you understand thus far?  If not, go back to our first example. Now comes the interesting part:

 

?  matches zero or one occurrence of a pattern; thus ba?b matches bb and bab, but not baab

 

?! is an advanced regex called a lookaround.  More importantly, it's a negative lookahead.  Negative lookaheads require that a specific pattern NOT be met in the expression to the right.  In this case, we require example.com to not be in the expression for a match.

 

\ is an escape for a special character.  In our example, we want . in .com to be matched and thus need to use the \ 

 

* matches the character proceeding it zero or more times

 

$ matches the character or null string at the end of an input string; thus 123$ matches 0123, but not 1234

 

Thus, the regex ?!.*@example\.com*$ is read: exclude any expression matching .*@example.com

 

Let's put it all together: (?!.*@example\.com*$).* should be read: exclude any URI with the domain example.com but match all other URIs

 

Make sense?  Let's use an example:

 

jonathan@example.com is NOT A MATCH

 

jonathan123@example.com is NOT A MATCH

 

jonathan@acme.com is a MATCH

 

That one was a bit tougher, lookarounds are not for the faint of heart.  Grab a cup of coffee, let's start looking at using regex replacement strings.

Example 2: Use Replace

 

The Replace function in Expressway transforms is exceedingly useful when you need to modify an inbound URI or set of digits.  For example, we often want Expressway registered endpoints to dial a 5 digit internal numbers and route to CUCM.  To properly route, we need to take the 5 digit sting and convert to a URI.  Here's how we did it:

 

Match Pattern String: \d{5}

Behavior: Replace

Replacement String: \1@example.com

 

\d matches any single digit.  The {5} modifies the meaning to match any set of 5 digits.  Thus, 12345 matches but 123456 does not.

 

\1 matches the same text as was most recently matched.  In our case, it matches the same 5 digits that were matched in the first string.

 

Thus, the regex: \d{5} replace \1@example.com should be read; match any five digits and add example.com to the domain.

 

Make sense?  Let's use an example:

 

55555 is a MATCH which outputs 55555@example.com

 

666666 is NOT A MATCH

 

jonathan@example.com is NOT A MATCH

 

 

How about another example?  Here we want to match any 10 digit number dialed, excluding a number starting with 0, and add a domain to convert a digit string into a URI.  This would allow a video endpoint to dial a 10-digit PSTN number.

 

Match Pattern String: ([^0]*)

Behavior: Replace

Replacement String: \10@example.com

 

Let's start with the matching pattern string: ([^0]*)

 

( ) nests characters for grouping

 

[ ] match characters or a range of characters separated by a hyphen.  Thus, [1-9] matches 1,2,3 but not 0

 

^ matches the character or null string at the beginning of an input string.  Thus, ^123 matches 1234 but not 01234

 

Thus, the regex ([^0]*)​ is read: exclude any expression starting with 0 but match everything else.

 

Now, time for the replacement string: \10@example.com

 

\ when used in a replacements string, matches the number of characters following the backslash

 

Let's put it all together: ([^0]*) Replace \10@example.com should be read: match any expression not starting with 0 and create a URI with the first 10 digits and example.com as the domain.

 

Make sense?  Let's use an example:

 

8162223333 is a MATCH which outputs 8162223333@example.com

 

081622333 is NOT A MATCH

 

jonathan@example.com is NOT A MATCH