Regular Expressions Part 1/2

Regular Expressions, those oddities that live between two forward slashes, are very powerful and quite mysterious. Staring at something like /([abcdef0123456789]+)/i all day can give you a heaadache. With a little luck and a bit of hard work, you’ll know exactly what the previous expression means.

For this post, all examples will be using Perl.

Text Search
A regular expression, or regex, in its simplest form is a text search. Here’s an example:


$var = "Hello World";
if ( $var =~ m/Hello/ ) {
print "Match\n";
}

In perl, the operator =~ is used to run a regex against a variable. The m/Hello/ will match if the variable has “Hello” anywhere.

To make the match case-insensitive, simply add an i after the last forward slash. So change the regex to m/Hello/i to match “Hello”, “HeLlO” and “hello”.

Carets and Dollar Signs
A caret (^) at the beginning of a regex represents the beginning of a string. Here’s an example:


$var = "Hello World";
if ( $var =~ m/^Hello/ ) {
print "Match\n";
}

A dollar sign ($) at the end of a regex represents the end of a string. Another example:


$var = "Hello World";
if ( $var =~ m/World$/ ) {
print "Match\n";
}

If you want to match one of these special characters, put a backslash before it.


$var = "Hello^ $World";
if ( $var =~ m/e\^ \$W/ ) {
print "Match\n";
}

Braces
Putting a list of characters inside braces “[]” will match any of these characters.


$var = "Hello World";
if ( $var =~ m/[aeiou]/ ) {
print "There is a vowel.\n";
}

You can even tell if a string contains a hexadecimal value. This example uses the special character +. It means that the previous character must appear 1 or more times.


$var = "0x157afde";
if ( $var =~ m/^0x[0123456789abcdef]+$/ ) {
print "It is hexadecimal\n";
}

Within the braces, instead of listing every possible character, you can specify a range to be matched. For instance, 0-9 will match any digit 0 through 9. Here’s a slightly shorter example:


$var = "0x157afde";
if ( $var =~ m/^0x[0-9a-f]+$/ ) {
print "It is hexadecimal\n";
}

The caret (^) continues its job as a special character within braces. Putting one at the beginning of the braces will match anything but those listed inside the braces.


$var = "0x157afde";
if ( $var =~ m/^[^0-9a-fx]+$/ ) {
print "It is not hexadecimal\n";
}

Periods and Asterisks
A period (.) will match any character.


$var = "Hello World";
if ( $var =~ m/^H.llo W.+$/ ) {
print "Match\n";
}

An asterisks (*) is similar to a plus sign (+), but an asterisks will match 0 or more of the previous character.


$var = "Hello World";
if ( $var =~ m/^Hello .*$/ ) {
print "Saying hello\n";
}

Tomorrow
Tomorrow, more special characters, including white-space characters, non-whitespace characters, and matching parentheses.