Regular Expressions Part 2

In the previous post, simple regular expressions were explained. Today, regex becomes useful. If you didn’t read the previous post, you should at least skim it.

For this post, all examples will be using perl.

Getting a Match
Parentheses are used to extract a match from a string. Let’s say you want to know what is inside the “head” html tags, here’s the code:


if ( $html =~ m/(.*)<\/head>/is ) {
print "HTML Header:\n$1\n";
}

The match is given to the code as the variable $1. Note that this example has an “s” after the closing forward slash. The s treats the string to be compared as a single line. Without it, you probably wouldn’t get a match. Also, this simple regex will not match the entire head in all cases. If you put “</head>” inside a meta keyword list, it would match, but stop at the first “</head>”.

Here’s another example:


if ( $text =~ m/ a ([aeiou][a-z]+)/i ) {
print "Grammar error: use \"an\" when the following word starts with a vowel.\n i.e. an $1\n";
}

Yup, it’s a grammar rule check. Now you know where that green squiggly underline comes from.

Whitespace and Non-whitespace matching
Whitespace refers to a space, tab, and carriage returns. “\s” matches a whitespace character and “\S” matches a non-whitespace character.

That’s the basic of regular expressions. These magically expressions work in almost every language, including perl, php, javascript, and python.

Happy pattern matching.