Sunday, August 4, 2013

Perl Regular Expressions


Introduction to Regular Expressions

Regular expressions are used to match, change or translate strings against a pattern. In a traditional programming language, you will need to write a routine that scans your string and then do whatever you need to do. This approach is acceptable if the pattern is fairly simple. However, if your pattern is complicated, you will have to write a complicated routine to do your matching and comparing and translating.

Simple Example

If you want to match a string with another string, this is fairly easy.

Check the string "Concept Solutions Corporation" and see if it has the string "on" in it.

Unless the language has a pattern matching function, you have to write a pattern matching function yourself. You will have to get the length of the string, go through the string one byte at a time and match the string "on" against each byte. If you want to take the last string that matched, you need a routine to 'remember' that it has found a first match. To make our life easier, we'll just assume that we want to see if the string has the pattern "on".

You can write a routine to do this:
$str = "Concept Solutions Corporation"; 
foreach (1..length($str)-2) {
    if (substr($str, $_, 2) eq "on") {
        print "String found/n"; last; 
    }
}

Note that this is just a simple compare. If you want to return the string before the pattern, you will need to add more code. This whole routine can be written this way in Perl using regular expressions:

$str = "Concept Solutions Corporation";
print "String found" if $str =~ m/on/;

Yes, that's right, you have just reduced 7 lines of code to just 2. Talk about efficiency - and that is just the start.

Details, Details

Now that I've got you interested into learning about regular expressions, let's discuss more details. The regular expression feature matches a string against a pattern. In this case, the string is: "Concept Solutions Corporation" and the pattern is "on". You create patterns by using the funny looking symbols to signify something. These funny looking characters are called metacharacters or qualifiers

Metacharacters are used to define the pattern.

Qualifiers are used to limit the number of times the pattern is searched. So in our example pattern "on", the metacharacters are o and n - fairly straightforward. 
In our next article, we will discuss about using other metacharacters to create more complicated patterns.

Qualifiers are used to limit the number of times the pattern is searched. So in our example pattern "on", the metacharacters are o and n - fairly straightforward. In our next article, we will discuss about using other metacharacters to create more complicated patterns.

No comments:

Post a Comment

Follow by Email