Introduction to Regular Expressions
In our introductory article on Perl Regular Expressions, we showed how regular expressions can cut down the code that you have to write. This article discusses the format of regular expressions.
The regular expression format is like this:
action/pattern/modifier
The action can be any of the following:
The pattern is made up of characters that Perl recognizes: Perl uses metacharacters to denote something within the pattern. Listed below are the more common metacharacters.Action | Description | Format |
Search once | /pattern/modifier | |
s | Substitute | s/oldpattern/newpattern/modifier |
m | Match | m/pattern/modified |
y | Translate | y/replacecharacter/withthese/ |
q | Single quote; similar to'string' | q/string/ |
qq | Double quote; similar to"string" | qq/string/ |
qr | Compile the regular expression | qr/regexp/ |
qw | Compile as a list | qw/item1 item2 item3 item4 item5 .../ |
Metacharacter | Descripion | Example | Explanation |
\
| Generally, treat the next character as is | \*\+ | This represents the *+ characters as is |
*
| Previous character 0 or more times | A* | The letter A 0 or more times |
+
| Previous character 1 or more times | A+ | The letter A 1 or more times |
?
| Previous character 0 or 1 time | N? | The letter N 0 or 1 time |
^
| String being matched should start with the pattern following the ^ | ^Concept | String should begin with the string Concept |
.
| Represents any character | ... | Any 3 characters |
$
| String being matched should end with the pattern preceeding the $ | Corporation$ | String should end with the stringCorporation |
[ ]
| Any characters enclosed between the two brackets | [1358] | Character should be 1, 3, 5 OR 8 |
{x, y}
| Character should occur x minumum number of times and not more than y times | 4{1, 3} g{3} L{4,} | Character 4 occurs 1 to 3 times Character g occurs 3 times Character L occurs at least 4 times |
Modifiers
Modifiers define how a pattern is to be used. Below are the more common modifiers.
Modifier | Used in | Explanation |
i | Match and substitute | Case insensitive |
g |
Match and substitute
|
Search all occurences of the pattern
|
e |
Substitute
|
Evaluate the first pattern
|
$_ = 'Concept*Solutions*Corporation';
Regular Expression | Explanation | Result |
/*/ | Invalid regular expression because * should be preceded by a character |
Script error
|
/\*/ | Search for an asterisk (*) |
1
|
/Solutions/ | Search for a string 'Solutions' |
1
|
/solutions/ | Search for a string 'solutions' - not found because the search is case sensitive |
0
|
/solutions/i | Search for a string 'solutions' - found because the search is NOT case sensitive |
1
|
s/C/K/ | Substitute the first upper case C with a K |
Result is 1
$_ will have: Koncept*Solutions*Corporation |
s/C/K/g | Substitute all upper case C with a K |
Result is 2
$_ will have: Koncept*Solutions*Korporation |
s/C/K/gi | Substitute all C with a K regardless of case |
Result is 3
$_ will have: KonKept*Solutions*Korporation |
y/con/KUD/ | Change c to K, change o to U, change n to D |
Result is 10
$_ will have: CUDKept*SUlutiUDs*CUrpUratiUD |
m/^[con]/gi | First character of string should be c, o or n, case is NOT sensitive |
1
|
s/.n/\+\^/gi | If the substring has a character followed by an n, change the two characters to a + and @ |
Result is 3
$_ contains: C+@cept*Soluti+@s*Corporati+@ |
These are just some of the basic regular expressions. The only way you can learn this is to try out combinations yourselves. To do this, you can write a simple perl script based on this:
|
Have fun!
No comments:
Post a Comment