perl 方法

这是一个与班级相关的功能。与函数属于命名空间的方式相同,方法属于类。

methods.pl
# When you call a method, you do so with an invocant. 
# When you call a method on an object, that object is the invocant:
my $choco = Cat->new;
$choco->sleep_on_keyboard;

# A method's first argument is its invocant($self). Suppose a Cat can meow():
package Cat {
  use Moose;
  
  sub meow {
    my $self = shift;
    say 'Meow!';
  }
}
# the cat always meows three times at 6 am 
my $fuzzy_alarm = Cat->new;
$fuzzy_alarm->meow for 1..3;

perl 智能匹配

这使用智能匹配运算符(~~)来比较两个操作数,如果匹配则返回真值。

smart_matching.pl
\!h # This is an experimental feature, not using it properly might warrant confusing results. 
\!h # Stick to simple operations between two operands.

use experimental 'smartmatch';
say 'They match (somehow)' if $l_operand ~~ $r_operand; 
 
# The type of comparsion usually depends first on the type of the right operand then on the left.
# Scalar with numeric component --> numeric equality
# Regex --> a grep or a pattern match
# Array --> a grep or a recursive smart match
# Hash --> checks existence of one or more keys 
# ... more on perldoc perlsyn

perl 正则表达式修饰符的基础知识

这些改变了正则表达式运算符的行为,出现在匹配的末尾,替换和qr //运算符。

regex_modifiers.pl
# Simple example with case sensitivity:
my $pet = 'ELLie';

like $pet, qr/Ellie/, 'Nice Puppy!'; # won't work
like $pet, qr/Ellie/i, 'shift key brOken'; # works because of /i modifier which ignores case

# Embed regex modifiers within a pattern:
my $find_a_cat = qr/(?<feline>(?i)cat)/; # enables case-insensitive matching only for its enclosing group - here, its the named capture
# To disable specific modifiers, precede them with the minus character:
my $find_a_rational = qr/(?<number>(?-i)Rat)/;

\!h # The multiline operator (/m) - allows the ^ and $ anchors to match at any newline embedded within the string
\!h # The /s modifier treats the source string as a single line such as that the '.' metacharacter matches the newline character.
# i.e. /m modifies the behaviour of multiple regex metacharacters, while /s modifies that of a single one.

perl 其他转义序列

通常,您可以使用反斜杠转义特殊字符。但是,使用\Q和\E字符可以更简单,更清晰地执行此操作。这些禁用其边界内的元字符解释。

meta_disable.pl
my ($text, $literal_text) = @_;
return $text =~ /\Q$literal_text\E/;

perl 分组和交替

如果您需要匹配不同的可能单词,则轮换可能很有用。分组有助于提高效率。

alternation.pl
# To match apple or orange:
/apple|orange/

# Perl tries to match the first possible position:
"apple and orange" =~ '/apple|orange|mango/'; # matches apple
"apple and orange" =~ '/orange|apple|mango/'; # matches orange

\!h # Grouping 
# the issue with regular alternation is that you have to repeat words sometimes:
/housemate|housemaid|houseman/
# grouping solves this issue with parentheses:
/house(mate|maid|man)

# Full example - grouping metacharacter in regular expression:
#!/usr/bin/perl
use warnings;
use strict;
 my @words = ('housemaid',
        'housemate',
        'household', 
        'houseman',
        'house');
        
for (@words) {
  print("$_\n") if(/house(maid|mate|man)/);
}
  
  
}

perl 捕获

正则表达式允许您对匹配的部分进行分组和捕获以供以后使用。

capturing.pl
\!h # Example - extract an American telephone number of the form (202)456-1111:
my $area_code = qr/\(\d{3)\)/; # escaped parentheses
my $local_number = qr /\d {3}-?\d{4}/;
my $phone_number = qr/$area\s?$local_number/;

\!h # Named captures - capture portions of matches from applying a regex and access them later.
# Example - extracting a phone number from contact information:
if ($contact_info =~ /(?<phone>$phone_number)/) {
  say "Found a number $+{phone}";
}

# Named capture syntax: 
(?<capture name> ...)
# the ?<name> construct immediately follows the opening parentheses and provides a name for this particular capture
# the rest of the capture is the regular expression

\!h # Numbered Captures 
if ($contact_info =~ /($phone_number)/) {
  say "Found a number $1";
}
# Perl stores the captured substring in a series of magic variables. 
# The first matching capture goes into $1, the second into $2, and so on.


perl 角色类

简要概述。您可以将多个字符分组到一个字符类中,方法是将它们括在方括号中。这允许您将一组替代品视为单个原子。

char_classes.pl
# Basic character class
my $ascii_vowels = qr/[aeiou]/;
my $maybe_cat = qr/c${ascii_vowels}t/;

# The hyphen allows you to include a continous range of characters in a class:
my $ascii_letters_only = qr/[a-zA-Z]/;
# Alternatively, have the hyphen as a member of the class by placing it at the start or the end of the class:
my $interesting_punctuaction = qr/[-!?]/;
# or escape it:
my $line_characters = qr/[|=\-_]/;

# Use the caret(^) at the start of the class to mean 'anything except these characters'
my $not_an_ascii_vowel = qr/[^aeiou]/;

perl Regex Anchors

这些迫使正则表达式引擎在某个固定位置开始或结束匹配。

regex_anchors.pl
# The start of string anchor (\A) dictates that a match must start at the beginning of the string:
# also matches "lammed", "lawmaker" and "layman"
my $seven_down = qr/\Al${letters_only}{2}m/; # (letter 'l' and 'm')

# The end of line string anchor (\z) requires that a match ends at the end of the string:
# also matches "loom", still an obvious improvement
my $seven_down = qr/\Al${letters_only}{2}m\z/; 

\!h # There are also the ^ and $ assertions used to match the start and end of strings
# ^ does mean the start of the string, but in certain circumstances it can match the invisible point after the newline within the string

# Example:
# You want to find strings that have 'barney' at the absolute beginning of the string or anywhere after a newline.
/^barney/m

\!h # Similarly, $ does mean the end of the string, 
# but it can match the invisible point before a newline in the middle of the string

# Example:
# You want to find strings that have 'fred' at the end of any line, not just at the end of the entire string.
/fred$/m

\!h # Without the '/m', ^ and $ act just like \A and \Z. Its better to use \A and \Z unless you specifically want multiline matching.

\!h # Word Anchors

# The word-boundary anchor, \b, matching either end of the word.
# For example, /\bfred\b/ will match the word 'fred', but not ones like 'frederick', 'alfred', or 'manfred'.

# This is useful to ensure you dont find 'cat' in 'delicatessen', or 'fish' in 'selfishness'.
# You would normally use one word-boundary anchor, when using /\bhunt\ to match words like 'hunt' or 'hunting', but not 'shunt'.
# Or when using /stone\b/ to match words like 'sandstone' or 'flintstone', but not 'capstones'.

# The nonword-boundary anchor, \B, matching any point \b wouldn't match. 
# For example, the pattern /\bsearch\B/ will match 'searches' or 'searching', but not 'search' or 'researching'.

perl 量词贪婪

+和*量词是贪婪的。这意味着他们尝试尽可能多地匹配输入字符串。这可能会对正则表达式的结果产生潜在的危害。

quantifier_greediness.pl
# A poor regex 
my $hot_meal = qr/hot.*meal/;
say 'Found a hot meal!' if 'I have a hot meal' =~ $hot_meal;
say 'Found a hot meal!' if 'one-shot, piecemeal work!' =~ $hot_meal;
# ^ Greedy quantifers start by matching everything at first.

\!h # To get around this, the ? can turn a greedy quantifier non greedy:
my $minimal_greedy = qr/hot.*?meal/;
# With the modification above, the regex engine will prefer the shortest possible match. 
# If that match fails, it will increase the number of characters identified by one character at a time.

perl 正则表达式量词

这些是特殊字符,可提供更大的灵活性,更改每个字符可匹配的次数。

quantifiers.pl
\!h # There are three main quantifiers:
- ? matching 0 or 1 occurence 
- + matching 1 or more occurence 
- * matching 0 or more (i.e. any amount)

# In addition to that, using curly braces we can have even more flexible quantifiers

\!h # Example: color vs colour - optional characters
# the long way
if ($str =~ /color/ or $str =~ /colour/) {
  ...
}
# faster way with a quantifier
if ($str =~ /colou?r/) {
  ...
}
# Here, the ? says the letter 'u' can appear 0 or 1 times.
# Therefore, either 'color' or 'colour' will work.

\!h # The curly braces:
# These can be used to express alot of differnt amounts.
# Normally, they are used to express a range:
x{2, 4} would mean 2, 3 or 4 x-es
# removing the upper limit:
x{2,} would mean 2 or more x-es
# removing the comma:
x{2} would mean exactly 2 x-es

\!h # Quantifiers on character classes
# As well as individual characters, we can also use quantifiers on special characters or character classes
/[0-9]+/ meaning 1 or more digits
/[0-9]{2-4}/ meaning 2 to 4 digits
/-[abc]+-/ meaning 1 or more occurences of a, b or c between two dashes
# as a comparison, the character class /-[abc]-/ (without the quantifier), will match:
-a- 
-b- 
# but not:
-aa-
-ab-
--
-x-
# with the quantifier it will match:
-a-
-b-
-aa-
-ab-
# but not:
--
-x-
# with an * in place of the + it will match an empty field -- but still not a different character -x-