Perl标识符中允许使用哪些字符? [英] What characters are allowed in Perl identifiers?
问题描述
我正在研究一个正则表达式作业,其中一个问题是:
I'm working on regular expressions homework where one question is:
在线使用语言参考手册确定整数数字常量的正则表达式以及Java,Python,Perl和C的标识符.
Using language reference manuals online determine the regular expressions for integer numeric constants and identifiers for Java, Python, Perl, and C.
我不需要正则表达式的帮助,我只是不知道Perl中的标识符是什么样子.我发现了描述 Python 和 Java ,但我不能找到有关Perl的任何信息.
I don't need help on the regular expression, I just have no idea what identifiers look like in Perl. I found pages describing valid identifiers for C, Python and Java, but I can't find anything about Perl.
To clarify, finding the documentation was meant to be easy (like doing a Google search for python identifiers). I'm not taking a class in "doing Google searches".
推荐答案
Perl整数常量
Perl中的整数常量可以是
Perl Integer Constants
Integer constants in Perl can be
- 如果以
^0x
开头,则为基数16
- 以2开头的
- 以8开头的
- (如果它们以
0
开头)
- 否则它们以10为底.
- in base 16 if they start with
^0x
- in base 2 if they start with
^0b
- in base 8 if they start with
0
- otherwise they are in base 10.
跟随该领导者的是该基数中的任意有效数字,以及可选的下划线.
Following that leader is any number of valid digits in that base and also optional underscores.
请注意,数字并不表示\p{POSIX_Digit}
;意思是\p{Decimal_Number}
,真的很不一样.
Note that digit does not mean \p{POSIX_Digit}
; it means \p{Decimal_Number}
, which is really quite different, you know.
请注意,任何前导减号都不是整数常量的一部分,不是,可以通过以下方式轻松证明:
Please note that any leading minus sign is not part of the integer constant, which is easily proven by:
$ perl -MO=Concise,-exec -le '$x = -3**$y'
1 <0> enter
2 <;> nextstate(main 1 -e:1) v:{
3 <$> const(IV 3) s
4 <$> gvsv(*y) s
5 <2> pow[t1] sK/2
6 <1> negate[t2] sK/1
7 <$> gvsv(*x) s
8 <2> sassign vKS/2
9 <@> leave[1 ref] vKP/REFC
-e syntax OK
请参阅3 const
,以及以后在negate
操作码上?这可以告诉您很多情况,包括对优先级的好奇心.
See the 3 const
, and much later on the negate
op-code? That tells you a bunch, including a curiosity of precedence.
通过符号解引用指定的标识符对其名称完全没有限制.
Identifiers specified via symbolic dereferencing have absolutely no restriction whatsoever on their names.
- 例如,
100->(200)
调用带有(100, 200)
的名为100
的函数. - 另外,
${"What’s up, doc?"}
在当前包中使用该名称引用标量包变量. - 另一方面,
${"What's up, doc?"}
引用标量程序包变量,其名称为${"s up, doc?"}
,在当前程序包中而不是 ,而在What
程序包中.好吧,当然,除非当前软件包是What
软件包.类似$Who's
是Who
软件包中的$s
变量.
- For example,
100->(200)
calls the function named100
with the arugments(100, 200)
. - For another,
${"What’s up, doc?"}
refers to the scalar package variable by that name in the current package. - On the other hand,
${"What's up, doc?"}
refers to the scalar package variable whose name is${"s up, doc?"}
and which is not in the current package, but rather in theWhat
package. Well, unless the current package is theWhat
package, of course. Similary$Who's
is the$s
variable in theWho
package.
也可以具有${^
identifier }
形式的标识符;这些不被视为符号表中的符号取消引用.
One can also have identifiers of the form ${^
identifier}
; these are not considered symbolic dereferences into the symbol table.
仅具有单个字符的标识符可以是标点符号,包括$$
或%!
.
Identifiers with a single character alone can be a punctuation character, include $$
or %!
.
标识符也可以采用$^C
的形式,它可以是控制字符,也可以是非控制字符后面的抑扬符.
Identifers can also be of the form $^C
, which is either a control character or a circumflex folllowed by a non-control character.
如果所有这些都不成立,则(完全合格的)标识符遵循与属性ID_Start
的字符有关的Unicode规则,后跟具有属性ID_Continue
的字符.但是,它在允许所有数字标识符和以下划线开头(也许没有其他内容)的标识符上是不正确的.通常,您可以假装(但实际上只是假装),就像说\w+
,其中\w
如 UTS#18的附件C .也就是说,具有以下任何一项的任何内容:
If none of those things is true, a (non–fully qualified) identifier follows the Unicode rules related to characters with the properties ID_Start
followed by those with the property ID_Continue
. However, it overrules this in allowing all-digit identifiers and identifiers that start with (and perhaps have nothing else beyond) an underscore. You can generally pretend (but it’s really only pretending) that that is like saying \w+
, where \w
is as described in Annex C of UTS#18. That is, anything that has any of these:
- 按字母顺序排列的属性-不仅包含字母,还包含更多内容;它还包含各种组合字符和Letter_Number代码点,以及带圆圈的字母
- Decimal_Number属性,而不仅仅是
[0-9]
- 任何和所有具有Mark属性的字符,而不仅仅是那些被认为是Other_Alphabetic的标记
- 任何具有Connector_Puncutation属性的字符,下划线就是这样的字符.
- the Alphabetic property — which includes far more than just Letters; it also contains various combining characters and the Letter_Number code points, plus the circled letters
- the Decimal_Number property, which is rather more than merely
[0-9]
- Any and all characters with the Mark property, not just those marks that are deemed Other_Alphabetic
- Any characters with the Connector_Puncutation property, of which underscore is just one such.
所以^\d+$
否则
^[\p{Alphabetic}\p{Decimal_Number}\p{Mark}\p{Connector_Punctuation}]+$
如果您不想探究Unicode ID_Start和ID_Continue属性的复杂性,则应该对非常简单的代码进行操作.确实是这样,但是我敢打赌您的教练不知道.也许有人不告诉他,是吗?
ought to do it for the really simple ones if you don’t care to explore the intricacies of the Unicode ID_Start and ID_Continue properties. That’s how it’s really done, but I bet your instructor doesn’t know that. Perhaps one shan’t tell him, eh?
但是您应该涵盖我之前描述的非简单内容.
But you should cover the nonsimple ones I describe earlier.
我们还没有讨论过软件包.
And we haven’t talked about packages yet.
除了这些简单的规则外,您还必须考虑标识符可以使用程序包名称来限定,并且程序包名称本身遵循标识符规则.
Beyond those simple rules, you must also consider that identifiers may be qualified with a package name, and package names themselves follow the rules of identifiers.
根据您的想法,包装分隔符为::
或'
.
The package separator is either ::
or '
at your whim.
如果它是完全限定标识符中的第一个组件,则不必指定软件包,在这种情况下,它表示软件包main
.这意味着$::foo
和$'foo
之类的东西等同于$main::foo
,而isn't_it()
之类的东西等同于isn::t_it()
. (错字已删除)
You do not have to specify a package if it is the first component in a fully qualified identifier, in which case it means the package main
. That means things like $::foo
and $'foo
are equivalent to $main::foo
, and isn't_it()
is equivalent to isn::t_it()
. (Typo removed)
最后,在特殊情况下,允许在散列末尾使用双引号(但不允许单引号),然后引用该名称的符号表.
Finally, as a special case, a trailing double-colon (but not a single-quote) at the end of a hash is permitted, and this then refers to the symbol table of that name.
因此,%main::
是main
符号表,并且因为您可以省略main,所以%::
也是如此.
Thus %main::
is the main
symbol table, and because you can omit main, so too is %::
.
与此同时,%foo::
是foo
符号表,%main::foo::
也是%::foo::
也是为了变态.
Meanwhile %foo::
is the foo
symbol table, as is %main::foo::
and also %::foo::
just for perversity’s sake.
很高兴看到讲师给人们的琐碎任务.问题是讲师是否意识到这是不平凡的.可能不是.
It’s nice to see instructors giving people non-trivial assignments. The question is whether the instructor realized it was non-trivial. Probably not.
也几乎不仅是Perl.关于Java标识符,您是否发现教科书在说谎?这是演示:
And it’s hardly just Perl, either. Regarding the Java identifiers, did you figure out yet that the textbooks lie? Here’s the demo:
$ perl -le 'print qq(public class escape { public static void main(String argv[]) { String var_\033 = "i am escape: ^\033"; System.out.println(var_\033); }})' > escape.java
$ javac escape.java
$ java escape | cat -v
i am escape: ^[
是的,是的.对于许多其他代码点也是如此,尤其是在编译行上使用-encoding UTF-8
时.您的工作是找到描述这些令人惊讶的禁止Java标识符的模式. 提示:确保包括代码点U + 0000.
Yes, it’s true. It is also true for many other code points, especially if you use -encoding UTF-8
on the compile line. Your job is to find the pattern that describes these startlingly unforbidden Java identifiers. Hint: make sure to include code point U+0000.
在那里,你不高兴你问吗?希望这可以帮助.或者其他的东西. ☺
There, aren’t you glad you asked? Hope this helps. Or something. ☺
这篇关于Perl标识符中允许使用哪些字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!