Perl标识符中允许使用哪些字符? [英] What characters are allowed in Perl identifiers?

查看：100 发布时间：2020/11/26 3:09:16 perl unicode identifier

本文介绍了Perl标识符中允许使用哪些字符?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在研究一个正则表达式作业，其中一个问题是:

I'm working on regular expressions homework where one question is:

在线使用语言参考手册确定整数数字常量的正则表达式以及Java，Python，Perl和C的标识符.

Using language reference manuals online determine the regular expressions for integer numeric constants and identifiers for Java, Python, Perl, and C.

我不需要正则表达式的帮助，我只是不知道Perl中的标识符是什么样子.我发现了描述 Python 和 Java ，但我不能找到有关Perl的任何信息.

I don't need help on the regular expression, I just have no idea what identifiers look like in Perl. I found pages describing valid identifiers for C, Python and Java, but I can't find anything about Perl.

为了明确起见，查找文档本来就很容易(例如Google搜索

To clarify, finding the documentation was meant to be easy (like doing a Google search for python identifiers). I'm not taking a class in "doing Google searches".

Perl整数常量

Perl中的整数常量可以是

Perl Integer Constants

Integer constants in Perl can be

如果以^0x
以2开头的
以8开头的
(如果它们以0
否则它们以10为底.

in base 16 if they start with ^0x
in base 2 if they start with ^0b
in base 8 if they start with 0
otherwise they are in base 10.

跟随该领导者的是该基数中的任意有效数字，以及可选的下划线.

Following that leader is any number of valid digits in that base and also optional underscores.

请注意，数字并不表示\p{POSIX_Digit}；意思是\p{Decimal_Number}，真的很不一样.

Note that digit does not mean \p{POSIX_Digit}; it means \p{Decimal_Number}, which is really quite different, you know.

请注意，任何前导减号都不是整数常量的一部分，不是，可以通过以下方式轻松证明:

Please note that any leading minus sign is not part of the integer constant, which is easily proven by:

$ perl -MO=Concise,-exec -le '$x = -3**$y'
1  <0> enter 
2  <;> nextstate(main 1 -e:1) v:{
3  <$> const(IV 3) s
4  <$> gvsv(*y) s
5  <2> pow[t1] sK/2
6  <1> negate[t2] sK/1
7  <$> gvsv(*x) s
8  <2> sassign vKS/2
9  <@> leave[1 ref] vKP/REFC
-e syntax OK

请参阅3 const，以及以后在negate操作码上?这可以告诉您很多情况，包括对优先级的好奇心.

See the 3 const, and much later on the negate op-code? That tells you a bunch, including a curiosity of precedence.

通过符号解引用指定的标识符对其名称完全没有限制.

Identifiers specified via symbolic dereferencing have absolutely no restriction whatsoever on their names.

例如，100->(200)调用带有(100, 200)的名为100的函数.
另外，${"What’s up, doc?"}在当前包中使用该名称引用标量包变量.
另一方面，${"What's up, doc?"}引用标量程序包变量，其名称为${"s up, doc?"}，在当前程序包中而不是，而在What程序包中.好吧，当然，除非当前软件包是What软件包.类似$Who's是Who软件包中的$s变量.

For example, 100->(200) calls the function named 100 with the arugments (100, 200).
For another, ${"What’s up, doc?"} refers to the scalar package variable by that name in the current package.
On the other hand, ${"What's up, doc?"} refers to the scalar package variable whose name is ${"s up, doc?"} and which is not in the current package, but rather in the What package. Well, unless the current package is the What package, of course. Similary $Who's is the $s variable in the Who package.

也可以具有${^ identifier }形式的标识符；这些不被视为符号表中的符号取消引用.

One can also have identifiers of the form ${^identifier}; these are not considered symbolic dereferences into the symbol table.

仅具有单个字符的标识符可以是标点符号，包括$$或%!.

Identifiers with a single character alone can be a punctuation character, include $$ or %!.

标识符也可以采用$^C的形式，它可以是控制字符，也可以是非控制字符后面的抑扬符.

Identifers can also be of the form $^C, which is either a control character or a circumflex folllowed by a non-control character.

如果所有这些都不成立，则(完全合格的)标识符遵循与属性ID_Start的字符有关的Unicode规则，后跟具有属性ID_Continue的字符.但是，它在允许所有数字标识符和以下划线开头(也许没有其他内容)的标识符上是不正确的.通常，您可以假装(但实际上只是假装)，就像说\w+，其中\w如 UTS#18的附件C .也就是说，具有以下任何一项的任何内容:

If none of those things is true, a (non–fully qualified) identifier follows the Unicode rules related to characters with the properties ID_Start followed by those with the property ID_Continue. However, it overrules this in allowing all-digit identifiers and identifiers that start with (and perhaps have nothing else beyond) an underscore. You can generally pretend (but it’s really only pretending) that that is like saying \w+, where \w is as described in Annex C of UTS#18. That is, anything that has any of these:

按字母顺序排列的属性-不仅包含字母，还包含更多内容；它还包含各种组合字符和Letter_Number代码点，以及带圆圈的字母
Decimal_Number属性，而不仅仅是[0-9]
任何和所有具有Mark属性的字符，而不仅仅是那些被认为是Other_Alphabetic的标记
任何具有Connector_Puncutation属性的字符，下划线就是这样的字符.

the Alphabetic property — which includes far more than just Letters; it also contains various combining characters and the Letter_Number code points, plus the circled letters
the Decimal_Number property, which is rather more than merely [0-9]
Any and all characters with the Mark property, not just those marks that are deemed Other_Alphabetic
Any characters with the Connector_Puncutation property, of which underscore is just one such.

所以^\d+$否则

^[\p{Alphabetic}\p{Decimal_Number}\p{Mark}\p{Connector_Punctuation}]+$

如果您不想探究Unicode ID_Start和ID_Continue属性的复杂性，则应该对非常简单的代码进行操作.确实是这样，但是我敢打赌您的教练不知道.也许有人不告诉他，是吗?

ought to do it for the really simple ones if you don’t care to explore the intricacies of the Unicode ID_Start and ID_Continue properties. That’s how it’s really done, but I bet your instructor doesn’t know that. Perhaps one shan’t tell him, eh?

但是您应该涵盖我之前描述的非简单内容.

But you should cover the nonsimple ones I describe earlier.

我们还没有讨论过软件包.

And we haven’t talked about packages yet.

除了这些简单的规则外，您还必须考虑标识符可以使用程序包名称来限定，并且程序包名称本身遵循标识符规则.

Beyond those simple rules, you must also consider that identifiers may be qualified with a package name, and package names themselves follow the rules of identifiers.

根据您的想法，包装分隔符为::或'.

The package separator is either :: or ' at your whim.

如果它是完全限定标识符中的第一个组件，则不必指定软件包，在这种情况下，它表示软件包main.这意味着$::foo和$'foo之类的东西等同于$main::foo，而isn't_it()之类的东西等同于isn::t_it(). (错字已删除)

You do not have to specify a package if it is the first component in a fully qualified identifier, in which case it means the package main. That means things like $::foo and $'foo are equivalent to $main::foo, and isn't_it() is equivalent to isn::t_it(). (Typo removed)

最后，在特殊情况下，允许在散列末尾使用双引号(但不允许单引号)，然后引用该名称的符号表.

Finally, as a special case, a trailing double-colon (but not a single-quote) at the end of a hash is permitted, and this then refers to the symbol table of that name.

因此，%main::是main符号表，并且因为您可以省略main，所以%::也是如此.

Thus %main:: is the main symbol table, and because you can omit main, so too is %::.

与此同时，%foo::是foo符号表，%main::foo::也是%::foo::也是为了变态.

Meanwhile %foo:: is the foo symbol table, as is %main::foo:: and also %::foo:: just for perversity’s sake.

很高兴看到讲师给人们的琐碎任务.问题是讲师是否意识到这是不平凡的.可能不是.

It’s nice to see instructors giving people non-trivial assignments. The question is whether the instructor realized it was non-trivial. Probably not.

也几乎不仅是Perl.关于Java标识符，您是否发现教科书在说谎?这是演示:

And it’s hardly just Perl, either. Regarding the Java identifiers, did you figure out yet that the textbooks lie? Here’s the demo:

$ perl -le 'print qq(public class escape { public static void main(String argv[]) { String var_\033 = "i am escape: ^\033"; System.out.println(var_\033); }})' > escape.java
$ javac escape.java
$ java escape | cat -v
i am escape: ^[

是的，是的.对于许多其他代码点也是如此，尤其是在编译行上使用-encoding UTF-8时.您的工作是找到描述这些令人惊讶的禁止Java标识符的模式. 提示:确保包括代码点U + 0000.

Yes, it’s true. It is also true for many other code points, especially if you use -encoding UTF-8 on the compile line. Your job is to find the pattern that describes these startlingly unforbidden Java identifiers. Hint: make sure to include code point U+0000.

在那里，你不高兴你问吗?希望这可以帮助.或者其他的东西. ☺

There, aren’t you glad you asked? Hope this helps. Or something. ☺

这篇关于Perl标识符中允许使用哪些字符?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Perl标识符中允许使用哪些字符? [英] What characters are allowed in Perl identifiers?

问题描述

推荐答案

Perl整数常量

Perl Integer Constants

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Perl标识符中允许使用哪些字符? [英] What characters are allowed in Perl identifiers?

问题描述

推荐答案

Perl整数常量

Perl Integer Constants

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭