我如何捕获一个“Unicode非字符”警告？ [英] How could I catch an "Unicode non-character"-warning?

查看：178 发布时间：2016/11/18 16:11:24 perl unicode warnings try-catch character

本文介绍了我如何捕获一个“Unicode非字符”警告？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我如何捕获Unicode非字符0xffff是非法的交换？警告？

 ＃！/ usr / bin / env perl 
使用警告; 
 use 5.012; 
 use Try :: Tiny; 
 
使用警告FATAL => qw（all）; 
 
 my $ character; 
 
 try {
 $ character =\x {ffff}; 
} catch {
 die---------- caught error ---------- \\\
; 
}; 
 
说something;

输出：

 code>＃Unicode非字符0xffff在./perl1.pl行11上的交换是非法的。

解决方案

A Perl 5.10.0⋯⋯5.13.8 Bug

我想假设你想抓住这个警告，而是生存或忽略它。如果你真的想捕捉它，好吧，可能有更简单的方法来做到这一点。

但是首先要知道的是，没有这样的事情，代码点，只有代码点不能有效交换。

您只需要使用无警告utf8 Unicode范围（或更多）。 不需要为此使用 eval 。所需的只是一个范围限制的警告抑制。

而不是这样：

  $ char = chr（0xFFFE）;

写（在旧perls上）：

  $ char = do {no warningsutf8; chr（0xFFFE）};

这也是涉及这样一个字符的模式匹配的情况：

  $ did_match = do {no warningsutf8; $ char =〜$ char）;

会导致警告或致命，取决于perl的年龄，

您可以禁用utf8相关的警告，只有在这种方式重要的版本：

 如果$ ^ V < 5.13.9，qw< warnings utf8> ;;

'在下一个版本中修复

真正有趣的是，他们（阅读：Perl5 Porters，特别是卡尔·威廉姆森）修正了需要没有警告utf8与任何代码点。它只是输出，你可能要小心。观看：

 ％perl5.10.0 -Mwarnings = FATAL，all -E'my $ char = chr（0xFFFE）;说Ok'
 Unicode字符0xfffe在-e行1是非法的。
 
％perl5.11.3 -Mwarnings = FATAL，all -E'my $ char = chr（0xFFFE）;说Ok'
 Unicode非字符0xfffe在-e行1的交换是非法的。
 
％perl5.12.0 -Mwarnings = FATAL，all -E'my $ char = chr （0xFFFE）;说Ok'
 Unicode非字符0xfffe在-e行1的交换是非法的。
 
％perl5.12.3 -Mwarnings = FATAL，all -E'my $ char = chr （0xFFFE）;说Ok'
 Unicode非字符0xfffe在-e行1的交换是非法的。
 
％perl5.13.0 -Mwarnings = FATAL，all -E'my $ char = chr （0xFFFE）;说Ok'
 Unicode非字符0xfffe在-e行1的交换是非法的。
 
％perl5.13.8 -Mwarnings = FATAL，all -E'my $ char = chr （0xFFFE）;说Ok'
 Unicode非字符0xfffe在-e行1的交换是非法的。
 
％perl5.13.9 -Mwarnings = FATAL，all -E'my $ char = chr （0xFFFE）;说Ok'
 Ok 
 
％perl5.13.10 -Mwarnings = FATAL，all -E'my $ char = chr（0xFFFE）;说Ok'
 Ok

最安全的事情是把没有警告utf8在你需要的地方。但是不需要 eval ！

从5.13.10开始，因此在5.14中，如下所述的UTF-16， nonchar 和的utf8警告的三个子类： surrogate

 
 
 全Perl交换是安全的
 
 
 你可能不想抑制输出上的非法交换警告，因为这是真的。嗯，除非你使用Perl的utf8编码，这与它的UTF-8编码，奇怪。 utf8编码比正式标准更容易，因为它允许我们做比我们原本更有趣的事情。
 
 
  但，当且仅当您有100％的pure-perl数据路径时，您仍然可以使用任何所需的代码点，包括高达ᴍᴀxincluding的非Unicode代码点。这是32位机器上的0x7FFF_FFFF，在64位机器上无法言喻的巨大数据：0xFFFF_FFFF_FFFF_FFFF！这不只是一个超级;这是一个超大的！
 ％perl -Mwarnings -CS -E'my $ a = chr（0xFFFF_FFFF）;说$ a'| 
 perl -Mwarnings -CS -nlE'说got ord，ord'
代码点0xFFFFFFFF不是Unicode，可能不能在-e行1移植。
获取ord 4294967295 
 
％perl -Mwarnings -CS -E'no warningsutf8; my $ a = chr（0xFFFF_FFFF）;说$ a'| 
 perl -Mwarnings -CS -nlE'说获得ord，ord'
获取ord 4294967295 
 
％perl -Mwarnings -CS -E'无警告utf8 my $ a = chr（0xFFFF_FFFF_FFFF_FFFF）;说$ a'| 
 perl -Mwarnings -CS -nlE'说got ord，ord'
十六进制数> 0xffffffff non-portable at -e line 1. 
 got ord 18446744073709551615 
 
％perl -Mwarnings -CS -E'没有警告qw [utf8 portable]; my $ a = chr（0xFFFF_FFFF_FFFF_FFFF）;说$ a'| 
 perl -Mwarnings -CS -nlE'saygot ord，ord'
 got ord 18446744073709551615 
  
 
 b $ b 
注意，在32位机器上，最后一个产生这个：
 十六进制整数溢出at -e line 1. 
 get ord 4294967295 
  
 
 
 
非交换品种非法交换< h2> 
 
 有几个 - 实际上是不同类别的代码点，不能合法的互换。
 
 
  
  任何代码点使得（ord（ᴄᴏᴅᴇᴘᴏɪɴᴛ）& 0xFFFE）== 0xFFFE 为真。这覆盖了所有可能平面中的最后两个代码点。因为它跨越17个平面，因此Unicode定义了34个这样的代码点。这些不是字符，虽然它们是Unicode码点。让我们称之为 Penults 。 
 
 
   32个代码点开始在U + FDD0。这些保证是非字符，虽然当然它们仍然是Unicode代码点。与前一个阴影集类似，这些阴影也落在5.13.10或更好的 nonchar 警告类别下。
  
  p> 1024个高代理和1024个低代理，它们被划分为多个，使得UTF-16可以用于所有使用UCS-2而不是UTF-8或UTF-32的哑系统。这削弱了有效的Unicode代码点的范围，限制它们只有前21位值。  SURROGATES还有代码点。他们只是不是有效的交换，因为他们不能总是正确的代表由脑死灵巧的UTF-16。在5.13.10或更好的情况下，这些由代理警告子类控制。
 
 
  我们现在高于Unicode范围。我会将这些 Supers 调用。在32位机器上，您仍然有（10或11）它们超过Unicode给您的标准21位。 Perl可以使用这些只是罚款。这给出2 ** 32个总代码点，你可以在你的Perl程序中使用（好吧，或者至少2 ** 31，由于签名溢出）。你获得了一百万个Unicode代码点，但是你会得到几十亿超级代码点，超出了你可以在Perl中使用。如果您运行的是5.13.10或更高版本，您可以通过 non_unicode 警告子类控制对这些访问。
 
 
   Perl仍然遵守关于Penults的规则，甚至超越范围。在32位机器上有480个这样的超级结果，而更多的是64位机器。
 
 
  如果你真的想要不可读地播放它，那么如果你有本地的64位int，你有另一个32或33位以上的超级者给你。你现在有18百万446千兆744万亿73亿709万551万和616个字符。您拥有完整的exabyte 代码点！这远远超出了我将称之为他们 Hypermegas 。好吧，所以这些都不是很便携，因为他们需要一个真正的64位平台。他们有点外国人，所以也许我们应该写这个Ὑπέρμεγας来吓唬人们。 :)请注意，防止青春期的规则仍然适用于hypermegas。
 
 
 
 
 
 
 
 
 
测试程序
 
 
 我写了一个小程序，证明这些代码点很酷。 
 测试Penults传递所有34个代码点
测试Super_penults传递所有480个代码点
测试非字符传递所有32个代码点
测试Low_surrogates通过所有1024个代码点
测试High_surrogates通过所有1024个代码点
测试Supers通过了所有8个代码点
测试Ὑπέρμεγας通过了所有10个代码点
  
 注意：上面的最后一行显示了SO的地狱高亮代码中的另一个愚蠢的错误。注意那里的最后一个WɪᴋɪW，， \p {Greek} 一个，离开了着色方案？这意味着他们只需要查找大小写 ASCII 标识符。 Trèspassé！如果您不准确地使用 \p {Uppercase} ，为什么还要接受ᴜɴɪᴄᴏᴅᴇ？正如你在我的程序中看到的，我有一个 @ὑπέρμεγας数组，我们ᴍᴏᴅᴇʀɴᴘʀᴏɢʀᴀᴍᴍɪɴɢʟᴀɴɢᴜᴀɢᴇs处理这个完美的。 ☺
 
 
 我显然没有运行所有的supers或hypers。而在32位机器上，你只能得到4个测试的hypers。我也没有测试任何hyperpenults。
 
 
 这里是测试程序，从5.10及以上的所有版本上运行干净。
 ＃！/ usr / bin / env perl 
＃
＃hypertest  - 显示如何安全地使用代码点
＃
＃Tom Christiansen 
＃tchrist@perl.com 
＃Sat Feb 26 16:38:44 MST 2011 
 
使用utf8; 
 use 5.10.0; 
 use strict; 
 use if $]> 5.010，autodie; 
使用警告FATAL => 所有; 
 
使用Carp; 
 
 binmode（STDOUT，：utf8）; 
 END {close STDOUT} 
 
 $ \ =\\\
; 
 
 sub ghex（_）; 
 
 my @penults = map {
（0x01_0000 * $ _）+ 0xfffE，
（0x01_0000 * $ _）+ 0xfffF，
} 0x00 .. 0x10; 
 
 my @super_penults = map {
（0x01_0000 * $ _）+ 0xfffE，
（0x01_0000 * $ _）+ 0xfffF，
} 0x10 .. 0xFF; 
 
 my @low_surrogates = map {0xDC00 + $ _} 0x000 .. 0x3FF; 
 my @high_surrogates = map {0xD800 + $ _} 0x000 .. 0x3FF; 
 
 my @noncharacters = map {0xFDD0 + $ _} 0x00 .. 0x1F; 
 
 my @supers =（
 0x0011_0000，0x0100_0000，0x1000_0000，0x1F00_0000，
 0x1FFF_FFFF，0x3FFF_FFFF，0x7FFF_FFFF，0x7FFF_FFFF，
）; 
 
＃这些应该总是在任何地方工作
 my @ὑπέρμεγας=（
 0x8000_0000，0xF000_0000，
 0x3FFF_FFFF，0xFFFF_FFFF，
）; 
 
 #### 
＃现在我们去钓鱼了64位ὑπέρμεγας
 #### 
 
 eval q {
 use warnings FATAL => 溢出; 
没有警告portable; 
 push @ὑπέρμεγας=> （
 0x01_0000_0000，
 0x01_FFFF_FF00，
）; 
}; 
 eval q {
使用警告FATAL => 溢出; 
没有警告portable; 
 push @ὑπέρμεγας=> （
 0x0001_0000_0000_0000，
 0x001F_0000_0000_0000，
 0x7FFF_FFFF_FFFF_FFFF，
 0xFFFF_FFFF_FFFF_FFFF，
）; 
}; 
 
＃超过64？ 
 eval q {
使用警告FATAL => 溢出; 
没有警告portable; 
 push @ὑπέρμεγας=> （
 0x01_0001_0000_0000_0000，
 0x01_7FFF_FFFF_FFFF_FFFF，
 0x01_FFFF_FFFF_FFFF_FFFF，
）; 
 1; 
}; 
 
 
 my @testpairs =（
 penults => \ @ penults，
 super_penults => \ @ super_penults，
 noncharacters => ; \ @ noncharacters，
 low_surrogates => \ @ low_surrogates，
 high_surrogates => \ @ high_surrogates，
 supers => \ @ supers，
ὑπέρμεγας => \ @ὑπέρμεγας，
）; 
 
 while（my（$ name，$ aref）= splice（@testpairs，0，2））{
 printftesting％-20s，ucfirst $ name; 
 
 my（@passed，@failed）; 
 
 for my $ codeepoint（@ $ aref）{
 
使用警告FATAL => 所有; 
 
 my $ char = do {
＃5.13.9以下不需要下一行：HURRAY！ 
无警告utf8; 
 chr（0xFFFF）&& chr（$ codepoint）; 
}; 
 
 my $ regex_ok = do {
＃5.13.9以下不需要下一行：HURRAY！ 
无警告utf8; 
 $ char =〜$ char; 
 1; 
}; 
 
 my $ status = defined（$ char）&& $ regex_ok; 
 
 push @ {$ status？ \ @ passed：\ @ failed}，$ codepoint; 
} 
 
 my $ total = @ $ aref; 
 my $ passed = @passed; 
 my $ failed = @failed; 
 
给定（$ total）{
 when（$ passed）{printpassed all $ total codepoints} 
 when（$ failed）{printfailed all $ total codepoints } 
默认{
 printof $ total codepoints，failed $ failed and passed $ passed; 
 my $ flist = join（，，map {ghex} @failed）; 
 my $ plist = join（，，map {ghex} @passed）; 
 print\tpassed：$ plist; 
 print\tfailed：$ flist; 
} 
} 
 
} 
 
 sub ghex（_）{
 my $ num = shift（）; 
 my $ hex = sprintf（％X，$ num）; 
 return $ hex如果length（$ hex）< 5; 
 my $ flip = reverse $ hex; 
 $ flip =〜s< 
（\p {ahex} \p {ahex} \p {ahex} \p {ahex}）
（？= \p {ahex}）
 ！\p {ahex} * \。）
>< $ {1} _> gx; 
返回0x。 reverse（$ flip）; 
} 
  
 
 
 
How could I catch the "Unicode non-character 0xffff is illegal for interchange"-warning?
#!/usr/bin/env perl
use warnings;
use 5.012;
use Try::Tiny;

use warnings FATAL => qw(all);

my $character;

try {
    $character = "\x{ffff}";
} catch {
    die "---------- caught error ----------\n";
};

say "something";
Output:
# Unicode non-character 0xffff is illegal for interchange at ./perl1.pl line 11.

 解决方案 
A     Perl 5.10.0 ⋯ 5.13.8     Bug

I’m going to assume that you don’t actually want to "catch" this warning, but rather to survive or ignore it. If you really want to catch it, well, there may be easier ways to do that.

But the first thing to know is that there is no such thing as an illegal code point, only code points not valid for interchange. 

You just have to use a no warnings "utf8" for the scope of where you need to use the full Unicode range (or more).  There is no need to use an eval for this. All it takes is a scoped warning suppression. Even that it is unnecessary on newer perls.

So instead of this:
$char = chr(0xFFFE);
write (on older perls):
$char = do { no warnings "utf8"; chr(0xFFFE) };
This is also the situation with pattern matches involving such a character:
 $did_match = do { no warnings "utf8" ; $char =~ $char);
will cause a warning or a fatal, depending on how old your perl, or nothing at all, depending on how new your perl is.

You can disable utf8-related warnings only on releases where it matters this way:
no if $^V < 5.13.9, qw<warnings utf8>;


‘Fixed in the Next Release’

The really interesting thing is that they (read: Perl5 Porters, and in particular, Karl Williamson) have fixed the bug that requires a no warnings "utf8" guard  just to work with any code point at all.  It is only the output where you may have to be careful.  Watch:
% perl5.10.0 -Mwarnings=FATAL,all -E 'my $char = chr(0xFFFE); say "Ok"'
Unicode character 0xfffe is illegal at -e line 1.

% perl5.11.3 -Mwarnings=FATAL,all -E 'my $char = chr(0xFFFE); say "Ok"'
Unicode non-character 0xfffe is illegal for interchange at -e line 1.

% perl5.12.0 -Mwarnings=FATAL,all -E 'my $char = chr(0xFFFE); say "Ok"'
Unicode non-character 0xfffe is illegal for interchange at -e line 1.

% perl5.12.3 -Mwarnings=FATAL,all -E 'my $char = chr(0xFFFE); say "Ok"'
Unicode non-character 0xfffe is illegal for interchange at -e line 1.

% perl5.13.0 -Mwarnings=FATAL,all -E 'my $char = chr(0xFFFE); say "Ok"'
Unicode non-character 0xfffe is illegal for interchange at -e line 1.

% perl5.13.8 -Mwarnings=FATAL,all -E 'my $char = chr(0xFFFE); say "Ok"'
Unicode non-character 0xfffe is illegal for interchange at -e line 1.

% perl5.13.9 -Mwarnings=FATAL,all -E 'my $char = chr(0xFFFE); say "Ok"'
Ok

% perl5.13.10 -Mwarnings=FATAL,all -E 'my $char = chr(0xFFFE); say "Ok"'
Ok
The safest thing to do is put no warnings "utf8" in just the places you need it. But there is no need of an eval!

As of 5.13.10, and hence in 5.14, there are three subcategories of utf8 warnings: surrogate for UTF‑16, nonchar as described below, and non_unicode for supers, also defined below.

An All‐Perl Interchange is Safe

You probably don’t want to suppress the "illegal for interchange" warnings on output, though, because this is true.  Well, unless you’re using Perl’s "utf8" encoding, which isn’t the same as its "UTF‑8" encoding, oddly enough.  The "utf8" encoding is laxer than the formal standard, because it allows us to do more interesting things than we otherwise could.

However, if and only if you have a 100% pure-perl datapath, you can still use any code point you want, including non-unicode code points up to ᴍᴀxɪɴᴛ.  That’s 0x7FFF_FFFF on 32‑bit machines, and something unspeakably huge on 64‑bit machines: 0xFFFF_FFFF_FFFF_FFFF! That’s not just a super; it’s a hypermega!
% perl -Mwarnings -CS -E 'my $a = chr(0xFFFF_FFFF); say $a ' | 
  perl -Mwarnings -CS -nlE 'say "got ord ", ord'
Code point 0xFFFFFFFF is not Unicode, may not be portable at -e line 1.
got ord 4294967295

% perl -Mwarnings -CS -E 'no warnings "utf8"; my $a = chr(0xFFFF_FFFF); say $a' |
 perl -Mwarnings -CS -nlE 'say "got ord ", ord'
got ord 4294967295

% perl -Mwarnings -CS -E 'no warnings "utf8"; my $a = chr(0xFFFF_FFFF_FFFF_FFFF); say $a' |
  perl -Mwarnings -CS -nlE 'say "got ord ", ord'
Hexadecimal number > 0xffffffff non-portable at -e line 1.
got ord 18446744073709551615

% perl -Mwarnings -CS -E 'no warnings qw[ utf8 portable ]; my $a = chr(0xFFFF_FFFF_FFFF_FFFF);  say $a ' |
  perl -Mwarnings -CS -nlE 'say "got ord ", ord'
got ord 18446744073709551615
Note that on a 32‑bit machine, that last one produces this:
Integer overflow in hexadecimal number at -e line 1.
got ord 4294967295


Varieties of Noncharacters Illegal for Interchange

There are several — quite a few, actually — different classes of code points that are not legal for interchange.


Any code point such that  (ord(ᴄᴏᴅᴇᴘᴏɪɴᴛ) & 0xFFFE) == 0xFFFE is true.  This covers the last two code points in all possible planes. As it spans 17 planes, Unicode defines therefore 34 such code points. Those are not characters, although they are Unicode code points.  Let’s call these the Penults. They fall under the nonchar warning class on 5.13.10 or better.
The 32 code points starting at U+FDD0.  These are guaranteed to be Noncharacters, although of course they are still Unicode code points. Like the previous penult set, these too fall under the nonchar warning class on 5.13.10 or better.
The 1024 high surrogates and the 1024 low surrogates, which were carved out as slop to make UTF‑16 possible for all those dumb systems that tried UCS‑2 instead of UTF‑8 or UTF‑32.  This cripples the range of valid Unicode code points, restricting them to only the first 21 bits worth. SURROGATES ARE STILL CODE POINTS.  They just are not valid for interchange, because they cannot always be correctly represented by brain-dead-clever UTF‑16. Under 5.13.10 or better, these are controlled by the surrogate warning subclass.
Beyond that, we’re now above the Unicode range.  I’ll call these Supers.  On a 32‑bit machine, you still have (10 or) 11 bits of them beyond the standard 21 bits that Unicode gives you.  Perl can use these just fine.  That gives 2**32 total code points you can use in your Perl program (well, or 2**31 at least, due to signed overflow).  You get a million Unicode code points, but then you get a couple of billion Super code points beyond those that you can use in Perl.  If you are running 5.13.10 or better, you can control access to these via the non_unicode warnings subclass.
Perl still follows the rules about Penults even up in the Super range. There are 480 such Superpenults on a 32‑bit machine, and rather more of them on a 64‑bit one.
If you really want to play it nonportably, then if you have native 64‑bit ints, you have another 32 or 33 bits above what the supers give you. You now have 18 quintillion 446 quadrillion 744 trillion 73 billion 709 million 551 thousand and 616 characters. You have a whole exabyte of distinct code points! That’s far beyond super that I’m going to call them Hypermegas.  Ok, so these aren’t very portable, since they require a truly 64‑bit platform.  They’re a bit foreign, so maybe we should write that Ὑπέρμεγας   to scare people away. :) Note that the rules against penults still apply to hypermegas.




The Test Program

I wrote a little program that proves that these code points are cool. 
testing Penults             passed all 34 codepoints
testing Super_penults       passed all 480 codepoints
testing Noncharacters       passed all 32 codepoints
testing Low_surrogates      passed all 1024 codepoints
testing High_surrogates     passed all 1024 codepoints
testing Supers              passed all 8 codepoints
testing Ὑπέρμεγας            passed all 10 codepoints
NOTE: That last line above shows a Yet Another Stupid Bug in SO’s infernal highlighting code. Notice the last WɪᴋɪWᴏʀᴅ up there, the \p{Greek} one, got left out of the colorization scheme? That means they are only looking for capitalized ASCII identifiers.  Très passé!  Why bother accepting ᴜɴɪᴄᴏᴅᴇ if you aren’t going to use things like \p{Uppercase} correctly? As you’ll see in my program where I have a @ὑπέρμεγας array, us ᴍᴏᴅᴇʀɴ ᴘʀᴏɢʀᴀᴍᴍɪɴɢ ʟᴀɴɢᴜᴀɢᴇs handle this perfectly fine. ☺

I obviously didn’t run all the supers or the hypers.  And on 32‑bit machine, you’ll only get 4 of the tested hypers.  I also didn’t test any of the hyperpenults.

Here’s the testing program, which runs cleanly on all version from 5.10 and up.
#!/usr/bin/env perl
#
# hypertest - show how to safely use code points not legal for interchange in Perl
# 
# Tom Christiansen
# tchrist@perl.com
# Sat Feb 26 16:38:44 MST 2011

use utf8;
use 5.10.0;
use strict;
use if $] > 5.010, "autodie";
use warnings FATAL => "all";

use Carp;

binmode(STDOUT, ":utf8");
END { close STDOUT }

$\ = "\n";

sub ghex(_);

my @penults = map { 
    (0x01_0000 * $_) + 0xfffE, 
    (0x01_0000 * $_) + 0xfffF, 
} 0x00 .. 0x10;

my @super_penults = map { 
    (0x01_0000 * $_) + 0xfffE, 
    (0x01_0000 * $_) + 0xfffF, 
} 0x10 .. 0xFF;

my @low_surrogates  = map { 0xDC00 + $_ } 0x000 .. 0x3FF;
my @high_surrogates = map { 0xD800 + $_ } 0x000 .. 0x3FF;

my @noncharacters = map { 0xFDD0 + $_ } 0x00 .. 0x1F;

my @supers = ( 
    0x0011_0000,  0x0100_0000,  0x1000_0000,  0x1F00_0000,  
    0x1FFF_FFFF,  0x3FFF_FFFF,  0x7FFF_FFFF,  0x7FFF_FFFF,  
);

# these should always work anywhere 
my @ὑπέρμεγας = ( 
    0x8000_0000,   0xF000_0000,   
    0x3FFF_FFFF,   0xFFFF_FFFF,  
);

####
# now we go fishing for 64-bit ὑπέρμεγας
####

eval q{
    use warnings FATAL => "overflow";
    no  warnings "portable";
    push @ὑπέρμεγας => ( 
        0x01_0000_0000, 
        0x01_FFFF_FF00,
    );
};
eval q{
    use warnings FATAL => "overflow";
    no  warnings "portable";
    push @ὑπέρμεγας => (
        0x0001_0000_0000_0000,
        0x001F_0000_0000_0000,
        0x7FFF_FFFF_FFFF_FFFF,
        0xFFFF_FFFF_FFFF_FFFF,
    );
};

# more than 64??
eval q{
    use warnings FATAL => "overflow";
    no  warnings "portable";
    push @ὑπέρμεγας => (
        0x01_0001_0000_0000_0000,
        0x01_7FFF_FFFF_FFFF_FFFF,
        0x01_FFFF_FFFF_FFFF_FFFF,
    );
    1;
};


my @testpairs = (
    penults         => \@penults,
    super_penults   => \@super_penults,
    noncharacters   => \@noncharacters ,
    low_surrogates  => \@low_surrogates,
    high_surrogates => \@high_surrogates,
    supers          => \@supers,
    ὑπέρμεγας       => \@ὑπέρμεγας,   
);

while (my($name, $aref) = splice(@testpairs, 0, 2)) {
    printf "testing %-20s", ucfirst $name;

    my(@passed, @failed);

    for my $codepoint (@$aref) {

        use warnings FATAL => "all";

        my $char = do {
            # next line not needed under 5.13.9 or better: HURRAY!
            no warnings "utf8";
            chr(0xFFFF) && chr($codepoint);
        };

        my $regex_ok = do {
            # next line not needed under 5.13.9 or better: HURRAY!
            no warnings "utf8";
            $char =~ $char;
            1;
        };

        my $status = defined($char) && $regex_ok;

        push @{ $status ? \@passed : \@failed }, $codepoint;
    }

    my $total  = @$aref;
    my $passed = @passed;
    my $failed = @failed;

    given($total) {
        when ($passed)  { print "passed all $total codepoints" }
        when ($failed)  { print "failed all $total codepoints" }
        default         {
            print "of $total codepoints, failed $failed and passed $passed";
            my $flist = join(", ", map { ghex } @failed);
            my $plist = join(", ", map { ghex } @passed);
            print "\tpassed: $plist";
            print "\tfailed: $flist";
        }
    }

}

sub ghex(_) {
    my $num = shift();
    my $hex = sprintf("%X", $num);
    return $hex if length($hex) < 5;
    my $flip = reverse $hex;
    $flip =~ s<
        ( \p{ahex} \p{ahex} \p{ahex} \p{ahex} )
        (?= \p{ahex} )
        (?! \p{ahex}* \. )
    ><${1}_>gx;
    return "0x" . reverse($flip);
}




                        
这篇关于我如何捕获一个“Unicode非字符”警告？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

我如何捕获一个“Unicode非字符”警告？ [英] How could I catch an "Unicode non-character"-warning?

问题描述

A Perl 5.10.0⋯⋯5.13.8 Bug

'在下一个版本中修复

全Perl交换是安全的

测试程序

A Perl 5.10.0 ⋯ 5.13.8 Bug

‘Fixed in the Next Release’

An All‐Perl Interchange is Safe

Varieties of Noncharacters Illegal for Interchange

The Test Program

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

我如何捕获一个“Unicode非字符”警告？ [英] How could I catch an &quot;Unicode non-character&quot;-warning?

问题描述

A Perl 5.10.0⋯⋯5.13.8 Bug

'在下一个版本中修复

全Perl交换是安全的

测试程序

A Perl 5.10.0 ⋯ 5.13.8 Bug

‘Fixed in the Next Release’

An All‐Perl Interchange is Safe

Varieties of Noncharacters Illegal for Interchange

The Test Program

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

我如何捕获一个“Unicode非字符”警告？ [英] How could I catch an "Unicode non-character"-warning?

登录关闭