为什么 Perl v5.22 没有找到所有的句子边界? [英] Why doesn't Perl v5.22 find all the sentence boundaries?

查看:31
本文介绍了为什么 Perl v5.22 没有找到所有的句子边界?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这已在 Perl 5.22.1 中修复.我在 Perl v5 中写了它.22 添加了花哨的 Unicode 字边界.

This is fixed in Perl 5.22.1. I write about it in Perl v5.22 adds fancy Unicode word boundaries.

Perl v5.22 添加了来自 TR #29 的 Unicode 断言.我一直在玩句子边界断言,但它似乎只能找到文本的开头和结尾:

Perl v5.22 added the Unicode assertions from TR #29. I've been playing with the sentence boundary assertion, but it only seems to find the start and end of text:

use v5.22;

$_ = "See Spot. (Spot is a dog.) See Spot run. Run Spot, run!\x{2029}New paragraph.";

while( m/\b{sb}/g ) {
    say "Sentence boundary at ", pos;
    }

输出会在文本的开头和结尾注明句子边界,但不会在句号、句子终止符或括号之后:

The output notes sentence boundaries at the start and end of text, but not after the full stops, the sentence terminators, or the parens:

Sentence boundary at 0
Sentence boundary at 70

Unicode 中断测试器 主要展示了它们 我希望它们基于 TR #29.

The Unicode breaks tester shows them mostly I expect them based on TR #29.

我在 perl 源代码中找不到此功能的任何重要测试.我正在消化技术报告以创建适当的测试用例,但到目前为止,这看起来像是另一个未经测试且已损坏的功能.

I couldn't find any non-trivial tests in the perl source for this feature. I'm digesting technical report to create appropriate test cases, but so far this looks like another untested and broken feature.

推荐答案

Calle Dybedahl 的评论是正确的(当他们把它变成答案时,我会接受).这是 v5.22.0 中的一个损坏的功能,据我所知,未经测试.昨晚我在编译最新的 perls 时遇到了问题,并以这个问题结束了一天.

Calle Dybedahl's comment gets it right (and when they turn it into an answer I'll accept that). This was a broken feature in v5.22.0, and as far as I can tell, untested. I had an issue compiling stuff the latest perls last night and ended the day with the question.

perl5.22.1 perldelta 没有提到特定的变化(并且提及"可能过于强烈,因为它只是暗示了可能出错的事情而没有列举它们).它提到与 5.20.0 不兼容的更改(剪切和粘贴错误?),单个"异常,然后是多个问题.提到sane"让我觉得所有的变化都与下一小节中的恐慌问题有关.提到只有一个 rt.perl.org 引用的几个错误"让我认为这些错误与恐慌问题有关.

The perl5.22.1 perldelta does not mention the particular changes (and "mention" might be too strong since it merely alludes to possible things that were wrong without enumerating them). It mentions as incompatible change with 5.20.0 (a cut and paste error?), a "single" exception, then more than one issue. The reference to "sane" made me think that all of the changes were related to the panic issue in the next subsection. The mention of "several bugs" with only one rt.perl.org reference made me think those bugs were related to the panic issue.

=head1 不兼容的变化

=head1 Incompatible Changes

没有任何故意与 5.20.0 不兼容的更改,除了以下单一例外,我们认为这是一个明智的改变为了获得新的 C<\b{wb}> 和(特别是) C<\b{sb}> 功能健全在人们因为 Perl 5.22.0 中的错误而认为它们一文不值之前实施并在未来避免它们.如果存在任何其他问题,则它们是错误,我们要求您提交报告.见下面的L.

There are no changes intentionally incompatible with 5.20.0 other than the following single exception, which we deemed to be a sensible change to make in order to get the new C<\b{wb}> and (in particular) C<\b{sb}> features sane before people decided they're worthless because of bugs in their Perl 5.22.0 implementation and avoided them in the future. If any others exist, they are bugs, and we request that you submit a report. See L below.

=head2 边界检查构造

=head2 Bounds Checking Constructs

一些错误,包括分段错误,已通过边界修复检查结构(在 Perl 5.22 中引入) C<\b{gcb}>、C<\b{sb}>、C<\b{wb}>、C<\B{gcb}>、C<\B{sb}> 和 C<\B{wb}>.所有 C<\B{}> 现在匹配一个空的细绳;C<\b{}> 没有一个.L<[perl #126319]|https://rt.perl.org/Ticket/Display.html?id=126319>

Several bugs, including a segmentation fault, have been fixed with the bounds checking constructs (introduced in Perl 5.22) C<\b{gcb}>, C<\b{sb}>, C<\b{wb}>, C<\B{gcb}>, C<\B{sb}>, and C<\B{wb}>. All the C<\B{}> ones now match an empty string; none of the C<\b{}> ones do. L<[perl #126319]|https://rt.perl.org/Ticket/Display.html?id=126319>

此外,记录新边界的 perlrebackslash 没有提到它们在 v5.22.0 中不起作用.

Additionally, perlrebackslash, where the new boundaries are documented, doesn't mention that they don't work in v5.22.0.

由于 perldelta 和我之前的经验,新功能没有在 perl 源代码中进行充分(甚至根本没有)测试,我忽略了可能的修复.我过早地切断了那条调查线,本可以为自己节省几个小时.没有让代码在最新的二进制文件上运行当然是我的错,但我一直认为我做错了什么并且我的代码是问题所在.尽管我过去有很多相反的经历,但我并没有想到 perl 是错误的(除了对 UCD 的更新).

I disregarded a possible fix because of incongruities in the perldelta and the prior experience I've had that new features aren't adequately (or even at all) tested in the perl source. I prematurely cut off that line of investigation and could have saved myself a couple of hours. It's certainly my fault for not getting the code running on the latest binaries, but I had become fixated on the idea that I was doing something wrong and that my code was the problem. Despite my numerous past experiences to the contrary, I wasn't entertaining thoughts (other than an update to the UCD) that perl was wrong.

现在我在另一台机器上并且有一个可以工作的 perl-5.22.1,我看到我的程序在点发布中按预期工作.perldelta 在这里可以做得更好.

Now that I'm at a different machine and have a working perl-5.22.1, I see that my program works as expected in the point release. The perldelta could have been much better here.

这篇关于为什么 Perl v5.22 没有找到所有的句子边界?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆