Mathematica 中的错误:正则表达式应用于很长的字符串 [英] Bug in Mathematica: regular expression applied to very long string

查看:45
本文介绍了Mathematica 中的错误:正则表达式应用于很长的字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在下面的代码中,如果字符串 s 被附加为 10 或 20,000 个字符,Mathematica 内核会出现段错误.

In the following code, if the string s is appended to be something like 10 or 20 thousand characters, the Mathematica kernel seg faults.

s = "This is the first line.
MAGIC_STRING
Everything after this line should get removed.
12345678901234567890123456789012345678901234567890123456789012345678901234567890
12345678901234567890123456789012345678901234567890123456789012345678901234567890
12345678901234567890123456789012345678901234567890123456789012345678901234567890
12345678901234567890123456789012345678901234567890123456789012345678901234567890
12345678901234567890123456789012345678901234567890123456789012345678901234567890
...";

s = StringReplace[s, RegularExpression@"(^|\\n)[^\\n]*MAGIC_STRING(.|\\n)*"->""]

我认为这主要是 Mathematica 的错,我已经提交了一个错误报告,如果我得到回复,我会在这里跟进.但我也想知道我是否以一种愚蠢/低效的方式来做这件事.即使没有,解决 Mathematica 错误的想法也将不胜感激.

I think this is primarily Mathematica's fault and I've submitted a bug report and will follow up here if I get a response. But I'm also wondering if I'm doing this in a stupid/inefficient way. And even if not, ideas for working around Mathematica's bug would be appreciated.

推荐答案

Mathematica 使用 PCRE 语法,所以它确实有 /s aka DOTALL aka Singleline 修饰符,你只需在要应用的表达式部分之前添加 (?s) 修饰符.

Mathematica uses PCRE syntax, so it does have the /s aka DOTALL aka Singleline modifier, you just prepend the (?s) modifier before the part of the expression in which you want it to apply.

在此处查看正则表达式文档:(展开标有更多信息"的部分)
http://reference.wolfram.com/mathematica/ref/RegularExpression.html

See the RegularExpression documentation here: (expand the section labeled "More Information")
http://reference.wolfram.com/mathematica/ref/RegularExpression.html

以下为它们后面的所有正则表达式元素设置选项:
(?i) 将大写和小写视为等价(忽略大小写)
(?m) 使 ^ 和 $ 匹配行首和行尾(多行模式)
(?s) 允许.匹配换行符
(?-c) 取消设置选项

The following set options for all regular expression elements that follow them:
(?i) treat uppercase and lowercase as equivalent (ignore case)
(?m) make ^ and $ match start and end of lines (multiline mode)
(?s) allow . to match newline
(?-c) unset options

这个修改后的输入不会使我的 Mathematica 7.0.1 崩溃(原来的),使用一个 15,000 个字符长的字符串,产生与您的表达式相同的输出:

This modified input doesn't crash Mathematica 7.0.1 for me (the original did), using a string that is 15,000 characters long, producing the same output as your expression:

s = StringReplace[s,RegularExpression@".*MAGIC_STRING(?s).*"->""]

由于@AlanMoore 解释的原因,它也应该快一点

It should also be a bit faster for the reasons @AlanMoore explained

这篇关于Mathematica 中的错误:正则表达式应用于很长的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆