有没有办法将恶意代码放入正则表达式中? [英] Is there any way to put malicious code into a regular expression?

查看:52
本文介绍了有没有办法将恶意代码放入正则表达式中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想向我的公共网页添加正则表达式搜索功能.除了 HTML 对输出进行编码之外,我是否需要采取任何措施来防范恶意用户输入?

I want to add regular expression search capability to my public web page. Other than HTML encoding the output, do I need to do anything to guard against malicious user input?

Google 搜索被解决逆向问题的人淹没了——使用正则表达式检测恶意输入——我不感兴趣.在我的场景中,用户输入一个正则表达式.

Google searches are swamped by people solving the converse problem-- using regular expressions to detect malicious input--which I'm not interested in. In my scenario, the user input is a regular expression.

我将在 .NET (C#) 中使用 Regex 库.

I'll be using the Regex library in .NET (C#).

推荐答案

拒绝服务问题

正则表达式最常见的问题是拒绝服务攻击,其病态模式呈指数级——甚至超指数级!- 因此似乎需要永远解决.这些可能只出现在特定的输入数据上,但通常可以创建一个无关紧要的数据.

Denial‐of‐Service Concerns

The most common concern with regexes is a denial‐of‐service attack through pathological patterns that go exponential — or even super‐exponential! — and so appear to take forever to solve. These may only show up on particular input data, but one can generally create one wherein this doesn’t matter.

哪些在某种程度上取决于您使用的正则表达式编译器的智能程度,因为其中一些可以在编译期间检测到.实现递归的正则表达式编译器通常有一个内置的递归深度计数器来检查非进展.

Which ones these are will depend somewhat on how smart the regex compiler you’re using happens to be, because some of these can be detected during compilation time. Regex compilers that implement recursion usually have a built‐in recursion‐depth counter for checking non‐progression.

Russ Cox 在 2007 年发表的关于 正则表达式匹配可以简单而快速的优秀论文(但在 Java、Perl、PHP、Python、Ruby 等中速度较慢) 讨论了大多数现代 NFA(似乎都源自 Henry Spencer 的代码)遭受严重性能下降的方式,但是 Thompson-style NFA 没有这样的问题.

Russ Cox’s excellent 2007 paper on Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python, Ruby, ...) talks about ways that most modern NFAs, which all seem to derive from Henry Spencer’s code, suffer severe performance degradation, but where a Thompson‐style NFA has no such problems.

如果你只承认 DFA 可以解决的模式,你可以这样编译它们,它们会运行得更快,甚至可能更快.但是,这需要时间来完成.Cox 论文提到了这种方法及其伴随的问题.这一切都归结为经典的时空权衡.

If you only admit patterns that can be solved by DFAs, you can compile them up as such, and they will run faster, possibly much faster. However, it takes time to do this. The Cox paper mentions this approach and its attendant issues. It all comes down to a classic time–space trade‐off.

使用 DFA,您会花更多时间构建它(并分配更多状态),而使用 NFA 您会花更多时间执行它,因为它可以同时处于多个状态,而回溯可以吃掉您的午餐——并且你的 CPU.

With a DFA, you spend more time building it (and allocating more states), whereas with an NFA you spend more time executing it, since it can be multiple states at the same time, and backtracking can eat your lunch — and your CPU.

解决这些在宇宙热死竞赛中处于失败状态的模式的最合理方法可能是用一个计时器包装它们,该计时器有效地放置了允许它们执行的最长时间.通常,这比大多数 HTTP 服务器提供的默认超时要少得多.

Probably the most reasonable way to address these patterns that are on the losing end of a race with the heat‐death of the universe is to wrap them with a timer that effectively places a maximum amount of time allowed for their execution. Usually this will be much, much less than the default timeout that most HTTP servers provide.

有多种方法可以实现这些,从 C 级别的简单 alarm(N) 到某种 try {} 阻止捕获警报 -类型异常,一直到生成一个新线程,该线程是专门创建的,其中内置了时间约束.

There are various ways to implement these, ranging form a simple alarm(N) at the C level, to some sort of try {} block the catches alarm‐type exceptions, all the way to spawning off a new thread that’s specially created with a timing constraint built right into it.

在允许代码标注的正则表达式语言中,应该提供一些机制来允许或禁止这些来自您要编译的字符串.即使代码标注仅针对您使用的语言进行编码,您也应该限制它们;他们不必能够调用外部代码,但如果可以,您就会遇到更大的问题.

In regex languages that admit code callouts, some mechanism for allowing or disallowing these from the string you’re going to compile should be provided. Even if code callouts are only to code in the language you are using, you should restrict them; they don’t have to be able to call external code, although if they can, you’ve got much bigger problems.

例如,在 Perl 中,不能在从字符串插值创建的正则表达式中使用代码标注(因为它们是在运行时编译的),除非特殊的词法范围的编译指示 使用 re "eval"; 在当前作用域中处于活动状态.

For example, in Perl one cannot have code callouts in regexes created from string interpolation (as these would be, as they’re compiled during run‐time) unless the special lexically‐scoped pragma use re "eval"; in active in the current scope.

这样,没有人可以偷偷加入代码标注来运行诸如 rm -rf * 之类的系统程序.由于代码标注对安全性非常敏感,Perl 在默认情况下对所有内插字符串禁用它们,您必须不遗余力地重新启用它们.

That way nobody can sneak in a code callout to run system programs like rm -rf *, for example. Because code callouts are so security‐sensitive, Perl disables them by default on all interpolated strings, and you have to go out of your way to re‐enable them.

还有一个与 Unicode 样式属性相关的安全敏感问题——例如 \pM\p{Pd}\p{Pattern_Syntax}\p{Script=Greek}可能存在于一些支持该符号的正则表达式编译器中.

There remains one more security‐sensitive issue related to Unicode-style properties — like \pM, \p{Pd}, \p{Pattern_Syntax}, or \p{Script=Greek} — that may exist in some regex compilers that support that notation.

问题在于,其中一些可能的属性集是用户可扩展的.这意味着您可以拥有自定义属性,这些属性是某些特定命名空间中命名函数的实际代码标注,例如 \p{GoodChars}\p{Class::Good_Characters}.您的语言如何处理这些可能值得研究.

The issue is that in some of these, the set of possible properties is user‐extensible. That means you can have custom properties that are actual code callouts to named functions in some particular namepace, like \p{GoodChars} or \p{Class::Good_Characters}. How your language handles those might be worth looking at.

在 Perl 中,通过 Safe 模块的沙盒隔间可以控制命名空间的可见性.其他语言提供类似的沙盒技术.如果此类设备可用,您可能需要研究一下它们,因为它们是专门为有限执行不受信任的代码而设计的.

In Perl, a sandboxed compartment via the Safe module would give control over namespace visibility. Other languages offer similar sandboxing technologies. If such devices are available, you might want to look into them, because they are specifically designed for limited execution of untrusted code.

这篇关于有没有办法将恶意代码放入正则表达式中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆