解决StackOverflowException [英] Work-around a StackOverflowException

查看:98
本文介绍了解决StackOverflowException的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用HtmlAgilityPack解析大约200,000个HTML文档.

我无法预测这些文档的内容,但是一个这样的文档导致我的应用程序失败,并显示StackOverflowException.该文档包含以下HTML:

<ol>
    <li><li><li><li><li><li>...
</ol>

像这样嵌套的大约10,000个<li>元素.由于HtmlAgilityPack解析HTML的方式,它会导致StackOverflowException.

不幸的是,在.NET 2.0及更高版本中无法捕获StackOverflowException.

我确实想知道是否为线程的堆栈设置更大的大小,但是设置更大的堆栈大小是很容易的事情:这将导致我的程序使用更多的内存(我的程序开始处理大约50个线程来处理HTML,所以所有这些线程中的其中一个会增加堆栈大小),并且如果再次遇到类似情况,则需要手动进行调整.

我可以采用其他解决方法吗?

解决方案

理想情况下,长期的解决方案是修补HtmlAgilityPack以使用堆栈而不是调用栈,但这对于我.我暂时丢失了我的CodePlex帐户详细信息,但是当我找回它们时,我将提交有关该问题的问题报告.我还注意到,此问题可能会向使用HtmlAgilityPack清理用户提交的HTML的任何网站带来拒绝服务攻击漏洞-精心制作的过度嵌套的HTML文档将导致w3wp.exe进程死亡.

同时,我认为最好的解决方法是手动覆盖最大线程堆栈大小.我在较早的声明中错了,那就是更大的堆栈大小意味着所有线程都会自动消耗该内存(似乎随着线程堆栈的增长而不是一次全部为线程堆栈分配内存页面).

我复制了<ol><li>页并进行了一些实验.我发现当堆栈大小小于2^21个字节时,我的程序失败了,但是最大大小为2^22成功了-那是4MB,并且在我的书中通过了可接受的" hack程序. /p>

I'm using HtmlAgilityPack to parse roughly 200,000 HTML documents.

I cannot predict the contents of these documents, however one such document causes my application to fail with a StackOverflowException. The document contains this HTML:

<ol>
    <li><li><li><li><li><li>...
</ol>

There are roughly 10,000 <li> elements nested like that. Due to the way HtmlAgilityPack parses HTML it causes a StackOverflowException.

Unfortunately a StackOverflowException is not catchable in .NET 2.0 and later.

I did wonder about setting a larger size for the thread's stack, but setting a larger stack size is a hack: it would cause my program to use a lot more memory (my program starts about 50 threads for processing HTML, so all of these threads would have the increased stack size) and would need manually adjusting if it ever came across a similar situation again.

Are there any other workarounds I could employ?

解决方案

Ideally, the long-term solution is to patch HtmlAgilityPack to use a heap-stack instead of the call-stack, but that would be an undertaking too big for me. I've temporarily lost my CodePlex account details, but when I get them back I'll submit an Issue report on the problem. I also note that this issue could present a Denial-of-Service attack vulnerability to any site that uses HtmlAgilityPack to sanitize user-submitted HTML - a crafted overly-nested HTML document would cause the w3wp.exe process to die.

In the meantime, I figured the best way forward is to manually override the maximum thread stack size. I was wrong in my earlier statement that a bigger stack-size means that all threads automatically consume that memory (it seems memory pages are allocated for a thread stack as it grows, not all-at-once).

I made a copy of the <ol><li> page and ran some experiments. I found that my program failed when the stack size was less than 2^21 bytes in size, but a maximum size of 2^22 succeeded - that's 4MB and in my book passes as an "acceptable" hack... for now.

这篇关于解决StackOverflowException的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆