HTML遍历很慢 [英] HTML traversal is very slow

查看:595
本文介绍了HTML遍历很慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我面对,通过MSHTML元素使用C#简单迭代是可怕的慢。以下是通过的document.all 收集三次迭代的小例子。我们有空白名为浏览器的WPF应用程序和WebBrowser控件:

I faced that simply iterating through MSHTML elements using C# is horribly slow. Here is small example of iteration through document.all collection three times. We have blank WPF application and WebBrowser control named Browser:

public partial class MainWindow
{
    public MainWindow()
    {
        InitializeComponent();

        Browser.LoadCompleted += DocumentLoaded;
        Browser.Navigate("http://google.com");
    }

    private IHTMLElementCollection _items;

    private void DocumentLoaded(object sender, NavigationEventArgs e)
    {
        var dc = (HTMLDocument)Browser.Document;
        _items = dc.all;

        Test();
        Test();
        Test();
    }

    private void Test()
    {
        var sw = new Stopwatch();
        sw.Start();

        int i;
        for (i = 0; i < _items.length; i++)
        {
            _items.item(i);
        }

        sw.Stop();

        Debug.WriteLine("Items: {0}, Time: {1}", i, sw.Elapsed);
    }
}



输出是:

The output is:

Items: 274, Time: 00:00:01.0573245
Items: 274, Time: 00:00:00.0011637
Items: 274, Time: 00:00:00.0006619

1和2线之间的性能差异是可怕的。我试图重写相同的代码与非托管C ++和COM,并得到了根本就没有性能问题,非托管代码的运行速度更快1200次。不幸的是去非托管是不是一种选择,因为真正的项目是不是简单的迭代更加复杂。

The performance difference between 1 and 2 lines is horrible. I tried to rewrite same code with unmanaged C++ and COM and got no performance issues at all, unmanaged code runs 1200 times faster. Unfortunately going unmanaged is not an option because the real project is more complex than simple iterating.

据我了解,在第一次运行时它是每个引用的HTML元素创建RCW COM对象。但它可以是慢? 300项目 - 每秒3.2 GHz的100%的CPU核心负载

I understand that for the first time runtime creates RCW for each referenced HTML element which is COM object. But can it be THAT slow? 300 items per second with 100% core load of 3,2 GHz CPU.

以上代码的性能分析:

Performance analysis of the code above:

推荐答案

在业绩不佳来源在该集合项目定义为动态在MSHTML互操作程序集对象

The source of poor performance is that collection items defined as dynamic objects in the MSHTML interop assembly.

public interface IHTMLElementCollection : IEnumerable
{
    ...
    [DispId(0)]
    dynamic item(object name = Type.Missing, object index = Type.Missing);
    ...
}

如果我们重新编写界面,所以它返回的IDispatch ,对象,则滞后将消失

If we rewrite that interface so it returns IDispatch objects then the lag will disappear.

public interface IHTMLElementCollection : IEnumerable
{
    ...
    [DispId(0)]
    [return: MarshalAs(UnmanagedType.IDispatch)]
    object item(object name = Type.Missing, object index = Type.Missing);
    ...
}

新的输出:

Items: 246, Time: 00:00:00.0034520
Items: 246, Time: 00:00:00.0029398
Items: 246, Time: 00:00:00.0029968

这篇关于HTML遍历很慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆