使用XPath和WebBrowser Control选择多个节点 [英] Using XPath and WebBrowser Control to select multiple nodes

查看:227
本文介绍了使用XPath和WebBrowser Control选择多个节点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在C#WinForms示例应用程序中,我使用了WebBrowser控件和 JavaScript -XPath 选择单个节点并通过以下代码更改该节点.innerHtml:

In C# WinForms sample application I have used WebBrowser control and JavaScript-XPath to select single node and change that node .innerHtml by the following code:

    private void MainForm_Load(object sender, EventArgs e)
    {
        webBrowser1.DocumentText = @"
            <html>
            <head>
                <script src=""http://svn.coderepos.org/share/lang/javascript/javascript-xpath/trunk/release/javascript-xpath-latest-cmp.js""></script>
            </head>
            <body>
            <img alt=""0764547763 Product Details"" 
                src=""http://ecx.images-amazon.com/images/I/51AK1MRIi7L._AA160_.jpg"">
            <hr/>
            <h2>Product Details</h2>
            <ul>
            <li><b>Paperback:</b> 648 pages</li>
            <li><b>Publisher:</b> Wiley; Unlimited Edition edition (October 15, 2001)</li>
            <li><b>Language:</b> English</li>
            <li><b>ISBN-10:</b> 0764547763</li>
            </ul>
            </body>
            </html>
        ";
    }

    private void cmdTest_Click(object sender, EventArgs e)
    {
        string xPath = "//li";
        string code = string.Format("document.evaluate('{0}', document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue;", xPath);
        var li = webBrowser1.Document.InvokeScript("eval", new object[] { code }) as mshtml.IHTMLElement;

        li.innerHTML = string.Format("<span style='text-transform: uppercase;font-family:verdana;color:green;'>{0}</span>", li.innerText);

    }

运行此代码的结果如下:

The result of running this code is as the following:

现在我想使用相同的技术来选择多个< li> 下的节点< ul> 节点我正在写:

Now I'd like to use the same technique to select multiple <li>nodes under <ul> node and I'm writing:

        xPath = "//ul//*";
        code = string.Format("document.evaluate('{0}', document, null, XPathResult.ORDERED_NODE_ITERATOR_TYPE, null);", xPath);
        var allLI = webBrowser1.Document.InvokeScript("eval", new object[] { code }) as mshtml.IHTMLElementCollection;

但是 allLI 变量的返回值是 NULL

如果我要写

        xPath = "//ul//*";
        code = string.Format("document.evaluate('{0}', document, null, XPathResult.ORDERED_NODE_ITERATOR_TYPE, null);", xPath);
        var allLI = webBrowser1.Document.InvokeScript("eval", new object[] { code }); 

然后返回的 allLI 变量不是null和它的值类型是 COM对象但是这个 COM对象可以强制转换为什么更具体的类型对我来说还不清楚。

then the returned allLI variable isn't null and its value type is COM Object but what more specific type this COM Object can be cast to is unclear for me.

有没有办法在这里使用技术选择多个节点?

Is there a way to select multiple nodes by used here technique?

xPath =ul // *;

xPath = "ul//*";

to

xPath =// ul // *;

xPath = "//ul//*";

[加法]

我在我的示例HTML中添加了两个javaScript函数:

I have added two javaScript functions to my sample HTML:

<script type=""text/javascript"">
    function GetElementsText (XPath) {
            var xPathRes = document.evaluate ( XPath, document, null, XPathResult.ORDERED_NODE_ITERATOR_TYPE, null);              
            var nextElement = xPathRes.iterateNext ();
            var text = """";
            while (nextElement) {
               text += nextElement.innerText;
               nextElement = xPathRes.iterateNext ();
            }
        return text;
        };

    function GetElements (XPath) {
            var xPathRes = document.evaluate ( XPath, document, null, XPathResult.ORDERED_NODE_ITERATOR_TYPE, null);              
            var nextElement = xPathRes.iterateNext ();
            var elements = new Object();
            var elementIndex = 1;
            while (nextElement) {
               elements[elementIndex++] = nextElement;
               nextElement = xPathRes.iterateNext ();
            }
        return elements;
        };
</script>

现在,当我在 cmd_TestClick <中运行以下C#代码行时/ code>方法:

Now when I'm runnung the following C# code line within my cmd_TestClick method:

var text = webBrowser1.Document.InvokeScript("eval", new object[] { "GetElementsText('//ul')" });

我收到所有 li 的文字要素:

I'm getting text of all li elements:

"Paperback: 648 pages \r\nPublisher: Wiley; Unlimited Edition edition (October 15, 2001) \r\nLanguage: English \r\nISBN-10: 0764547763 "

当我在我的 cmd_TestClick 方法中运行以下C#代码行:

And when I'm running the following C# code line within my cmd_TestClick method:

var elements = webBrowser1.Document.InvokeScript("eval", new object[] { "GetElements('//ul')" });

我收到 COM对象,其中我无法转换为 IEnumerable< mshtml.IHtmlElement>

I'm getting COM Object, which I cannot cast to IEnumerable<mshtml.IHtmlElement> .

有没有办法在C#代码中处理javaScript

Is there any way to process within C# code a javaScript collection of HTML nodes returned by

var elements = webBrowser1.Document.InvokeScript("eval", new object[] { "GetElements('//ul')" });

推荐答案

我找到了解决方案,这里是代码:

I have found solution, here is the code:

using System;
using System.Collections.Generic;
using System.Reflection;
using System.Windows.Forms;

namespace myTest.WinFormsApp
{
public partial class MainForm : Form
{
    public MainForm()
    {
        InitializeComponent();
    }

    private void MainForm_Load(object sender, EventArgs e)
    {
        webBrowser1.DocumentText = @"
            <html>
            <body>
            <img alt=""0764547763 Product Details"" 
                src=""http://ecx.images-amazon.com/images/I/51AK1MRIi7L._AA160_.jpg"">
            <hr/>
            <h2>Product Details</h2>
            <ul>
            <li><b>Paperback:</b> 648 pages</li>
            <li><b>Publisher:</b> Wiley; Unlimited Edition edition (October 15, 2001)</li>
            <li><b>Language:</b> English</li>
            <li><b>ISBN-10:</b> 0764547763</li>
            </html>
        ";
    }

    private void cmdTest_Click(object sender, EventArgs e)
    {
        var processor = new WebBrowserControlXPathQueriesProcessor(webBrowser1);

        // change attributes of the first element of the list
        {
            var li = processor.GetHtmlElement("//li");
            li.innerHTML = string.Format("<span style='text-transform: uppercase;font-family:verdana;color:green;'>{0}</span>", li.innerText);
        }

        // change attributes of the second and subsequent elements of the list
        var list = processor.GetHtmlElements("//ul//li");
        int index = 1;
        foreach (var li in list)
        {
            if (index++ == 1) continue;
            li.innerHTML = string.Format("<span style='text-transform: uppercase;font-family:verdana;color:blue;'>{0}</span>", li.innerText);
        }

    }

    /// <summary>
    /// Enables IE WebBrowser control to evaluate XPath queries 
    /// by injecting http://svn.coderepos.org/share/lang/javascript/javascript-xpath/trunk/release/javascript-xpath-latest-cmp.js
    /// and to return XPath queries results to the calling C# code as strongly typed
    /// mshtml.IHTMLElement and IEnumerable<mshtml.IHTMLElement>
    /// </summary>
    public class WebBrowserControlXPathQueriesProcessor
    {
        private System.Windows.Forms.WebBrowser _webBrowser;
        public WebBrowserControlXPathQueriesProcessor(System.Windows.Forms.WebBrowser webBrowser)
        {
            _webBrowser = webBrowser;
            injectScripts();
        }

        private void injectScripts()
        {
            // Thanks to: http://stackoverflow.com/questions/7998996/how-to-inject-javascript-in-webbrowser-control

            HtmlElement head = _webBrowser.Document.GetElementsByTagName("head")[0];
            HtmlElement scriptEl = _webBrowser.Document.CreateElement("script");
            mshtml.IHTMLScriptElement element = (mshtml.IHTMLScriptElement)scriptEl.DomElement;
            element.src = "http://svn.coderepos.org/share/lang/javascript/javascript-xpath/trunk/release/javascript-xpath-latest-cmp.js";
            head.AppendChild(scriptEl);

            string javaScriptText = @"
                    function GetElements (XPath) {
                            var xPathRes = document.evaluate ( XPath, document, null, XPathResult.ORDERED_NODE_ITERATOR_TYPE, null);              
                            var nextElement = xPathRes.iterateNext ();
                            var elements = new Object();
                            var elementIndex = 1;
                            while (nextElement) {
                            elements[elementIndex++] = nextElement;
                            nextElement = xPathRes.iterateNext ();
                            }
                        elements.length = elementIndex -1;
                        return elements;
                        };
                   ";
            scriptEl = _webBrowser.Document.CreateElement("script");
            element = (mshtml.IHTMLScriptElement)scriptEl.DomElement;
            element.text = javaScriptText;
            head.AppendChild(scriptEl);
        }

        /// <summary>
        /// Gets Html element's mshtml.IHTMLElement object instance using XPath query
        /// </summary>
        public mshtml.IHTMLElement GetHtmlElement(string xPathQuery)
        {
            string code = string.Format("document.evaluate('{0}', document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue;", xPathQuery);
            return _webBrowser.Document.InvokeScript("eval", new object[] { code }) as mshtml.IHTMLElement;
        }

        /// <summary>
        /// Gets Html elements' IEnumerable<mshtml.IHTMLElement> object instance using XPath query
        /// </summary>
        public IEnumerable<mshtml.IHTMLElement> GetHtmlElements(string xPathQuery)
        {
            // Thanks to: http://stackoverflow.com/questions/5278275/accessing-properties-of-javascript-objects-using-type-dynamic-in-c-sharp-4
            var comObject = _webBrowser.Document.InvokeScript("eval", new object[] { string.Format("GetElements('{0}')", xPathQuery) });
            Type type = comObject.GetType();
            int length = (int)type.InvokeMember("length", BindingFlags.GetProperty, null, comObject, null);

            for (int i = 1; i <= length; i++)
            {
                yield return type.InvokeMember(i.ToString(), BindingFlags.GetProperty, null, comObject, null) as mshtml.IHTMLElement;
            }
        }
    }

}
}

以下是运行结果的代码:

And here are the code running results:

我已将信用'引用到我的代码内联。如果你发现我错过了一些,请在评论中指出我,我会添加它们。

I have put credits' references to my code inline. If you'll find I have missed some please point me in your comments and I will add them.

如果你知道更好的解决方案 - 更短的代码,更有效的代码 - 请评论和/或发表你的答案。

If you know better solution - shorter code, more effective code - please comment and/or post your answer.

这篇关于使用XPath和WebBrowser Control选择多个节点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆