如何通过使用C#提供xpath来获取任何类的innertext [英] How do I get the innertext of any class by giving the xpath using C#

查看:181
本文介绍了如何通过使用C#提供xpath来获取任何类的innertext的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

<div class="vote">
    <input type="hidden" name="_id_" value="1998690">
    <a class="vote-up-off" title="This answer is useful">up vote</a>
    <span itemprop="upvoteCount" class="vote-count-post ">50</span>
    <a class="vote-down-off" title="This answer is not useful">down vote</a>
    <span class="vote-accepted-on load-accepted-answer-date" title="loading when this answer was accepted...">accepted</span>
</div>
            
<td class="answercell">
    <div class="post-text" itemprop="text">
<p>Have you tried this?</p>
<pre><code>//myparent/mychild[text() = 'foo']
</code>

或者,您可以使用 self 轴的快捷方式:

Alternatively, you can use the shortcut for the self axis:

<code>//myparent/mychild[. = 'foo']</code>







这里我需要获取文字/ / myparent / mychild [text()='foo']



我尝试了什么:






Here i need to get the text "//myparent/mychild[text() = 'foo']"

What I have tried:

string htmlCode = "";
using (WebClient client = new WebClient())
{
    client.Headers.Add(HttpRequestHeader.UserAgent, "AvoidError");
    //htmlCode = client.DownloadString("http://www.w3schools.com/html/html_blocks.asp");
    htmlCode = client.DownloadString("http://stackoverflow.com/questions/1998681/xpath-selection-by-innertext");
}
            
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlCode);

//HtmlNode node = doc.DocumentNode.SelectSingleNode(textBox2.Text).InnerText.ToString();
var val = doc.DocumentNode.SelectSingleNode(textBox2.Text).InnerText.ToString();
MessageBox.Show(val.ToString());





我会在TextBox2中粘贴XPath值,这应该是最好的给我对应于XPath的内部文本。





I would paste the XPath value in TextBox2, which should inspite give my the inner text corresponding to the XPath.

Xpath = //*[@id="answer-1998690"]/table/tbody/tr[1]/td[2]/div/pre[1]/code/span



我试图获取innertext的网站如下:

xml - 通过innertext选择XPath - Stack Overflow [ ^ ]



我是XPath的新手,因此不知道使用相同的效率....


The web site which i have tried to get the innertext is as follows:
xml - XPath selection by innertext - Stack Overflow[^]

I am a newbie to XPath, hence not aware on using the same much efficiently....

推荐答案

首先,内部文本是否是HTML元素(实例)的属性,而不是类。但是,您可以按CSS类对元素进行分类,这通常是在JavaScript中完成的。



但是您需要在C#中使用它来下载HTML。当然你可以做到,你只需要解析下载的HTML。也许最合适的工具是开源HTML Agility Pack,它可以完全按照您的要求执行:XPath。请参阅:

HTML Agility Pack - 主页



另请参阅:

网络抓取 - 维基百科,免费的百科全书
HTML解析器的比较 - 维基百科,免费的百科全书



另请参阅ScrapySharp,这是一个Web抓取工具,其中包含用于模拟浏览器的Web客户端和HTML Agility Pack的扩展: https://www.nuget.org/packages/ScrapySharp



请注意,您可以使用HTML Agility Pack或ScrapySharp直接下载来自Web的资源,因此您不需要使用类 WebClient 。但是,很高兴知道 WebClient 是一个非常基本的工具;一个非常全面的从Web检索资源的工具(Web抓取,以及类似的东西)是类 System.Net.HttpWebRequest

< a href =https://msdn.microsoft.com/en-us/library/system.net.httpwebrequest%28v=vs.110%29.aspx> HttpWebRequest Class(System.Net)。< br $>


-SA
First of all, inner text if a property of an HTML element (instance), not a class. However, you can classify elements by CSS classes, which is routinely done in JavaScript.

But you need to do it in C# which you use to download HTML. Of course you can do it, you just need to parse HTML downloaded. Perhaps the most suitable tool is the open-source HTML Agility Pack, which can do exactly what you want: XPath. Please see:
HTML Agility Pack — Home.

See also:
Web scraping — Wikipedia, the free encyclopedia,
Comparison of HTML parsers — Wikipedia, the free encyclopedia.

See also ScrapySharp, a Web scraping tool which contains a Web client used to simulate a browser and an extension of HTML Agility Pack: https://www.nuget.org/packages/ScrapySharp.

Note that you can use HTML Agility Pack or ScrapySharp for direct downloading of the resources from the Web, so you won't really need to use the class WebClient. However, it's good to know that WebClient is a pretty much rudimentary tool; a really comprehensive facility for retrieving resources from the Web (Web scraping, and stuff like that) is the class System.Net.HttpWebRequest:
HttpWebRequest Class (System.Net).

—SA


这篇关于如何通过使用C#提供xpath来获取任何类的innertext的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆