如何通过使用C#提供xpath来获取任何类的innertext [英] How do I get the innertext of any class by giving the xpath using C#
问题描述
<div class="vote">
<input type="hidden" name="_id_" value="1998690">
<a class="vote-up-off" title="This answer is useful">up vote</a>
<span itemprop="upvoteCount" class="vote-count-post ">50</span>
<a class="vote-down-off" title="This answer is not useful">down vote</a>
<span class="vote-accepted-on load-accepted-answer-date" title="loading when this answer was accepted...">accepted</span>
</div>
<td class="answercell">
<div class="post-text" itemprop="text">
<p>Have you tried this?</p>
<pre><code>//myparent/mychild[text() = 'foo']
</code>
或者,您可以使用 self
轴的快捷方式:
Alternatively, you can use the shortcut for the self
axis:
<code>//myparent/mychild[. = 'foo']</code>
这里我需要获取文字/ / myparent / mychild [text()='foo']
我尝试了什么:
Here i need to get the text "//myparent/mychild[text() = 'foo']"
What I have tried:
string htmlCode = "";
using (WebClient client = new WebClient())
{
client.Headers.Add(HttpRequestHeader.UserAgent, "AvoidError");
//htmlCode = client.DownloadString("http://www.w3schools.com/html/html_blocks.asp");
htmlCode = client.DownloadString("http://stackoverflow.com/questions/1998681/xpath-selection-by-innertext");
}
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlCode);
//HtmlNode node = doc.DocumentNode.SelectSingleNode(textBox2.Text).InnerText.ToString();
var val = doc.DocumentNode.SelectSingleNode(textBox2.Text).InnerText.ToString();
MessageBox.Show(val.ToString());
我会在TextBox2中粘贴XPath值,这应该是最好的给我对应于XPath的内部文本。
I would paste the XPath value in TextBox2, which should inspite give my the inner text corresponding to the XPath.
Xpath = //*[@id="answer-1998690"]/table/tbody/tr[1]/td[2]/div/pre[1]/code/span
我试图获取innertext的网站如下:
xml - 通过innertext选择XPath - Stack Overflow [ ^ ]
我是XPath的新手,因此不知道使用相同的效率....
The web site which i have tried to get the innertext is as follows:
xml - XPath selection by innertext - Stack Overflow[^]
I am a newbie to XPath, hence not aware on using the same much efficiently....
推荐答案
首先,内部文本是否是HTML元素(实例)的属性,而不是类。但是,您可以按CSS类对元素进行分类,这通常是在JavaScript中完成的。
但是您需要在C#中使用它来下载HTML。当然你可以做到,你只需要解析下载的HTML。也许最合适的工具是开源HTML Agility Pack,它可以完全按照您的要求执行:XPath。请参阅:
HTML Agility Pack - 主页。
另请参阅:
网络抓取 - 维基百科,免费的百科全书,>
HTML解析器的比较 - 维基百科,免费的百科全书。
另请参阅ScrapySharp,这是一个Web抓取工具,其中包含用于模拟浏览器的Web客户端和HTML Agility Pack的扩展: https://www.nuget.org/packages/ScrapySharp 。
请注意,您可以使用HTML Agility Pack或ScrapySharp直接下载来自Web的资源,因此您不需要使用类WebClient
。但是,很高兴知道WebClient
是一个非常基本的工具;一个非常全面的从Web检索资源的工具(Web抓取,以及类似的东西)是类System.Net.HttpWebRequest
:
< a href =https://msdn.microsoft.com/en-us/library/system.net.httpwebrequest%28v=vs.110%29.aspx> HttpWebRequest Class(System.Net)。< br $>
-SA
First of all, inner text if a property of an HTML element (instance), not a class. However, you can classify elements by CSS classes, which is routinely done in JavaScript.
But you need to do it in C# which you use to download HTML. Of course you can do it, you just need to parse HTML downloaded. Perhaps the most suitable tool is the open-source HTML Agility Pack, which can do exactly what you want: XPath. Please see:
HTML Agility Pack — Home.
See also:
Web scraping — Wikipedia, the free encyclopedia,
Comparison of HTML parsers — Wikipedia, the free encyclopedia.
See also ScrapySharp, a Web scraping tool which contains a Web client used to simulate a browser and an extension of HTML Agility Pack: https://www.nuget.org/packages/ScrapySharp.
Note that you can use HTML Agility Pack or ScrapySharp for direct downloading of the resources from the Web, so you won't really need to use the classWebClient
. However, it's good to know thatWebClient
is a pretty much rudimentary tool; a really comprehensive facility for retrieving resources from the Web (Web scraping, and stuff like that) is the classSystem.Net.HttpWebRequest
:
HttpWebRequest Class (System.Net).
—SA
这篇关于如何通过使用C#提供xpath来获取任何类的innertext的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!