XPath查询,HtmlAgilityPack和提取文本 [英] XPATH query, HtmlAgilityPack and Extracting Text

查看:449
本文介绍了XPath查询,HtmlAgilityPack和提取文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在试图从一个叫做tim_new类的链接。我一直在考虑一个解决方案,以及

I had been trying to extract links from a class called "tim_new" . I have been given a solution as well.

这两个解决方案,摘要和必要的信息给出的这里

Both the solution, snippet and necessary information is given here

该XPath查询是// a [@类='tim_new'] ,我的问题是,这是怎么查询片断的第一行区分(在上面的链接,第二给定片段的线)。

The said XPATH query was "//a[@class='tim_new'], my question is, how did this query differentiate between the first line of the snippet (given in the link above and the second line of the snippet).

更具体地讲,这是什么XPath查询的直译(英文)。

More specifically, what is the literal translation (in English) of this XPATH query.

此外,我想要写的几行代码来提取书面反对文本NSE:

Furthermore, I want to write a few lines of code to extract the text written against NSE:

<div class="FL gL_12 PL10 PT15">BSE: 523395 &nbsp;&nbsp;|&nbsp;&nbsp; NSE: 3MINDIA &nbsp;&nbsp;|&nbsp;&nbsp; ISIN: INE470A01017</div>

在会形成必然选择查询感激帮助。

Would appreciate help in forming the necessary selection query.

我的代码写为:

IEnumerable<string> NSECODE = doc.DocumentNode.SelectSingleNode("//div[@NSE:]");



但这并不期待权。希望得到一些帮助。

But this doesnt look right. Would appreciate some help.

推荐答案

在第一选择中的XPath写着选择有一个名为类的属性与所有的文档元素tim_new的价值。括号里的东西是不是你回来的东西,它是你申请到搜索的标准。

The XPath in the first selection reads "select all document elements that have an attribute named class with a value of tim_new". The stuff in brackets is not what you're returning, it's the criteria you're applying to the search.

我没有HTML敏捷包,但如果您要查询该具有的divNSE:作为其文本,你的第二个查询的XPath应该只是//格那么你会希望使用LINQ过滤

I don't have the HTML Agility pack, but if you are trying to query the divs that have "NSE:" as its text, your XPath for the second query should just be "//div" then you'll want to filter using LINQ.

类似

var nodes = 
    doc.DocumentNode.SelectNodes("//div[text()]").Where(a => a.InnerText.IndexOf("NSE:") > -1);



因此,在英语中,返回所有直接包含文本LINQ的div元素,然后检查内部文本值包含NSE:。
同样,我不知道语法是完美的,但是这是想法

So in English, "Return all the div elements that immediately contain text to LINQ, then check that the inner text value contains NSE:". Again, I'm not sure the syntax is perfect, but that's the idea.

中的XPath// DIV [@NSE:]将返回:是不是在属性名允许有而得名,NSE:,因为这将是非法的属性反正所有div。您是在寻找的元素,而不是它的一个属性。

The XPath "//div[@NSE:]" would return all divs that have and attribute named, NSE:, which would be illegal anyway because ":" isn't allowed in an attribute name. Youre looking for the text of the element, not one of its attributes.

希望帮助文本。

注:如果您在<嵌套的div既包含文本; DIV> NSE:一些文字< DIV> NSE:以上文字< / DIV>< / DIV> 你会得到重复的结果。

Note: If you have nested divs that both contain text as in <div>NSE: some text<div>NSE: more text</div></div> you're going to get duplicate results.

这篇关于XPath查询,HtmlAgilityPack和提取文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆