使用Yahoo YQL查询html [英] Querying html using Yahoo YQL

查看:117
本文介绍了使用Yahoo YQL查询html的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在尝试使用YQL提供的Yahoo查询语言和xpath功能解析html时,遇到了无法提取text()或属性值的问题。

例如,

perma link

  select * from html where url =http: //backoverflow.com
和xpath ='// div / h3 / a'

将锚点列表作为xml

 < results> 
< a class =question-hyperlinkhref =/ questions / 661184 /当按钮被点击时用文本填充文本区域title =In ASP.net,当点击一个按钮时,我需要代码来填充文本区域(在表单中)。你能通过显示一个简单的包含脚本标签的.aspx代码来帮助我完成操作吗?>用文本区域填充文本区域单击按钮时的文本< / a> ...
< / results>

现在,当我尝试使用 $ b $提取节点值时b

  select * from html where url =http://stackoverflow.com
和xpath ='// div / h3 / a / text()'

我得到结果连接而不是节点列表
eg

 < results> Xcode:附加到远程进程进行调试为什么是b 
......< / results>

我如何将其分成节点列表,以及如何选择属性值



像这样的查询

  select * from html where url = http://stackoverflow.com
和xpath ='// div / h3 / a [@href]'

给了我同样的结果来查询 div / h3 / a

解决方案

YQL要求xpath表达式求值为itemPath而不是节点文本。但是,一旦你有一个itemPath,你可以从树中投射各种值。

换句话说,ItemPath应该指向结果HTML中的节点而不是文本内容/属性。当您从数据中选择*时,YQL返回所有匹配的节点及其子节点。



示例

  select * from html where url =http://stackoverflow.comand xpath ='// div / h3 / a'

这将返回与xpath匹配的所有a。现在要投影文本内容,您可以使用

 从html中选择内容,其中url =http:// stackoverflow。 com和xpath ='// div / h3 / a'

content返回文本内容在节点内部举行。



为了突出显示属性,您可以指定它相对于xpath表达式。在这种情况下,由于您需要与href有关的href。

 从html中选择href其中url =http:/ /stackoverflow.com和xpath ='// div / h3 / a'

返回
< results>
....
< / results>



如果您需要属性'href'和textContent,则可以执行以下YQL查询:

 选择href,html中的内容where url =http://stackoverflow.com和xpath ='// div / h3 / a'

返回:

 <结果> < a href =/ questions / 663950 / double-pointer-const-issue-issue>双指针const问题< / a> ...< / results> 

希望有所帮助。让我知道你是否对YQL有更多问题。


While trying to parse html using Yahoo Query Language and xpath functionality provided by YQL, I ran into problems of not being able to extract "text()" or attribute values.
For e.g.
perma link

select * from html where url="http://stackoverflow.com" 
and xpath='//div/h3/a'

gives a list of anchors as xml

<results>
    <a class="question-hyperlink" href="/questions/661184/filling-the-text-area-with-the-text-when-a-button-is-clicked" title="In ASP.net, I need the code to fill the text area (in the form) when a button is clicked. Can you help me through by showing a simple .aspx code containing the script tag? ">Filling the text area with the text when a button is clicked</a>...
</results> 

Now when I try to extract the node value using

select * from html where url="http://stackoverflow.com" 
and xpath='//div/h3/a/text()'

I get results concatenated rather than a node list e.g.

<results>Xcode: attaching to a remote process for debuggingWhy is b
…… </results>

How do I separate it into node lists and how do I select attribute values ?

A query like this

select * from html where url="http://stackoverflow.com"
and xpath='//div/h3/a[@href]'

gave me the same results for querying div/h3/a

解决方案

YQL requires the xpath expression to evaluate to an itemPath rather than node text. But once you have an itemPath you can project various values from the tree

In other words an ItemPath should point to the Node in the resulting HTML rather than text content/attributes. YQL returns all matching nodes and their children when you select * from the data.

example

select * from html where url="http://stackoverflow.com" and xpath='//div/h3/a'

This returns all the a's matching the xpath. Now to project the text content you can project it out using

select content from html where url="http://stackoverflow.com" and xpath='//div/h3/a'

"content" returns the text content held within the node.

For projecting out attributes, you can specify it relative to the xpath expression. In this case, since you need the href which is relative to a.

select href from html where url="http://stackoverflow.com" and xpath='//div/h3/a'

this returns <results> <a href="/questions/663973/putting-a-background-pictures-with-leds"/> <a href="/questions/663013/advantages-and-disadvantages-of-popular-high-level-languages"/> .... </results>

If you needed both the attribute 'href' and the textContent, then you can execute the following YQL query:

select href, content from html where url="http://stackoverflow.com" and xpath='//div/h3/a'

returns:

<results> <a href="/questions/663950/double-pointer-const-issue-issue">double pointer const issue issue</a>... </results>

Hope that helps. let me know if you have more questions on YQL.

这篇关于使用Yahoo YQL查询html的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆