使用YQL提取HTML内容? [英] Extract HTML content using YQL?
本文介绍了使用YQL提取HTML内容?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
假设我要使用以下标记从网页中提取数据:
Let say I want to extract data from a web page with the following markup:
<table>
<tr>
<td><a href="Link 1">Column 1 Text</a></td>
<td>Column 2 Text</td>
<td>Column 3 Text</td>
</tr>
<tr>
<td><a href="Link 2">Column 1 Text</a></td>
<td>Column 2 Text</td>
<td>Column 3 Text</td>
</tr>
...
</table>
转换为 JSON 格式:
[
{
link: 'Link 1',
text: 'Column 1 Text',
data: 'Column 3 Text'
},
{
link: 'Link 2',
text: 'Column 1 Text',
data: 'Column 3 Text'
}
]
我们可以用YQL做到吗?如果是,请给我一个示例查询.
Can we make it with YQL? If yes then please give me an example query.
任何帮助将不胜感激!
推荐答案
使用HTML表和一些XPath查询,这是一个很好的起点查询(请参阅
Here's a query that's a good starting point, using the HTML table along with some XPath query (see Extracting HTML Content With XPath for more details on this technique):
哪个会产生这样的JSON结果:
Which produces JSON results like this:
{
"query": {
"count": 2,
"created": "2012-01-06T20:16:46Z",
"lang": "en-US",
"results": {
"tr": [
{
"td": [
{
"a": {
"href": "Link%201",
"content": "Column 1 Text"
}
},
{
"p": "Column 2 Text"
},
{
"p": "Column 3 Text"
}
]
},
{
"td": [
{
"a": {
"href": "Link%202",
"content": "Column 1 Text"
}
},
{
"p": "Column 2 Text"
},
{
"p": "Column 3 Text"
}
]
}
]
}
}
}
这篇关于使用YQL提取HTML内容?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文