使用YQL多查询& XPath解析HTML,如何逃避嵌套引号? [英] Using YQL multi-query & XPath to parse HTML, how to escape nested quotes?

查看:108
本文介绍了使用YQL多查询& XPath解析HTML,如何逃避嵌套引号?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  SELECT * 
FROM query.multi
WHERE queries =
SELECT *
FROM html
WHERE url ='http://www.stumbleupon.com/url/http://www.guildwars2 .com'
AND xpath ='// li [@ class = \listLi\] / div [@ class = \views\] / a / span';
SELECT *
FROM xml
WHERE url ='http://services.digg.com/1.0/endpoint?method = story.getAll& link = http://www.guildwars2.com';
SELECT *
FROM json
WHERE url ='http://api.tweetmeme.com/url_info.json?url = http://www.guildwars2.com';
SELECT *
FROM xml
WHERE url ='http://api.facebook.com/restserver.php?method = links.getStats& urls = http://www.guildwars2.com';
SELECT *
FROM json
WHERE url ='http://www.reddit.com/button_info.json?url = http://www.guildwar s2.com'

具体来说这行,

b $ b

由于引用问题,我必须将它们三层嵌套,我已经用完了报价字符使用。我没有成功尝试过以下变体:

  //没有属性引用
xpath ='// li [ @ class = listLi] / div [@ class = views] / a / span'

//尝试引用属性w /反斜杠&单引号
xpath ='// li [@ class = \'listLi\'] / div [@ class = \'views\'] / a / span'

//尝试引用属性w /反斜杠&双引号
xpath ='// li [@ class = \listLi\] / div [@ class = \views\] / a / span'

//尝试用双引号引用属性,如SQL
xpath ='// li [@class =''listLi''] / div [@class =''views''] / a / span '

//尝试用双引号引用属性,如SQL
xpath ='// li [@class =listLi] / div [@ class = views] / a / span'

//尝试用引号实体引用属性
xpath ='// li [@ class =& quot; listLi& quot;] / div [@ class =&views&] / a / span'

//尝试用反斜杠&双引号
xpath = \// li [@ class ='listLi'] / div [@ class ='views'] / a / span \

// try用双引号环绕XPath
xpath =// li [@ class ='listLi'] / div [@ class ='views'] / a / span

所有没有成功。



我没有看到有关转义XPath字符串,但是我发现的所有东西似乎都是使用concat的变体(这不会有帮助,因为或不可用)或html实体。不使用引号的属性不会抛出错误,但是失败,因为它不是我需要实际的XPath字符串。



我在YQL文档中没有看到有关如何处理转义的任何内容,我知道这是多么边缘希望他们会有一些逃避指南。

解决方案

你需要逃避任何用XPath查询划定的字符双反斜杠 ...换句话说:

  SELECT * FROM query.multi 
WHERE queries =
SELECT *
FROM html
WHERE url =' http://www.stumbleupon.com/url/http://www.guildwars2.com'
AND xpath ='// li [@ class = \\'listLi\\'] /的div [@类= \\'views\\ '] /一个/跨度';
SELECT *
FROM xml
WHERE url ='http://services.digg.com/1.0/endpoint?method = story.getAll& link = http://www.guildwars2。 com公司;
SELECT *
FROM json
WHERE url ='http://api.tweetmeme.com/url_info.json?url = http://www.guildwars2.com';
SELECT *
FROM xml
WHERE url ='http://api.facebook.com/restserver.php?method = links.getStats& urls = http://www.guildwars2。 com公司;
SELECT *
FROM json
WHERE url ='http://www.reddit.com/button_info.json?url = http://www.guildwars2.com'

在YQL控制台中尝试此操作


The title is more complicated than it has to be, here's the problem query.

SELECT * 
FROM query.multi 
WHERE queries="
    SELECT * 
        FROM html 
        WHERE url='http://www.stumbleupon.com/url/http://www.guildwars2.com' 
        AND xpath='//li[@class=\"listLi\"]/div[@class=\"views\"]/a/span';
    SELECT * 
        FROM xml 
        WHERE url='http://services.digg.com/1.0/endpoint?method=story.getAll&link=http://www.guildwars2.com';
    SELECT * 
        FROM json 
        WHERE url='http://api.tweetmeme.com/url_info.json?url=http://www.guildwars2.com';
    SELECT * 
        FROM xml 
        WHERE url='http://api.facebook.com/restserver.php?method=links.getStats&urls=http://www.guildwars2.com';
    SELECT * 
        FROM json 
        WHERE url='http://www.reddit.com/button_info.json?url=http://www.guildwars2.com'"

Specifically this line,

xpath='//li[@class=\"listLi\"]/div[@class=\"views\"]/a/span'

It's problematic because of the quoting, I have to nest them three levels deep and I've run out of quote characters to use. I've tried the following variations without success:

//no attribute quoting
xpath='//li[@class=listLi]/div[@class=views]/a/span' 

//try to quote attribute w/ backslash & single quote
xpath='//li[@class=\'listLi\']/div[@class=\'views\']/a/span'

//try to quote attribute w/ backslash & double quote
xpath='//li[@class=\"listLi\"]/div[@class=\"views\"]/a/span'

//try to quote attribute with double single quotes, like SQL
xpath='//li[@class=''listLi'']/div[@class=''views'']/a/span'

//try to quote attribute with double double quotes, like SQL
xpath='//li[@class=""listLi""]/div[@class=""views""]/a/span'

//try to quote attribute with quote entities
xpath='//li[@class="listLi"]/div[@class="views"]/a/span'

//try to surround XPath with backslash & double quote
xpath=\"//li[@class='listLi']/div[@class='views']/a/span\"

//try to surround XPath with double double quote
xpath=""//li[@class='listLi']/div[@class='views']/a/span""

All without success.

I don't see much out there about escaping XPath strings but everything I've found seems to be variations on using concat (which won't help because neither ' nor " are available) or html entities. Not using quotes for the attributes doesn't throw an error but fails because it's not the actual XPath string I need.

I don't see anything in the YQL docs about how to handle escaping. I'm aware of how edge-casey this is but was hoping they'd have some sort of escaping guide.

解决方案

You need to escape whatever character is delimiting your XPath query with a double backslash... in other words:

SELECT * FROM query.multi 
WHERE queries="
    SELECT * 
        FROM html 
        WHERE url='http://www.stumbleupon.com/url/http://www.guildwars2.com' 
        AND xpath='//li[@class=\\'listLi\\']/div[@class=\\'views\\']/a/span';
    SELECT * 
        FROM xml 
        WHERE url='http://services.digg.com/1.0/endpoint?method=story.getAll&link=http://www.guildwars2.com';
    SELECT * 
        FROM json 
        WHERE url='http://api.tweetmeme.com/url_info.json?url=http://www.guildwars2.com';
    SELECT * 
        FROM xml 
        WHERE url='http://api.facebook.com/restserver.php?method=links.getStats&urls=http://www.guildwars2.com';
    SELECT * 
        FROM json 
        WHERE url='http://www.reddit.com/button_info.json?url=http://www.guildwars2.com'"

(try this in the YQL console)

这篇关于使用YQL多查询& XPath解析HTML,如何逃避嵌套引号?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆