Webharvest If和null测试 [英] Webharvest If and null test

查看:105
本文介绍了Webharvest If和null测试的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试让我的程序检查xpath表达式的返回,如果它为null,则应尝试其他表达式,我该怎么做?我已经尝试了网站上的所有示例,并且空白单引号不会编译.

I'm trying to make my program check the return of an xpath expression and if it is null it should try a different one, how do I do this? I have tried all the examples on the website and the blank single quotes will not compile.

    <var-def name="googleResults">
    <xpath expression="//div[@id='center_col']//div[@id='search']//div[@id='ires']//ol/li/div//b/div/text()">
        <html-to-xml>
            <http url="http://google.com/shopping?q=asus laptops&amp;hl=en"/>
        </html-to-xml>
    </xpath>
</var-def>

<var-def name="productTruth">
    <case>
        <if condition="${googleResults != null}">
            <var name="googleResults"/>
        </if>
        <else>
            <xpath expression="//div[@id='center_col']//div[@id='search']//div[@id='ires']//ol/li/div//b/text()">
                <html-to-xml>
                    <http url="http://google.com/shopping?q=asus laptops&amp;hl=en"/>
                </html-to-xml>
            </xpath>
        </else>
    </case>
</var-def>

还有什么方法可以操纵已定义的变量以排除字符串的某些部分(例如符号和数字)?

Also is there any way to manipulate a defined variable to exclude certain parts of strings like symbols and numbers?

推荐答案

我发现了与您相同的问题,其中的示例来自

I have found the same problem as you, where the example from the official WH user manual does not work, because of double single quotes.

作为解决方法,我使用:variable.toString().length() > 0

as a work around I use: variable.toString().length() > 0

这是您的代码:

<var-def name="googleResults">
    <xpath expression="//div[@id='center_col']//div[@id='search']//div[@id='ires']//ol/li/div//b/div/text()">
        <html-to-xml>
            <http url="http://google.com/shopping?q=asus laptops&amp;hl=en"/>
        </html-to-xml>
    </xpath>
</var-def>

<var-def name="productTruth">
    <case>
        <if condition="${googleResults.toString().length() > 0}">
            <var name="googleResults"/>
        </if>
        <else>
            <xpath expression="//div[@id='center_col']//div[@id='search']//div[@id='ires']//ol/li/div//b/text()">
                <html-to-xml>
                    <http url="http://google.com/shopping?q=asus laptops&amp;hl=en"/>
                </html-to-xml>
            </xpath>
        </else>
    </case>
</var-def>

另外,关于代码的一些一般注意事项:

Also, a few notes on your code in general:

1)实际上,下载页面是Web收获中最耗时和最消耗内存的部分.如果第一个xpath没有收集到您想要的信息,您最终将重新下载该页面(重新运行http请求).将http请求的结果保存在变量中,然后您可以重新查询结果,而无需重复下载-这也限制了您访问源服务器的次数,如果要抓取多个页面,这将成为一个问题.

1) Actually downloading the page is the most time and memory - consuming part of web harvest. If the information you want is not collected by the first xpath, you end up re-downloading the page (re-running the http request). save the result of the http request in a variable and you can then re-query the result, without repeating the download - this also limits the number of times you hit the source server, which becomes an issue if you have multiple pages to scrape.

    <var-def name="pagetext">
            <html-to-xml>
                <http url="http://google.com/shopping?q=asus laptops&amp;hl=en"/>
            </html-to-xml>
    </var-def>

    <var-def name="googleResults">
        <xpath expression="//div[@id='center_col']//div[@id='search']//div[@id='ires']//ol/li/div//b/div/text()">
            <var name="pagetext"/>
        </xpath>
    </var-def>

    <var-def name="productTruth">
        <case>
            <if condition="${googleResults.toString().length() > 0}">
                <var name="googleResults"/>
            </if>
            <else>
                <xpath expression="//div[@id='center_col']//div[@id='search']//div[@id='ires']//ol/li/div//b/text()">
                    <var name="pagetext"/>
                </xpath>
            </else>
        </case>
    </var-def>

2)您可以通过更改xpath避免整个条件:

2) you can avoid the whole conditional by changing the xpath:

//div[@id='center_col']//div[@id='search']//div[@id='ires']//ol/li/div//b/descendant-or-self::text()

    <var-def name="pagetext">
            <html-to-xml>
                <http url="http://google.com/shopping?q=asus laptops&amp;hl=en"/>
            </html-to-xml>
    </var-def>

    <var-def name="googleResults">
        <xpath expression="//div[@id='center_col']//div[@id='search']//div[@id='ires']//ol/li/div//b/descendant-or-self::text()">
            <var name="pagetext"/>
        </xpath>
    </var-def>

这篇关于Webharvest If和null测试的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆