normalize-space 只适用于 xpath 而不是 css 选择器 [英] normalize-space just works with xpath not css selector

查看:80
本文介绍了normalize-space 只适用于 xpath 而不是 css 选择器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用scrapy和python提取数据.

i am extracting data using scrapy and python.

数据有时包含空格.我正在使用 normalize-space 和 xpath 来删除这些空格,如下所示:

the data sometimes include spaces. i was using normalize-space with xpath to remove those spaces like this:

xpath('normalize-space(.//li[2]/strong/text())').extract()

这话说得很好.但是,现在我想将 normalize-space 与 css 选择器一起使用.

It words very good. However, now i want to use normalize-space with css selector.

我试过了:

car['Location'] = site.css('normalize-space(div[class=location]::text)').extract()

我得到了空结果,但如果我删除了规范化空间,我会得到正确的结果..

I got empty result though i get correct result if i removed the normalize-space..

请问如何在css选择器中使用它?

please how to use it with css selector?

def normalize_whitespace(str):
        import re
        str = str.strip()
        str = re.sub(r'\s+', ' ', str)
        return str

我这样称呼这个功能:

car['Location'] = normalize_whitespace(site.css('div[class=location]::text').extract())

但我得到了空结果.为什么?

but i got empty result. why please?

推荐答案

遗憾的是,Scrapy 中的 CSS 选择器不提供 XPath 函数.

Unfortunately, XPath functions are not available with CSS selectors in Scrapy.

您可以先将 div[class=location]::text CSS 选择器转换为等效的 XPath 表达式,然后将其包装在 normalize-space() 中作为输入到 .xpath().

You could first translate your div[class=location]::text CSS selector to the equivalent XPath expression and then wrap it in normalize-space() as input to .xpath().

无论如何,由于您只对最终的空白规范化"字符串感兴趣,您可以在 CSS 选择器提取的输出上使用 Python 函数实现相同的效果.

Anyhow, as you are only interested in a final "whitespace-normalized" string, you could achieve the same with a Python function on the output of the CSS selector extract.

参见例如 http://snipplr.com/view/50410/normalize-whitespace/ :

def normalize_whitespace(str):
    import re
    str = str.strip()
    str = re.sub(r'\s+', ' ', str)
    return str

如果你在你的 Scrapy 项目的某个地方包含这个函数,你可以像这样使用它:

If you include this function somewhere in your Scrapy project, you could use it like this:

    car['Location'] = normalize_whitespace(
        u''.join(site.css('div[class=location]::text').extract()))

    car['Location'] = normalize_whitespace(
        site.css('div[class=location]::text').extract()[0])

这篇关于normalize-space 只适用于 xpath 而不是 css 选择器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆