normalize-space 只适用于 xpath 而不是 css 选择器 [英] normalize-space just works with xpath not css selector
问题描述
我正在使用scrapy和python提取数据.
i am extracting data using scrapy and python.
数据有时包含空格.我正在使用 normalize-space
和 xpath 来删除这些空格,如下所示:
the data sometimes include spaces. i was using normalize-space
with xpath to remove those spaces like this:
xpath('normalize-space(.//li[2]/strong/text())').extract()
这话说得很好.但是,现在我想将 normalize-space
与 css 选择器一起使用.
It words very good. However, now i want to use normalize-space
with css selector.
我试过了:
car['Location'] = site.css('normalize-space(div[class=location]::text)').extract()
我得到了空结果,但如果我删除了规范化空间,我会得到正确的结果..
I got empty result though i get correct result if i removed the normalize-space..
请问如何在css选择器中使用它?
please how to use it with css selector?
def normalize_whitespace(str):
import re
str = str.strip()
str = re.sub(r'\s+', ' ', str)
return str
我这样称呼这个功能:
car['Location'] = normalize_whitespace(site.css('div[class=location]::text').extract())
但我得到了空结果.为什么?
but i got empty result. why please?
推荐答案
遗憾的是,Scrapy 中的 CSS 选择器不提供 XPath 函数.
Unfortunately, XPath functions are not available with CSS selectors in Scrapy.
您可以先将 div[class=location]::text
CSS 选择器转换为等效的 XPath 表达式,然后将其包装在 normalize-space()
中作为输入到 .xpath()
.
You could first translate your div[class=location]::text
CSS selector to the equivalent XPath expression and then wrap it in normalize-space()
as input to .xpath()
.
无论如何,由于您只对最终的空白规范化"字符串感兴趣,您可以在 CSS 选择器提取的输出上使用 Python 函数实现相同的效果.
Anyhow, as you are only interested in a final "whitespace-normalized" string, you could achieve the same with a Python function on the output of the CSS selector extract.
参见例如 http://snipplr.com/view/50410/normalize-whitespace/ :
def normalize_whitespace(str):
import re
str = str.strip()
str = re.sub(r'\s+', ' ', str)
return str
如果你在你的 Scrapy 项目的某个地方包含这个函数,你可以像这样使用它:
If you include this function somewhere in your Scrapy project, you could use it like this:
car['Location'] = normalize_whitespace(
u''.join(site.css('div[class=location]::text').extract()))
或
car['Location'] = normalize_whitespace(
site.css('div[class=location]::text').extract()[0])
这篇关于normalize-space 只适用于 xpath 而不是 css 选择器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!