获取xpath()以返回空值 [英] Get xpath() to return empty values
问题描述
我遇到很多<b>
标签的情况:
I have a situation where I have a lot of <b>
tags:
<b>12</b>
<b>13</b>
<b>14</b>
<b></b>
<b>121</b>
如您所见,倒数第二个标签为空.当我打电话时:
As you can see, the second last tag is empty. When I call:
sel.xpath('b/text()').extract()
哪个给我:
['12', '13', '14', '121']
我想拥有:
['12', '13', '14', '', '121']
有没有一种方法可以获取空值?
Is there a way to get the empty value?
我目前的解决方法是致电:
My current work around is to call:
sel.xpath('b').extract()
然后自己解析每个html标签(这里是空标签,这就是我想要的).
And then parsing through each html tag myself (the empty tags are here, which is what I want).
推荐答案
在这里可以手动剥离标签并获取文本.您可以使用 remove_tags()
函数href ="https://github.com/scrapy/w3lib" rel ="nofollow"> w3lib
:
This is where it is okay to manually strip the tags and get the text. You can use remove_tags()
function provided by w3lib
:
>>> from w3lib.html import remove_tags
>>> map(remove_tags, sel.xpath('//b').extract())
[u'12', u'13', u'14', u'', u'121']
请注意,w3lib
是 Scrapy依赖项,在内部使用.无需单独安装.
Note that w3lib
is a Scrapy dependency and is used internally. No need to install it separately.
Also, it would be better to use Scrapy
Input and Output Processors here. Continue using sel.xpath('b')
and define an input processor. For example, you can define it for specific Field
s for the Item
class:
from scrapy.contrib.loader.processor import MapCompose
from scrapy.item import Item, Field
from w3lib.html import remove_tags
class MyItem(Item):
my_field = Field(input_processor=MapCompose(remove_tags))
这篇关于获取xpath()以返回空值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!