获取xpath()以返回空值 [英] Get xpath() to return empty values

查看:651
本文介绍了获取xpath()以返回空值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到很多<b>标签的情况:

I have a situation where I have a lot of <b> tags:

<b>12</b>
<b>13</b>
<b>14</b>
<b></b>
<b>121</b>

如您所见,倒数第二个标签为空.当我打电话时:

As you can see, the second last tag is empty. When I call:

sel.xpath('b/text()').extract()

哪个给我:

['12', '13', '14', '121']

我想拥有:

['12', '13', '14', '', '121']

有没有一种方法可以获取空值?

Is there a way to get the empty value?

我目前的解决方法是致电:

My current work around is to call:

sel.xpath('b').extract()

然后自己解析每个html标签(这里是空标签,这就是我想要的).

And then parsing through each html tag myself (the empty tags are here, which is what I want).

推荐答案

在这里可以手动剥离标签并获取文本.您可以使用 remove_tags() 函数href ="https://github.com/scrapy/w3lib" rel ="nofollow"> w3lib :

This is where it is okay to manually strip the tags and get the text. You can use remove_tags() function provided by w3lib:

>>> from w3lib.html import remove_tags
>>> map(remove_tags, sel.xpath('//b').extract())
[u'12', u'13', u'14', u'', u'121']

请注意,w3lib Scrapy依赖项,在内部使用.无需单独安装.

Note that w3lib is a Scrapy dependency and is used internally. No need to install it separately.

此外,最好使用

Also, it would be better to use Scrapy Input and Output Processors here. Continue using sel.xpath('b') and define an input processor. For example, you can define it for specific Fields for the Item class:

from scrapy.contrib.loader.processor import MapCompose
from scrapy.item import Item, Field
from w3lib.html import remove_tags

class MyItem(Item):
    my_field = Field(input_processor=MapCompose(remove_tags)) 

这篇关于获取xpath()以返回空值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆