使用python和scrapy在逗号上拆分 [英] Split on comma using python and scrapy

查看:45
本文介绍了使用python和scrapy在逗号上拆分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用scrapy从某个网站提取数据,我有一个正在提取的字段,它返回城市和区域我希望能够在逗号上拆分返回的数据并将其第一部分存储在city 字段及其在 region 字段中的第二部分用于提取数据的代码:

Am using scrapy to extract data from a certain website, I have a field am extracting that returns both the city and the region I want to be able to split the returned data on the comma and store the first part of it inside the city field and second part of it in the region field The code am using to extract the data :

 loader.add_css('region','.seller-box__seller-address__label::text')

数据的输出是:一个名为 region 的列,例如这个值:

the output of the data is : a column named region with for example this value :

Elbląg, Warmińsko-mazurskie

Elbląg, Warmińsko-mazurskie

所需的输出将是两列,第一列是值为:Elbląg 的城市和值为:Warmińsko-mazurskie 的区域

the desired output would be two columns the first being city with the value of : Elbląg and region with the value of : Warmińsko-mazurskie

更新:

显然,加载程序可以为正则表达式提供额外的参数:我能够通过传递来拆分数据

apprently the loader can take an additional arrgument for regular expressions : i was able to split the data by passing

loader.add_css('region','.seller-box__seller-address__label::text',re='([^,]+)$')

这将删除逗号之前的所有内容.

This will remove everything before the comma.

推荐答案

不知道loader有没有特殊的方法把值拆分成两个字段.

I don't know if loader has special method for split value into two fields.

通常我会这样做

text = response.css('.seller-box__seller-address__label::text').extract_first().strip()

city, region = tex.split(', ') 

loader.add_value('city', city)
loader.add_value('region', region)

这篇关于使用python和scrapy在逗号上拆分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆