将scrapy输出中的元素列表分成单独的行 [英] divide list of elements in scrapy output into seperate rows
问题描述
我试图将 Scrapy 的输出分成 Excel 文件中的单独行,但我得到了这样的结果
I am trying to separate the output from Scrapy into separate lines in an Excel file but I get something like this
换句话说,变体 ID、价格和名称的每个输出都应该放在 Excel 中的单独行中.
In other words each output from variant id, price and name should be in placed in seperate lines in Excel.
我使用scrapy-xlsx 0.1.1 库将输出导出到xlsx 文件(不能在csv 中).
I am using scrapy-xlsx 0.1.1 library to export output to xlsx file (it cannot be in csv).
请告诉我问题出在哪里.
Please tell me where is the issue.
import scrapy
from ..items import ZooplusItem
import re
class ZooplusDeSpider(scrapy.Spider):
name = 'zooplus_de'
allowed_domains = ['zooplus.de']
start_urls = ['https://www.zooplus.de/shop/hunde/hundefutter_trockenfutter/diaetfutter']
def parse(self, response):
for link in response.css('.MuiGrid-root.MuiGrid-container.MuiGrid-spacing-xs-2.MuiGrid-justify-xs-flex-end'):
items = ZooplusItem()
redirect_urls = response.request.meta.get('redirect_urls')
items['url'] = link.redirect_urls[0] if redirect_urls else response.request.url
items['product_url'] = link.css('.MuiGrid-root.product-image a::attr(href)').getall()
items['title'] = link.css('h3 a::text').getall()
items['id'] = link.css('h3 a::attr(id)').getall()
items['review'] = link.css('span.sc-fzoaKM.kVcaXm::text').getall()
items['review'] = re.sub(r'\D', " ", str(items['review']))
items['review'] = items['review'].replace(" ", "")
#items['review'] = int(items['review'])
items['rate'] = len(link.css('a.v3-link i[role=full-star]'))
items['variant_id'] = [i.strip().split('/n') for i in link.css('.jss114.jss115::text').extract()]
items['variant_name'] = [i.strip().split('/n') for i in link.css('.sc-fzqARJ.cHdpSy:not(.jss114.jss115)::text').extract()]
items['variant_price'] = [i.strip().split('/n') for i in link.css('div.product__prices_col meta::attr(content)').extract()]
yield items
推荐答案
如果您想存储所有具有重复公共信息的变体,那么您需要遍历每个变体并分别产生它们.您可以复制已收集的常用信息并添加到其中.
If you want to store all the variants with common information duplicated, then you need to loop through each variant and yield that separately. You can copy the common information you've already collected and add to that.
总结替换
items['variant_id'] = [i.strip().split('/n') for i in link.css('.jss114.jss115::text').extract()]
items['variant_name'] = [i.strip().split('/n') for i in link.css('.sc-fzqARJ.cHdpSy:not(.jss114.jss115)::text').extract()]
items['variant_price'] = [i.strip().split('/n') for i in link.css('div.product__prices_col meta::attr(content)').extract()]
yield item
类似的东西
for i in link.css("[data-zta='product-variant']"):
variant = items.copy()
variant["variant_id"] = i.attrib["data-variant-id"]
variant["variant_name"] = "".join(i.css(".title > div::text").getall()).strip()
variant['variant_price'] = i.css("[itemprop='price']::attr(content)").get()
yield variant
这篇关于将scrapy输出中的元素列表分成单独的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!