将scrapy输出中的元素列表分成单独的行 [英] divide list of elements in scrapy output into seperate rows

查看:44
本文介绍了将scrapy输出中的元素列表分成单独的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图将 Scrapy 的输出分成 Excel 文件中的单独行,但我得到了这样的结果

I am trying to separate the output from Scrapy into separate lines in an Excel file but I get something like this

换句话说,变体 ID、价格和名称的每个输出都应该放在 Excel 中的单独行中.

In other words each output from variant id, price and name should be in placed in seperate lines in Excel.

我使用scrapy-xlsx 0.1.1 库将输出导出到xlsx 文件(不能在csv 中).

I am using scrapy-xlsx 0.1.1 library to export output to xlsx file (it cannot be in csv).

请告诉我问题出在哪里.

Please tell me where is the issue.

import scrapy
from ..items import ZooplusItem
import re
class ZooplusDeSpider(scrapy.Spider):
name = 'zooplus_de'
allowed_domains = ['zooplus.de']
start_urls = ['https://www.zooplus.de/shop/hunde/hundefutter_trockenfutter/diaetfutter']

def parse(self, response):
    for link in response.css('.MuiGrid-root.MuiGrid-container.MuiGrid-spacing-xs-2.MuiGrid-justify-xs-flex-end'):
        items = ZooplusItem()
        redirect_urls = response.request.meta.get('redirect_urls')
        items['url'] = link.redirect_urls[0] if redirect_urls else response.request.url
        items['product_url'] = link.css('.MuiGrid-root.product-image a::attr(href)').getall()
        items['title'] = link.css('h3 a::text').getall()
        items['id'] = link.css('h3 a::attr(id)').getall()

        items['review'] = link.css('span.sc-fzoaKM.kVcaXm::text').getall()
        items['review'] = re.sub(r'\D', " ", str(items['review']))
        items['review'] = items['review'].replace(" ", "")
        #items['review'] = int(items['review'])

        items['rate'] = len(link.css('a.v3-link i[role=full-star]'))
        items['variant_id'] = [i.strip().split('/n') for i in link.css('.jss114.jss115::text').extract()]
        items['variant_name'] = [i.strip().split('/n') for i in link.css('.sc-fzqARJ.cHdpSy:not(.jss114.jss115)::text').extract()]
        items['variant_price'] = [i.strip().split('/n') for i in link.css('div.product__prices_col meta::attr(content)').extract()]

        yield items

推荐答案

如果您想存储所有具有重复公共信息的变体,那么您需要遍历每个变体并分别产生它们.您可以复制已收集的常用信息并添加到其中.

If you want to store all the variants with common information duplicated, then you need to loop through each variant and yield that separately. You can copy the common information you've already collected and add to that.

总结替换

items['variant_id'] = [i.strip().split('/n') for i in link.css('.jss114.jss115::text').extract()]
items['variant_name'] = [i.strip().split('/n') for i in link.css('.sc-fzqARJ.cHdpSy:not(.jss114.jss115)::text').extract()]
items['variant_price'] = [i.strip().split('/n') for i in link.css('div.product__prices_col meta::attr(content)').extract()]

yield item

类似的东西

for i in link.css("[data-zta='product-variant']"):
    variant = items.copy()
    variant["variant_id"] = i.attrib["data-variant-id"]
    variant["variant_name"] = "".join(i.css(".title > div::text").getall()).strip()
    variant['variant_price'] = i.css("[itemprop='price']::attr(content)").get()
 
    yield variant

这篇关于将scrapy输出中的元素列表分成单独的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆