无法摆脱 csv 输出中的空白行 [英] Can't get rid of blank rows in csv output

查看:31
本文介绍了无法摆脱 csv 输出中的空白行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 python scrapy 中编写了一个非常小的脚本来解析在来自黄页网站的多个页面上显示的姓名、街道和电话号码.当我运行我的脚本时,我发现它运行顺利.但是,我遇到的唯一问题是在 csv 输出中抓取数据的方式.它始终是两行之间的线(行)间隙.我的意思是:数据每隔一行打印一次.看到下面的图片你就会明白我的意思.如果不是scrapy,我本可以使用[newline=''].但是,不幸的是,我在这里完全无能为力.我怎样才能摆脱 csv 输出中出现的空行?提前感谢您查看它.

I've written a very tiny script in python scrapy to parse name, street and phone number displayed across multiple pages from yellowpage website. When I run my script i find it working smoothly. However, the only problem i encounter is the way data are getting scraped in csv output. It is always a line (row) gap between two rows. What I meant is: data are getting printed in every other row. Seeing the picture below you will get to know what I meant. If it were not for scrapy, I could have used [newline='']. But, unfortunately I am totally helpless here. How can i get rid of blank lines coming along in the csv output? Thanks in advance to take a look into it.

items.py 包括:

items.py includes:

import scrapy

class YellowpageItem(scrapy.Item):
    name = scrapy.Field()
    street = scrapy.Field()
    phone = scrapy.Field()

这是蜘蛛:

import scrapy

class YellowpageSpider(scrapy.Spider):
    name = "YellowpageSp"
    start_urls = ["https://www.yellowpages.com/search?search_terms=Pizza&geo_location_terms=Los%20Angeles%2C%20CA&page={0}".format(page) for page in range(2,6)]

    def parse(self, response):
        for titles in response.css('div.info'):
            name = titles.css('a.business-name span[itemprop=name]::text').extract_first()
            street = titles.css('span.street-address::text').extract_first()
            phone = titles.css('div[itemprop=telephone]::text').extract_first()
            yield {'name': name, 'street': street, 'phone':phone}

这是 csv 输出的样子:

Here is how the csv output looks like:

顺便说一句,我用来获取 csv 输出的命令是:

Btw, the command I'm using to get csv output is:

scrapy crawl YellowpageSp -o items.csv -t csv

推荐答案

您可以通过创建新的 FeedExporter 来修复它.更改您的 settings.py 如下

You can fix it by creating a new FeedExporter. Change your settings.py as below

FEED_EXPORTERS = {
    'csv': 'project.exporters.FixLineCsvItemExporter',
}

在你的项目中创建一个 exporters.py

create a exporters.py in your project

exporters.py

import io
import os
import six
import csv

from scrapy.contrib.exporter import CsvItemExporter
from scrapy.extensions.feedexport import IFeedStorage
from w3lib.url import file_uri_to_path
from zope.interface import implementer


@implementer(IFeedStorage)
class FixedFileFeedStorage(object):

    def __init__(self, uri):
        self.path = file_uri_to_path(uri)

    def open(self, spider):
        dirname = os.path.dirname(self.path)
        if dirname and not os.path.exists(dirname):
            os.makedirs(dirname)
        return open(self.path, 'ab')

    def store(self, file):
        file.close()


class FixLineCsvItemExporter(CsvItemExporter):

    def __init__(self, file, include_headers_line=True, join_multivalued=',', **kwargs):
        super(FixLineCsvItemExporter, self).__init__(file, include_headers_line, join_multivalued, **kwargs)
        self._configure(kwargs, dont_fail=True)
        self.stream.close()
        storage = FixedFileFeedStorage(file.name)
        file = storage.open(file.name)
        self.stream = io.TextIOWrapper(
            file,
            line_buffering=False,
            write_through=True,
            encoding=self.encoding,
            newline="",
        ) if six.PY3 else file
        self.csv_writer = csv.writer(self.stream, **kwargs)

我使用的是 Mac,因此无法测试其 Windows 行为.但如果以上不起作用,则更改以下部分代码并设置 newline=" "

I am on Mac, so can't test its windows behavior. But if above doesn't work then change below part of code and set newline=" "

        self.stream = io.TextIOWrapper(
            file,
            line_buffering=False,
            write_through=True,
            encoding=self.encoding,
            newline="
",
        ) if six.PY3 else file

这篇关于无法摆脱 csv 输出中的空白行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆