Python:Scrapy CSV导出错误? [英] Python: Scrapy CSV exports incorrectly?

查看:184
本文介绍了Python:Scrapy CSV导出错误?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我只是想写一个csv。但是,我有两个单独的语句,因此每个for语句的数据独立导出并且打破顺序。建议?

  def parse(self,response):
hxs = HtmlXPathSelector(response)
titles = hxs.select('// td [@ class =title]')
subtext = hxs.select('// td [@ class =subtext]')
items = []
标题中的标题:
item = HackernewsItem()
item [title] = title.select(a / text())extract()
item [url] = title.select(a / @ href)。extract()
items.append(item)
在子文本中的分数:
item = HackernewsItem()
item [score] = score.select(span / text())extract()
items.append(item)
返回项

如下图所示,第二个for-statement打印在其他代码之下,而不是其他的 p>

附加CSV图片:



和gi thub link for full file: https://github.com/nchlswtsn/scrapy/ blob / master / items.csv

解决方案

您的导出元素顺序与您在CSV文件中找到的符合逻辑,首先导出所有的标题,然后导出所有的子文本元素。

我想你正在尝试删除HN文章,这里是我的建议:

  def parse(self,response):
hxs = HtmlXPathSelector(response)
titles = hxs.select('// td [@ class =title]')
items = []
标题中的标题:
item = HackernewsItem()
item [title] = title.select(a / text())。 extract()
item [url] = title.select(a / @ href)。extract()
item [score] = title.select('../ td [ @ class =subtext] / span / text()')。extract()
items.append(item)
return items

我没有t检验,但它会给你一个想法。


I am simply trying to write to a csv. However I have two separate for-statements, therefore the data from each for-statement exports independently and breaks order. Suggestions?

def parse(self, response):
        hxs = HtmlXPathSelector(response)
        titles = hxs.select('//td[@class="title"]')
        subtext = hxs.select('//td[@class="subtext"]')
        items = []
        for title in titles:
            item = HackernewsItem()
            item["title"] = title.select("a/text()").extract()
            item["url"] = title.select("a/@href").extract()
            items.append(item)
        for score in subtext:
            item = HackernewsItem()
            item["score"] = score.select("span/text()").extract()
            items.append(item)
        return items

As is apparent in the image below, the second for-statement prints below the others instead of "among" others as header does.

CSV image attached:

and github link for full file: https://github.com/nchlswtsn/scrapy/blob/master/items.csv

解决方案

Your order of exporting element is logical to what you find in CSV file, first you exported all the titles then all subtext elements.
I guess you are trying to scrap HN articles, here is my suggestion:

def parse(self, response):
    hxs = HtmlXPathSelector(response)
    titles = hxs.select('//td[@class="title"]')
    items = []
    for title in titles:
        item = HackernewsItem()
        item["title"] = title.select("a/text()").extract()
        item["url"] = title.select("a/@href").extract()
        item["score"] = title.select('../td[@class="subtext"]/span/text()').extract()
        items.append(item)
    return items

I didn't test it, but it will give you an idea.

这篇关于Python:Scrapy CSV导出错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆