Python:Scrapy CSV导出错误? [英] Python: Scrapy CSV exports incorrectly?
问题描述
def parse(self,response):
hxs = HtmlXPathSelector(response)
titles = hxs.select('// td [@ class =title]')
subtext = hxs.select('// td [@ class =subtext]')
items = []
标题中的标题:
item = HackernewsItem()
item [title] = title.select(a / text())extract()
item [url] = title.select(a / @ href)。extract()
items.append(item)
在子文本中的分数:
item = HackernewsItem()
item [score] = score.select(span / text())extract()
items.append(item)
返回项
如下图所示,第二个for-statement打印在其他代码之下,而不是其他的 p>
附加CSV图片:
和gi thub link for full file: https://github.com/nchlswtsn/scrapy/ blob / master / items.csv
您的导出元素顺序与您在CSV文件中找到的符合逻辑,首先导出所有的标题,然后导出所有的子文本元素。
我想你正在尝试删除HN文章,这里是我的建议:
def parse(self,response):
hxs = HtmlXPathSelector(response)
titles = hxs.select('// td [@ class =title]')
items = []
标题中的标题:
item = HackernewsItem()
item [title] = title.select(a / text())。 extract()
item [url] = title.select(a / @ href)。extract()
item [score] = title.select('../ td [ @ class =subtext] / span / text()')。extract()
items.append(item)
return items
我没有t检验,但它会给你一个想法。
I am simply trying to write to a csv. However I have two separate for-statements, therefore the data from each for-statement exports independently and breaks order. Suggestions?
def parse(self, response):
hxs = HtmlXPathSelector(response)
titles = hxs.select('//td[@class="title"]')
subtext = hxs.select('//td[@class="subtext"]')
items = []
for title in titles:
item = HackernewsItem()
item["title"] = title.select("a/text()").extract()
item["url"] = title.select("a/@href").extract()
items.append(item)
for score in subtext:
item = HackernewsItem()
item["score"] = score.select("span/text()").extract()
items.append(item)
return items
As is apparent in the image below, the second for-statement prints below the others instead of "among" others as header does.
CSV image attached:
and github link for full file: https://github.com/nchlswtsn/scrapy/blob/master/items.csv
Your order of exporting element is logical to what you find in CSV file, first you exported all the titles then all subtext elements.
I guess you are trying to scrap HN articles, here is my suggestion:
def parse(self, response):
hxs = HtmlXPathSelector(response)
titles = hxs.select('//td[@class="title"]')
items = []
for title in titles:
item = HackernewsItem()
item["title"] = title.select("a/text()").extract()
item["url"] = title.select("a/@href").extract()
item["score"] = title.select('../td[@class="subtext"]/span/text()').extract()
items.append(item)
return items
I didn't test it, but it will give you an idea.
这篇关于Python:Scrapy CSV导出错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!