Scrapy 重复行 [英] Scrapy repeating rows

查看：41 发布时间：2021/7/16 22:25:19 python-3.x xpath web-scraping scrapy

本文介绍了Scrapy 重复行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试浏览此网站 https://www.tahko.com/fi/menovinkit/?ql=tapahtumat.特别是，我正在尝试浏览网站上的 3 个表格.

I'm trying to scrape through this site https://www.tahko.com/fi/menovinkit/?ql=tapahtumat. In particular, I'm trying to scrape through the 3 tables on the site.

我已经用

tables = response.xpath('//*[@class="table table-stripefd"]')

然后我想获取表格的每一行，这是我用的

Then I'd like to get each of the rows for the table, which I did with

rows = tables.xpath('//tr')

这里的问题是，在抓取并打印出一些数据后，我注意到某些行有多个条目.例如，事件Tahko vuorijuoksu"在网站上出现过一次，但在我抓取的数据中，我有 3 个实例.

The problem here is, that after scraping and printing out some of the data I noticed that there are multiple entries for some rows. For example, the event "Tahko vuorijuoksu" shows up on the website once, but in my scraped data I have 3 instances of it.

谁能指出为什么会这样?

Could anyone point out why this is happening?

推荐答案

当你像这样使用选择器时:

When you use the selector like this:

rows = tables.xpath('//tr')

它将选择自身或后代轴上的每个 tr 元素，不受父元素的限制.因此，对于 3 个 table 元素中的每一个，它将返回所有 207 个 tr 元素.

It will select every tr element in it self or descendent axis, unbounded by the parent element. So it will return all the 207 tr elements, for each of the 3 table elements.

要仅获取每个表的 tr 元素子项，您可以像这样使用它:

To get only the tr elements childs of each table you can use it like this:

rows = tables.xpath('.//tr') # notice the .

通常这样写会更直观:

for table in tables:
    rows = table.xpath('tr')

这只是一个建议，以前的解决方案效果很好.

这篇关于Scrapy 重复行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Scrapy 重复行 [英] Scrapy repeating rows

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Scrapy 重复行 [英] Scrapy repeating rows

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭