Scrapy 重复行 [英] Scrapy repeating rows
问题描述
我正在尝试浏览此网站 https://www.tahko.com/fi/menovinkit/?ql=tapahtumat.特别是,我正在尝试浏览网站上的 3 个表格.
I'm trying to scrape through this site https://www.tahko.com/fi/menovinkit/?ql=tapahtumat. In particular, I'm trying to scrape through the 3 tables on the site.
我已经用
tables = response.xpath('//*[@class="table table-stripefd"]')
然后我想获取表格的每一行,这是我用的
Then I'd like to get each of the rows for the table, which I did with
rows = tables.xpath('//tr')
这里的问题是,在抓取并打印出一些数据后,我注意到某些行有多个条目.例如,事件Tahko vuorijuoksu"在网站上出现过一次,但在我抓取的数据中,我有 3 个实例.
The problem here is, that after scraping and printing out some of the data I noticed that there are multiple entries for some rows. For example, the event "Tahko vuorijuoksu" shows up on the website once, but in my scraped data I have 3 instances of it.
谁能指出为什么会这样?
Could anyone point out why this is happening?
推荐答案
当你像这样使用选择器时:
When you use the selector like this:
rows = tables.xpath('//tr')
它将选择自身或后代轴上的每个 tr
元素,不受父元素的限制.因此,对于 3 个 table
元素中的每一个,它将返回所有 207 个 tr
元素.
It will select every tr
element in it self or descendent axis, unbounded by the parent element. So it will return all the 207 tr
elements, for each of the 3 table
elements.
要仅获取每个表的 tr
元素子项,您可以像这样使用它:
To get only the tr
elements childs of each table you can use it like this:
rows = tables.xpath('.//tr') # notice the .
通常这样写会更直观:
for table in tables:
rows = table.xpath('tr')
这只是一个建议,以前的解决方案效果很好.
这篇关于Scrapy 重复行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!