在scrapy上的空列表响应摘录 [英] empty list response extract on scrapy

查看:45
本文介绍了在scrapy上的空列表响应摘录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是scrapy 的新手,我必须抓取网页进行测试.所以我在终端上使用下面的代码,但它返回一个空列表我不明白为什么.当我在另一个网站(如亚马逊)上使用相同的命令并使用正确的选择器时,它会起作用.有人可以点亮它吗?非常感谢

I'm new on scrapy and i have to crawl a webpage for a test. So I use the code below on a terminal but its returns a empty list i Don't understand why. When i use the same command on a another website, like amazon, with the right selector, it works. Can someone put light on it? thank you so much

scrapy shell "'https://www.woolworths.com.au/shop/browse/drinks/cordials-juices-iced-teas/iced-teas"

response.css('.tileList-title').extract()

推荐答案

首先,当我查阅页面的源代码时,您似乎对标题Iced Teas<感兴趣/code> 在标题标签

中.我说得对吗?

First of all, when I consulted the source code of the page you seemed interested to scrape the title Iced Teas in a header tags <h1>. Am I right ?

其次,我尝试了scrapy shell会话来理解这个问题.这似乎是用户代理请求标头的设置.查看下面的代码会话:

Second, I tried scrapy shell sessions to understand the issue. It seems to be a settings of user-agent request's headers. Look at the code sessions below:

未设置用户代理

scrapy shell https://www.woolworths.com.au/shop/browse/drinks/cordials-juices-iced-teas/iced-teas
In [1]: response.css('.tileList-title').extract()                               
Out[1]: []
view(response) #open the given response in your local web browser, for inspection.

设置用户代理

scrapy shell https://www.woolworths.com.au/shop/browse/drinks/cordials-juices-iced-teas/iced-teas -s USER_AGENT='Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'

In [1]: response.css('.tileList-title').extract()                               
Out[1]: ['<h1 class="tileList-title" ng-if="$ctrl.listTitle" tabindex="-1">Iced Teas</h1>']
#now as you can see it does not return an empty list.
view(response)

所以为了改进你未来的实践,知道你可以在你的scrapy shell会话中使用-s KEYWORDSETTING=value.这里是scrapy的设置关键词.并检查 view(response) 以查看请求是否返回预期的内容,即使它发送了 200.根据我的经验,使用 view(response) 你可以看到当你在scrapy shell中使用它时,内容页面,有时甚至是源代码,与你在普通浏览器中使用它时有点不同.因此,使用此快捷方式进行检查是一个很好的做法.这里是scrapy的快捷方式.在每个scrapy shell 会话中也会提到它们.

So to improve your future practices, know you can use -s KEYWORDSETTING=value in your scrapy shell sessions. Here the settings key words for scrapy. And to check with view(response) to see if the requests returns the expected content even if it sent a 200. For my experience, with view(response) you can see that the content page, and even source code sometimes, is a little different when you use it in scrapy shell than when you use it in a normal browser. So that's a good practice to check with this shortcut. Here the shorcuts for scrapy. They are mentioned at each scrapy shell session too.

这篇关于在scrapy上的空列表响应摘录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆