Scrapy 空输出 [英] Scrapy empty output
本文介绍了Scrapy 空输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试使用 Scrapy 从
正如您所见,此页面不包含您要查找的信息.要从此类网站收集信息,您可以尝试以下操作:
- 使用scrapy-selenium/splash 收集信息.
- 使用验证码解决工具,例如死亡验证码、反验证码或类似工具.
I am trying to use Scrapy to extract data from page. But I get an empty output. What is the problem?
spider:
class Ratemds(scrapy.Spider):
name = 'ratemds'
allowed_domains = ['ratemds.com']
custom_settings = {
'USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36 OPR/60.0.3255.50747 OPRGX/60.0.3255.50747',
}
def start_requests(self):
yield scrapy.Request('https://www.ratemds.com/doctor-ratings/dr-aaron-morrow-md-greensboro-nc-us' , callback=self.profile)
def profile(self, response):
item = {
'url': response.request.url,
'Image': response.css('.doctor-profile-image::attr(src)').get(),
'First_and_Last_Name': response.css('h1::text').get()
}
yield item
output:
{'url': 'https://www.ratemds.com/doctor-ratings/dr-aaron-morrow-md-greensboro-nc-us', 'Image': None, 'First_and_Last_Name': None}
解决方案
The problem is that this website has captcha protection. And when you try to collect information from it you are redirecting to the page, like this one:
and as you can see this page not contains information which you are looking for. To collect information from such website you can try the following:
- Use scrapy-selenium/splash to collect information.
- use captcha solving tools like death-by-captcha , anticaptcha or similar.
这篇关于Scrapy 空输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文