带有动态验证码的 Scrapy [英] Scrapy with dynamic captcha

查看:43
本文介绍了带有动态验证码的 Scrapy的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试破解网站表单中的 captcha,但此验证码是动态的,它没有 URL 而是具有类似这样的内容

src="captcha?accion=image"

这里最好的选择是什么?我读过类似使用中间件之类的东西.我也知道它可以用 Selenium 或 Splash 或其他浏览器驱动程序(截图)来完成,但我想用 Scrapy 来完成,当然如果可能的话.

解决方案

这里有一个完整的解决方案,可以使用

<小时>

注意事项:

  • 您自己负责使用它(要聪明)
  • 您可以通过正确处理异常来改进代码;
  • anticaptcha 是一项付费服务(0.5$/1000 imgs);
  • 我不隶属于 anticaptcha.

I'm trying to break a captcha within a form from a website, but this captcha is dynamic, it doesn't have a URL instead it has something like this

src="captcha?accion=image"

What is the best option here? I have read something like using middlewares or something like that. Also I know it can be done with Selenium or Splash or another browser driver (screenshot), but i want to do it with just Scrapy, if it's possible of course.

解决方案

Here's a complete solution to bypass the specified captcha using anticaptcha and PIL.

Due to the dynamic of this captcha, we need to grab a print screen of the img element containing the captcha. For that we use save_screenshot() and PIL to crop and save <img name="imagen"... to disk (captcha.png).
We then submit captcha.png to anti-captcha that will return the solution, i.e.:

from PIL import Image
from python_anticaptcha import AnticaptchaClient, ImageToTextTask
from selenium import webdriver

def get_captcha():
    captcha_fn = "captcha.png"
    element = driver.find_element_by_name("imagen") # element name containing the catcha image
    location = element.location
    size = element.size
    driver.save_screenshot("temp.png")

    x = location['x']
    y = location['y']
    w = size['width']
    h = size['height']
    width = x + w
    height = y + h

    im = Image.open('temp.png')
    im = im.crop((int(x), int(y), int(width), int(height)))
    im.save(captcha_fn)

    # request anti-captcha service to decode the captcha

    api_key = 'XXXXXXXXXXXXXXXXXXXXXXXXXX' # api key -> https://anti-captcha.com/
    captcha_fp = open(captcha_fn, 'rb')
    client = AnticaptchaClient(api_key)
    task = ImageToTextTask(captcha_fp)
    job = client.createTask(task)
    job.join()
    return job.get_captcha_text()

start_url = "YOU KNOW THE URL"
driver = webdriver.Chrome()
driver.get(start_url)
captcha = get_captcha()
print( captcha )


Output:

ifds


captcha.png


Notes:

  • Use it at your own responsibility (be smart);
  • You can improve the code by handling exceptions properly;
  • anticaptcha is a paid service (0.5$/1000 imgs);
  • I'm not affiliated with anticaptcha.

这篇关于带有动态验证码的 Scrapy的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆