动态验证码 [英] Scrapy with dynamic captcha
问题描述
我正在尝试从网站的表单中破坏验证码
,但是此验证码是动态的,没有URL,而是类似这样的内容
src = captcha?accion = image
这里最好的选择是什么?我读过类似使用中间件之类的东西。我也知道可以用Selenium或Splash或其他浏览器驱动程序(屏幕截图)来完成,但是我想只用 Scrapy
来做到这一点,当然。 p>
这里是绕过指定的验证码$ c $的完整解决方案c>使用
注意:
- 请自行承担责任(聪明) ;
- 您可以通过适当地处理异常来改进代码;
-
anticaptcha
是已付费服务(0.5 $ / 1000 imgs); - 我与
anticaptcha
没有关系。
I'm trying to break a captcha
within a form from a website, but this captcha is dynamic, it doesn't have a URL instead it has something like this
src="captcha?accion=image"
What is the best option here? I have read something like using middlewares or something like that. Also I know it can be done with Selenium or Splash or another browser driver (screenshot), but i want to do it with just Scrapy
, if it's possible of course.
Here's a complete solution to bypass the specified captcha
using anticaptcha and PIL.
Due to the dynamic of this captcha
, we need to grab a print screen of the img
element containing the captcha
. For that we use save_screenshot()
and PIL
to crop and save <img name="imagen"...
to disk (captcha.png
).
We then submit captcha.png
to anti-captcha
that will return the solution, i.e.:
from PIL import Image
from python_anticaptcha import AnticaptchaClient, ImageToTextTask
from selenium import webdriver
def get_captcha():
captcha_fn = "captcha.png"
element = driver.find_element_by_name("imagen") # element name containing the catcha image
location = element.location
size = element.size
driver.save_screenshot("temp.png")
x = location['x']
y = location['y']
w = size['width']
h = size['height']
width = x + w
height = y + h
im = Image.open('temp.png')
im = im.crop((int(x), int(y), int(width), int(height)))
im.save(captcha_fn)
# request anti-captcha service to decode the captcha
api_key = 'XXXXXXXXXXXXXXXXXXXXXXXXXX' # api key -> https://anti-captcha.com/
captcha_fp = open(captcha_fn, 'rb')
client = AnticaptchaClient(api_key)
task = ImageToTextTask(captcha_fp)
job = client.createTask(task)
job.join()
return job.get_captcha_text()
start_url = "YOU KNOW THE URL"
driver = webdriver.Chrome()
driver.get(start_url)
captcha = get_captcha()
print( captcha )
Output:
ifds
captcha.png
Notes:
- Use it at your own responsibility (be smart);
- You can improve the code by handling exceptions properly;
anticaptcha
is a paid service (0.5$/1000 imgs);- I'm not affiliated with
anticaptcha
.
这篇关于动态验证码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!