在 Python 中抓取 - 防止 IP 禁令 [英] Scraping in Python - Preventing IP ban

查看：42 发布时间：2021/7/17 18:41:21 python selenium web-scraping screen-scraping

本文介绍了在 Python 中抓取 - 防止 IP 禁令的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用 Python 来抓取页面.到目前为止，我没有遇到任何复杂的问题.

I am using Python to scrape pages. Until now I didn't have any complicated issues.

我尝试抓取的网站使用了大量安全检查，并有一些机制来防止抓取.

The site that I'm trying to scrape uses a lot of security checks and have some mechanism to prevent scraping.

使用 Requests 和 lxml 在被 IP 禁止之前，我能够抓取大约 100-150 个页面.有时我什至禁止第一个请求(新 IP，以前没有使用过，不同的 C 块).我试过欺骗头，随机化请求之间的时间，还是一样.

Using Requests and lxml I was able to scrape about 100-150 pages before getting banned by IP. Sometimes I even get ban on first request (new IP, not used before, different C block). I have tried with spoofing headers, randomize time between requests, still the same.

我已经尝试过使用 Selenium 并且得到了更好的结果.使用 Selenium，我能够在被禁止之前抓取大约 600-650 页.在这里，我还尝试将请求随机化(在 3-5 秒之间，并在每 300 个请求上调用 time.sleep(300) ).尽管如此，我还是被禁止了.

I have tried with Selenium and I got much better results. With Selenium I was able to scrape about 600-650 pages before getting banned. Here I have also tried to randomize requests (between 3-5 seconds, and make time.sleep(300) call on every 300th request). Despite that, Im getting banned.

从这里我可以得出结论，如果站点在一个打开的浏览器会话中请求的页面超过 X 个页面或类似的东西，他们会禁止 IP 的某种机制.

From here I can conclude that site have some mechanism where they ban IP if it requested more than X pages in one open browser session or something like that.

根据您的经验，我还应该尝试什么?在 Selenium 帮助中关闭和打开浏览器(例如在每 100 个请求关闭和打开浏览器之后).我正在考虑尝试使用代理，但大约有数百万页，而且会非常庞大.

Based on your experience what else should I try? Will closing and opening browser in Selenium help (for example after every 100th requests close and open browser). I was thinking about trying with proxies but there are about million of pages and it will be very expansive.

在 Python 中抓取 - 防止 IP 禁令 [英] Scraping in Python - Preventing IP ban

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在 Python 中抓取 - 防止 IP 禁令 [英] Scraping in Python - Preventing IP ban

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭