urllib.error.HTTPError：HTTP错误403：禁止 [英] urllib.error.HTTPError: HTTP Error 403: Forbidden

查看：6126 发布时间：2018/7/10 11:13:49 python http urllib

本文介绍了urllib.error.HTTPError：HTTP错误403：禁止的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在抓取某些页面时收到错误urllib.error.HTTPError：HTTP Error 403：Forbidden，并了解添加类似 hdr = {User-Agent'：'Mozilla / 5.0 } 到标题是解决方案。

I get the error "urllib.error.HTTPError: HTTP Error 403: Forbidden" when scraping certain pages, and understand that adding something like hdr = {"User-Agent': 'Mozilla/5.0"} to the header is the solution for this.

但是当我正在尝试刮取的URL位于单独的源文件中时，我无法使其正常工作。如何/在哪里可以将User-Agent添加到下面的代码中？

However I can't make it work when the URL's I'm trying to scrape is in a separate source file. How/where can I add the User-Agent to the code below?

from bs4 import BeautifulSoup
import urllib.request as urllib2
import time

list_open = open("source-urls.txt")
read_list = list_open.read()
line_in_list = read_list.split("\n")

i = 0
for url in line_in_list:
    soup = BeautifulSoup(urllib2.urlopen(url).read(), 'html.parser')
    name = soup.find(attrs={'class': "name"})
    description = soup.find(attrs={'class': "description"})
    for text in description:
        print(name.get_text(), ';', description.get_text())
#        time.sleep(5)
    i += 1

谢谢：）

推荐答案

您可以使用 requests


You can achieve same using requests
import requests
hdrs = {'User-Agent': 'Mozilla / 5.0 (X11 Linux x86_64) AppleWebKit / 537.36 (KHTML, like Gecko) Chrome / 52.0.2743.116 Safari / 537.36'}    
for url in line_in_list:
    resp = requests.get(url, headers=hdrs)
    soup = BeautifulSoup(resp.content, 'html.parser')
    name = soup.find(attrs={'class': "name"})
    description = soup.find(attrs={'class': "description"})
    for text in description:
        print(name.get_text(), ';', description.get_text())
#        time.sleep(5)
    i += 1

希望有所帮助！ 

                        这篇关于urllib.error.HTTPError：HTTP错误403：禁止的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

urllib.error.HTTPError：HTTP错误403：禁止 [英] urllib.error.HTTPError: HTTP Error 403: Forbidden

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

urllib.error.HTTPError：HTTP错误403：禁止 [英] urllib.error.HTTPError: HTTP Error 403: Forbidden

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭