“列表"对象没有“超时"属性 [英] 'list' object has no attribute 'timeout'

查看:24
本文介绍了“列表"对象没有“超时"属性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 urllib.request.urlopen 从页面下载 Pdfs,但它返回一个错误:'list' object has no attribute 'timeout':

def get_hansard_data(page_url):#将 base_url 读入美丽的汤对象html = urllib.request.urlopen(page_url).read()汤 = BeautifulSoup(html, "html.parser")#grab 

保存所有 hansard pdf 的链接和日期hansard_menu = 汤.find_all(div",itemContainer")#获取所有手册#写入tsv文件with open(hansards.tsv",a") as f:字段名 = (日期",hansard_url")输出 = csv.writer(f, delimiter=" ")对于 hansard_menu 中的 div:hansard_link = [HANSARD_URL + div.a[href"]]hansard_date = div.find("h3", "catItemTitle").string#下载使用 urllib.request.urlopen(hansard_link) 作为响应:数据 = response.read()r = open("/Users/Parliament Hansards/"+hansard_date +".txt","wb")r.write(数据)r.close()打印(hansard_date)打印(hansard_link)output.writerow([hansard_date,hansard_link])打印(完成写入文件")

解决方案

有点晚了,但可能仍然对其他人有帮助(如果不是主题入门).我通过解决同样的问题找到了解决方案.

问题在于 page_url(在你的例子中)是一个列表,而不是一个字符串.原因很可能是 page_url 来自 argparse.parse_args() (至少在我的情况下是这样).执行 page_url[0] 应该可以工作,但在 def get_hansard_data(page_url) 函数中这样做并不好.最好检查参数的类型,如果类型不匹配,则向函数调用者返回适当的错误.

可以通过调用 type(page_url) 并比较结果来检查参数的类型,例如:typen("") == type(page_url).我相信可能有更优雅的方法来做到这一点,但这超出了这个特定问题的范围.

I am trying to download Pdfs using urllib.request.urlopen from a page but it returns an error: 'list' object has no attribute 'timeout':

def get_hansard_data(page_url):
    #Read base_url into Beautiful soup Object
    html = urllib.request.urlopen(page_url).read()
    soup = BeautifulSoup(html, "html.parser")
    #grab <div class="itemContainer"> that hold links and dates to all hansard pdfs
    hansard_menu = soup.find_all("div","itemContainer")

    #Get all hansards
    #write to a tsv file
    with open("hansards.tsv","a") as f:
        fieldnames = ("date","hansard_url")
        output = csv.writer(f, delimiter="	")

        for div in hansard_menu:
            hansard_link = [HANSARD_URL + div.a["href"]]
            hansard_date = div.find("h3", "catItemTitle").string

            #download
            
            with urllib.request.urlopen(hansard_link) as response:
                data = response.read()
                r = open("/Users/Parliament Hansards/"+hansard_date +".txt","wb")
                r.write(data)
                r.close()

            print(hansard_date)
            print(hansard_link)
            output.writerow([hansard_date,hansard_link])
        print ("Done Writing File")

解决方案

A bit late, but might still be helpful to someone else (if not for topic starter). I found the solution by solving the same problem.

The problem was that page_url (in your case) was a list, rather than a string. The reason for that is mos likely that page_url comes from argparse.parse_args() (at least it was so in my case). Doing page_url[0] should work but it is not nice to do that inside the def get_hansard_data(page_url) function. Better would be to check the type of the parameter and return an appropriate error to the function caller, if the type does not match.

The type of an argument could be checked by calling type(page_url) and comparing the result like for example: typen("") == type(page_url). I am sure there might be more elegant way to do that, but it is out of the scope of this particular question.

这篇关于“列表"对象没有“超时"属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆