“列表"对象没有“超时"属性 [英] 'list' object has no attribute 'timeout'
问题描述
我正在尝试使用 urllib.request.urlopen
从页面下载 Pdfs,但它返回一个错误:'list' object has no attribute 'timeout'
:
def get_hansard_data(page_url):#将 base_url 读入美丽的汤对象html = urllib.request.urlopen(page_url).read()汤 = BeautifulSoup(html, "html.parser")#grab 保存所有 hansard pdf 的链接和日期hansard_menu = 汤.find_all(div",itemContainer")#获取所有手册#写入tsv文件with open(hansards.tsv",a") as f:字段名 = (日期",hansard_url")输出 = csv.writer(f, delimiter=" ")对于 hansard_menu 中的 div:hansard_link = [HANSARD_URL + div.a[href"]]hansard_date = div.find("h3", "catItemTitle").string#下载使用 urllib.request.urlopen(hansard_link) 作为响应:数据 = response.read()r = open("/Users/Parliament Hansards/"+hansard_date +".txt","wb")r.write(数据)r.close()打印(hansard_date)打印(hansard_link)output.writerow([hansard_date,hansard_link])打印(完成写入文件") 解决方案 有点晚了,但可能仍然对其他人有帮助(如果不是主题入门).我通过解决同样的问题找到了解决方案.
问题在于 page_url
(在你的例子中)是一个列表,而不是一个字符串.原因很可能是 page_url
来自 argparse.parse_args()
(至少在我的情况下是这样).执行 page_url[0]
应该可以工作,但在 def get_hansard_data(page_url)
函数中这样做并不好.最好检查参数的类型,如果类型不匹配,则向函数调用者返回适当的错误.
可以通过调用 type(page_url)
并比较结果来检查参数的类型,例如:typen("") == type(page_url)
.我相信可能有更优雅的方法来做到这一点,但这超出了这个特定问题的范围.
I am trying to download Pdfs using urllib.request.urlopen
from a page but it returns an error: 'list' object has no attribute 'timeout'
:
def get_hansard_data(page_url):
#Read base_url into Beautiful soup Object
html = urllib.request.urlopen(page_url).read()
soup = BeautifulSoup(html, "html.parser")
#grab <div class="itemContainer"> that hold links and dates to all hansard pdfs
hansard_menu = soup.find_all("div","itemContainer")
#Get all hansards
#write to a tsv file
with open("hansards.tsv","a") as f:
fieldnames = ("date","hansard_url")
output = csv.writer(f, delimiter=" ")
for div in hansard_menu:
hansard_link = [HANSARD_URL + div.a["href"]]
hansard_date = div.find("h3", "catItemTitle").string
#download
with urllib.request.urlopen(hansard_link) as response:
data = response.read()
r = open("/Users/Parliament Hansards/"+hansard_date +".txt","wb")
r.write(data)
r.close()
print(hansard_date)
print(hansard_link)
output.writerow([hansard_date,hansard_link])
print ("Done Writing File")
解决方案 A bit late, but might still be helpful to someone else (if not for topic starter). I found the solution by solving the same problem.
The problem was that page_url
(in your case) was a list, rather than a string. The reason for that is mos likely that page_url
comes from argparse.parse_args()
(at least it was so in my case).
Doing page_url[0]
should work but it is not nice to do that inside the def get_hansard_data(page_url)
function. Better would be to check the type of the parameter and return an appropriate error to the function caller, if the type does not match.
The type of an argument could be checked by calling type(page_url)
and comparing the result like for example: typen("") == type(page_url)
. I am sure there might be more elegant way to do that, but it is out of the scope of this particular question.
这篇关于“列表"对象没有“超时"属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文