ResultSet对象没有属性"get" [英] ResultSet object has no attribute 'get'
问题描述
您好,我目前正在尝试将此 https://www.sec.gov/ix?doc=/Archives/edgar/data/1090727/000109072720000003/form8-kq42019earningsr.htm SEC链接与beautifulsoup一起获得包含"UPS"的链接"
Hi I'm currently trying to scrape this https://www.sec.gov/ix?doc=/Archives/edgar/data/1090727/000109072720000003/form8-kq42019earningsr.htm SEC link with beautifulsoup to get the link containing "UPS"
pressting = soup3.find_all("a", string="UPS")
linkkm = pressting.get('href')
print(linkkm)
但是当我这样做时,我会收到此错误:
But when I do this I get this error:
Traceback (most recent call last):
File "C:\Users\Admin\AppData\Local\Programs\Python\Python36\SEC.py", line 55, in <module>
print('Price: ' + str(edgar()))
File "C:\Users\Admin\AppData\Local\Programs\Python\Python36\SEC.py", line 46, in edgar
linkkm = pressting.get('href')
File "C:\Users\Admin\AppData\Local\Programs\Python\Python36\lib\site-packages\bs4\element.py", line 2081, in __getattr__
"ResultSet object has no attribute '%s'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?" % key
AttributeError: ResultSet object has no attribute 'get'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?
我的预期结果是提取href,然后打印该href.任何帮助将不胜感激.
My expected result is to exract the href and then print that href. Any help would be appreciated.
推荐答案
基本上,页面一旦加载,就会通过 JavaScript
动态呈现.因此,在您首先渲染对象之前,您将无法解析对象.因此, requests
模块将不会呈现 JavaScript
.
Basically the page is dynamically rendered via JavaScript
once it's loads. so you will not be able to parse the objects until you render it firstly. Therefore requests
module will not render the JavaScript
.
您可以使用硒
方法来实现.否则,您可以使用 html_request
模块中的 HTMLSession
进行动态渲染.
You can use selenium
approach to achieve that. otherwise you can use HTMLSession
from html_request
module to render it on the fly.
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from bs4 import BeautifulSoup
import re
from time import sleep
options = Options()
options.add_argument('--headless')
driver = webdriver.Firefox(options=options)
driver.get("https://www.sec.gov/ix?doc=/Archives/edgar/data/1090727/000109072720000003/form8-kq42019earningsr.htm")
sleep(1)
soup = BeautifulSoup(driver.page_source, 'html.parser')
for item in soup.findAll("a", style=re.compile("^text")):
print(item.get("href"))
driver.quit()
输出:
https://www.sec.gov/Archives/edgar/data/1090727/000109072720000003/exhibit991-q42019earni.htm
https://www.sec.gov/Archives/edgar/data/1090727/000109072720000003/exhibit992-q42019finan.htm
但是,如果您只需要第一个网址;
However if you want just the first url;
url = soup.find("a", style=re.compile("^text")).get("href")
print(url)
输出:
https://www.sec.gov/Archives/edgar/data/1090727/000109072720000003/exhibit991-q42019earni.htm
这篇关于ResultSet对象没有属性"get"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!