如何通过硒自动化时使用beautifulsoup打印href属性? [英] How to print the href attributes using beautifulsoup while automating through selenium?

查看:57
本文介绍了如何通过硒自动化时使用beautifulsoup打印href属性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

蓝色元素的href值是我要从此HTML访问的

我尝试了几种打印链接的方法,但是没有用.

我的下面的代码:-

discover_page = BeautifulSoup(r.text, 'html.parser')

finding_accounts = discover_page.find_all("a", class_="author track")
print(len(finding_accounts))

finding_accounts = discover_page.find_all('a[class="author track"]')
print(len(finding_accounts))

accounts = discover_page.select('a', {'class': 'author track'})['href']
print(len(accounts))

Output:- 
0
0
TypeError: 'dict' object is not callable

该网页的网址为 https://society6.com/discover 但登录后URL更改为 https://society6.com/society?show=2 我的帐户

我在这里做错什么了?

注意:-我在这里使用硒铬浏览器.此处给出的答案在我的终端中有效,但在我运行文件时无效

我的完整代码:-

from selenium import webdriver
import time
import requests
from bs4 import BeautifulSoup
import lxml

driver = webdriver.Chrome()
driver.get("https://society6.com/login?done=/")
username = driver.find_element_by_id('email')
username.send_keys("exp4money@gmail.com")
password = driver.find_element_by_id('password')
password.send_keys("sultan1997")
driver.find_element_by_name('login').click()

time.sleep(5)

driver.find_element_by_link_text('My Society').click()
driver.find_element_by_link_text('Discover').click()

time.sleep(5)

r = requests.get(driver.current_url)
r.raise_for_status()

'''discover_page = BeautifulSoup(r.html.raw_html, 'html.parser')

finding_accounts = discover_page.find_all("a", class_="author track")
print(len(finding_accounts))

finding_accounts = discover_page.find_all('a[class="author track"]')
print(len(finding_accounts))


links = []
for a in discover_page.find_all('a', class_ = 'author track'): 
        links.append(a['href'])
        #links.append(a.get('href'))

print(links)'''

#discover_page.find_all('a')

links = []
for a in discover_page.find_all("a", attrs = {"class": "author track"}): 
        links.append(a['href'])
        #links.append(a.get('href'))

print(links)

#soup.find_all("a", attrs = {"class": "author track"})'''

soup = BeautifulSoup(r.content, "lxml")
a_tags = soup.find_all("a", attrs={"class": "author track"})

for a in soup.find_all('a',{'class':'author track'}):
    print('https://society6.com'+a['href'])

文档中的代码是我正在尝试的代码

解决方案

根据您要从所需元素中打印href的问题,只能使用并使用以下解决方案:

  • 代码块:

    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
    options = Options()
    options.add_argument("start-maximized")
    options.add_argument("disable-infobars")
    options.add_argument("--disable-extensions")
    options.add_argument("--disable-gpu")
    options.add_argument("--no-sandbox")
    driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\WebDrivers\ChromeDriver\chromedriver_win32\chromedriver.exe')
    driver.get("https://society6.com/login?done=/")
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input#email"))).send_keys("exp4money@gmail.com")
    driver.find_element_by_css_selector("input#password").send_keys("sultan1997")
    driver.find_element_by_css_selector("button[name='login']").click()
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a#nav-user-my-society>span"))).click()
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.LINK_TEXT, "Discover"))).click()
    hrefs_elements = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a.author.track")))
    for element in hrefs_elements:
        print(element.get_attribute("href"))
    

  • 控制台输出:

    https://society6.com/pivivikstrm
    https://society6.com/cafelab
    https://society6.com/cafelab
    https://society6.com/colorandcolor
    https://society6.com/83oranges
    https://society6.com/aftrdrk
    https://society6.com/alaskanmommabear
    https://society6.com/thindesign
    https://society6.com/colorandcolor
    https://society6.com/aftrdrk
    https://society6.com/aljahorvat
    https://society6.com/bribuckley
    https://society6.com/hennkim
    https://society6.com/franciscomffonseca
    https://society6.com/83oranges
    https://society6.com/nadja1
    https://society6.com/beeple
    https://society6.com/absentisdesigns
    https://society6.com/alexandratarasoff
    https://society6.com/artdekay880
    https://society6.com/annaki
    https://society6.com/cafelab
    https://society6.com/bribuckley
    https://society6.com/bitart
    https://society6.com/draw4you
    https://society6.com/cafelab
    https://society6.com/beeple
    https://society6.com/burcukorkmazyurek
    https://society6.com/absentisdesigns
    https://society6.com/deanng
    https://society6.com/beautifulhomes
    https://society6.com/aftrdrk
    https://society6.com/printsproject
    https://society6.com/bluelela
    https://society6.com/anipani
    https://society6.com/ecmazur
    https://society6.com/batkei
    https://society6.com/menchulica
    https://society6.com/83oranges
    https://society6.com/7115
    

Blue element's href value is what i want to access from this HTML

I tried few ways to print the link but didn't work.

My code below:-

discover_page = BeautifulSoup(r.text, 'html.parser')

finding_accounts = discover_page.find_all("a", class_="author track")
print(len(finding_accounts))

finding_accounts = discover_page.find_all('a[class="author track"]')
print(len(finding_accounts))

accounts = discover_page.select('a', {'class': 'author track'})['href']
print(len(accounts))

Output:- 
0
0
TypeError: 'dict' object is not callable

URL of the webpage is https://society6.com/discover but URL changes to https://society6.com/society?show=2 after logging into my account

What am i doing wrong here?

note:- I am using selenium chrome browser here. Answer given here works in my terminal but not when i run the file

My full code:-

from selenium import webdriver
import time
import requests
from bs4 import BeautifulSoup
import lxml

driver = webdriver.Chrome()
driver.get("https://society6.com/login?done=/")
username = driver.find_element_by_id('email')
username.send_keys("exp4money@gmail.com")
password = driver.find_element_by_id('password')
password.send_keys("sultan1997")
driver.find_element_by_name('login').click()

time.sleep(5)

driver.find_element_by_link_text('My Society').click()
driver.find_element_by_link_text('Discover').click()

time.sleep(5)

r = requests.get(driver.current_url)
r.raise_for_status()

'''discover_page = BeautifulSoup(r.html.raw_html, 'html.parser')

finding_accounts = discover_page.find_all("a", class_="author track")
print(len(finding_accounts))

finding_accounts = discover_page.find_all('a[class="author track"]')
print(len(finding_accounts))


links = []
for a in discover_page.find_all('a', class_ = 'author track'): 
        links.append(a['href'])
        #links.append(a.get('href'))

print(links)'''

#discover_page.find_all('a')

links = []
for a in discover_page.find_all("a", attrs = {"class": "author track"}): 
        links.append(a['href'])
        #links.append(a.get('href'))

print(links)

#soup.find_all("a", attrs = {"class": "author track"})'''

soup = BeautifulSoup(r.content, "lxml")
a_tags = soup.find_all("a", attrs={"class": "author track"})

for a in soup.find_all('a',{'class':'author track'}):
    print('https://society6.com'+a['href'])

codes in documentation is the one I was using experimenting with

解决方案

As per your question to print the href from the desired elements you can use only Selenium using the following solution:

  • Code Block:

    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
    options = Options()
    options.add_argument("start-maximized")
    options.add_argument("disable-infobars")
    options.add_argument("--disable-extensions")
    options.add_argument("--disable-gpu")
    options.add_argument("--no-sandbox")
    driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\WebDrivers\ChromeDriver\chromedriver_win32\chromedriver.exe')
    driver.get("https://society6.com/login?done=/")
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input#email"))).send_keys("exp4money@gmail.com")
    driver.find_element_by_css_selector("input#password").send_keys("sultan1997")
    driver.find_element_by_css_selector("button[name='login']").click()
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a#nav-user-my-society>span"))).click()
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.LINK_TEXT, "Discover"))).click()
    hrefs_elements = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a.author.track")))
    for element in hrefs_elements:
        print(element.get_attribute("href"))
    

  • Console Output:

    https://society6.com/pivivikstrm
    https://society6.com/cafelab
    https://society6.com/cafelab
    https://society6.com/colorandcolor
    https://society6.com/83oranges
    https://society6.com/aftrdrk
    https://society6.com/alaskanmommabear
    https://society6.com/thindesign
    https://society6.com/colorandcolor
    https://society6.com/aftrdrk
    https://society6.com/aljahorvat
    https://society6.com/bribuckley
    https://society6.com/hennkim
    https://society6.com/franciscomffonseca
    https://society6.com/83oranges
    https://society6.com/nadja1
    https://society6.com/beeple
    https://society6.com/absentisdesigns
    https://society6.com/alexandratarasoff
    https://society6.com/artdekay880
    https://society6.com/annaki
    https://society6.com/cafelab
    https://society6.com/bribuckley
    https://society6.com/bitart
    https://society6.com/draw4you
    https://society6.com/cafelab
    https://society6.com/beeple
    https://society6.com/burcukorkmazyurek
    https://society6.com/absentisdesigns
    https://society6.com/deanng
    https://society6.com/beautifulhomes
    https://society6.com/aftrdrk
    https://society6.com/printsproject
    https://society6.com/bluelela
    https://society6.com/anipani
    https://society6.com/ecmazur
    https://society6.com/batkei
    https://society6.com/menchulica
    https://society6.com/83oranges
    https://society6.com/7115
    

这篇关于如何通过硒自动化时使用beautifulsoup打印href属性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆