BeautifulSoup返回< a>的一些奇怪的文本.标签 [英] BeautifulSoup returns some weird text for the <a> tag
问题描述
我是网络抓取的新手,我正试图从该拍卖网站上抓取数据.但是,在尝试获取锚标记的文本时,我遇到了这个奇怪的问题.
I'm new to web scraping and I'm trying to scrape data from this auction website. However, I meet this weird problem when trying to get the text of the anchor tag.
这是HTML:
<div class="mt50">
<div class="head_011">
<a id="item_event_title" href="https://www.storyltd.com/auction/auction.aspx?eid=4158">NO RESERVE AUCTION OF MODERN AND CONTEMPORARY ART (16-17 APRIL 2019)</a>
</div>
</div>
这是我的代码:
auction_info = LTD_work_soup.find('a', id = 'item_event_title').text
print(auction_info)
这会打印出返回拍卖目录" ,而不是现代及当代艺术没有保留拍卖(2019年4月16日至17日)" 期待中.
This prints out "Back To Auction Catalogue" instead of 'NO RESERVE AUCTION OF MODERN AND CONTEMPORARY ART (16-17 APRIL 2019)', which is what I am expecting.
这是指向的链接页面.
谢谢.
推荐答案
在这里,您可以从网页中提取现代和当代艺术的无保留拍卖(2019年4月16日至17日)'
:
Here how you can extract the NO RESERVE AUCTION OF MODERN AND CONTEMPORARY ART (16-17 APRIL 2019)'
from the webpage:
from bs4 import BeautifulSoup
import requests
page_link = 'https://www.storyltd.com/auction/item.aspx?eid=4158&&lotno=2'
page_response = requests.get(page_link, timeout=5)
page_content = BeautifulSoup(page_response.content, "html.parser")
page_content.find("input", attrs={"id":"hdnAuctionTitle"}).attrs['value']
输出:
NO RESERVE AUCTION OF MODERN AND CONTEMPORARY ART (16-17 APRIL 2019)
当您检查 page_content
时,您会发现这句话出现在输入标签中.
When you check the page_content
, you will find out that this sentence is present in an Input Tag.
希望对您有帮助!
这篇关于BeautifulSoup返回< a>的一些奇怪的文本.标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!