BeautifulSoup返回< a>的一些奇怪的文本.标签 [英] BeautifulSoup returns some weird text for the <a> tag

查看:43
本文介绍了BeautifulSoup返回< a>的一些奇怪的文本.标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是网络抓取的新手,我正试图从该拍卖网站上抓取数据.但是,在尝试获取锚标记的文本时,我遇到了这个奇怪的问题.

I'm new to web scraping and I'm trying to scrape data from this auction website. However, I meet this weird problem when trying to get the text of the anchor tag.

这是HTML:

<div class="mt50">
  <div class="head_011">
    <a id="item_event_title" href="https://www.storyltd.com/auction/auction.aspx?eid=4158">NO RESERVE AUCTION OF MODERN AND CONTEMPORARY ART  (16-17 APRIL 2019)</a>
  </div>
</div>

这是我的代码:

auction_info = LTD_work_soup.find('a', id = 'item_event_title').text
print(auction_info)

这会打印出返回拍卖目录" ,而不是现代及当代艺术没有保留拍卖(2019年4月16日至17日)" 期待中.

This prints out "Back To Auction Catalogue" instead of 'NO RESERVE AUCTION OF MODERN AND CONTEMPORARY ART (16-17 APRIL 2019)', which is what I am expecting.

这是指向的链接页面.

谢谢.

推荐答案

在这里,您可以从网页中提取现代和当代艺术的无保留拍卖(2019年4月16日至17日)':

Here how you can extract the NO RESERVE AUCTION OF MODERN AND CONTEMPORARY ART (16-17 APRIL 2019)' from the webpage:

from bs4 import BeautifulSoup
import requests

page_link = 'https://www.storyltd.com/auction/item.aspx?eid=4158&amp&lotno=2'
page_response = requests.get(page_link, timeout=5)
page_content = BeautifulSoup(page_response.content, "html.parser")
page_content.find("input", attrs={"id":"hdnAuctionTitle"}).attrs['value']

输出:

NO RESERVE AUCTION OF MODERN AND CONTEMPORARY ART  (16-17 APRIL 2019)

当您检查 page_content 时,您会发现这句话出现在输入标签中.

When you check the page_content, you will find out that this sentence is present in an Input Tag.

希望对您有帮助!

这篇关于BeautifulSoup返回&lt; a&gt;的一些奇怪的文本.标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆