BeautifulSoup html丢失 [英] BeautifulSoup html missing

查看：93 发布时间：2020/9/20 7:37:27 python html beautifulsoup html-parsing

本文介绍了BeautifulSoup html丢失的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试获取链接的URL，以在特定时间段内从Yahoo Finance下载资产的历史数据. 1999年1月1日至今.

I'm trying to get the url for the link to download historical data from Yahoo Finance for an asset during a specific timeframe. January 1, 1999 to present day.

例如，如果我去这里: https://finance.yahoo.com/quote/XLB/history?period1=915177600&period2=1498633200&interval=1d&filter=history&frequency=1d

So for example if I go here: https://finance.yahoo.com/quote/XLB/history?period1=915177600&period2=1498633200&interval=1d&filter=history&frequency=1d

我想获取它(从数据表上方的下载数据"链接中):

I would want to acquire this (from the "Download Data" link above the table of data):

"https://query1.finance.yahoo.com/v7/finance/download/XLB?period1=915177600&amp;period2=1498633200&amp;interval=1d&amp;events=history&amp;crumb=iX6bJ6LfGxc"

我正在使用BeautifulSoup，并且遇到了所需标签的问题，该标签包含href不会显示在html中.起初，我认为在尝试使用find_all('a')并遍历子对象/后代对象没有任何结果后，BeautifulSoup只是无法正常工作.但是，当我对html进行文本转储时，html元素(以及父元素中的所有其他元素)不存在. 有人可以解释发生了什么吗?下面列出了我目前正在使用的工具.

I'm using BeautifulSoup and am running into the problem of the required tag that holds the href not showing up in the html. At first I thought BeautifulSoup was just not working properly after getting no results from trying to use find_all('a') and iterating through children/decendants. But when I did a text dump of the html, the html element (along with everything else within the parent element) was not there. Can someone please explain what is going on? What I'm currently working with is listed below.

from bs4 import BeautifulSoup
import datetime as dTime
import requests

"""
asset = "Materials"
assetSignal = "XLB"
today = dTime.datetime.now()
startTime = str(int(dTime.datetime(1999, 1, 1, 0, 0, 0).timestamp()))
endTime = str(int(dTime.datetime(today.year, today.month, today.day, 0, 0, 0).timestamp()))
url = "https://finance.yahoo.com/quote/" + assetSignal + "/history?period1=" + startTime + "&period2=" + endTime + "&interval=1d&filter=history&frequency=1d"
"""

url = "https://finance.yahoo.com/quote/XLB/history?period1=915177600&period2=1498633200&interval=1d&filter=history&frequency=1d"
page = requests.get(url)
data = page.content
#soup = BeautifulSoup(data, "html.parser")
soup = BeautifulSoup(data, "lxml")
#soup = BeautifulSoup(data, "xml")
#soup = BeautifulSoup(data, "html5lib")

#Link not found
for link in soup.find_all("a"):
    print(link.get("href"))

#Span is empty?
span = soup.find(class_="Fl(end) Pos(r) T(-6px)")
print(span)
print(span.string)
print(span.contents)
for child in span.children:
    print(child)

#Other span has children.  Target span doesn't
div = soup.find(class_="C($finDarkGray) Mt(20px) Mb(15px)")
print(div)
for child in div.descendants:
    print(child)

#Is the tag even there?
with open("soup.txt", "w") as file:
    file.write(page.text)

BeautifulSoup html丢失 [英] BeautifulSoup html missing

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

BeautifulSoup html丢失 [英] BeautifulSoup html missing

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭