将数据附加到 Pandas 数据框时的错误消息 [英] Error message when appending data to pandas dataframe
问题描述
谁能帮我解决这个问题:
我创建了一个循环来附加来自 Coinbase 的历史价格数据的连续间隔.
我的循环成功迭代了几次然后崩溃了.
错误信息(在data_temp代码行下):
ValueError:如果使用所有标量值,则必须传递索引"
days = 10end = datetime.now().replace(微秒=0)开始 = 结束 - 时间增量(天 = 天)数据价格 = pd.DataFrame()对于范围内的 i (1,50):打印(开始)打印(结束)data_temp = pd.DataFrame(public_client.get_product_historic_rates(product_id='BTC-USD',粒度=3600,开始=开始,结束=结束))data_price = data_price.append(data_temp)结束 = 开始开始 = 结束 - 时间增量(天 = 天)
很想了解如何解决这个问题以及为什么会发生这种情况.
谢谢!
这是完整的跟踪:
回溯(最近一次调用最后一次):文件\coinbase_bot.py",第 46 行,在data_temp = pd.DataFrame(public_client.get_product_historic_rates(product_id='BTC-USD',粒度=3600,开始=开始,结束=结束))init 中的文件D:\Program Files\Python37\lib\site-packages\pandas\core\frame.py",第 411 行mgr = init_dict(数据,索引,列,dtype=dtype)文件D:\Program Files\Python37\lib\site-packages\pandas\core\internals\construction.py",第 257 行,在 init_dict 中返回arrays_to_mgr(数组,数据名称,索引,列,dtype=dtype)文件D:\Program Files\Python37\lib\site-packages\pandas\core\internals\construction.py",第 77 行,在 arrays_to_mgr索引 = 提取索引(数组)文件D:\Program Files\Python37\lib\site-packages\pandas\core\internals\construction.py",第 358 行,extract_indexraise ValueError("如果使用所有标量值,则必须传递一个索引")ValueError:如果使用所有标量值,则必须传递索引
这是通过简单的 url 调用返回的 json:
[[1454716800,370.05,384.54,384.44,375.44,6276.66473729],[1454630400,382.99,389.36,387.99,384.5,7443.92933224],[1454544000,368.74,390.63,368.87,387.99,8887.7572324],[1454457600,365.63,373.01,372.93,368.87,7147.95657328],[1454371200,371.17,374.41,371.33,372.93,6856.21815799],[1454284800,366.26,379,367.89,371.33,7931.22922922],[1454198400,365,382.5,378.46,367.95,5506.77681302]] >
与此用户的问题非常相似,但无法解决:尝试合并多个数据帧时,如何解决ValueError:如果使用所有标量值,则必须传递索引"
-- Hi DashOfProgramming,
您的问题是 data_temp 仅使用一行进行初始化,而 Pandas 要求您为此提供一个索引.
以下代码段应该可以解决这个问题.我用一个简单的字典替换了您的 API 调用,该字典类似于我期望 API 返回的内容,并使用 i 作为数据帧的索引(这样做的优点是您也可以跟踪):>
将pandas导入为pd从日期时间导入日期时间,时间增量天 = 10end = datetime.now().replace(微秒=0)开始 = 结束 - 时间增量(天 = 天)数据价格 = pd.DataFrame()temp_dict = {'开始':'2019-09-30','结束':'2019-10-01','价格':'-111.0928','货币:美元'}对于范围内的 i (1,50):打印(开始)打印(结束)data_temp = pd.DataFrame(temp_dict, index=[i])data_price = data_price.append(data_temp)结束 = 开始开始 = 结束 - 时间增量(天 = 天)打印(数据价格)
<小时>
编辑
刚刚看到您的 API 输出是一个嵌套列表.pd.DataFrame() 认为列表只有一行,因为它是嵌套的.我建议您将列存储在一个单独的变量中,然后执行以下操作:
cols = ['ts', 'low', 'high', 'open', 'close', 'sth_else']v = [[...], [...], [...]] # 你的列表列表data_temp = pd.DataFrame.from_records(v, columns=cols)
Can someone give me a hand with this:
I created a loop to append successive intervals of historical price data from Coinbase.
My loop iterates successfully a few times then crashes.
Error message (under data_temp code line):
"ValueError: If using all scalar values, you must pass an index"
days = 10
end = datetime.now().replace(microsecond=0)
start = end - timedelta(days=days)
data_price = pd.DataFrame()
for i in range(1,50):
print(start)
print(end)
data_temp = pd.DataFrame(public_client.get_product_historic_rates(product_id='BTC-USD', granularity=3600, start=start, end=end))
data_price = data_price.append(data_temp)
end = start
start = end - timedelta(days=days)
Would love to understand how to fix this and why this is happening in the first place.
Thank you!
Here's the full trace:
Traceback (most recent call last): File "\coinbase_bot.py", line 46, in data_temp = pd.DataFrame(public_client.get_product_historic_rates(product_id='BTC-USD', granularity=3600, start=start, end=end)) File "D:\Program Files\Python37\lib\site-packages\pandas\core\frame.py", line 411, in init mgr = init_dict(data, index, columns, dtype=dtype) File "D:\Program Files\Python37\lib\site-packages\pandas\core\internals\construction.py", line 257, in init_dict return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype) File "D:\Program Files\Python37\lib\site-packages\pandas\core\internals\construction.py", line 77, in arrays_to_mgr index = extract_index(arrays) File "D:\Program Files\Python37\lib\site-packages\pandas\core\internals\construction.py", line 358, in extract_index raise ValueError("If using all scalar values, you must pass an index") ValueError: If using all scalar values, you must pass an index
Here's json returned via simple url call:
[[1454716800,370.05,384.54,384.44,375.44,6276.66473729],[1454630400,382.99,389.36,387.99,384.5,7443.92933224],[1454544000,368.74,390.63,368.87,387.99,8887.7572324],[1454457600,365.63,373.01,372.93,368.87,7147.95657328],[1454371200,371.17,374.41,371.33,372.93,6856.21815799],[1454284800,366.26,379,367.89,371.33,7931.22922922],[1454198400,365,382.5,378.46,367.95,5506.77681302]]
Very similar to this user's issue but cannot put my finger on it: When attempting to merge multiple dataframes, how to resolve "ValueError: If using all scalar values, you must pass an index"
-- Hi DashOfProgramming,
Your problem is that the data_temp is initialised with only a single row and pandas requires you to provide it with an index for that.
The following snippet should resolve this. I replaced your API call with a simple dictionary that resembles what I would expect the API to return and used i as index for the dataframe (this has the advantage that you can keep track as well):
import pandas as pd
from datetime import datetime, timedelta
days = 10
end = datetime.now().replace(microsecond=0)
start = end - timedelta(days=days)
data_price = pd.DataFrame()
temp_dict = {'start': '2019-09-30', 'end': '2019-10-01', 'price': '-111.0928',
'currency': 'USD'}
for i in range(1,50):
print(start)
print(end)
data_temp = pd.DataFrame(temp_dict, index=[i])
data_price = data_price.append(data_temp)
end = start
start = end - timedelta(days=days)
print(data_price)
EDIT
Just saw that your API output is a nested list. pd.DataFrame() thinks the list is only one row, because it's nested. I suggest you store your columns in a separate variable and then do this:
cols = ['ts', 'low', 'high', 'open', 'close', 'sth_else']
v = [[...], [...], [...]] # your list of lists
data_temp = pd.DataFrame.from_records(v, columns=cols)
这篇关于将数据附加到 Pandas 数据框时的错误消息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!