计算python pandas 中的系列数 [英] Calulating number of series in python pandas
问题描述
我想计算给定数据中存在的序列数.
I wanted to calculate the number of series present in the given data.
我需要此信息来计算时间序列.
在这里,我希望用户选择如何检查序列.
Here I would like the user to select how to check series.
例如系列可以是地区>产品>国家/地区(请同时选择此代码)
e.g. Series can be Region > Product > Country (please take this selection for this code also)
现在,系列是:
- 亚洲> A>印度
- 亚洲> A>泰国
- 亚洲> B>印度
- 亚洲> B>泰国
- 亚洲> D>日本
- 欧洲> A>意大利
- 欧洲> A>土耳其
- 欧洲> B>意大利
所以我需要答案为'8',因为选定的层次结构有8个系列.
So I need an answer as '8' since there are 8 series for selected hierarchy.
我成功地做到了这一点,方法是将CSV转换为excel,然后计算所有系列.但是,如果我有大量数据,那将非常耗时.
I was successfully able to do this by converting CSV to excel and then counting all series. But it is very time consuming if I have large data.
import pandas as pd
import numpy as np
df = pd.read_csv("data.csv")
state = df.unstack('Sales')
set1= list(set(state))
pivot = pd.pivot_table(df,index=["Region","Country","Product"],values="Sales",aggfunc=np.sum)
df1 = pd.DataFrame(pivot)
df1.to_excel("output.xlsx")
df2 = pd.read_excel("output.xlsx")
cols = list(df2.columns)
count_TS = 0
for i in cols:
if i =="":
continue
count_TS += df2[i].count()
print("Total Timeseries = ",count_TS + 1 -(df2['Sales'].count()))
注意:上面代码中使用的层次结构是地区>国家/地区>产品
Note: Hierarchy used in the above code is Region > Country > Product
是否可以在不创建新的excel文件的情况下执行此操作?
Is it possible to do this without creating new excel file?
这是您的numpy数组:
Here is the numpy array for you:
array([['Asia', 'India', 'A', 200],
['Asia', 'Thailand', 'A', 150],
['Asia', 'India', 'B', 175],
['Asia', 'Thailand', 'B', 225],
['Asia', 'Japan', 'D', 325],
['Europe', 'Italy', 'A', 120],
['Europe', 'Turkey', 'A', 130],
['Europe', 'Italy', 'B', 160]], dtype=object)
推荐答案
IIUC,您需要GroupBy.ngroups
:
df.groupby(['Region','Country','Sales']).ngroups
#8 Output
这篇关于计算python pandas 中的系列数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!