在组和子组中拆分数据框并将输出存储在CSV文件中 [英] Split dataframes in groups and sub-groups and store the output in a CSV file
问题描述
ABC
27/6/2017 4:00: 00 928.04 4.83
27/6/2017 4:20:00 927.71 4.61
27/6/2017 4:40:00 928.22 4.49
27/6/2017 5:00:00 898.74 3.81
27/6/2017 5:20:00 895.16 3.55
27/6/2017 5:40:00 895.05 3.4
27/6/2017 6:00:00 895.68 3.3
27/6/2017 16:20:00 662.45 1.52
27/6/2017 16:40:00 639.98 1.48
27/6/2017 17:40:00 732.02 1.79
27/6/2017 18:00:00 722.63 1.98
27/6/2017 18:20:00 713.26 1.79
27/6/2017 18:40:00 705.8 1.54
27 / 6/2017 19:00:00 652.1 1.51
27/6/2017 19:20:00 638.58 1.68
27/6/2017 19:40:00 633.14 1.66
27/6 / 2017 20:00:00 654.66 1.45
我想根据差异来拆分数据帧小时,即如果两个时间戳之间的差异超过4小时,它将拆分数据帧。然后,我想根据B的值范围将这两个数据框分成子组。我想将所有这些组和子组存储在单独的csv文件中。
所需输出:
Group1:
ABC
27/6/2017 4:00:00 928.04 4.83
27/6/2017 4:20:00 927.71 4.61
27/6 / 2017 4:40:00 928.22 4.49
27/6/2017 5:00:00 898.74 3.81
27/6/2017 5:20:00 895.16 3.55
27/6/2017 5 :40:00 895.05 3.4
27/6/2017 6:00:00 895.68 3.3
Group2:
ABC
27/6/2017 16:20:00 662.45 1.52
27 / 6/2017 16:40:00 639.98 1.48
27/6/2017 17:40:00 732.02 1.79
27/6/2017 18:00:00 722.63 1.98
27/6 / 2017 18:20:00 713.26 1.79
27/6/2017 18:40:00 705.8 1.54
27/6/2017 19:00:00 652.1 1.51
27/6/2017 19 :20:00 638.58 1.68
27/6/2017 19:40:00 633.14 1.66
27/6/2017 20:00:00 654.66 1.45
<区域>
Group1 Zone1:
ABC
2 7/6/2017 4:00:00 928.04 4.83
27/6/2017 4:20:00 927.71 4.61
27/6/2017 4:40:00 928.22 4.49
GRoup1 ZOne2:
ABC
27/6/2017 5:00:00 898.74 3.81
27/6/2017 5:20:00 895.16 3.55
27/6/2017 5:40:00 895.05 3.4
27/6/2017 6:00:00 895.68 3.3
喜欢这个。
我尝试了一些逻辑来实现这一点,但我无法做到这一点。
代码:
time_diff = df [Time]。diff()
zones = []
dfs = DataFrame
zone =(dfs [Time]> =(dfs [Time]。shift()+ time_diff [1] * 12))。cumsum()
zone_grp = dfs.groupby(zone)
xyz = []
for k,g in zone_grp:
if len(g)> = 30:
zone .append(g)
else:
在范围内通过
(len(zones)):
zone_df = DataFrame(zones [m])
x = range (len(zone_df))
y = zone_df [T401FN1VT4000]
abc = Series((linregress(x,y)))
abc = DataFrame(abc).T
slope = abc [0] .tolist()
intercept = abc [1] .tolist()
abc = DataFrame({Slope:slope,Intercept:截距})
xyz.append(abc)
zone_df.to_csv(Zone_%s.csv%m,index = False)
xyz = concat(xyz).reset_index()
del xyz [index ]
xyz [Zone] = xyz.index
xyz = xyz.set_index(Zone)
xyz.to_csv(Coefficients.csv,index = True)
请帮助我以更好的方式根据时差分割数据框,并帮助我存储组以及不同名称的csv文件中的子组。
任何帮助都会被理解。
您可以为第一级groupby使用 diff
和 pd.Timedelta
,并且 df.B // x * x
将 B
分成不同的组别。
GRPS = [(df.A.diff()> (dps.groupby(grps)):
g.to_csv('{} pd.Timedelta(hours = 4))。cumsum(),df.B // 100 * 100]
_ {}。csv'.format(* i))
print(g)
ABC
3 2017-06-27 05:00:00 898.74 3.81
4 2017-06-27 05: 20:00 895.16 3.55
5 2017-06-27 05:40:00 895.05 3.40
6 2017-06-27 06:00:00 895.68 3.30
ABC
0 2017-06-27 04:00:00 928.04 4.83
1 2017-06-27 04:20:00 927.71 4.61
2 2017-06-27 04:40:00 928.22 4.49
ABC
7 2017-06-27 16:20:00 662.45 1.52
8 2017-06-27 16:40:00 639.98 1.48
13 2017-06 -27 19:00:00 652.10 1.51
14 2017-06-27 19:20:00 638.58 1.68
15 2017-06-27 19:40:00 633.14 1.66
16 2017- 06-27 20:00:00 654.66 1.45
ABC
9 2017-06-27 17:40:00 732.02 1.79
2017-06-27 18:00: 00 722.63 1.98
11 2017-06-27 18:20:00 713.26 1.79
12 2017-06-27 18:40:00 705.80 1.54
Suppose I have a large dataframe like this:
A B C
27/6/2017 4:00:00 928.04 4.83
27/6/2017 4:20:00 927.71 4.61
27/6/2017 4:40:00 928.22 4.49
27/6/2017 5:00:00 898.74 3.81
27/6/2017 5:20:00 895.16 3.55
27/6/2017 5:40:00 895.05 3.4
27/6/2017 6:00:00 895.68 3.3
27/6/2017 16:20:00 662.45 1.52
27/6/2017 16:40:00 639.98 1.48
27/6/2017 17:40:00 732.02 1.79
27/6/2017 18:00:00 722.63 1.98
27/6/2017 18:20:00 713.26 1.79
27/6/2017 18:40:00 705.8 1.54
27/6/2017 19:00:00 652.1 1.51
27/6/2017 19:20:00 638.58 1.68
27/6/2017 19:40:00 633.14 1.66
27/6/2017 20:00:00 654.66 1.45
I want to split the dataframe on the basis of difference of hours i.e. if the difference between two timestamp is more than 4 hours it will split the dataframe. Then i want to split those two data frames in subgroups on the basis of range of values of B. I want to store all those groups and sub groups in an individual csv files.
Desired output:
Group1:
A B C
27/6/2017 4:00:00 928.04 4.83
27/6/2017 4:20:00 927.71 4.61
27/6/2017 4:40:00 928.22 4.49
27/6/2017 5:00:00 898.74 3.81
27/6/2017 5:20:00 895.16 3.55
27/6/2017 5:40:00 895.05 3.4
27/6/2017 6:00:00 895.68 3.3
Group2:
A B C
27/6/2017 16:20:00 662.45 1.52
27/6/2017 16:40:00 639.98 1.48
27/6/2017 17:40:00 732.02 1.79
27/6/2017 18:00:00 722.63 1.98
27/6/2017 18:20:00 713.26 1.79
27/6/2017 18:40:00 705.8 1.54
27/6/2017 19:00:00 652.1 1.51
27/6/2017 19:20:00 638.58 1.68
27/6/2017 19:40:00 633.14 1.66
27/6/2017 20:00:00 654.66 1.45
Zones:
Group1 Zone1:
A B C
27/6/2017 4:00:00 928.04 4.83
27/6/2017 4:20:00 927.71 4.61
27/6/2017 4:40:00 928.22 4.49
GRoup1 ZOne2:
A B C
27/6/2017 5:00:00 898.74 3.81
27/6/2017 5:20:00 895.16 3.55
27/6/2017 5:40:00 895.05 3.4
27/6/2017 6:00:00 895.68 3.3
LIke this.
I have tried some logics to achieve this but i couldn't able to do this.
Code:
time_diff = df["Time"].diff()
zones = []
dfs = DataFrame
zone = (dfs["Time"] >= (dfs["Time"].shift() + time_diff[1]*12)).cumsum()
zone_grp = dfs.groupby(zone)
xyz = []
for k,g in zone_grp:
if len(g) >= 30:
zones.append(g)
else:
pass
for m in range(len(zones)):
zone_df = DataFrame(zones[m])
x = range(len(zone_df))
y = zone_df["T401FN1VT4000"]
abc = Series((linregress(x,y)))
abc = DataFrame(abc).T
slope = abc[0].tolist()
intercept = abc[1].tolist()
abc = DataFrame({"Slope":slope,"Intercept":intercept})
xyz.append(abc)
zone_df.to_csv("Zone_%s.csv" %m, index = False)
xyz = concat(xyz).reset_index()
del xyz["index"]
xyz["Zone"] = xyz.index
xyz = xyz.set_index("Zone")
xyz.to_csv("Coefficients.csv", index = True)
Please help me to split the dataframe on the basis of time difference in a better way and help me to store the groups and sub groups in csv files with different names.
Any help would be appreciated.
You could use diff
and pd.Timedelta
for the first level groupby, and df.B // x * x
to divide B
into ranged groups.
grps = [(df.A.diff() > pd.Timedelta(hours=4)).cumsum(), df.B // 100 * 100]
for i, g in df.groupby(grps):
g.to_csv('{}_{}.csv'.format(*i))
print(g)
A B C
3 2017-06-27 05:00:00 898.74 3.81
4 2017-06-27 05:20:00 895.16 3.55
5 2017-06-27 05:40:00 895.05 3.40
6 2017-06-27 06:00:00 895.68 3.30
A B C
0 2017-06-27 04:00:00 928.04 4.83
1 2017-06-27 04:20:00 927.71 4.61
2 2017-06-27 04:40:00 928.22 4.49
A B C
7 2017-06-27 16:20:00 662.45 1.52
8 2017-06-27 16:40:00 639.98 1.48
13 2017-06-27 19:00:00 652.10 1.51
14 2017-06-27 19:20:00 638.58 1.68
15 2017-06-27 19:40:00 633.14 1.66
16 2017-06-27 20:00:00 654.66 1.45
A B C
9 2017-06-27 17:40:00 732.02 1.79
10 2017-06-27 18:00:00 722.63 1.98
11 2017-06-27 18:20:00 713.26 1.79
12 2017-06-27 18:40:00 705.80 1.54
这篇关于在组和子组中拆分数据框并将输出存储在CSV文件中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!