如何基于另一列的特定值对python数据框进行操作？ [英] How to make operations on python dataframe based on specific values of another column?

查看：197 发布时间：2020/10/15 21:36:25 python numpy dataframe data-analysis

本文介绍了如何基于另一列的特定值对python数据框进行操作？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是python数据分析的新手。以下是一个示例数据集：

  d2 = {'Index'：[0,0,0,0,0,0， 0,0,1,1,1,1,1,1,1,1,1]，'journey_time'：[95.546,132.945,147.538,301.307,42.907,129.008,102.900,112.620,234.334,103.321,82.337,154.817， 20.076,85.717,94.362,45.032]，'edge'：['s_b'，'c_d'，'b_d'，'c_e'，'d_f'，'s_a'，'a_c'，'d_c'，'c_e'， 'a_c'，'d_c'，'s_a'，'d_f'，'s_b'，'b_d'，'c_d']} 
 df2 = pd.DataFrame（data = d2）
   se1 = s_a + a_c + c_e 
 se2 = s_b + b_d + d_c + c_e 
 sf1 = s_b + b_d + d_f 
 sf2 = s_a + a_c + c_d + d_f 
  
此外，我的计算方式还有其他变化，例如
  eq_time1 =（200 /（s_a + a_c））+ c_e 
 eq_time2 =（200 /（s_b + b_d + d_c））+ c_e 
  
规则中边的值是每个唯一索引的相应行进时间。我不确定如何在python数据帧中编写此代码。以下是我的预期输出：
  df3 = {'Index'：[0,1]，'se1'：[129.008+ 102.900 + 301.307,154.817 + 103.321 + 234.334]，'se2'：[95.546 + 147.538 + 112.620 + 301.307,85.717 + 94.362 + 82.337 + 234.334]，'sf1'：[95.546 + 147.538 + 42.907,85.717 + 94.362 + 20.076] ，'sf2'：[129.008 + 102.900 + 132.945 + 42.907,154.817 + 103.321 + 45.032 + 20.076]，'eq_time1'：[（200 /（129.008 + 102.900））+ 301.307，（200 /（154.817 + 103.321））+ 234.334]，'eq_time2'：[（200 /（95.546 + 147.538 + 112.620））+ 301.307，（200 /（85.717 + 94.362 + 82.337））+ 234.334]} 
  
请帮助！
解决方案
如果只有这4个数据中的路径，您可以按以下方式计算熊猫时间：
  paths = {
'se1'： ['s_a'，'a_c'，'c_e']，
'se2'：['s_b'，'b_d'，'d_c'，'c_e']，
'sf1'：[' s_b'，'b_d'，'d_f']，
'sf2'：['s_a'，'a_c'，'c_d'，'d_f'] 
} 
 
路径= {
'se1'：['s_a'，'a_c'，'c_e']，
'se2'：['s_b'，'b_d'，'d_c'，'c_e'] ，
' sf1'：['s_b'，'b_d'，'d_f']，
'sf2'：['s_a'，'a_c'，'c_d'，'d_f'] 
} 
 
 df3 = pd.DataFrame（{'Index'：df2 ['Index']。unique（）}）。set_index（'Index'）
 
对于路径中的k，v。 items（）：
 df3 [k] = df2 [df2.edge.isin（v）]。groupby（'Index'）['journey_time']。sum（）
 last_edge_times = df2 [df2。 edge == v [-1]]。set_index（'Index'）
 df3 ['eq_time _'+ k] = 200.0 /（df3 [k]-last_edge_times.journey_time）+ last_edge_times.journey_time 
  
对于任何路径 p ， eq_time_p 列存储方程式给出的eq_times。
 
I am new to python data analysis. Following is an example dataset:
d2 = {'Index': [0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1], 'journey_time':[95.546,132.945,147.538,301.307,42.907,129.008,102.900,112.620,234.334,103.321,82.337,154.817,20.076,85.717,94.362,45.032],'edge':['s_b','c_d','b_d','c_e','d_f','s_a','a_c','d_c','c_e','a_c','d_c','s_a','d_f','s_b','b_d','c_d']}
df2=pd.DataFrame(data=d2)
I want to create a new data frame where there is one row for each index with new columns. The rules for the new columns are as such:
se1 = s_a + a_c + c_e
se2 = s_b + b_d + d_c + c_e
sf1 = s_b + b_d + d_f
sf2 = s_a + a_c + c_d + d_f 
Also, I have further variations in my calculations such as 
eq_time1 = (200/(s_a + a_c)) + c_e
eq_time2 = (200/(s_b + b_d + d_c)) + c_e 
The values of the edges in the rules are the corresponding journey time for each unique index. I am not sure how to write this in python dataframe. Following is my expected output:
df3 = {'Index':[0,1],'se1':[129.008+102.900+301.307,154.817+103.321+234.334],'se2':[95.546+147.538+112.620+301.307,85.717+94.362+82.337+234.334],'sf1':[95.546+147.538+42.907,85.717+94.362+20.076],'sf2':[129.008+102.900+132.945+42.907,154.817+103.321+45.032+20.076 ],'eq_time1':[(200/(129.008+102.900))+301.307,(200/(154.817+103.321))+234.334   ], 'eq_time2' : [(200/(95.546+147.538+112.620))+301.307,(200/(85.717+94.362+82.337))+234.334]}
Please help!
 解决方案 
If you have just those 4 paths in your data, you can calculate the times in pandas as follows:
paths = {
  'se1': ['s_a', 'a_c', 'c_e'],
  'se2': ['s_b', 'b_d', 'd_c', 'c_e'],
  'sf1': ['s_b', 'b_d', 'd_f'],
  'sf2': ['s_a', 'a_c', 'c_d', 'd_f']
}

paths = {
  'se1': ['s_a', 'a_c', 'c_e'],
  'se2': ['s_b', 'b_d', 'd_c', 'c_e'],
  'sf1': ['s_b', 'b_d', 'd_f'],
  'sf2': ['s_a', 'a_c', 'c_d', 'd_f']
}

df3 = pd.DataFrame({'Index': df2['Index'].unique()}).set_index('Index')

for k, v in paths.items():
  df3[k] = df2[df2.edge.isin(v)].groupby('Index')['journey_time'].sum()
  last_edge_times = df2[df2.edge==v[-1]].set_index('Index')
  df3['eq_time_'+k] = 200.0/(df3[k] - last_edge_times.journey_time) + last_edge_times.journey_time
For any path p, eq_time_p column stores the eq_times as given by your equations.

                        这篇关于如何基于另一列的特定值对python数据框进行操作？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

如何基于另一列的特定值对python数据框进行操作？ [英] How to make operations on python dataframe based on specific values of another column?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何基于另一列的特定值对python数据框进行操作？ [英] How to make operations on python dataframe based on specific values of another column?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭