重新采样Pandas数据框并合并列中的字符串 [英] resample Pandas dataframe and merge strings in column
问题描述
我想对熊猫数据框重新采样,并将不同的功能应用于不同的列.问题是我无法正确处理带有字符串的列.我想应用一个将字符串与定界符(例如-")合并的函数.这是一个数据示例:
I want to resample a pandas dataframe and apply different functions to different columns. The problem is that I cannot properly process a column with strings. I would like to apply a function that merges the string with a delimiter such as " - ". This is a data example:
import pandas as pd
import numpy as np
idx = pd.date_range('2017-01-31', '2017-02-03')
data=list([[1,10,"ok"],[2,20,"merge"],[3,30,"us"]])
dates=pd.DatetimeIndex(['2017-01-31','2017-02-03','2017-02-03'])
d=pd.DataFrame(data, index=,columns=list('ABC'))
A B C
2017-01-31 1 10 ok
2017-02-03 2 20 merge
2017-02-03 3 30 us
使用求和和平均值聚合器对数字列A和B重新采样.但是C列对sum起作用(但它排在第二位,这可能意味着某些地方失败了).
Resampling the numeric columns A and B with a sum and mean aggregator works. Column C however kind of works with sum (but it gets placed on the second place, which might mean that something fails).
d.resample('D').agg({'A': sum, 'B': np.mean, 'C': sum})
A C B
2017-01-31 1.0 a 10.0
2017-02-01 NaN 0 NaN
2017-02-02 NaN 0 NaN
2017-02-03 5.0 merge us 25.0
我想得到这个:
...
2017-02-03 5.0 merge - us 25.0
我尝试以不同的方式使用lambda,但没有成功(未显示).
I tried using lambda in different ways but without success (not shown).
如果我要问另一个相关的问题:我可以对此做一些后期处理,但是如何用零或"填充不同列中的缺失单元格呢?
If I may ask a second related question: I can do some post processing for this, but how to fill missing cells in different columns with zeros or ""?
推荐答案
您的列'C'
的agg函数应为join
Your agg function for column 'C'
should be a join
d.resample('D').agg({'A': sum, 'B': np.mean, 'C': ' - '.join})
A B C
2017-01-31 1.0 10.0 ok
2017-02-01 NaN NaN
2017-02-02 NaN NaN
2017-02-03 5.0 25.0 merge - us
这篇关于重新采样Pandas数据框并合并列中的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!