在字符串的pandas数据框中查找值计数 [英] Find value counts within a pandas dataframe of strings

查看：170 发布时间：2020/5/23 23:48:26 python pandas pivot-table

本文介绍了在字符串的pandas数据框中查找值计数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想获取一列中字符串的频率计数.一方面，这类似于将数据框折叠为仅反映列中的字符串的一组行.我能够通过循环解决此问题，但知道有更好的解决方案.

I want to get the frequency count of strings within a column. One one hand, this is similar to collapsing a dataframe to a set of rows that only reflects the strings in the column. I was able to solve this with a loop, but know there is a better solution.

示例df:

       2017-08-09  2017-08-10
id                                                             
0             pre         pre   
2      active_1-3    active_1   
3        active_1    active_1   
4      active_3-7  active_3-7   
5        active_1    active_1

想出去:

       2017-08-09  2017-08-10
pre             1           1
active_1        2           3
active_1-3      3           0
active_3-7      1           1

我搜索了很多论坛，但找不到合适的答案.

I searched a lot of forums but couldnt' find a good answer.

我假设透视表方法是正确的方法，但是无法获得正确的参数来折叠没有输出df明显索引的表.

I'm assuming a pivot_table approach is the right one, but couldn't get the right arguments to collapse a table that didn't have an obvious index for the output df.

我可以通过使用value_counts()遍历每列并将每个值计数系列附加到新的数据框中来使其工作，但是我知道有更好的解决方案.

I was able to get this to work by iterating over each column, using value_counts(), and appending each value count series into a new dataframe, but I know there is a better solution.

for i in range(len(date_cols)):
    new_values = df[date_cols[i]].value_counts()
    output_df = pd.concat([output_df , new_values], axis=1)

谢谢！

推荐答案

您可以使用value counts和pd.Series(感谢Jon的改进)，即

You can use value counts and pd.Series (Thanks for improvement Jon)i.e

ndf = df.apply(pd.Series.value_counts).fillna(0)


           2017-08-09  2017-08-10
active_1             2         3.0
active_1-3           1         0.0
active_3-7           1         1.0
pre                  1         1.0

时间:

k = pd.concat([df]*1000)
# @cᴏʟᴅsᴘᴇᴇᴅ's method 
%%timeit
pd.get_dummies(k.T).groupby(by=lambda x: x.split('_', 1)[1], axis=1).sum().T
1 loop, best of 3: 5.68 s per loop


%%timeit
# @cᴏʟᴅsᴘᴇᴇᴅ's method 
k.stack().str.get_dummies().sum(level=1).T
10 loops, best of 3: 84.1 ms per loop

# My method 
%%timeit
k.apply(pd.Series.value_counts).fillna(0)
100 loops, best of 3: 7.57 ms per loop

# FabienP's method 
%%timeit
k.unstack().groupby(level=0).value_counts().unstack().T.fillna(0)
100 loops, best of 3: 7.35 ms per loop

#@Wen's method (fastest for now) 
pd.concat([pd.Series(collections.Counter(k[x])) for x in df.columns],axis=1)
100 loops, best of 3: 4 ms per loop

这篇关于在字符串的pandas数据框中查找值计数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在字符串的pandas数据框中查找值计数 [英] Find value counts within a pandas dataframe of strings

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在字符串的pandas数据框中查找值计数 [英] Find value counts within a pandas dataframe of strings

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭