如何计算两列中任一列的出现次数 [英] How to count the number of occurrences in either of two columns
问题描述
我有一个简单的问题.我有一个具有两列的数据框df
.对于这两个列中任一列中出现的每个字符串,我都希望计算其中任一列中都有符号的行数.
I have a simple looking problem. I have a dataframe df
with two columns. For each of the strings that occurs in either of these columns I would like to count the number of rows which has the symbol in either column.
例如
g k
a h
c i
j e
d i
i h
b b
d d
i a
d h
以下代码有效,但效率很低.
The following code works but is very inefficient.
for elem in set(df.values.flat):
print elem, len(df.loc[(df[0] == elem) | (df[1] == elem)])
a 2
c 1
b 1
e 1
d 3
g 1
i 4
h 3
k 1
j 1
但是,这效率很低,而且我的数据帧很大.效率低下是因为分别为df中的每个不同符号调用了df.loc[(df[0] == elem) | (df[1] == elem)]
.
This is however very inefficient and my dataframe is large. The inefficiency comes from calling df.loc[(df[0] == elem) | (df[1] == elem)]
separately for every distinct symbol in df.
有快速的方法吗?
推荐答案
您可以使用loc
过滤掉'col2'
中的行级别匹配项,将过滤后的'col2'
值附加到'col1'
,然后调用value_counts
:
You can use loc
to filter out row level matches from 'col2'
, append the filtered 'col2'
values to 'col1'
, and then call value_counts
:
counts = df['col1'].append(df.loc[df['col1'] != df['col2'], 'col2']).value_counts()
结果输出:
i 4
d 3
h 3
a 2
j 1
k 1
c 1
g 1
b 1
e 1
注意:如果希望输出按字母顺序显示,可以在计数代码的末尾添加.sort_index()
.
Note: You can add .sort_index()
to the end of the counting code if you want the output to appear in alphabetical order.
时间
使用以下设置来生成更大的样本数据集:
Using the following setup to produce a larger sample dataset:
from string import ascii_lowercase
n = 10**5
data = np.random.choice(list(ascii_lowercase), size=(n,2))
df = pd.DataFrame(data, columns=['col1', 'col2'])
def edchum(df):
vals = np.unique(df.values)
count = np.maximum(df['col1'].str.get_dummies().reindex_axis(vals, axis=1).fillna(0), df['col2'].str.get_dummies().reindex_axis(vals, axis=1).fillna(0)).sum()
return count
我得到以下计时:
%timeit df['col1'].append(df.loc[df['col1'] != df['col2'], 'col2']).value_counts()
10 loops, best of 3: 19.7 ms per loop
%timeit edchum(df)
1 loop, best of 3: 3.81 s per loop
这篇关于如何计算两列中任一列的出现次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!