Python Pandas 计算特定值的出现次数 [英] Python Pandas Counting the Occurrences of a Specific value

查看:116
本文介绍了Python Pandas 计算特定值的出现次数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图找出某个值出现在一列中的次数.

我已经用 data = pd.DataFrame.from_csv('data/DataSet2.csv')

制作了数据框

现在我想查找某列中出现的次数.这是怎么做的?

我以为是下面的,我在教育专栏中查看并计算?出现的次数.

下面的代码显示我试图找到9th出现的次数,错误是我运行代码时得到的

代码

missing2 = df.education.value_counts()['9th']打印(缺少2)

错误

KeyError: '9th'

解决方案

您可以根据您的条件创建数据的subset,然后使用

代码:

import perfplot, stringnp.random.seed(123)定义形状(df):返回 df[df.education == 'a'].shape[0]def len_df(df):返回 len(df[df['education'] == 'a'])def query_count(df):返回 df.query('education == "a"').education.count()def sum_mask(df):返回 (df.education == 'a').sum()def sum_mask_numpy(df):返回 (df.education.values == 'a').sum()def make_df(n):L = 列表(string.ascii_letters)df = pd.DataFrame(np.random.choice(L, size=n), columns=['education'])返回 dfperfplot.show(设置=make_df,kernels=[shape, len_df, query_count, sum_mask, sum_mask_numpy],n_range=[2**k for k in range(2, 25)],logx=真,逻辑=真,平等检查=假,xlabel='len(df)')

I am trying to find the number of times a certain value appears in one column.

I have made the dataframe with data = pd.DataFrame.from_csv('data/DataSet2.csv')

and now I want to find the number of times something appears in a column. How is this done?

I thought it was the below, where I am looking in the education column and counting the number of time ? occurs.

The code below shows that I am trying to find the number of times 9th appears and the error is what I am getting when I run the code

Code

missing2 = df.education.value_counts()['9th']
print(missing2)

Error

KeyError: '9th'

解决方案

You can create subset of data with your condition and then use shape or len:

print df
  col1 education
0    a       9th
1    b       9th
2    c       8th

print df.education == '9th'
0     True
1     True
2    False
Name: education, dtype: bool

print df[df.education == '9th']
  col1 education
0    a       9th
1    b       9th

print df[df.education == '9th'].shape[0]
2
print len(df[df['education'] == '9th'])
2

Performance is interesting, the fastest solution is compare numpy array and sum:

Code:

import perfplot, string
np.random.seed(123)


def shape(df):
    return df[df.education == 'a'].shape[0]

def len_df(df):
    return len(df[df['education'] == 'a'])

def query_count(df):
    return df.query('education == "a"').education.count()

def sum_mask(df):
    return (df.education == 'a').sum()

def sum_mask_numpy(df):
    return (df.education.values == 'a').sum()

def make_df(n):
    L = list(string.ascii_letters)
    df = pd.DataFrame(np.random.choice(L, size=n), columns=['education'])
    return df

perfplot.show(
    setup=make_df,
    kernels=[shape, len_df, query_count, sum_mask, sum_mask_numpy],
    n_range=[2**k for k in range(2, 25)],
    logx=True,
    logy=True,
    equality_check=False, 
    xlabel='len(df)')

这篇关于Python Pandas 计算特定值的出现次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆