如何将pandas value_counts()合并到数据框或使用它来对数据框进行子集化 [英] How to merge pandas value_counts() to dataframe or use it to subset a dataframe

查看:697
本文介绍了如何将pandas value_counts()合并到数据框或使用它来对数据框进行子集化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用了熊猫df.value_counts()来查找特定品牌的出现次数.我想将这些价值计数与初始数据框中的各个品牌合并.

I used pandas df.value_counts() to find the number of occurrences of particular brands. I want to merge those value counts with the respective brands in the initial dataframe.

 df has many columns including one named 'brands'
 brands = df.brands.value_counts()

 brand1   143
 brand2   21
 brand3   101
 etc.

如何将价值计数与原始数据框合并,以使每个品牌的相应计数位于新列中,例如"brand_count"?

How do I merge the value counts with the original dataframe such that each brand's corresponding count is in a new column, say "brand_count"?

是否可以将标题分配给这些列;名称函数不能与series一起使用,我无法将其转换为数据框以可能以这种方式合并数据.但是,value_counts输出一系列dtype int64(品牌名称应为字符串类型),这意味着我无法执行以下操作:

Is it possible to assign headers to these columns; the names function won't work with series and I was unable to convert it to a dataframe to possibly merge the data that way. But, value_counts outputs a Series of dtype int64 (brand names should be type string) which means I cannot do the following:

 df2 = pd.DataFrame({'brands': list(brands_all[0]), "brand_count":
 list(brands_all[1])})
 (merge with df)

最终,我想得到这个:

 col1  col2  col3  brands  brand_count ... col150
                   A        30
                   C        140
                   A        30
                   B        111 

推荐答案

就是您想要的:

import numpy as np
import pandas as pd

# generating random DataFrame
brands_list = ['brand{}'.format(i) for i in range(10)]
a = pd.DataFrame({'brands': np.random.choice(brands_list, 100)})
b = pd.DataFrame(np.random.randint(0,10,size=(100, 3)), columns=list('ABC'))
df = pd.concat([a, b], axis=1)
print(df.head())

# generate 'brands' DF
brands = pd.DataFrame(df.brands.value_counts().reset_index())
brands.columns = ['brands', 'count']
print(brands)

# merge 'df' & 'brands_count'
merged = pd.merge(df, brands, on='brands')
print(merged)

PS的第一大部分只是数据帧的生成.

PS first big part is just a dataframe generation.

您感兴趣的部分以# generate 'brands' DF注释

The part which is interesting for you starts with the # generate 'brands' DF comment

这篇关于如何将pandas value_counts()合并到数据框或使用它来对数据框进行子集化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆