将PANDAs数据框折叠为所有项目及其出现的单个列 [英] Collapsing a PANDAs dataframe into a single column of all items and their occurances

查看:40
本文介绍了将PANDAs数据框折叠为所有项目及其出现的单个列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据帧,其中包含NaN和字符串的混合物,例如

I have a data frame consisting of a mixture of NaN's and strings e.g

data = {'String1':['NaN', 'tree', 'car', 'tree'],
        'String2':['cat','dog','car','tree'],
        'String3':['fish','tree','NaN','tree']}
ddf = pd.DataFrame(data)

我想

1:计算项目总数并放入新的数据框,例如

1:count the total number of items and put in a new data frame e.g

      NaN=2
      tree=5
      car=2
      fish=1
      cat=1
      dog=1

2:与单独的较长列表(另一个数据框的列,例如

2:Count the total number of items when compared to a separate longer list (column of a another data frame, e.g

df['compare'] =
      NaN
      tree
      car
      fish
      cat
      dog
      rabbit
      Pear
      Orange
      snow
      rain

谢谢 杰森

推荐答案

第一个问题:

from collections import Counter

data = {
    "String1": ["NaN", "tree", "car", "tree"],
    "String2": ["cat", "dog", "car", "tree"],
    "String3": ["fish", "tree", "NaN", "tree"],
}
ddf = pd.DataFrame(data)

a = Counter(ddf.stack().tolist())

df_result = pd.DataFrame(dict(a), index=['Count']).T

df = pd.DataFrame({'vals':['NaN', 'tree', 'car', 'fish', 'cat', 'dog', 'rabbit', 'Pear', 'Orange', 'snow', 'rain']})

df_counts = df.vals.map(df_result.to_dict()['Count'])

这应该做:)

这篇关于将PANDAs数据框折叠为所有项目及其出现的单个列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆