如何将一列分割成多列并计数频率 [英] how to split one column into many columns and count the frequency

查看:141
本文介绍了如何将一列分割成多列并计数频率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我想到的问题,给出一个表

Here is the question I have in mind, given a table

   Id   type
0   1    [a,b]
1   2     [c]
2   3     [a,d]

我想将其转换为以下形式:

I want to convert it into the form of:

   Id     a  b  c  d
0   1     1  1  0  0
1   2     0  0  1  0
2   3     1  0  0  1

我需要一个非常有效的方式来转换一张大桌子。欢迎任何评论。

I need a very efficient way to convert a large table. any comment is welcome.

========================== ======

====================================

我收到了几个好的答案,非常感谢你的帮助。

I have received several good answers, and really appreciate your help.

现在一个新的问题来了,这是我的笔记本电脑内存不足以通过使用 pd.dummies 生成整个数据框。

Now a new question comes along, which is my laptop memory is insufficient to generating the whole dataframe by using pd.dummies.

是否还是一行一行地生成一个稀疏向量,然后一起?

is there anyway to generate a sparse vector row by row and stack then together?

推荐答案

尝试这个

>>> df
   Id    type
0   1  [a, b]
1   2     [c]
2   3  [a, d]
>>> df2 = pd.DataFrame([x for x in df['type'].apply(
...           lambda item: dict(map(
...                                 lambda x: (x,1), 
...                             item)) 
...           ).values]).fillna(0)
>>> df2.join(df)
   a  b  c  d  Id    type
0  1  1  0  0   1  [a, b]
1  0  0  1  0   2     [c]
2  1  0  0  1   3  [a, d]

它基本上将列表列表转换为dict列表,并构造一个DataFrame这个

It basically convert the list of list to list of dict and construct a DataFrame out of this


[['a','b'],['c'],['a','d'] ]#列表列表

[{'a':1,'b':1},{'c':1},{'a':1,'d':1}] dict的列表
使DataFrame从此

[ ['a', 'b'], ['c'], ['a', 'd'] ] # list of list
[ {'a':1, 'b':1}, {'c':1}, {'a':1, 'd':1} ] # list of dict Make DataFrame out of this

这篇关于如何将一列分割成多列并计数频率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆