有没有一种方法可以从列表字典中创建虚拟变量的数据框? [英] Is there a method for creating dataframe of dummy variables from a dictionary of lists?

查看:77
本文介绍了有没有一种方法可以从列表字典中创建虚拟变量的数据框?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在熊猫中,我有一本字典,看起来像下面的字典:

In pandas I have a dictionary that looks like the one below:

{'Anemones & allies': ['Carnivore'],
'Ants, bees & wasps': ['Omnivore',  'Herbivore',  'Nectar',  'Insects', 'Parasite'],
'Beetles & bugs': ['Herbivore', 'Carnivore', 'Nectar', 'Insects'],
'Birds': ['Carnivore'],
'Fishes': ['Carnivore', 'Plankton or Particles']}

我想将其转换为一个DataFrame,您可以在其中看到什么动物可能会吃。因此它看起来类似于下图:

I want to convert it into a DataFrame in which you can see, what the animaltype could possibly eat. So it would look similar to the image below:

当试图生成这样的表时,我感觉到我在用错误的方式来做它,因为我需要很多代码行。所以我的问题是,是否有一个很好的函数将此字典映射到DataFrame,使其看起来像上表?

When trying to generate such a table I got the feeling that I was doing it in an incorrect way because I needed quite some lines of code. So my question is, is there a nice function to map this dictionary to a DataFrame so it looks something like the above table?

推荐答案

最简单的方法

使用 pd.str.get_dummies

dct = {
    'Anemones & allies': ['Carnivore'],
    'Ants, bees & wasps': ['Omnivore',  'Herbivore',  'Nectar',  'Insects', 'Parasite'],
    'Beetles & bugs': ['Herbivore', 'Carnivore', 'Nectar', 'Insects'],
    'Birds': ['Carnivore'],
    'Fishes': ['Carnivore', 'Plankton or Particles']
}

pd.Series(dct).str.join('|').str.get_dummies()

                    Carnivore  Herbivore  Insects  Nectar  Omnivore  Parasite  Plankton or Particles
Anemones & allies           1          0        0       0         0         0                      0
Ants, bees & wasps          0          1        1       1         1         1                      0
Beetles & bugs              1          1        1       1         0         0                      0
Birds                       1          0        0       0         0         0                      0
Fishes                      1          0        0       0         0         0                      1






更复杂

但可能推荐


More Complicated
But probably recommended

from sklearn.preprocessing import MultiLabelBinarizer

dct = {
    'Anemones & allies': ['Carnivore'],
    'Ants, bees & wasps': ['Omnivore',  'Herbivore',  'Nectar',  'Insects', 'Parasite'],
    'Beetles & bugs': ['Herbivore', 'Carnivore', 'Nectar', 'Insects'],
    'Birds': ['Carnivore'],
    'Fishes': ['Carnivore', 'Plankton or Particles']
}

s = pd.Series(dct)

mlb = MultiLabelBinarizer()

d = mlb.fit_transform(s)
c = mlb.classes_
pd.DataFrame(d, s.index, c)

                    Carnivore  Herbivore  Insects  Nectar  Omnivore  Parasite  Plankton or Particles
Anemones & allies           1          0        0       0         0         0                      0
Ants, bees & wasps          0          1        1       1         1         1                      0
Beetles & bugs              1          1        1       1         0         0                      0
Birds                       1          0        0       0         0         0                      0
Fishes                      1          0        0       0         0         0                      1

这篇关于有没有一种方法可以从列表字典中创建虚拟变量的数据框?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆