pandas :同时分配多个* new *列 [英] Pandas: Assigning multiple *new* columns simultaneously

查看:72
本文介绍了 pandas :同时分配多个* new *列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个DataFrame,其中的一列包含每行的标签(除了每行的一些相关数据).我有一本字典,其中的键等于可能的标签,而值等于与该标签有关的信息的2元组.我想在框架上添加两列新列,每列对应2个标签的每一行的标签.

I have a DataFrame with a column containing labels for each row (in addition to some relevant data for each row). I have a dictionary with keys equal to the possible labels and values equal to 2-tuples of information related to that label. I'd like to tack two new columns onto my frame, one for each part of the 2-tuple corresponding to the label for each row.

这是设置:

import pandas as pd
import numpy as np

np.random.seed(1)
n = 10

labels = list('abcdef')
colors = ['red', 'green', 'blue']
sizes = ['small', 'medium', 'large']

labeldict = {c: (np.random.choice(colors), np.random.choice(sizes)) for c in labels}

df = pd.DataFrame({'label': np.random.choice(labels, n), 
                   'somedata': np.random.randn(n)})

我可以通过运行获得想要的东西:

I can get what I want by running:

df['color'], df['size'] = zip(*df['label'].map(labeldict))
print df

  label  somedata  color    size
0     b  0.196643    red  medium
1     c -1.545214  green   small
2     a -0.088104  green   small
3     c  0.852239  green   small
4     b  0.677234    red  medium
5     c -0.106878  green   small
6     a  0.725274  green   small
7     d  0.934889    red  medium
8     a  1.118297  green   small
9     c  0.055613  green   small

但是,如果我不想手动在作业左侧键入两列,该怎么办? IE.如何动态创建多个新列.例如,如果我在labeldict中有10个元组而不是2个元组,那么按照当前的写作,这将是一个真正的痛苦.这有几项行不通:

But how can I do this if I don't want to manually type out the two columns on the left side of the assignment? I.e. how can I create multiple new columns on the fly. For example, if I had 10-tuples in labeldict instead of 2-tuples, this would be a real pain as currently written. Here are a couple things that don't work:

# set up attrlist for later use
attrlist = ['color', 'size']

# non-working idea 1)
df[attrlist] = zip(*df['label'].map(labeldict))

# non-working idea 2)
df.loc[:, attrlist] = zip(*df['label'].map(labeldict))

这确实有效,但看起来像一个hack:

This does work, but seems like a hack:

for a in attrlist:
    df[a] = 0
df[attrlist] = zip(*df['label'].map(labeldict))

更好的解决方案?

推荐答案

您可以改为使用合并:

>>> ld = pd.DataFrame(labeldict).T
>>> ld.columns = ['color', 'size']
>>> ld.index.name = 'label'
>>> df.merge(ld.reset_index(), on='label')
  label  somedata  color    size
0     b  1.462108    red  medium
1     c -2.060141  green   small
2     c  1.133769  green   small
3     c  0.042214  green   small
4     e -0.322417    red  medium
5     e -1.099891    red  medium
6     e -0.877858    red  medium
7     e  0.582815    red  medium
8     f -0.384054    red   large
9     d -0.172428    red  medium

这篇关于 pandas :同时分配多个* new *列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆