将值映射到新的数据框列 [英] Mapping values into a new dataframe column

查看:60
本文介绍了将值映射到新的数据框列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据集(〜7000行),我已经在Pandas中导入了一些数据争用"数据,但是我需要一些正确方向的指针来进行下一步.我的数据如下所示,它是对具有几个子级别的结构的描述. BDB都是A的子级别. CB的子级别.等等...

I have a dataset (~7000 rows) that I have imported in Pandas for some "data wrangling" but I need some pointers in the right direction to take the next step. My data looks something like the below and it is a description of a structure with several sub levels. B, D and again B are sub levels to A. Cis a sub level to B. and so on...

级别,名称
0,A
1,B
2,C
1,D
2,E
3,F
3,G
1,B
2,C

Level, Name
0, A
1, B
2, C
1, D
2, E
3, F
3, G
1, B
2, C

但是我想要类似以下的内容,在同一行上同时包含NameMother_name:

But i want something like the below, with Name and Mother_name on the same row:

级别,名称,母亲名称
1,B,A
2,C,B
1,D,A
2,E,D
3,F,E
3,G,E
1,B,A
2,C,B

Level, Name, Mother_name
1, B, A
2, C, B
1, D, A
2, E, D
3, F, E
3, G, E
1, B, A
2, C, B

推荐答案

如果我正确理解格式,则name的父级取决于 level比当前行的level小1的最近的前一行.

If I understand the format correctly, the parent of a name depends on the nearest prior row whose level is one less than the current row's level.

您的DataFrame的行数适中(〜7000).因此,对 性能).如果DataFrame非常 大,如果可以使用按列矢量化的熊猫,通常会获得更好的性能 操作而不是逐行迭代.但是,在这种情况下, 使用按列矢量化的Pandas操作很尴尬, 过于复杂.因此,我相信逐行迭代是此处的最佳选择.

Your DataFrame has a modest number of rows (~7000). So there is little harm (to performance) in simply iterating through the rows. If the DataFrame were very large, you often get better performance if you can use column-wise vectorized Pandas operations instead of row-wise iteration. However, in this case it appears that using column-wise vectorized Pandas operations is awkward and overly-complicated. So I believe row-wise iteration is the best choice here.

使用df.iterrows执行逐行迭代,您可以轻松记录每个级别的当前父级,并适当地填写母级":

Using df.iterrows to perform row-wise iteration, you can simply record the current parents for every level as you go, and fill in the "mother"s as appropriate:

import pandas as pd
df = pd.DataFrame({'level': [0, 1, 2, 1, 2, 3, 3, 1, 2],
                   'name': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'B', 'C']})

parent = dict()
mother = []
for index, row in df.iterrows():
    parent[row['level']] = row['name']
    mother.append(parent.get(row['level']-1))
df['mother'] = mother
print(df)

收益

   level name mother
0      0    A   None
1      1    B      A
2      2    C      B
3      1    D      A
4      2    E      D
5      3    F      E
6      3    G      E
7      1    B      A
8      2    C      B

这篇关于将值映射到新的数据框列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆