将值映射到新的数据框列 [英] Mapping values into a new dataframe column
问题描述
我有一个数据集(〜7000行),我已经在Pandas中导入了一些数据争用"数据,但是我需要一些正确方向的指针来进行下一步.我的数据如下所示,它是对具有几个子级别的结构的描述. B
,D
和B
都是A
的子级别. C
是B
的子级别.等等...
I have a dataset (~7000 rows) that I have imported in Pandas for some "data wrangling" but I need some pointers in the right direction to take the next step. My data looks something like the below and it is a description of a structure with several sub levels. B
, D
and again B
are sub levels to A
. C
is a sub level to B
. and so on...
级别,名称
0,A
1,B
2,C
1,D
2,E
3,F
3,G
1,B
2,C
Level, Name
0, A
1, B
2, C
1, D
2, E
3, F
3, G
1, B
2, C
但是我想要类似以下的内容,在同一行上同时包含Name
和Mother_name
:
But i want something like the below, with Name
and Mother_name
on the same row:
级别,名称,母亲名称
1,B,A
2,C,B
1,D,A
2,E,D
3,F,E
3,G,E
1,B,A
2,C,B
Level, Name, Mother_name
1, B, A
2, C, B
1, D, A
2, E, D
3, F, E
3, G, E
1, B, A
2, C, B
推荐答案
如果我正确理解格式,则name
的父级取决于
level
比当前行的level
小1的最近的前一行.
If I understand the format correctly, the parent of a name
depends on the
nearest prior row whose level
is one less than the current row's level
.
您的DataFrame的行数适中(〜7000).因此,对 性能).如果DataFrame非常 大,如果可以使用按列矢量化的熊猫,通常会获得更好的性能 操作而不是逐行迭代.但是,在这种情况下, 使用按列矢量化的Pandas操作很尴尬, 过于复杂.因此,我相信逐行迭代是此处的最佳选择.
Your DataFrame has a modest number of rows (~7000). So there is little harm (to performance) in simply iterating through the rows. If the DataFrame were very large, you often get better performance if you can use column-wise vectorized Pandas operations instead of row-wise iteration. However, in this case it appears that using column-wise vectorized Pandas operations is awkward and overly-complicated. So I believe row-wise iteration is the best choice here.
使用df.iterrows
执行逐行迭代,您可以轻松记录每个级别的当前父级,并适当地填写母级":
Using df.iterrows
to perform row-wise iteration, you can simply record the current parents for every level as you go, and fill in the "mother"s as appropriate:
import pandas as pd
df = pd.DataFrame({'level': [0, 1, 2, 1, 2, 3, 3, 1, 2],
'name': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'B', 'C']})
parent = dict()
mother = []
for index, row in df.iterrows():
parent[row['level']] = row['name']
mother.append(parent.get(row['level']-1))
df['mother'] = mother
print(df)
收益
level name mother
0 0 A None
1 1 B A
2 2 C B
3 1 D A
4 2 E D
5 3 F E
6 3 G E
7 1 B A
8 2 C B
这篇关于将值映射到新的数据框列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!