将值映射到新的数据框列 [英] Mapping values into a new dataframe column

查看：60 发布时间：2020/5/24 3:06:51 python pandas

本文介绍了将值映射到新的数据框列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个数据集(〜7000行)，我已经在Pandas中导入了一些数据争用"数据，但是我需要一些正确方向的指针来进行下一步.我的数据如下所示，它是对具有几个子级别的结构的描述. B，D和B都是A的子级别. C是B的子级别.等等...

I have a dataset (~7000 rows) that I have imported in Pandas for some "data wrangling" but I need some pointers in the right direction to take the next step. My data looks something like the below and it is a description of a structure with several sub levels. B, D and again B are sub levels to A. Cis a sub level to B. and so on...

级别，名称
0，A
1，B
2，C
1，D
2，E
3，F
3，G
1，B
2，C

Level, Name
0, A
1, B
2, C
1, D
2, E
3, F
3, G
1, B
2, C

但是我想要类似以下的内容，在同一行上同时包含Name和Mother_name:

But i want something like the below, with Name and Mother_name on the same row:

级别，名称，母亲名称
1，B，A
2，C，B
1，D，A
2，E，D
3，F，E
3，G，E
1，B，A
2，C，B

Level, Name, Mother_name
1, B, A
2, C, B
1, D, A
2, E, D
3, F, E
3, G, E
1, B, A
2, C, B

推荐答案

如果我正确理解格式，则name的父级取决于 level比当前行的level小1的最近的前一行.

If I understand the format correctly, the parent of a name depends on the nearest prior row whose level is one less than the current row's level.

您的DataFrame的行数适中(〜7000).因此，对性能).如果DataFrame非常大，如果可以使用按列矢量化的熊猫，通常会获得更好的性能操作而不是逐行迭代.但是，在这种情况下，使用按列矢量化的Pandas操作很尴尬，过于复杂.因此，我相信逐行迭代是此处的最佳选择.

Your DataFrame has a modest number of rows (~7000). So there is little harm (to performance) in simply iterating through the rows. If the DataFrame were very large, you often get better performance if you can use column-wise vectorized Pandas operations instead of row-wise iteration. However, in this case it appears that using column-wise vectorized Pandas operations is awkward and overly-complicated. So I believe row-wise iteration is the best choice here.

使用df.iterrows执行逐行迭代，您可以轻松记录每个级别的当前父级，并适当地填写母级":

Using df.iterrows to perform row-wise iteration, you can simply record the current parents for every level as you go, and fill in the "mother"s as appropriate:

import pandas as pd
df = pd.DataFrame({'level': [0, 1, 2, 1, 2, 3, 3, 1, 2],
                   'name': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'B', 'C']})

parent = dict()
mother = []
for index, row in df.iterrows():
    parent[row['level']] = row['name']
    mother.append(parent.get(row['level']-1))
df['mother'] = mother
print(df)

收益

   level name mother
0      0    A   None
1      1    B      A
2      2    C      B
3      1    D      A
4      2    E      D
5      3    F      E
6      3    G      E
7      1    B      A
8      2    C      B

这篇关于将值映射到新的数据框列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

将值映射到新的数据框列 [英] Mapping values into a new dataframe column

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

将值映射到新的数据框列 [英] Mapping values into a new dataframe column

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭