向包含父节点符号的数据框添加一列 [英] Adding a column to a Dataframe containing the Symbol of the parent node

查看:32
本文介绍了向包含父节点符号的数据框添加一列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用来自

<块引用>

注意:没有级别 1、3 或 6 符号.有多个 2 级符号.2级符号没有父级,4级符号的父级可以分配第一个2级符号,7级符号的父级同样可以分配第一个5级符号.

我需要更好地解释如何确定节点的父节点.级别值和行位置是确定父级所需的全部.

我想使用 Pandas 来完成这项工作,但我不知道如何开始.有接班人吗?谢谢

解决方案

这是另一种方法.GetParent() 返回一个函数,该函数跟踪每个级别的最新交易品种并返回当前级别的父级.在 pandas.apply() 中使用它会创建一个带有父符号的列.

def GetParent():# 0 1 2 3 4 5 6 7 8 9 10层级 = [0, 0, 0, 0, 2, 4, 0, 5, 7, 8, 9]父 = [' ']*11定义函数(行):#打印(行)符号,级别 = 行[['符号','级别']]parent_level = 层次结构[级别]parent_symbol = parent[parent_level]父[级别] = 符号返回 pd.Series([parent_symbol], index=['parent'])返回函数# 用父母创建一个列父母 = df.apply(GetParent(),axis=1)df = pd.concat([df, 父母], 轴=1)df

输出:

 SYMBOL level na ao parent0 A 2 真假1 A01 4 真假 A2 A01B 5 真假 A013 A01B 1/00 7 假 假 A01B4 A01B 1/02 8 假 假 A01B 1/005 A01B 1/022 9 假 假 A01B 1/026 A01B 1/024 9 假 假 A01B 1/027 A01B 1/026 9 假 假 A01B 1/028 A01B 1/028 9 假 假 A01B 1/029 A01B 1/04 9 假 假 A01B 1/0210 A01B 1/06 8 假 假 A01B 1/0011 A01B 1/065 9 假 假 A01B 1/0612 A01B 1/08 9 假 假 A01B 1/06...

I am using bulk data (List of CPC Valid symbols) from the CPC website. I've read the csv into a pandas df, and the first 30 rows (of over 260K) are:

    SYMBOL  level   not-allocatable additional-only
1   A   2   True    False
2   A01 4   True    False
3   A01B    5   True    False
4   A01B 1/00   7   False   False
5   A01B 1/02   8   False   False
6   A01B 1/022  9   False   False
7   A01B 1/024  9   False   False
8   A01B 1/026  9   False   False
9   A01B 1/028  9   False   False
10  A01B 1/04   9   False   False
11  A01B 1/06   8   False   False
12  A01B 1/065  9   False   False
13  A01B 1/08   9   False   False
14  A01B 1/10   9   False   False
15  A01B 1/12   9   False   False
16  A01B 1/14   9   False   False
17  A01B 1/16   8   False   False
18  A01B 1/165  9   False   False
19  A01B 1/18   9   False   False
20  A01B 1/20   8   False   False
21  A01B 1/22   8   False   False
22  A01B 1/222  9   False   False
23  A01B 1/225  10  False   False
24  A01B 1/227  9   False   False
25  A01B 1/24   8   False   False
26  A01B 1/243  9   False   False
27  A01B 1/246  9   False   False
28  A01B 3/00   7   False   False
29  A01B 3/02   8   False   False

The level value creates a hierarchy. So node A01B 1/00 is level 7 and a child of A01B. A01B 1/02 is level 8 and the child of A01B 1/00 & A01b 3/00 is a child of A01B.

What I would like is a way to create a new column called PARENT that contains the SYMBOL of the node's direct parent. For example, I edited the csv in Excel to show the desired result of the first few rows:

Note: there are no level 1, 3, or 6 symbols. There are multiple level 2 symbols. There is no parent for level 2 symbols, the parent of level 4 symbols can be assigned the first level 2 symbol above it, and the parent of level 7 symbols likewise can be assigned the first level 5 symbol above it.

EDIT: I need to better explain how to determine a node's parent. The level value and the row position are all that is needed to determine a parent.

I would like to use pandas to do the work, but I am not sure how to get started. Any takers? Thank you

解决方案

Here's another method. GetParent() returns a function that keeps track of the most recent symbol for each level and returns the parent of the current level. Using it in pandas.apply() creates a column with the parent symbols.

def GetParent():
    #            0  1  2  3  4  5  6  7  8  9  10
    hierarchy = [0, 0, 0, 0, 2, 4, 0, 5, 7, 8, 9]
    parent = [' ']*11

    def func(row):
        #print(row)
        symbol,level = row[['SYMBOL', 'level']]

        parent_level = hierarchy[level]
        parent_symbol = parent[parent_level]

        parent[level] = symbol

        return pd.Series([parent_symbol], index=['parent'])

    return func

# create a column with the parents
parents = df.apply(GetParent(), axis=1)
df = pd.concat([df, parents], axis=1)

df

Output:

    SYMBOL  level   na      ao      parent
0   A           2   True    False   
1   A01         4   True    False   A
2   A01B        5   True    False   A01
3   A01B 1/00   7   False   False   A01B
4   A01B 1/02   8   False   False   A01B 1/00
5   A01B 1/022  9   False   False   A01B 1/02
6   A01B 1/024  9   False   False   A01B 1/02
7   A01B 1/026  9   False   False   A01B 1/02
8   A01B 1/028  9   False   False   A01B 1/02
9   A01B 1/04   9   False   False   A01B 1/02
10  A01B 1/06   8   False   False   A01B 1/00
11  A01B 1/065  9   False   False   A01B 1/06
12  A01B 1/08   9   False   False   A01B 1/06
...

这篇关于向包含父节点符号的数据框添加一列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆