向包含父节点符号的数据框添加一列 [英] Adding a column to a Dataframe containing the Symbol of the parent node
问题描述
我正在使用来自
<块引用>注意:没有级别 1、3 或 6 符号.有多个 2 级符号.2级符号没有父级,4级符号的父级可以分配第一个2级符号,7级符号的父级同样可以分配第一个5级符号.
我需要更好地解释如何确定节点的父节点.级别值和行位置是确定父级所需的全部.
我想使用 Pandas 来完成这项工作,但我不知道如何开始.有接班人吗?谢谢
这是另一种方法.GetParent() 返回一个函数,该函数跟踪每个级别的最新交易品种并返回当前级别的父级.在 pandas.apply() 中使用它会创建一个带有父符号的列.
def GetParent():# 0 1 2 3 4 5 6 7 8 9 10层级 = [0, 0, 0, 0, 2, 4, 0, 5, 7, 8, 9]父 = [' ']*11定义函数(行):#打印(行)符号,级别 = 行[['符号','级别']]parent_level = 层次结构[级别]parent_symbol = parent[parent_level]父[级别] = 符号返回 pd.Series([parent_symbol], index=['parent'])返回函数# 用父母创建一个列父母 = df.apply(GetParent(),axis=1)df = pd.concat([df, 父母], 轴=1)df
输出:
SYMBOL level na ao parent0 A 2 真假1 A01 4 真假 A2 A01B 5 真假 A013 A01B 1/00 7 假 假 A01B4 A01B 1/02 8 假 假 A01B 1/005 A01B 1/022 9 假 假 A01B 1/026 A01B 1/024 9 假 假 A01B 1/027 A01B 1/026 9 假 假 A01B 1/028 A01B 1/028 9 假 假 A01B 1/029 A01B 1/04 9 假 假 A01B 1/0210 A01B 1/06 8 假 假 A01B 1/0011 A01B 1/065 9 假 假 A01B 1/0612 A01B 1/08 9 假 假 A01B 1/06...
I am using bulk data (List of CPC Valid symbols) from the CPC website. I've read the csv into a pandas df, and the first 30 rows (of over 260K) are:
SYMBOL level not-allocatable additional-only
1 A 2 True False
2 A01 4 True False
3 A01B 5 True False
4 A01B 1/00 7 False False
5 A01B 1/02 8 False False
6 A01B 1/022 9 False False
7 A01B 1/024 9 False False
8 A01B 1/026 9 False False
9 A01B 1/028 9 False False
10 A01B 1/04 9 False False
11 A01B 1/06 8 False False
12 A01B 1/065 9 False False
13 A01B 1/08 9 False False
14 A01B 1/10 9 False False
15 A01B 1/12 9 False False
16 A01B 1/14 9 False False
17 A01B 1/16 8 False False
18 A01B 1/165 9 False False
19 A01B 1/18 9 False False
20 A01B 1/20 8 False False
21 A01B 1/22 8 False False
22 A01B 1/222 9 False False
23 A01B 1/225 10 False False
24 A01B 1/227 9 False False
25 A01B 1/24 8 False False
26 A01B 1/243 9 False False
27 A01B 1/246 9 False False
28 A01B 3/00 7 False False
29 A01B 3/02 8 False False
The level value creates a hierarchy. So node A01B 1/00 is level 7 and a child of A01B. A01B 1/02 is level 8 and the child of A01B 1/00 & A01b 3/00 is a child of A01B.
What I would like is a way to create a new column called PARENT
that contains the SYMBOL
of the node's direct parent. For example, I edited the csv in Excel to show the desired result of the first few rows:
Note: there are no level 1, 3, or 6 symbols. There are multiple level 2 symbols. There is no parent for level 2 symbols, the parent of level 4 symbols can be assigned the first level 2 symbol above it, and the parent of level 7 symbols likewise can be assigned the first level 5 symbol above it.
EDIT: I need to better explain how to determine a node's parent. The level value and the row position are all that is needed to determine a parent.
I would like to use pandas to do the work, but I am not sure how to get started. Any takers? Thank you
Here's another method. GetParent() returns a function that keeps track of the most recent symbol for each level and returns the parent of the current level. Using it in pandas.apply() creates a column with the parent symbols.
def GetParent():
# 0 1 2 3 4 5 6 7 8 9 10
hierarchy = [0, 0, 0, 0, 2, 4, 0, 5, 7, 8, 9]
parent = [' ']*11
def func(row):
#print(row)
symbol,level = row[['SYMBOL', 'level']]
parent_level = hierarchy[level]
parent_symbol = parent[parent_level]
parent[level] = symbol
return pd.Series([parent_symbol], index=['parent'])
return func
# create a column with the parents
parents = df.apply(GetParent(), axis=1)
df = pd.concat([df, parents], axis=1)
df
Output:
SYMBOL level na ao parent
0 A 2 True False
1 A01 4 True False A
2 A01B 5 True False A01
3 A01B 1/00 7 False False A01B
4 A01B 1/02 8 False False A01B 1/00
5 A01B 1/022 9 False False A01B 1/02
6 A01B 1/024 9 False False A01B 1/02
7 A01B 1/026 9 False False A01B 1/02
8 A01B 1/028 9 False False A01B 1/02
9 A01B 1/04 9 False False A01B 1/02
10 A01B 1/06 8 False False A01B 1/00
11 A01B 1/065 9 False False A01B 1/06
12 A01B 1/08 9 False False A01B 1/06
...
这篇关于向包含父节点符号的数据框添加一列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!