根据 pandas 中另一列的值创建新列 [英] Creating new columns based on value from another column in pandas
本文介绍了根据 pandas 中另一列的值创建新列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有这个熊猫数据框,其中的代码列包含顺序的分层代码。我的目标是使用每个层次结构代码及其名称创建新列,如下所示:
I have this pandas dataframe with column "Code" that contains the sequential hierarchical code. My goal is to create new columns with each hierarchical level code and its name as followed:
原始数据:
Code Name
0 A USA
1 AM Massachusetts
2 AMB Boston
3 AMS Springfield
4 D Germany
5 DB Brandenburg
6 DBB Berlin
7 DBD Dresden
我的目标:
Code Name Level1 Level1Name Level2 Level2Name Level3 Level3Name
0 A USA A USA AM Massachusetts AMB Boston
1 AM Massachusetts A USA AM Massachusetts AMB Boston
2 AMB Boston A USA AM Massachusetts AMB Boston
3 AMS Springfield A USA AM Massachusetts AMS Springfiled
4 D Germany D Germany DB Brandenburg DBB Berlin
5 DB Brandenburg D Germany DB Brandenburg DBB Berlin
6 DBB Berlin D Germany DB Brandenburg DBB Berlin
7 DBD Dresden D Germany DB Brandenburg DBD Dresden
我的代码:
import pandas as pd
df = pd.read_excel(r'/Users/BoBoMann/Desktop/Sequence.xlsx')
df['Length']=test.Code.str.len() ## create a column with length of each cell in Code
df['Level1']=test.Code.str[:1] ## create the first level using string indexing
df['Level1Name'] = df[df['Length']==1]['Name']
df.head() ## This yields:
Code Name Length Level1 Level1Name
0 A USA 1 A USA
1 AM Massachusetts 2 A NaN
2 AMB Boston 3 A NaN
3 AMS Springfield 3 A NaN
4 D Germany 1 D Germany
5 DB Brandenburg 2 D NaN
6 DBB Berlin 3 D NaN
7 DBD Dresden 3 D NaN
对于我目前的方法,如何在Level1Name列中将这些NaN分别转换为美国和德国?
For my current approach, how do I turn those NaN into USA and Germany respectively in Level1Name column?
通常,有没有更好的方法可以达到我为每个层次结构层创建列并将它们与另一列中的名称匹配的目标?
Generally, is there a better approach to reach my goal of creating columns for each hierarchical layer and match them with their respective name in another column?
推荐答案
IIUC,让我们使用以下代码:
IIUC, let's use this code:
df['Codes'] = [[*i] for i in df['Code']]
df_level = df['Code'].str.extractall('(.)')[0].unstack('match').bfill().cumsum(axis=1)
s_map = df.explode('Codes').drop_duplicates('Code', keep='last').set_index('Code')['Name']
df_level.columns = [f'Level{i+1}' for i in df_level.columns]
df_level_names = pd.concat([df_level[i].map(s_map) for i in df_level.columns],
axis=1,
keys=df_level.columns+'Name')
df_out = df.join([df_level, df_level_names]).drop('Codes', axis=1)
df_out
输出:
Code Name Level1 Level2 Level3 Level1Name Level2Name Level3Name
0 A USA A AM AMB USA Massachusetts Boston
1 AM Massachusetts A AM AMB USA Massachusetts Boston
2 AMB Boston A AM AMB USA Massachusetts Boston
3 AMS Springfield A AM AMS USA Massachusetts Springfield
4 D Germany D DB DBB Germany Brandenburg Berlin
5 DB Brandenburg D DB DBB Germany Brandenburg Berlin
6 DBB Berlin D DB DBB Germany Brandenburg Berlin
7 DBD Dresden D DB DBD Germany Brandenburg Dresden
解释:
- 将字符串解压缩到创建代码列的字符列表中
- 使用
extractall
和正则表达式。
来获得
单个字符,然后在上方加上bfill
NaN并cumsum
到
的行上创建'LevelX'列 - 创建一个与
列上调用
爆炸
和drop_duplicates
地图 >保留
'Code'的最后一个值,然后在'Codes'上保留set_index
并将'Name'列保留为
创建's_map'。 / li>
- 重命名df_level列以获得Level1而不是Level0。
- 在
pd.concat
中使用使用s_map将map
df_level列的列表理解为
df_level_names。另外,使用键
参数重命名
新列,并附加名称 - 使用
join
将df与df_levels和df_level_names结合在一起,然后drop
在代码列中创建所需的输出。 - Unpack string into a list of characters creating 'Codes' column
- Create 'LevelX' columns using
extractall
and regex.
to get a single character, thenbfill
NaN above andcumsum
along rows to create 'LevelX' columns - Create a pd.Series to use with
map
by callingexplode
on 'Codes' column create above anddrop_duplicates
keep the last value of 'Code' and thenset_index
on 'Codes' and keep 'Name' column to create 's_map'. - Rename name df_level columns to get Level1 instead of Level0.
- Use
pd.concat
with list comprehension tomap
df_level columns to df_level_names using s_map. Also, usingkeys
parameter to rename new columns and appending 'Name' - Use
join
to join df with df_levels and df_level_names, thendrop
the 'Codes' column, creating the desired output.
Explained:
这篇关于根据 pandas 中另一列的值创建新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文