Pandas DataFrame:使用列值将字符串切成另一列 [英] Pandas DataFrame: use column value to slice string in another column

查看:386
本文介绍了Pandas DataFrame:使用列值将字符串切成另一列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Pandas DataFrame,如下所示:

I have a pandas DataFrame as follow:

     col1  col2  col3
0    1     3     ABCDEFG
1    1     5     HIJKLMNO
2    1     2     PQRSTUV

我想添加另一列,该列应该是col3的子字符串,从col1中指示的位置到col2中指示的位置. col3[(col1-1):(col2-1)]之类的东西,其结果应为:

I want to add another column which should be a substring of col3 from position as indicated in col1 to position as indicated in col2. Something like col3[(col1-1):(col2-1)], which should result in:

     col1  col2  col3       new_col
0    1     3     ABCDEFG    ABC
1    1     5     HIJKLMNO   HIJK
2    1     2     PQRSTUV    PQ

我尝试了以下操作:

my_df['new_col'] = my_df.col3.str.slice(my_df['col1']-1, my_df['col2']-1)

my_df['new_col'] = data['col3'].str[(my_df['col1']-1):(my_df['col2']-1)]

它们两个都导致NaN列,而如果我插入两个数值(即data['col3'].str[1:3]),则效果很好.我检查了类型是否正确(int64,int64和对象).另外,在这样的上下文之外(例如使用for循环),我可以完成工作,但是我更喜欢一种利用DataFrame的衬垫.我在做什么错了?

Both of them results in a column of NaN, while if I insert two numerical values (i.e. data['col3'].str[1:3]) it works fine. I checked and the types are correct (int64, int64 and object). Also, outside such context (e.g. using a for loop) I can get the job done, but I'd prefer a one liner that exploit the DataFrame. What am I doing wrong?

推荐答案

使用apply,因为每一行都必须分别处理:

Use apply, because each row has to be process separately:

my_df['new_col'] = my_df.apply(lambda x: x['col3'][x['col1']-1:x['col2']], 1)  
print (my_df)
   col1  col2      col3 new_col
0     1     3   ABCDEFG     ABC
1     1     5  HIJKLMNO   HIJKL
2     1     2   PQRSTUV      PQ

这篇关于Pandas DataFrame:使用列值将字符串切成另一列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆