Python Pandas 使用另一列删除子字符串 [英] Python Pandas removing substring using another column

查看：32 发布时间：2021/12/25 9:19:34 python string pandas replace series

本文介绍了Python Pandas 使用另一列删除子字符串的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试过四处寻找，但找不到一种简单的方法来做到这一点，所以我希望您的专业知识能有所帮助.

I've tried searching around and can't figure out an easy way to do this, so I'm hoping your expertise can help.

我有一个包含两列的 Pandas 数据框

I have a pandas data frame with two columns

import numpy as np
import pandas as pd

pd.options.display.width = 1000
testing = pd.DataFrame({'NAME':[
    'FIRST', np.nan, 'NAME2', 'NAME3', 
    'NAME4', 'NAME5', 'NAME6'], 'FULL_NAME':['FIRST LAST', np.nan, 'FIRST LAST', 'FIRST NAME3', 'FIRST NAME4 LAST', 'ANOTHER NAME', 'LAST NAME']})

这给了我

          FULL_NAME   NAME
0        FIRST LAST  FIRST
1               NaN    NaN
2        FIRST LAST  NAME2
3       FIRST NAME3  NAME3
4  FIRST NAME4 LAST  NAME4
5      ANOTHER NAME  NAME5
6         LAST NAME  NAME6

我想要做的是从NAME"列中获取值，然后从FULL NAME"列中删除(如果它在那里).所以函数会返回

what I'd like to do is take the values from the 'NAME' column and remove then from the 'FULL NAME' column if it's there. So the function would then return

          FULL_NAME   NAME           NEW
0        FIRST LAST  FIRST          LAST
1               NaN    NaN           NaN
2        FIRST LAST  NAME2    FIRST LAST
3       FIRST NAME3  NAME3         FIRST
4  FIRST NAME4 LAST  NAME4    FIRST LAST
5      ANOTHER NAME  NAME5  ANOTHER NAME
6         LAST NAME  NAME6     LAST NAME

到目前为止，我已经在下面定义了一个函数并且正在使用 apply 方法.不过，这在我的大型数据集上运行速度相当慢，我希望有一种更有效的方法来做到这一点.谢谢！

So far, I've defined a function below and am using the apply method. This runs rather slow on my large data set though and I'm hoping there's a more efficient way to do it. Thanks!

def address_remove(x):
    try:
        newADDR1 = re.sub(x['NAME'], '', x[-1])
        newADDR1 = newADDR1.rstrip()
        newADDR1 = newADDR1.lstrip()
        return newADDR1
    except:
        return x[-1]

推荐答案

这里有一个比您当前的解决方案快得多的解决方案，但我不相信不会有更快的解决方案

Here is one solution that is quite a bit faster than your current solution, I'm not convinced that there wouldn't be something faster though

In [13]: import numpy as np
         import pandas as pd
         n = 1000
         testing  = pd.DataFrame({'NAME':[
         'FIRST', np.nan, 'NAME2', 'NAME3', 
         'NAME4', 'NAME5', 'NAME6']*n, 'FULL_NAME':['FIRST LAST', np.nan, 'FIRST  LAST', 'FIRST NAME3', 'FIRST NAME4 LAST', 'ANOTHER NAME', 'LAST NAME']*n})

这是一种很长的内衬，但它应该可以满足您的需求

This is kind of a long one liner but it should do what you need

我能想到的快速解决方案是使用 replace ，如另一个答案中所述:

Fasted solution I can come up with is using replace as mentioned in another answer:

In [37]: %timeit testing ['NEW2'] = [e.replace(k, '') for e, k in zip(testing.FULL_NAME.astype('str'), testing.NAME.astype('str'))]
100 loops, best of 3: 4.67 ms per loop

原答案:

In [14]: %timeit testing ['NEW'] = [''.join(str(e).split(k)) for e, k in zip(testing.FULL_NAME.astype('str'), testing.NAME.astype('str'))]
100 loops, best of 3: 7.24 ms per loop

与您当前的解决方案相比:

compared to your current solution:

In [16]: %timeit testing['NEW1'] = testing.apply(address_remove, axis=1)
10 loops, best of 3: 166 ms per loop

这些为您提供与当前解决方案相同的答案

These get you the same answer as your current solution

这篇关于Python Pandas 使用另一列删除子字符串的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python Pandas 使用另一列删除子字符串 [英] Python Pandas removing substring using another column

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python Pandas 使用另一列删除子字符串 [英] Python Pandas removing substring using another column

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭