从基于另一列的列中删除子字符串 [英] Remove substring from column based on another column
问题描述
尝试使用一列中的值(作为字符串)来确定要从另一列中删除的内容.列的其余部分必须保持不变.
Attempting to use the values (as string) from one column to determine what gets removed from another column. Remainder of the column must be unchanged.
示例数据:
import pandas as pd
dfTest = pd.DataFrame({
'date': ['190225', '190225', '190226'],
'foo': ['190225-file1_190225', '190225-file2_190225', '190226-file3_190226']
})
dfTest
结果数据框:
| date | foo
------------------------------------
0 | 190225 | 190225-file1_190225
1 | 190225 | 190225-file2_190225
2 | 190226 | 190226-file3_190226
我需要创建一个'bar'列,其中'foo'删除了所有'date'匹配项.
I need to create the 'bar' column where 'foo' has all 'date' matches removed.
我正在寻找的是这个
| date | foo | bar
-----------------------------------------------
0 | 190225 | 190225-file1_190225 | -file1_
1 | 190225 | 190225-file2_190225 | -file2_
2 | 190226 | 190226-file3_190226 | -file3_
日期"列的内容,无论它们出现在开头,中间还是结尾,都需要为"foo"的每一行删除.
The contents of the 'date' column, whether they appear in the beginning, middle, or end, need to be removed for each row of 'foo.'
我已经尝试了一些类似下面的代码的方法,但是它不起作用.它只是复制原始列而不替换任何内容.请注意,更改regex = False不会影响结果.
I have tried a few things like the code below, but it doesn't work. It just replicates the original column without replacing anything. Note that changing regex = False does not impact the results.
dfTest['bar'] = dfTest['foo'].str.replace(str(dfTest['date']), '')
#or (removing .str, gives same result):
#dfTest['bar'] = dfTest['foo'].replace(str(dfTest['date']), '')
这两个结果都在下表中(在"bar"中完全相同):
Both result in the below table (exactly the same in 'bar'):
| date | foo | bar
-----------------------------------------------------------
0 | 190225 | 190225-file1_190225 | 190225-file1_190225
1 | 190225 | 190225-file2_190225 | 190225-file2_190225
2 | 190226 | 190226-file3_190226 | 190226-file3_190226
如何删除日期列的内容,但保留原始数据呢?
How can I remove the contents of the date column but otherwise preserve the original data?
推荐答案
所以,我尝试了一下,效果很好:
So, I tried this and it worked pretty well:
dfTest['bar'] = dfTest.apply(lambda row : row['foo'].replace(str(row['date']), ''), axis=1)
这篇关于从基于另一列的列中删除子字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!