在Pandas Dataframe中查找字符串模式匹配并返回匹配的Strin [英] Find String Pattern Match in Pandas Dataframe and Return Matched Strin

查看：693 发布时间：2020/5/24 2:36:23 python pandas

本文介绍了在Pandas Dataframe中查找字符串模式匹配并返回匹配的Strin的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个数据框列，该行的文本用可变逗号分隔，只是试图提取基于另一个列表找到的值.所以我的数据框看起来像这样:

I have a dataframe column with variable comma separated text and just trying to extract the values that are found based on another list. So my dataframe looks like this:

col1 | col2
-----------
 x   | a,b


listformatch = [c,d,f,b]
pattern = '|'.join(listformatch)

def test_for_pattern(x):
    if re.search(pattern, x):
        return pattern
    else:
        return x

#also can use col2.str.contains(pattern) for same results

上面的过滤效果很好，但是找到匹配项时不返回b而是返回整个模式，例如a|b而不是仅仅返回b，而我想用找到的模式创建另一列，例如b.

The above filtering works great but instead of returning b when it finds the match it returns the whole pattern such as a|b instead of just b whereas I want to create another column with the pattern it finds such as b.

这是我的最终功能，但仍然得到UserWarning: This pattern has match groups. To actually get the groups, use str.extract." groups, use str.extract.", UserWarning)，我希望我能解决:

Here is my final function but still getting UserWarning: This pattern has match groups. To actually get the groups, use str.extract." groups, use str.extract.", UserWarning) I wish I can solve:

def matching_func(file1, file2):
    file1 = pd.read_csv(fin)
    file2 = pd.read_excel(fin1, 0, skiprows=1)
    pattern = '|'.join(file1[col1].tolist())
    file2['new_col'] = file2[col1].map(lambda x: re.search(pattern, x).group()\
                                             if re.search(pattern, x) else None)

我想我了解大熊猫提取物现在是如何工作的，但是对正则表达式可能仍然感到生疏.如何创建用于以下示例的模式变量:

I think I understand how pandas extract works now but probably still rusty on regex. How do I create a pattern variable to use for the below example:

df[col1].str.extract('(word1|word2)')

我不想在参数中包含单词，而是想将变量创建为pattern = 'word1|word2'，但是由于创建字符串的方式而无法使用.

Instead of having the words in the argument, I want to create variable as pattern = 'word1|word2' but that won't work because of the way the string is being created.

我在熊猫0.13中使用矢量化字符串方法的最终版本和首选版本:

My final and preferred version with vectorized string method in pandas 0.13:

使用一列中的值从第二列中提取:

df[col1].str.extract('({})'.format('|'.join(df[col2]))

在Pandas Dataframe中查找字符串模式匹配并返回匹配的Strin [英] Find String Pattern Match in Pandas Dataframe and Return Matched Strin

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在Pandas Dataframe中查找字符串模式匹配并返回匹配的Strin [英] Find String Pattern Match in Pandas Dataframe and Return Matched Strin

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭