pandas :如何返回列中有换行符/换行符(\ n)且紧随其后的几个区分大小写的单词之一的行? [英] Pandas: How to return rows where a column has a line breaks/new line ( \n ) with one of several case-sensitive words coming directly after?

查看:225
本文介绍了 pandas :如何返回列中有换行符/换行符(\ n)且紧随其后的几个区分大小写的单词之一的行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是此stackoverflow问题的后续操作

This is a follow up to this stackoverflow questions

其中显示了如何获取换行符.

Which shows how to get a word which follows a new line.

我现在想返回行,其中该列可以包含换行后紧随其后的几个区分大小写的单词之一.

I would now like to return rows where the column can have one of of several case-sensitive words which follows right after a new line.

这是一个最小的例子

testdf = pd.DataFrame([
    [ ' generates the final summary. \nRESULTS We evaluate the performance of ', ], 
                       [ 'the cat and bat \n\n\nRESULTS\n teamed up to find some food'], 
                       ['anthropology with RESULTS pharmacology and biology'],
    [ ' generates the final summary. \nMethods We evaluate the performance of ', ], 
                       [ 'the cat and bat \n\n\nMETHODS\n teamed up to find some food'], 
                       ['anthropology with METHODS pharmacology and biology'],
        [ ' generates the final summary. \nBACKGROUND We evaluate the performance of ', ], 
                       [ 'the cat and bat \n\n\nBackground\n teamed up to find some food'], 
                       ['anthropology with BACKGROUND pharmacology and biology'],
])
testdf.columns = ['A']
testdf.head(10)

将返回

A
0   generates the final summary. \nRESULTS We evaluate the performance of
1   the cat and bat \n\n\nRESULTS\n teamed up to find some food
2   anthropology with RESULTS pharmacology and biology
3   generates the final summary. \nMethods We evaluate the performance of
4   the cat and bat \n\n\nMETHODS\n teamed up to find some food
5   anthropology with METHODS pharmacology and biology
6   generates the final summary. \nBACKGROUND We evaluate the performance of
7   the cat and bat \n\n\nBackground\n teamed up to find some food
8   anthropology with BACKGROUND pharmacology and biology

然后

listStrings = { '\nRESULTS',  '\nMETHODS' ,  '\nBACKGROUND' }
testdf.loc[testdf.A.apply(lambda x: len(listStrings.intersection(x.split())) >= 1)]

将不返回任何内容. 所需的结果将返回以下行.

Will return nothing. The desired result would return the following rows.

A
0   generates the final summary. \nRESULTS We evaluate the performance of
1   the cat and bat \n\n\nRESULTS\n teamed up to find some food
4   the cat and bat \n\n\nMETHODS\n teamed up to find some food
6   generates the final summary. \nBACKGROUND We evaluate the performance of

在这些行中,单词后跟一个'\ n'并与给定集中的大小写匹配.

These are rows where the word follows a '\n' and matches the case in the given set.

推荐答案

尝试以下代码:

>>> testdf[testdf['A'].str.contains('\nRESULTS|\nMETHODS|\nBACKGROUND')]
                                                   A
0   generates the final summary. \nRESULTS We eva...
1  the cat and bat \n\n\nRESULTS\n teamed up to f...
4  the cat and bat \n\n\nMETHODS\n teamed up to f...
6   generates the final summary. \nBACKGROUND We ...
>>> 

这篇关于 pandas :如何返回列中有换行符/换行符(\ n)且紧随其后的几个区分大小写的单词之一的行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆