检查Excel表格中是否存在Pandas DataFrame系列中的值 [英] Checking if value in pandas DataFrame series exists in excel sheets

查看:523
本文介绍了检查Excel表格中是否存在Pandas DataFrame系列中的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是熊猫新手.

我的数据框

df

A
Best free
best free
free
free best
best
Nokia best for free
best nokia in 2020
best streaming platform for free
free streaming platform in 2020 for nokia phone
streaming for free Canada
...

我的excel文件= filefile.Word表示我对文件和查看名为'Word'的工作表感兴趣

My excel file = file, file.Word indicates that I am interested in the file and viewing sheet named 'Word'

文件.包含最佳

meilleur
beste
mejor
best
...

文件.包含免费

gratuit
kostenlos
gratis
free
...

我想要的DataFrame

My desired DataFrame

df

A                                                   Contains Best   Contains Free
Best free                                           True            True
best free                                           True            True
free                                                False           True
free best                                           True            True
best                                                True            False
Nokia best for free                                 True            True
best nokia in 2020                                  True            False
best streaming platform for free                    True            True
free streaming platform in 2020 for nokia phone     False           True
streaming for free Canada                           False           False
...                                                 ...             ...

Excel file将充当过滤器,我可以通过添加或减除不同工作表中的单词来对其进行修改,并且它将用作确定新的DataFrame列是否包含TrueFalse值的源.

Excel file will work as an filter which I could modify by adding or subracting words from different sheets and it will act as a source determining if new DataFrame column will hold True or False values.

到目前为止我已经尝试过的:

What I've tried so far:

file = pd.read_excel('config_values.xlsx')
print(file)

它会打印出期望的第一个图纸值,一旦我添加了sheet_name='ContainsBest',它将引发此错误:

It prints out the first sheet values which are expected, once I add sheet_name='ContainsBest' it throws this error:

xlrd.biffh.XLRDError: No sheet named <'ContainsBest'>

但是它会使用以下代码打开所选的工作表:

But it does open the selected sheets with this code:

file = pd.read_excel(open('config_values.xlsx', 'rb'),
                     sheet_name='ContainsFree')
print(file)

这是否意味着pd.read_excel(open...不能用于打开具有名称的第一张纸?

Does that mean that pd.read_excel(open... cant be used to open the first sheet with a name?

所以我的问题可以分为两个部分:

So my question can be split into two:

  1. 使用带有表格的excel文件从其中的不同表格读取的正确方法
  2. 添加列并检查表中是否存在值的最佳方法是什么?使用.isin()还是有更好的选择?
  1. Correct way of using excel file with sheets to read from different sheets in it
  2. What would be the best approach to add columns and check if values in sheets exist in them? Using .isin() or there are better options?

编辑 我可以使用:

df['Contains Free'] = df.Search_Query.str.contains('free', regex=True)

哪个可以正常工作,但是如果我像这样通过file:

Which works fine, but if I pass file like so:

file = pd.read_excel(open('config_values.xlsx', 'rb'),
                     sheet_name='ContainsFree')
df['Contains Free'] = df.Search_Query.str.contains(file, regex=True)

我收到此错误:

TypeError: 'DataFrame' objects are mutable, thus they cannot be hashed

谢谢您的建议.

推荐答案

最终使用此解决方案:

containsFree = pd.read_excel('config_values.xlsx',
                             sheet_name='ContainsFree', header=None)
containsFree = '|'.join(containsFree[0])
df['ContainsFree'] = df.SearchQuery.str.contains(
    containsFree, regex=True, flags=re.IGNORECASE)

我在每个Excel工作表中都使用了相同的代码.但是,对于第一个工作表,您无需传递工作表名称,即使它是正确的名称,也只需传递不带名称的文件即可:

I've used the same code per each excel sheet. However, for the first sheet you do not need to pass a sheet name, even if its the right one, just pass the file without the name:

containsbest = pd.read_excel('config_values.xlsx', header=None)
containsbest = '|'.join(containsbest[0])
df['Containsbest'] = df.SearchQuery.str.contains(
    containsbest, regex=True, flags=re.IGNORECASE)

这篇关于检查Excel表格中是否存在Pandas DataFrame系列中的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆