检查Excel表格中是否存在Pandas DataFrame系列中的值 [英] Checking if value in pandas DataFrame series exists in excel sheets
问题描述
我是熊猫新手.
我的数据框
df
A
Best free
best free
free
free best
best
Nokia best for free
best nokia in 2020
best streaming platform for free
free streaming platform in 2020 for nokia phone
streaming for free Canada
...
我的excel文件= file
,file.Word
表示我对文件和查看名为'Word'的工作表感兴趣
My excel file = file
, file.Word
indicates that I am interested in the file and viewing sheet named 'Word'
文件.包含最佳
meilleur
beste
mejor
best
...
文件.包含免费
gratuit
kostenlos
gratis
free
...
我想要的DataFrame
My desired DataFrame
df
A Contains Best Contains Free
Best free True True
best free True True
free False True
free best True True
best True False
Nokia best for free True True
best nokia in 2020 True False
best streaming platform for free True True
free streaming platform in 2020 for nokia phone False True
streaming for free Canada False False
... ... ...
Excel file
将充当过滤器,我可以通过添加或减除不同工作表中的单词来对其进行修改,并且它将用作确定新的DataFrame列是否包含True
或False
值的源.
Excel file
will work as an filter which I could modify by adding or subracting words from different sheets and it will act as a source determining if new DataFrame column will hold True
or False
values.
到目前为止我已经尝试过的:
What I've tried so far:
file = pd.read_excel('config_values.xlsx')
print(file)
它会打印出期望的第一个图纸值,一旦我添加了sheet_name='ContainsBest'
,它将引发此错误:
It prints out the first sheet values which are expected, once I add sheet_name='ContainsBest'
it throws this error:
xlrd.biffh.XLRDError: No sheet named <'ContainsBest'>
但是它会使用以下代码打开所选的工作表:
But it does open the selected sheets with this code:
file = pd.read_excel(open('config_values.xlsx', 'rb'),
sheet_name='ContainsFree')
print(file)
这是否意味着pd.read_excel(open...
不能用于打开具有名称的第一张纸?
Does that mean that pd.read_excel(open...
cant be used to open the first sheet with a name?
所以我的问题可以分为两个部分:
So my question can be split into two:
- 使用带有表格的excel文件从其中的不同表格读取的正确方法
- 添加列并检查表中是否存在值的最佳方法是什么?使用
.isin()
还是有更好的选择?
- Correct way of using excel file with sheets to read from different sheets in it
- What would be the best approach to add columns and check if values in sheets exist in them? Using
.isin()
or there are better options?
编辑 我可以使用:
df['Contains Free'] = df.Search_Query.str.contains('free', regex=True)
哪个可以正常工作,但是如果我像这样通过file
:
Which works fine, but if I pass file
like so:
file = pd.read_excel(open('config_values.xlsx', 'rb'),
sheet_name='ContainsFree')
df['Contains Free'] = df.Search_Query.str.contains(file, regex=True)
我收到此错误:
TypeError: 'DataFrame' objects are mutable, thus they cannot be hashed
谢谢您的建议.
推荐答案
最终使用此解决方案:
containsFree = pd.read_excel('config_values.xlsx',
sheet_name='ContainsFree', header=None)
containsFree = '|'.join(containsFree[0])
df['ContainsFree'] = df.SearchQuery.str.contains(
containsFree, regex=True, flags=re.IGNORECASE)
我在每个Excel工作表中都使用了相同的代码.但是,对于第一个工作表,您无需传递工作表名称,即使它是正确的名称,也只需传递不带名称的文件即可:
I've used the same code per each excel sheet. However, for the first sheet you do not need to pass a sheet name, even if its the right one, just pass the file without the name:
containsbest = pd.read_excel('config_values.xlsx', header=None)
containsbest = '|'.join(containsbest[0])
df['Containsbest'] = df.SearchQuery.str.contains(
containsbest, regex=True, flags=re.IGNORECASE)
这篇关于检查Excel表格中是否存在Pandas DataFrame系列中的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!