检查 pandas 数据框列中是否包含某些值 [英] Check if certain value is contained in a dataframe column in pandas
问题描述
我试图检查一个python列中是否包含某个值。我正在使用 df.date.isin(['07311954'])
,我不怀疑这是一个很好的工具。问题是我有超过350K行,输出将不显示
所有这些,以便我可以看到该值是否实际包含。简单地说,我只想知道(是/否)列中是否包含特定的值。我的代码如下:
I am trying to check if a certain value is contained in a python column. I'm using df.date.isin(['07311954'])
, which I do not doubt to be a good tool. The problem is that I have over 350K rows and the output won't show
all of them so that I can see if the value is actually contained. Put simply, I just want to know (Y/N) whether or not a specific value is contained in a column. My code follows:
import numpy as np
import pandas as pd
import glob
df = (pd.read_csv('/home/jayaramdas/anaconda3/Thesis/FEC_data/itpas2_data/itpas214.txt',\
sep='|', header=None, low_memory=False, names=['1', '2', '3', '4', '5', '6', '7', \
'8', '9', '10', '11', '12', '13', 'date', '15', '16', '17', '18', '19', '20', \
'21', '22']))
df.date.isin(['07311954'])
推荐答案
我想你需要 str.contains
,如果您需要列 date $的值c $ c>包含字符串
07311954
:
I think you need str.contains
, if you need rows where values of column date
contains string 07311954
:
print df[df['date'].astype(str).str.contains('07311954')]
或者如果类型
的日期
列是 string
:
print df[df['date'].str.contains('07311954')]
如果你想检查最后4位数字日期
中 1954
If you want check last 4 digits for string
1954
in column date
:
print df[df['date'].astype(str).str[-4:].str.contains('1954')]
示例:
print df['date']
0 8152007
1 9262007
2 7311954
3 2252011
4 2012011
5 2012011
6 2222011
7 2282011
Name: date, dtype: int64
print df['date'].astype(str).str[-4:].str.contains('1954')
0 False
1 False
2 True
3 False
4 False
5 False
6 False
7 False
Name: date, dtype: bool
print df[df['date'].astype(str).str[-4:].str.contains('1954')]
cmte_id trans_typ entity_typ state employer occupation date \
2 C00119040 24K CCM MD NaN NaN 7311954
amount fec_id cand_id
2 1000 C00140715 H2MD05155
这篇关于检查 pandas 数据框列中是否包含某些值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!