使用isin()确定应打印的内容 [英] Using isin() to determine what should be printed

查看:64
本文介绍了使用isin()确定应打印的内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

现在我有两个数据帧(data1data2)

Right now I have two dataframes (data1 and data2)

根据ID是否同时存在于data2和data1中,我想在名为data1的数据框中打印一列字符串值.

I would like to print a column of string values in the dataframe called data1, based on whether the ID exists in both data2 and data1.

我现在正在做的事情为我提供了一个布尔列表(如果两个数据帧中都存在ID,但字符串列中不存在ID,则为TrueFalse).

What I am doing now gives me a boolean list (True or False if the ID exists in the both dataframes but not the column of strings).

print(data2['id'].isin(data1.id).to_string())

收益

0      True
1      True
2      True
3      True
4      True
5      True

任何想法都会受到赞赏.

Any ideas would be appreciated.

这里是data1的样本

Here is a sample of data1

"user_id","id","rating","unix_timestamp"

'user_id', 'id', 'rating', 'unix_timestamp'

196 242 3   881250949
186 302 3   891717742
22  377 1   878887116

data2包含这样的内容

And data2 contains something like this

'id','title','release_date', "video_release_date","imdb_url"

'id', 'title', 'release_date', 'video_release_date', 'imdb_url'

37|Nadja (1994)|01-Jan-1994||http://us.imdb.com/M/title-exact?Nadja%20(1994)|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0
38|Net, The (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Net,%20The%20(1995)|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|1|0|0
39|Strange Days (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Strange%20Days%20(1995)|0|1|0|0|0|0|1|0|0|0|0|0|0|0|0|1|0|0|0

推荐答案

如果id的所有值都是唯一的:

If all values of ids are unique:

我认为您需要 merge inner加入.对于data2仅选择id列,应省略on参数,因为在所有列上都进行了联接-这里仅id:

I think you need merge with inner join. For data2 select only id column, on parameter should be omit, because joining on all columns - here only id:

df = pd.merge(data1, data2[['id']])

示例:

data1 = pd.DataFrame({'id':list('abcdef'),
                      'B':[4,5,4,5,5,4],
                      'C':[7,8,9,4,2,3]})

print (data1)
   B  C id
0  4  7  a
1  5  8  b
2  4  9  c
3  5  4  d
4  5  2  e
5  4  3  f

data2 = pd.DataFrame({'id':list('frcdeg'),
                      'D':[1,3,5,7,1,0],
                      'E':[5,3,6,9,2,4],})

print (data2)
   D  E id
0  1  5  f
1  3  3  r
2  5  6  c
3  7  9  d
4  1  2  e
5  0  4  g

df = pd.merge(data1, data2[['id']])
print (df)
   B  C id
0  4  9  c
1  5  4  d
2  5  2  e
3  4  3  f

如果id在一个或另一个Dataframe中重复,则使用另一个答案,还添加了类似的解决方案:

If id are duplicated in one or another Dataframe use another answer, also added similar solutions:

df = data1[data1['id'].isin(set(data1['id']) & set(data2['id']))]


ids = set(data1['id']) & set(data2['id'])
df = data2.query('id in @ids')


df = data1[np.in1d(data1['id'], np.intersect1d(data1['id'], data2['id']))]

示例:

data1 = pd.DataFrame({'id':list('abcdef'),
                      'B':[4,5,4,5,5,4],
                      'C':[7,8,9,4,2,3]})

print (data1)
   B  C id
0  4  7  a
1  5  8  b
2  4  9  c
3  5  4  d
4  5  2  e
5  4  3  f

data2 = pd.DataFrame({'id':list('fecdef'),
                      'D':[1,3,5,7,1,0],
                      'E':[5,3,6,9,2,4],})

print (data2)
   D  E id
0  1  5  f
1  3  3  e
2  5  6  c
3  7  9  d
4  1  2  e
5  0  4  f

df = data1[data1['id'].isin(set(data1['id']) & set(data2['id']))]
print (df)
   B  C id
2  4  9  c
3  5  4  d
4  5  2  e
5  4  3  f

您可以使用:

df = data2.loc[data1['id'].isin(set(data1['id']) & set(data2['id'])), ['title']]

ids = set(data1['id']) & set(data2['id'])
df = data2.query('id in @ids')[['title']]

df = data2.loc[np.in1d(data1['id'], np.intersect1d(data1['id'], data2['id'])), ['title']]

这篇关于使用isin()确定应打印的内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆