根据 pandas 中多列的值从数据框中选择行 [英] Selecting rows from a Dataframe based on values from multiple columns in pandas

查看:50
本文介绍了根据 pandas 中多列的值从数据框中选择行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题与这两个问题非常相关另一个thisone,我什至会使用这个问题的非常有用的公认解决方案中的示例.以下是已接受的解决方案中的示例(归功于 unutbu):

This question is very related to these two questions another and thisone, and I'll even use the example from the very helpful accepted solution on that question. Here's the example from the accepted solution (credit to unutbu):

import pandas as pd
import numpy as np
df = pd.DataFrame({'A': 'foo bar foo bar foo bar foo foo'.split(),
                   'B': 'one one two three two two one three'.split(),
                   'C': np.arange(8), 'D': np.arange(8) * 2})
print(df)
#      A      B  C   D
# 0  foo    one  0   0
# 1  bar    one  1   2
# 2  foo    two  2   4
# 3  bar  three  3   6
# 4  foo    two  4   8
# 5  bar    two  5  10
# 6  foo    one  6  12
# 7  foo  three  7  14

print(df.loc[df['A'] == 'foo'])

收益

     A      B  C   D
0  foo    one  0   0
2  foo    two  2   4
4  foo    two  4   8
6  foo    one  6  12
7  foo  three  7  14

但我想拥有 A 的所有行,并且只有 B 中包含两个"的箭头.我的尝试是尝试

But I want to have all rows of A and only the arrows in B that have 'two' in them. My attempt at it is to try

print(df.loc[df['A']) & df['B'] == 'two'])

不幸的是,这不起作用.任何人都可以建议一种方法来实现这样的事情吗?如果解决方案有点通用,例如列 A 没有相同的值,即 'foo' 但具有不同的值,并且您仍然想要整个列,那将会有很大帮助.

This does not work, unfortunately. Can anybody suggest a way to implement something like this? it would be of a great help if the solution is somewhat general where for example column A doesn't have the same value which is 'foo' but has different values and you still want the whole column.

推荐答案

认为我理解你修改后的问题.在B的条件下进行子选择后,就可以选择你想要的列,如:

I think I understand your modified question. After sub-selecting on a condition of B, then you can select the columns you want, such as:

In [1]: df.loc[df.B =='two'][['A', 'B']]
Out[1]: 
     A    B
2  foo  two
4  foo  two
5  bar  two

例如,如果我想连接 A 列的所有字符串,其中 B 列的值为 'two',那么我可以这样做:

For example, if I wanted to concatenate all the string of column A, for which column B had value 'two', then I could do:

In [2]: df.loc[df.B =='two'].A.sum()  # <-- use .mean() for your quarterly data
Out[2]: 'foofoobar'

您还可以groupby B 列的值,并从一个表达式中为每个不同的 B-group 获得这样的连接结果:

You could also groupby the values of column B and get such a concatenation result for every different B-group from one expression:

In [3]: df.groupby('B').apply(lambda x: x.A.sum())
Out[3]: 
B
one      foobarfoo
three       barfoo
two      foofoobar
dtype: object

要过滤 A B 使用 numpy.logical_and:

To filter on A and B use numpy.logical_and:

In [1]: df.loc[np.logical_and(df.A == 'foo', df.B == 'two')]
Out[1]: 
     A    B  C  D
2  foo  two  2  4
4  foo  two  4  8

这篇关于根据 pandas 中多列的值从数据框中选择行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆