使用pandas过滤数据 [英] Filtering data with pandas
问题描述
我是Pandas的新手,我想将它应用到我已经写的脚本。
我有一个csv文件,我从其中提取数据,并使用列候选人','最终轨道'和'状态<
I'm a newbie to Pandas and I'm trying to apply it to a script that I have already written. I have a csv file from which I extract the data, and use the columns 'candidate', 'final track' and 'status' for my data frame.
我的问题是,我想过滤数据,使用Wes Mckinney的10分钟教程中显示的方法(' http://nbviewer.ipython.org/urls/ gist.github.com/wesm/4757075/raw/a72d3450ad4924d0e74fb57c9f62d1d895ea4574/PandasTour.ipynb ')。在 In [80]中:
他使用 aapl_bars.close_price ['2009-10-15']
。
My problem is, I would like to filter the data, using perhaps the method shown in Wes Mckinney's 10min tutorial ('http://nbviewer.ipython.org/urls/gist.github.com/wesm/4757075/raw/a72d3450ad4924d0e74fb57c9f62d1d895ea4574/PandasTour.ipynb'). In the section In [80]:
he uses aapl_bars.close_price['2009-10-15']
.
我想使用类似的方法来选择所有具有 *
状态的数据。如果该行中没有*,则也会删除其他列中的数据。
I would like to use a similar method to select all the data which have *
as a status. Data from the other columns are also deleted if there is no * in that row.
我的代码:
def establish_current_tacks(filename):
df=pd.read_csv(filename)
cols=[df.iloc[:,0], df.iloc[:,10], df.iloc[:,11]]
current_tracks=pd.concat(cols, axis=1)
return current_tracks
我的 DataFrame :
>>> current_tracks
<class 'pandas.core.frame.DataFrame'>
Int64Index: 707 entries, 0 to 706
Data columns (total 3 columns):
candidate 695 non-null values
final track 670 non-null values
status 670 non-null values
dtypes: float64(1), object(2)
想要使用 current_tracks.status ['*']
,但不起作用
推荐答案
由于你想要过滤的数据是基于的,因此,不是数据框架索引的一部分,而是一个常规列,你需要这样做:
Since the data you want to filter based on is not part of the data frame's index, but instead is a regular column, you need to do something like this:
current_tracks[current_tracks.status == '*']
完整示例:
import pandas as pd
current_tracks = pd.DataFrame({'candidate': ['Bob', 'Jim', 'Alice'],
'final_track': [10, 15, 13], 'status': ['*', '.', '*']})
current_tracks
Out[3]:
candidate final_track status
0 Bob 10 *
1 Jim 15 .
2 Alice 13 *
current_tracks[current_tracks.status == '*']
Out[4]:
candidate final_track status
0 Bob 10 *
2 Alice 13 *
如果 status
是您的数据框架索引的一部分,您的原始语法将有效:
If status
was part of your dataframe's index, your original syntax would have worked:
current_tracks = current_tracks.set_index('status')
current_tracks.candidate['*']
Out[8]:
status
* Bob
* Alice
Name: candidate, dtype: object
这篇关于使用pandas过滤数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!