在PANDAS中显示满足特定条件的多个数据 [英] showing multiple data which meets certain conditions-issue in PANDAS
问题描述
我在PANDAS使用Python工作,我正在看一个天气CSV文件。我可以从它拉数据没有问题。但是,我不能提取符合特定条件的数据,例如何时显示哪些天的温度高于100度。
I am working in PANDAS with Python and I am looking at a weather CSV file. I am able to pull data from it with no problem. However, I am not able to pull data that meets certain criteria such as when to show which days have the temperature above 100 degrees.
到目前为止,这是我的代码:
I have this as my code so far:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv('csv/weather.csv')
print(df[[df.MaxTemperatureF > 100 ]])
我有我的问题。我现在得到的错误跟踪,在执行以下步骤后,如下:
That last line is where I think I have my problem. The error traceback that I now get, after doing the steps below, is the following:
Traceback (most recent call last):
File "weather.py", line 40, in <module>
print(df[df['MaxTemperatureF' > 100]])
TypeError: unorderable types: str() > int()
Mikes-MBP-2:dataframes mikecuddy$ python3 weather.py
Traceback (most recent call last):
File "weather.py", line 41, in <module>
print(df[[df.MaxTemperatureF > 100 ]])
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-
packages/pandas/core/frame.py", line 1991, in __getitem__
return self._getitem_array(key)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-
packages/pandas/core/frame.py", line 2028, in _getitem_array
(len(key), len(self.index)))
ValueError: Item wrong length 1 instead of 360.
我一直在做一个教程: http://www.gregreda.com/2013/10/26/working-with-pandas- dataframes / 再次,任何帮助将是伟大的!谢谢!
I have been doing a tutorial at: http://www.gregreda.com/2013/10/26/working-with-pandas-dataframes/ Again any help would be great! Thank you!
df.info()资讯:
df.info() information:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 360 entries, 0 to 359
Data columns (total 23 columns):
PST 360 non-null object
MaxTemperatureF 359 non-null float64
Mean TemperatureF 359 non-null float64
Min TemperatureF 359 non-null float64
Max Dew PointF 359 non-null float64
MeanDew PointF 359 non-null float64
Min DewpointF 359 non-null float64
Max Humidity 359 non-null float64
Mean Humidity 359 non-null float64
Min Humidity 359 non-null float64
Max Sea Level PressureIn 359 non-null float64
Mean Sea Level PressureIn 359 non-null float64
Min Sea Level PressureIn 359 non-null float64
Max VisibilityMiles 355 non-null float64
Mean VisibilityMiles 355 non-null float64
Min VisibilityMiles 355 non-null float64
Max Wind SpeedMPH 359 non-null float64
Mean Wind SpeedMPH 359 non-null float64
Max Gust SpeedMPH 211 non-null float64
PrecipitationIn 360 non-null float64
CloudCover 343 non-null float64
Events 18 non-null object
WindDirDegrees 360 non-null int64
dtypes: float64(20), int64(1), object(2)
memory usage: 64.8+ KB
None
推荐答案
对于最大温度,您可以指定转换函数:
For the max temperature you can specify a converter function:
df = pd.read_csv('csv/weather.csv', converters={'MaxTemperatureF':float})
编辑:@ptrj您可以执行此操作,将 MaxTemperatureF
列中的字符串值替换为 np.nan
:
as @ptrj mentions in a comment you can do this to substitute np.nan
for string values in the MaxTemperatureF
column:
df = pd.read_csv('csv/weather.csv',
converters={'MaxTemperatureF':
lambda x: try: return float(x);
except ValueError: return np.nan;})
Edit2:@ ptrj的解决方案不能在注释中写...
@ptrj's solution since he can't write it up in a comment...
def my_conv(x):
try:
return float(x)
except ValueError:
return np.nan
df = pd.read_csv('csv/weather.csv', converters={'MaxTemperatureF': my_conv})
其他事项:
- 如果csv文件的第一行有标题,则不会传递
header = 0
。 - 您已经拥有标题,现在您不需要指定
cols = ...
- c $ c> sep 是','所以你不需要指定。
- If the first row of the csv file has the headers then don't pass
header=0
. - Being that you already have the header now you don't need to specify
cols=...
- The default
sep
is ',' so you don't need to specify that.
这篇关于在PANDAS中显示满足特定条件的多个数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!