pandas :类型错误:'>'在日期列上选择时,“int"和“str"实例之间不支持 [英] Pandas: TypeError: '>' not supported between instances of 'int' and 'str' when selecting on date column
问题描述
我有一个带有时间戳列的 Pandas DataFrame.我可以从此列中选择日期范围.但是在我对 DataFrame 中的其他列进行更改后,我无法再收到错误消息TypeError: '>' not supported between 'int' and 'str'".
I have a Pandas DataFrame with a column with TimeStamps. I can select date ranges from this column. But after I make change to other columns in the DataFrame, I can no longer and I get the error "TypeError: '>' not supported between instances of 'int' and 'str'".
以下代码重现问题:
- 用一些随机数生成一个 DataFrame
- 添加带有日期的列
在日期列上选择
- Generate a DataFrame with some random numbers
- Add a column with dates
Select on the date column
df = pd.DataFrame(np.random.random((200,3)))
df['date'] = pd.date_range('2000-1-1', periods=200, freq='D')
mask = (df['date'] > '2000-6-1') & (df['date'] <= '2000-6-10')
print(df.loc[mask])
一切顺利:
0 1 2 date
153 0.280575 0.810817 0.534509 2000-06-02
154 0.490319 0.873906 0.465698 2000-06-03
155 0.070790 0.898340 0.390777 2000-06-04
156 0.896007 0.824134 0.134484 2000-06-05
157 0.539633 0.814883 0.976257 2000-06-06
158 0.772454 0.420732 0.499719 2000-06-07
159 0.498020 0.495946 0.546043 2000-06-08
160 0.562385 0.460190 0.480170 2000-06-09
161 0.924412 0.611929 0.459360 2000-06-10
但是,现在我将第 0 列设置为 0,如果它超过 0.7 并重复:
However, now I set column 0 to 0 if it exceeds 0.7 and repeat:
df[df[0] > 0.7] = 0
mask = (df['date'] > '2000-6-1') & (df['date'] <= '2000-6-10')
这给出了错误:
TypeError: '>' not supported between instances of 'int' and 'str'
为什么会发生这种情况,我该如何避免?
Why does this happen and how do I avoid it?
推荐答案
您可以将时间戳 (Timestamp('2000-01-01 00:00:00')
) 与字符串进行比较,pandas 会为您将字符串转换为 Timestamp
.但是一旦将值设置为 0
,就无法将 int
与 str
进行比较.
You can compare a timestamp (Timestamp('2000-01-01 00:00:00')
) to a string, pandas will convert the string to Timestamp
for you. But once you set the value to 0
, you cannot compare an int
to a str
.
解决此问题的另一种方法是更改操作顺序.
Another way to go around this is to change order of your operations.
filters = df[0] > 0.7
mask = (df['date'] > '2000-6-1') & (df['date'] <= '2000-6-10')
df[filters] = 0
print(df.loc[mask & filters])
此外,您提到您希望将第 0 列设置为 0,如果它超过 0.7,所以 df[df[0]>0.7] = 0
不这样做正是您想要的:它将整行设置为 0
.相反:
Also, you mentioned you want to set column 0 to 0 if it exceeds 0.7, so df[df[0]>0.7] = 0
does not do exactly what you want: it sets the entire rows to 0
. Instead:
df.loc[df[0] > 0.7, 0] = 0
那你用原来的掩码应该没有任何问题.
Then you should not have any problem with the original mask.
这篇关于 pandas :类型错误:'>'在日期列上选择时,“int"和“str"实例之间不支持的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!