为什么pd.concat将结果数据类型从int更改为float? [英] Why does pd.concat change the resulting datatype from int to float?

查看:379
本文介绍了为什么pd.concat将结果数据类型从int更改为float?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有三个数据帧:时间戳(带有时间戳),dataSun(带有日出和日落的时间戳),dataData(带有不同的气候数据).数据框timestamp的数据类型为"int64".

I have three dataframes: timestamp (with timestamps), dataSun (with timestamps of sunrise and sunset), dataData (with different climate data). Dataframe timestamp has datatype "int64".

timestamp.head() timestamp 0 1521681600000 1 1521681900000 2 1521682200000 3 1521682500000 4 1521682800000

timestamp.head() timestamp 0 1521681600000 1 1521681900000 2 1521682200000 3 1521682500000 4 1521682800000

数据帧dataSun也具有数据类型"int64".

Dataframe dataSun has also datatype "int64".

 dataSun.head()
         sunrise         sunset
0  1521696105000  1521740761000
1  1521696105000  1521740761000
2  1521696105000  1521740761000
3  1521696105000  1521740761000
4  1521696105000  1521740761000

具有气候数据dataData的数据框的数据类型为"float64".

Dataframe with climate data dataData has datatype "float64".

dataData.head()
           temperature     pressure  humidity
    0     2.490000  1018.000000      99.0
    1     2.408333  1017.833333      99.0
    2     2.326667  1017.666667      99.0
    3     2.245000  1017.500000      99.0
    4     2.163333  1017.333333      99.0
    5     2.081667  1017.166667      99.0

我想将这三个数据帧合并为一个.

I want to concatenate these three dataframes in one.

dataResult = pd.concat((timestamp, dataSun, dataData), axis = 1)
dataResult.head()
       timestamp       sunrise        sunset  temperature     pressure     
0  1521681600000  1.521696e+12  1.521741e+12     2.490000  1018.000000   
1  1521681900000  1.521696e+12  1.521741e+12     2.408333  1017.833333   
2  1521682200000  1.521696e+12  1.521741e+12     2.326667  1017.666667   
3  1521682500000  1.521696e+12  1.521741e+12     2.245000  1017.500000   
4  1521682800000  1.521696e+12  1.521741e+12     2.163333  1017.333333   
5  1521683100000  1.521696e+12  1.521741e+12     2.081667  1017.166667   

weatherMeasurements.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7188 entries, 0 to 7187
Data columns (total 6 columns):
timestamp      7188 non-null int64
sunrise        7176 non-null float64
sunset         7176 non-null float64
temperature    7176 non-null float64
pressure       7176 non-null float64
humidity       7176 non-null float64
dtypes: float64(5), int64(1)

为什么pd.concat更改了值DataSun的数据类型?我尝试了多种方法来串联数据帧.例如,我只在一个数据帧中串联了timestampdataSun,然后我用dataData串联了结果数据帧.但这是相同的结果. 如何连接三个数据帧并保护数据类型?

Why pd.concat has changes the datatype of the values DataSun? I have tried different ways to concatenate the dataframes. For example, I concatenated only timestamp and dataSun in one dataframe, then I concatenated resulted dataframe with dataData. But it was the same result. How can I concatenate three dataframes and secure the datatypes?

推荐答案

因此-

timestamp      7188 non-null int64
sunrise        7176 non-null float64
...

timestamp具有7188个非空值,而sunrise及以后的版本具有7176个值.不用说,有12个值是 not 非空值……表示它们是是NaN.

timestamp has 7188 non-null values, while sunrise and onwards have 7176. It goes without saying that there are 12 values that are not non-null... meaning they're NaNs.

由于NaN为dtype=float,因此该列中的所有其他值都会自动转换为浮点数,并且通常用科学计数法表示大的浮点数.

Since NaNs are of dtype=float, every other value in that column is automatically upcasted to float, and float numbers that big are usually represented in scientific notation.

那是为什么,但这并不能真正解决您的问题.此时您的选择是

That's the why, but that doesn't really solve your problem. Your options at this point are

  1. 使用dropna
  2. 删除带有NaN的行
  3. 使用fillna
  4. 用默认的整数值填充这些NaN
  1. drop those rows with NaNs using dropna
  2. fill those NaNs with some default integeral value using fillna

(现在您可以将这些行向下转换为int.)

(Now you may downcast these rows to int.)

  1. 或者,如果对join='inner'执行pd.concat,则不会引入NaN,并且会保留dtype.

  1. Alternatively, if you perform pd.concat with join='inner', NaNs are not introduced and the dtypes are preserved.

pd.concat((timestamp, dataSun, dataData), axis=1, join='inner')

       timestamp        sunrise         sunset  temperature     pressure  \    
0  1521681600000  1521696105000  1521740761000     2.490000  1018.000000   
1  1521681900000  1521696105000  1521740761000     2.408333  1017.833333   
2  1521682200000  1521696105000  1521740761000     2.326667  1017.666667   
3  1521682500000  1521696105000  1521740761000     2.245000  1017.500000   
4  1521682800000  1521696105000  1521740761000     2.163333  1017.333333   

   humidity  
0      99.0  
1      99.0  
2      99.0  
3      99.0  
4      99.0 

使用选项3,将对每个数据帧的索引执行内部联接.

With option 3, an inner join is performed on the indexes of each dataframe.

这篇关于为什么pd.concat将结果数据类型从int更改为float?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆