为什么pd.concat将结果数据类型从int更改为float? [英] Why does pd.concat change the resulting datatype from int to float?
问题描述
我有三个数据帧:时间戳(带有时间戳),dataSun(带有日出和日落的时间戳),dataData(带有不同的气候数据).数据框timestamp
的数据类型为"int64"
.
I have three dataframes: timestamp (with timestamps), dataSun (with timestamps of sunrise and sunset), dataData (with different climate data). Dataframe timestamp
has datatype "int64"
.
timestamp.head()
timestamp
0 1521681600000
1 1521681900000
2 1521682200000
3 1521682500000
4 1521682800000
timestamp.head()
timestamp
0 1521681600000
1 1521681900000
2 1521682200000
3 1521682500000
4 1521682800000
数据帧dataSun
也具有数据类型"int64"
.
Dataframe dataSun
has also datatype "int64"
.
dataSun.head()
sunrise sunset
0 1521696105000 1521740761000
1 1521696105000 1521740761000
2 1521696105000 1521740761000
3 1521696105000 1521740761000
4 1521696105000 1521740761000
具有气候数据dataData
的数据框的数据类型为"float64"
.
Dataframe with climate data dataData
has datatype "float64"
.
dataData.head()
temperature pressure humidity
0 2.490000 1018.000000 99.0
1 2.408333 1017.833333 99.0
2 2.326667 1017.666667 99.0
3 2.245000 1017.500000 99.0
4 2.163333 1017.333333 99.0
5 2.081667 1017.166667 99.0
我想将这三个数据帧合并为一个.
I want to concatenate these three dataframes in one.
dataResult = pd.concat((timestamp, dataSun, dataData), axis = 1)
dataResult.head()
timestamp sunrise sunset temperature pressure
0 1521681600000 1.521696e+12 1.521741e+12 2.490000 1018.000000
1 1521681900000 1.521696e+12 1.521741e+12 2.408333 1017.833333
2 1521682200000 1.521696e+12 1.521741e+12 2.326667 1017.666667
3 1521682500000 1.521696e+12 1.521741e+12 2.245000 1017.500000
4 1521682800000 1.521696e+12 1.521741e+12 2.163333 1017.333333
5 1521683100000 1.521696e+12 1.521741e+12 2.081667 1017.166667
weatherMeasurements.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7188 entries, 0 to 7187
Data columns (total 6 columns):
timestamp 7188 non-null int64
sunrise 7176 non-null float64
sunset 7176 non-null float64
temperature 7176 non-null float64
pressure 7176 non-null float64
humidity 7176 non-null float64
dtypes: float64(5), int64(1)
为什么pd.concat
更改了值DataSun
的数据类型?我尝试了多种方法来串联数据帧.例如,我只在一个数据帧中串联了timestamp
和dataSun
,然后我用dataData
串联了结果数据帧.但这是相同的结果.
如何连接三个数据帧并保护数据类型?
Why pd.concat
has changes the datatype of the values DataSun
? I have tried different ways to concatenate the dataframes. For example, I concatenated only timestamp
and dataSun
in one dataframe, then I concatenated resulted dataframe with dataData
. But it was the same result.
How can I concatenate three dataframes and secure the datatypes?
推荐答案
因此-
timestamp 7188 non-null int64
sunrise 7176 non-null float64
...
timestamp
具有7188个非空值,而sunrise
及以后的版本具有7176个值.不用说,有12个值是 not 非空值……表示它们是是NaN.
timestamp
has 7188 non-null values, while sunrise
and onwards have 7176. It goes without saying that there are 12 values that are not non-null... meaning they're NaNs.
由于NaN为dtype=float
,因此该列中的所有其他值都会自动转换为浮点数,并且通常用科学计数法表示大的浮点数.
Since NaNs are of dtype=float
, every other value in that column is automatically upcasted to float, and float numbers that big are usually represented in scientific notation.
那是为什么,但这并不能真正解决您的问题.此时您的选择是
That's the why, but that doesn't really solve your problem. Your options at this point are
- 使用
dropna
删除带有NaN的行
- 使用
fillna
用默认的整数值填充这些NaN
- drop those rows with NaNs using
dropna
- fill those NaNs with some default integeral value using
fillna
(现在您可以将这些行向下转换为int.)
(Now you may downcast these rows to int.)
-
或者,如果对
join='inner'
执行pd.concat
,则不会引入NaN,并且会保留dtype.
Alternatively, if you perform
pd.concat
withjoin='inner'
, NaNs are not introduced and the dtypes are preserved.
pd.concat((timestamp, dataSun, dataData), axis=1, join='inner')
timestamp sunrise sunset temperature pressure \
0 1521681600000 1521696105000 1521740761000 2.490000 1018.000000
1 1521681900000 1521696105000 1521740761000 2.408333 1017.833333
2 1521682200000 1521696105000 1521740761000 2.326667 1017.666667
3 1521682500000 1521696105000 1521740761000 2.245000 1017.500000
4 1521682800000 1521696105000 1521740761000 2.163333 1017.333333
humidity
0 99.0
1 99.0
2 99.0
3 99.0
4 99.0
使用选项3,将对每个数据帧的索引执行内部联接.
With option 3, an inner join is performed on the indexes of each dataframe.
这篇关于为什么pd.concat将结果数据类型从int更改为float?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!