如何推断 pandas 数据框中的类型 [英] How to infer types in pandas dataframe
问题描述
我有一个数据帧,该数据帧是使用pyspark
与以下内容一起读取的:
I have a dataframe which I read in using pyspark
with:
df1 = spark.read.csv("/user/me/data/*").toPandas()
不幸的是,pyspark将所有类型都保留为Object
,即使是数值也是如此.我需要将其与我用df2 = pd.read_csv("file.csv")
读取的另一个数据框合并,因此我需要像熊猫那样准确地推断df1
中的类型.
Unfortunately, pyspark leaves all the types as Object
, even numerical values. I need to merge this with another dataframe I read in with df2 = pd.read_csv("file.csv")
so I need the types in df1
to be inferred exactly as pandas would have done it.
如何推断现有熊猫数据框的类型?
How can you infer types of an existing pandas dataframe?
推荐答案
If you have the same column names you could use pd.DataFrame.astype
:
df1 = df1.astype(df2.dtypes)
否则,您需要构造一个字典,其中键是df1
中的列名,值是dtypes
.您可以从d = df2.dtypes.to_dict()
开始以查看其外观.然后构建一个新字典,在需要的地方更改键.
Otherwise, you need to construct a dictionary where keys are the column names in df1
and the values are dtypes
. You can start with d = df2.dtypes.to_dict()
to see what it should look like. Then construct a new dictionary altering the keys where needed.
构建字典d
后,请使用:
df1 = df1.astype(d)
这篇关于如何推断 pandas 数据框中的类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!