pandas 联接字符串数据类型 [英] Pandas Join on String Datatype
问题描述
我正在尝试在id字段(字符串uuid)上加入两个熊猫数据帧.我收到值错误:
I am trying to join two pandas dataframes on an id field which is a string uuid. I get a Value error:
ValueError:您正在尝试合并object和int64列.如果要继续,则应使用pd.concat
ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat
代码在下面.我正在尝试按照尝试将字段转换为字符串2个数据帧,但得到ValueError ,但错误仍然存在.请注意,pdf是来自spark dataframe.toPandas()
,而outputPdf是根据字典创建的.
The code is below. I am trying to convert the fields to string as per Trying to merge 2 dataframes but get ValueError but the error remains. Note that pdf is coming from a spark dataframe.toPandas()
while outputsPdf is created from a dictionary.
pdf.id = pdf.id.apply(str)
outputsPdf.id = outputsPdf.id.apply(str)
inOutPdf = pdf.join(outputsPdf, on='id', how='left', rsuffix='fs')
pdf.dtypes
id object
time float64
height float32
dtype: object
outputsPdf.dtypes
id object
labels float64
dtype: object
我该如何调试? 完整回溯:
How can I debug this? Full Traceback:
ValueError Traceback (most recent call last)
<ipython-input-13-deb429dde9ad> in <module>()
61 pdf['id'] = pdf['id'].astype(str)
62 outputsPdf['id'] = outputsPdf['id'].astype(str)
---> 63 inOutPdf = pdf.join(outputsPdf, on=['id'], how='left', rsuffix='fs')
64
65 # idSparkDf = spark.createDataFrame(idPandasDf, schema=StructType([StructField('id', StringType(), True),
~/miniconda3/lib/python3.6/site-packages/pandas/core/frame.py in join(self, other, on, how, lsuffix, rsuffix, sort)
6334 # For SparseDataFrame's benefit
6335 return self._join_compat(other, on=on, how=how, lsuffix=lsuffix,
-> 6336 rsuffix=rsuffix, sort=sort)
6337
6338 def _join_compat(self, other, on=None, how='left', lsuffix='', rsuffix='',
~/miniconda3/lib/python3.6/site-packages/pandas/core/frame.py in _join_compat(self, other, on, how, lsuffix, rsuffix, sort)
6349 return merge(self, other, left_on=on, how=how,
6350 left_index=on is None, right_index=True,
-> 6351 suffixes=(lsuffix, rsuffix), sort=sort)
6352 else:
6353 if on is not None:
~/miniconda3/lib/python3.6/site-packages/pandas/core/reshape/merge.py in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
59 right_index=right_index, sort=sort, suffixes=suffixes,
60 copy=copy, indicator=indicator,
---> 61 validate=validate)
62 return op.get_result()
63
~/miniconda3/lib/python3.6/site-packages/pandas/core/reshape/merge.py in __init__(self, left, right, how, on, left_on, right_on, axis, left_index, right_index, sort, suffixes, copy, indicator, validate)
553 # validate the merge keys dtypes. We may need to coerce
554 # to avoid incompat dtypes
--> 555 self._maybe_coerce_merge_keys()
556
557 # If argument passed to validate,
~/miniconda3/lib/python3.6/site-packages/pandas/core/reshape/merge.py in _maybe_coerce_merge_keys(self)
984 elif (not is_numeric_dtype(lk)
985 and (is_numeric_dtype(rk) and not is_bool_dtype(rk))):
--> 986 raise ValueError(msg)
987 elif is_datetimelike(lk) and not is_datetimelike(rk):
988 raise ValueError(msg)
推荐答案
on
参数仅适用于调用的DataFrame !
on
:在调用方中的列或索引级别名称要在其他索引中联接,否则在索引上联接.
on
: Column or index level name(s) in the caller to join on the index in other, otherwise joins index-on-index.
尽管您指定了on='id'
,它将使用pdf中的'id'
作为对象,并尝试将其与采用整数值的outputsPdf的索引连接.
Though you specify on='id'
it will use the 'id'
in pdf, which is an object and attempt to join that with the index of outputsPdf, which takes integer values.
如果需要跨两个DataFrame对非索引列进行join
,则可以将它们设置为索引,或者必须使用merge
,因为pd.merge
中的on
参数适用于 数据框.
If you need to join
on non-index columns across two DataFrames you can either set them to the index, or you must use merge
as the on
paremeter in pd.merge
applies to both DataFrames.
import pandas as pd
df1 = pd.DataFrame({'id': ['1', 'True', '4'], 'vals': [10, 11, 12]})
df2 = df1.copy()
df1.join(df2, on='id', how='left', rsuffix='_fs')
ValueError:您正在尝试合并object和int64列.如果要继续,则应使用pd.concat
ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat
另一方面,这些工作:
df1.set_index('id').join(df2.set_index('id'), how='left', rsuffix='_fs').reset_index()
# id vals vals_fs
#0 1 10 10
#1 True 11 11
#2 4 12 12
df1.merge(df2, on='id', how='left', suffixes=['', '_fs'])
# id vals vals_fs
#0 1 10 10
#1 True 11 11
#2 4 12 12
这篇关于 pandas 联接字符串数据类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!