AssertionError的解决方案:在对Dataframes列表执行连接操作时,get_concat_dtype中的无效dtype确定 [英] Solution for AssertionError: invalid dtype determination in get_concat_dtype when concatenating operation on list of Dataframes

查看:2963
本文介绍了AssertionError的解决方案:在对Dataframes列表执行连接操作时,get_concat_dtype中的无效dtype确定的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框列表,我试图使用串联功能组合。

  dataframe_lists = [df1,df2,df3] 

result = pd.concat(dataframe_lists,keys = ['one','two','three'],ignore_index = True)

回溯为:

  ----------------------- -------------------------------------------------- -  
AssertionError Traceback(最近一次调用)
< ipython-input-198-a30c57d465d0> in< module>()
----> 1 result = pd.concat(dataframe_lists,keys = ['one','two','three'],ignore_index = True)
2检查(dataframe_lists)

C:\ WinPython-64bit-3.4.3.5 \python-3.4.3.amd64 \lib \site-packages \ pandas \tools\merge.py in concat(objs,axis,join,join_axes,ignore_index,keys,级别,名称,verify_integrity,副本)
753 verify_integrity = verify_integrity,
754 copy = copy)
- > 755 return op.get_result()
756
757

C:\WinPython-64bit-3.4.3.5\python-3.4.3.amd64\lib\ site-packages \ pandas \tools \merge.py in get_result(self)
924
925 new_data = concatenate_block_managers(
- > 926 mgrs_indexers,self.new_axes,concat_axis = self.axis,copy = self.copy)
927如果不是self.copy:
928 new_data._consolidate_inplace()

C:\WinPython-64bit-3.4.3.5 \python-3.4.3.amd64\lib \site-packages\pandas\core\\\\\\\\\\\ b $ b 4062 placement = placement)
- > 4063 for placement,join_units in concat_plan]
4064
4065返回BlockManager(块,轴)

C:\WinPython-64bit-3.4.3.5 \python-3.4。 3.amd64\lib\site-packages\pandas\core\\\\\\\\\\\\\\\\\\\ ininternals.py in< listcomp>(。0)
4061 copy = copy),
4062 placement = placement)
- > 4063 for placement,join_units in concat_plan]
4064
4065返回BlockManager(块,轴)

C:\WinPython-64bit-3.4.3.5 \python-3.4。
4150 raise AssertionError(沿着axis0连接单位)$ b $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ b 4151
- > 4152 empty_dtype,upcasted_na = get_empty_dtype_and_na(join_units)
4153
4154 to_concat = [ju.get_reindexed_values(empty_dtype = empty_dtype,

C:\WinPython-64bit-3.4.3.5\\ \\ python-3.4.3.amd64 \lib \site-packages \ pandas\core\\\\\\\\\\\\ tslib.iNaT
4140 else:#pragma
- > 4141 raise AssertionError(get_concat_dtype中无效的dtype确定)
4142
4143

AssertionError :get_concat_dtype中无效的dtype确定

我认为错误在于一个数据帧是空的,我使用简单的函数 check 验证并返回空数据框的头部:

  def check(list_of_df):

headers = []
for df in dataframe_lists:
如果df.empty不为True:
continue
else:
headers.append(df.columns)

return headers

我想知道是否可以使用这个函数,如果在空数据帧的情况下,只返回空的数据帧头,并将其附加到连接的数据帧。输出将是单个行的头(和在重复的列名称的情况下,只是一个单一的实例的头(如在连接函数的情况下)我有两个示例数据源,一个两个非空数据集。这是一个空的数据帧。 / p>

我想让结果连接符具有列标题...

 'AT','AccountNum','AcctType','Amount','City','Comment','Country','DuplicateAddressFlag','FromAccount','FromAccountNum','FromAccountT',' ,'PriorCity','PriorCountry','PriorState','PriorStreetAddress','PriorStreetAddress2','PriorZip','RTID','State','Street1','Street2','Timestamp','ToAccount' ToAccountNum','ToAccountT','TransferAmount','TransferMade','TransferTimestamp','Ttype','WA','WC','Zip'

要有一个空的数据帧的头被附加到这行(如果它们是新的)。

 'A','AT','AccountNum','AcctType','Amount','B','C 'From','FromAccountN','FromAccountT','G','PN',''''''''''先前城市,先行城市,先行城市,先行街道地址,先行街道地址2,PriorZip,RTID,州,街道1,街道2 ,'ToAccountT','TransferAmount','TransferMade','TransferTimestamp','Ttype','WA','WC','Zip'

我欢迎您对最佳方法的反馈。



如下面的答案,这是一个意想不到的结果:



不幸的是,这种材质,我不能共享实际数据。引导到gist中提供的内容如下:

  A = data [data ['RRT'] ==' A']#只从数据框data中选择列
B = data [data ['RRT'] =='B']
C = data [data ['RRT'] = ='C']
D = data [data ['RRT'] =='D']


$ b b

对于每个新的数据框架,我应用这个逻辑:

  ).iterrows():
AColumns = A [['ANum','RTID','Description','Type','Status','AD','CD','OD','RCD' ]] #get选择使用dataframe索引的列A

空数据框A:

  AColumns.count 

这是输出:

 < bound方法DataFrame.count空DataFrame 
列:[ANum,RTID,Description,Type,Status,AD,CD,OD,RCD]
索引:[]

最后,我导入了以下CSV:

  data = pd.read_csv('Merged_Success2.csv',dtype = str,error_bad_lines = False,iterator = True,chunksize = 1000)
data = pd.concat ([chunk for chunk in data],ignore_index = True)

我不确定提供。连接方法适用于满足要求所需的所有其他数据帧。我也看了Pandas internals.py和完整的跟踪。我有太多列与NaN,重复列名称或混合dtypes(后者是最不可能的罪魁祸首)。



再次感谢您的指导。

解决方案

在我们的一个项目中,我们遇到了同样的错误。调试后,我们发现了问题。我们的一个数据框架有两个相同名称的列。重命名其中一个列后,我们的问题解决了。


I have a list of Dataframes that I am attempting to combine using the concatenation function.

dataframe_lists = [df1, df2, df3]

result = pd.concat(dataframe_lists, keys = ['one', 'two','three'], ignore_index=True)

The full traceback is:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-198-a30c57d465d0> in <module>()
----> 1 result = pd.concat(dataframe_lists, keys = ['one', 'two','three'], ignore_index=True)
      2 check(dataframe_lists)

C:\WinPython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\pandas\tools\merge.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy)
    753                        verify_integrity=verify_integrity,
    754                        copy=copy)
--> 755     return op.get_result()
    756 
    757 

C:\WinPython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\pandas\tools\merge.py in get_result(self)
    924 
    925             new_data = concatenate_block_managers(
--> 926                 mgrs_indexers, self.new_axes, concat_axis=self.axis, copy=self.copy)
    927             if not self.copy:
    928                 new_data._consolidate_inplace()

C:\WinPython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\pandas\core\internals.py in concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy)
   4061                                                 copy=copy),
   4062                          placement=placement)
-> 4063               for placement, join_units in concat_plan]
   4064 
   4065     return BlockManager(blocks, axes)

C:\WinPython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\pandas\core\internals.py in <listcomp>(.0)
   4061                                                 copy=copy),
   4062                          placement=placement)
-> 4063               for placement, join_units in concat_plan]
   4064 
   4065     return BlockManager(blocks, axes)

C:\WinPython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\pandas\core\internals.py in concatenate_join_units(join_units, concat_axis, copy)
   4150         raise AssertionError("Concatenating join units along axis0")
   4151 
-> 4152     empty_dtype, upcasted_na = get_empty_dtype_and_na(join_units)
   4153 
   4154     to_concat = [ju.get_reindexed_values(empty_dtype=empty_dtype,

C:\WinPython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\pandas\core\internals.py in get_empty_dtype_and_na(join_units)
   4139         return np.dtype('m8[ns]'), tslib.iNaT
   4140     else:  # pragma
-> 4141         raise AssertionError("invalid dtype determination in get_concat_dtype")
   4142 
   4143 

AssertionError: invalid dtype determination in get_concat_dtype

I believe that the error lies in the fact that one of the data frames is empty. I used the simple function check to verify and return just the headers of the empty dataframe:

  def check(list_of_df):

    headers = []
    for df in dataframe_lists:
        if df.empty is not True:
            continue
        else:  
            headers.append(df.columns)

    return headers

I am wondering if it is possible to use this function to, if in the case of an empty dataframe, return just that empty dataframe's headers and append it to the concatenated dataframe. The output would be a single row for the headers (and, in the case of a repeating column name, just a single instance of the header (as in the case of the concatenation function). I have two sample data sources, one and two non-empty data sets. Here is an empty dataframe.

I would like to have the resulting concatenate have the column headers...

 'AT','AccountNum', 'AcctType', 'Amount', 'City', 'Comment', 'Country','DuplicateAddressFlag', 'FromAccount', 'FromAccountNum', 'FromAccountT','PN', 'PriorCity', 'PriorCountry', 'PriorState', 'PriorStreetAddress','PriorStreetAddress2', 'PriorZip', 'RTID', 'State', 'Street1','Street2', 'Timestamp', 'ToAccount', 'ToAccountNum', 'ToAccountT', 'TransferAmount', 'TransferMade', 'TransferTimestamp', 'Ttype', 'WA','WC', 'Zip'

to have an empty dataframe's headers be appended in line with this row (if they are new).

 'A', 'AT','AccountNum', 'AcctType', 'Amount', 'B', 'C', 'City', 'Comment', 'Country', 'D', 'DuplicateAddressFlag', 'E', 'F' 'FromAccount', 'FromAccountNum', 'FromAccountT', 'G', 'PN', 'PriorCity', 'PriorCountry', 'PriorState', 'PriorStreetAddress','PriorStreetAddress2', 'PriorZip', 'RTID', 'State', 'Street1','Street2', 'Timestamp', 'ToAccount', 'ToAccountNum', 'ToAccountT', 'TransferAmount', 'TransferMade', 'TransferTimestamp', 'Ttype', 'WA','WC', 'Zip'

I welcome feedback on the best method to do this.

As the answer below details, this is a rather unexpected result:

Unfortunately, due to the sensitivity of this material, I cannot share the actual data. Leading up to what is presented in the gist is the following:

A= data[data['RRT'] == 'A'] #Select just the columns with  from the dataframe "data"
B= data[data['RRT'] == 'B']
C= data[data['RRT'] == 'C']
D= data[data['RRT'] == 'D']

For each of the new data frames I then apply this logic:

for column_name, column in A.transpose().iterrows():
    AColumns= A[['ANum','RTID', 'Description','Type','Status', 'AD', 'CD', 'OD', 'RCD']]  #get select columns indexed with dataframe, "A"

When I perform the bound method on an empty dataframe A:

AColumns.count

This is the output:

<bound method DataFrame.count of Empty DataFrame
Columns: [ANum,RTID, Description,Type,Status, AD, CD, OD, RCD]
Index: []>

Finally, I imported the CSV with the following:

data=pd.read_csv('Merged_Success2.csv', dtype=str, error_bad_lines = False, iterator=True,  chunksize=1000)
data=pd.concat([chunk for chunk in data], ignore_index=True)

I am not certain what else I can provide. The concatenation method works with all other data frames that are needed to meet a requirement. I have also looked at the Pandas internals.py and the full trace. Either I have too many columns with NaN, duplicate column names or mixed dtypes (the latter being the least likely culprit).

Thank you again for your guidance.

解决方案

During one of our projects we experienced the same error. After debugging we found the problem. One of our dataframes had 2 columns with the same name. After renaming one of the columns our problem was solved.

这篇关于AssertionError的解决方案:在对Dataframes列表执行连接操作时,get_concat_dtype中的无效dtype确定的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆