如何对 ValueError 进行故障排除:数组的长度为 %s,而 DataFrame 的长度为 %s? [英] How do I troubleshoot ValueError: array is of length %s, while the length of the DataFrame is %s?

查看:29
本文介绍了如何对 ValueError 进行故障排除:数组的长度为 %s,而 DataFrame 的长度为 %s?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试按照此示例 笔记本.

正如这个github线程中所建议的:

  1. 我已将 ulimit 提高到 9999.
  2. 我已经将 csv 文件转换为 hdf5

尝试将单个 hdf5 文件打开到数据帧中时,我的代码失败:

df = vaex.open('data/chat_history_00.hdf5')

其余代码如下:

导入重新导入全局进口vaex将 numpy 导入为 npdef tryint(s):尝试:返回整数除了:返回def alphanum_key(s):""" 将字符串转换为字符串和数字块的列表.z23a"->["z", 23, "a"]"""return [ tryint(c) for c in re.split('([0-9]+)', s) ]hdf5_list = glob.glob('data/*.hdf5')hdf5_list.sort(key=alphanum_key)hdf5_list = np.array(hdf5_list)断言 len(hdf5_list) == 11, 文件数量不正确"# 检查单个文件的样子:df = vaex.open('data/chat_history_10.hdf5')df

产生的错误:

<块引用>

错误:主线程:vaex:打开data/chat_history_00.hdf5"时出错--------------------------------------------------------------------------- ValueError Traceback(最近调用最后)在1 # 检查单个文件的样子:----> 2 df = vaex.open('data/chat_history_10.hdf5')3 df

/usr/local/anaconda3/lib/python3.7/site-packages/vaex/init.py 中打开(路径,转换,洗牌,copy_index,*args,**kwargs)207 ds = from_csv(路径,copy_index=copy_index,**kwargs)208 其他:--> 209 ds = vaex.file.open(path, *args, **kwargs)210 如果转换和 ds:第211话

/usr/local/anaconda3/lib/python3.7/site-packages/vaex/file/init.py在开放(路径,*args,**kwargs)39 休息40 如果 dataset_class:---> 41 数据集 = dataset_class(path, *args, **kwargs)42 返回数据集43

/usr/local/anaconda3/lib/python3.7/site-packages/vaex/hdf5/dataset.py在 init(自我,文件名,写)第84话85 self._version = 1---> 86 self._load()8788 def write_meta(self):

/usr/local/anaconda3/lib/python3.7/site-packages/vaex/hdf5/dataset.py在 _load(self)第182话183 如果 self.h5file 中的数据":--> 184 self._load_columns(self.h5file["/data"])第185话186 如果 self.h5 文件中的表":

/usr/local/anaconda3/lib/python3.7/site-packages/vaex/hdf5/dataset.py在 _load_columns(self, h5data, first)第348话349 其他:--> 350 self.add_column(column_name, self._map_hdf5_array(data))351 其他:352转置=形状1 <形状[0]

/usr/local/anaconda3/lib/python3.7/site-packages/vaex/dataframe.py 中add_column(self, name, f_or_array, dtype) 2929
如果 len(self) == len(ar): 2930 提高ValueError("Array 的长度为 %s,而 DataFrame 的长度由于过滤是 %s,(未过滤的)长度是 %s." %(len(ar), len(self), self.length_unfiltered()))-> 2931 raise ValueError("array is of length %s, while the length of the DataFrame is %s" % (len(ar),self.length_original())) 2932 # 断言self.length_unfiltered() == len(data), "列应该是相等的长度,长度应该是 %d,而它是 %d" % (self.length_unfiltered(), len(data)) 2933 valid_name =vaex.utils.find_valid_name(name)

ValueError: 数组的长度为 2578961,而数组的长度数据帧为 6

这是什么意思,我该如何解决?所有文件都有 6 列.

下面是我创建 hdf5 文件的方法:

pd.read_csv(r'G:/path/to/file/data/chat_history-00.csv').to_hdf(r'data/chat_history_00.hdf5', key='data')

解决方案

问题已由 Jovan 回答来自 vaexrel="nofollow noreferrer">Github:

<块引用>

如果你想读取数据,你不应该使用pandas .to_hdfvaex 以内存映射的方式.请查看此链接了解更多详情.

我改用了这个:

vdf = vaex.from_pandas(df, copy_index=False)vdf.export_hdf5('chat_history_00.hdf5')

I'm trying to follow the example on this notebook.

As suggested in this github thread:

  1. I've upped the ulimit to 9999.
  2. I've already converted the csv files to hdf5

My code fails when trying to open a single hdf5 file into a dataframe:

df = vaex.open('data/chat_history_00.hdf5')

Here's the rest of the code:

import re
import glob
import vaex
import numpy as np

def tryint(s):
    try:
        return int(s)
    except:
        return s

def alphanum_key(s):
    """ Turn a string into a list of string and number chunks.
        "z23a" -> ["z", 23, "a"]
    """
    return [ tryint(c) for c in re.split('([0-9]+)', s) ]

hdf5_list = glob.glob('data/*.hdf5')
hdf5_list.sort(key=alphanum_key)
hdf5_list = np.array(hdf5_list)

assert len(hdf5_list) == 11, "Incorrect number of files"

# Check how the single file looks like:
df = vaex.open('data/chat_history_10.hdf5')
df

Error generated:

ERROR:MainThread:vaex:error opening 'data/chat_history_00.hdf5' --------------------------------------------------------------------------- ValueError Traceback (most recent call last) in 1 # Check how the single file looks like: ----> 2 df = vaex.open('data/chat_history_10.hdf5') 3 df

/usr/local/anaconda3/lib/python3.7/site-packages/vaex/init.py in open(path, convert, shuffle, copy_index, *args, **kwargs) 207 ds = from_csv(path, copy_index=copy_index, **kwargs) 208 else: --> 209 ds = vaex.file.open(path, *args, **kwargs) 210 if convert and ds: 211 ds.export_hdf5(filename_hdf5, shuffle=shuffle)

/usr/local/anaconda3/lib/python3.7/site-packages/vaex/file/init.py in open(path, *args, **kwargs) 39 break 40 if dataset_class: ---> 41 dataset = dataset_class(path, *args, **kwargs) 42 return dataset 43

/usr/local/anaconda3/lib/python3.7/site-packages/vaex/hdf5/dataset.py in init(self, filename, write) 84 self.h5table_root_name = None 85 self._version = 1 ---> 86 self._load() 87 88 def write_meta(self):

/usr/local/anaconda3/lib/python3.7/site-packages/vaex/hdf5/dataset.py in _load(self) 182 def _load(self): 183 if "data" in self.h5file: --> 184 self._load_columns(self.h5file["/data"]) 185 self.h5table_root_name = "/data" 186 if "table" in self.h5file:

/usr/local/anaconda3/lib/python3.7/site-packages/vaex/hdf5/dataset.py in _load_columns(self, h5data, first) 348 self.add_column(column_name, self._map_hdf5_array(data, column['mask'])) 349 else: --> 350 self.add_column(column_name, self._map_hdf5_array(data)) 351 else: 352 transposed = shape1 < shape[0]

/usr/local/anaconda3/lib/python3.7/site-packages/vaex/dataframe.py in add_column(self, name, f_or_array, dtype) 2929
if len(self) == len(ar): 2930 raise ValueError("Array is of length %s, while the length of the DataFrame is %s due to the filtering, the (unfiltered) length is %s." % (len(ar), len(self), self.length_unfiltered())) -> 2931 raise ValueError("array is of length %s, while the length of the DataFrame is %s" % (len(ar), self.length_original())) 2932 # assert self.length_unfiltered() == len(data), "columns should be of equal length, length should be %d, while it is %d" % ( self.length_unfiltered(), len(data)) 2933 valid_name = vaex.utils.find_valid_name(name)

ValueError: array is of length 2578961, while the length of the DataFrame is 6

What does this mean and how do I troubleshoot it? All the files has 6 columns.

EDIT: Here's how I created the hdf5 file:

pd.read_csv(r'G:/path/to/file/data/chat_history-00.csv').to_hdf(r'data/chat_history_00.hdf5', key='data')

解决方案

The question has been answered by Jovan of vaex on Github:

You should not use pandas .to_hdf if you want to read the data with vaex in a memory-mapped way. Please see this link for more details.

I used this instead:

vdf = vaex.from_pandas(df, copy_index=False)
vdf.export_hdf5('chat_history_00.hdf5')

这篇关于如何对 ValueError 进行故障排除:数组的长度为 %s,而 DataFrame 的长度为 %s?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆