自动检测/转换数据类型? [英] automatic detection/conversion of data types?
问题描述
numpy中是否有一个函数可以确定字符串应为整数还是浮点数并自动将其转换?例如,我经常有一组记录,这些记录是使用str.strip()
和str.split()
的组合从文本文件中解析出来的.然后我得到类似
Is there a function in numpy that determines whether strings should be integers or floating point numbers and automatically converts them? For instance, I often have a collection of records which are parsed from a text file using a combination of str.strip()
and str.split()
. Then I get something like
List = [['1','a','.3'],
['2','b','-.5']]
然后使用numpy.rec.fromrecords
进行转换:
In [1227]: numpy.rec.fromrecords(List)
Out[1227]:
rec.array([('1', 'a', '.3'), ('2', 'b', '-.5')],
dtype=[('f0', '|S1'), ('f1', '|S1'), ('f2', '|S3')])
在R中,有一个名为type.convert
的函数,该函数将字符串的向量/列传递给该函数,它将确定列的类型(即,如果它是字符串和数字的混合,则将保持为字符向量). Excel也会执行此操作(如果我没记错的话,它基于它的前6个元素)...
In R, there is a function called type.convert
to which vectors/columns of character strings are passed and it will determine what the type for the column should be (i.e. if it's a mix of strings and numbers it will remain a character vector). Excel does this also (based on its first 6 elements, if I recall correctly)...
NumPy/Python中是否有这样的功能?我知道我可能可以编写一个函数来测试列的每个元素是否可以转换为整数等,但是有什么内置功能吗?我知道在所有示例中,处方都是明确指定dtypes,但我想跳过此步骤.谢谢.
Is there such a function in NumPy/Python? I know I could probably write a function to test whether each element of a column could be converted to an integer, etc., but is there anything built in? I know in all the examples the prescription is to specify the dtypes explicitly, but I would like to skip this step. Thanks.
推荐答案
dtype=None,则rel ="nofollow noreferrer"> numpy.genfromtxt 可以猜测dtypes:
numpy.genfromtxt can guess dtypes if you set dtype=None
:
import numpy as np
import io
alist = [['1','a','.3'],
['2','b','-.5']]
f = io.BytesIO('\n'.join(' '.join(row) for row in alist))
arr = np.genfromtxt(f,dtype=None)
print(arr)
print(arr.dtype)
# [(1, 'a', 0.3) (2, 'b', -0.5)]
# [('f0', '<i4'), ('f1', '|S1'), ('f2', '<f8')]
请注意,最好直接将np.genfromtxt
应用于文本文件,而不是创建中间列表List
(或我所说的alist
).如果需要在将文件发送到np.genfromtxt
之前对文件进行一些处理,则可以进行
Note that it would be better to apply np.genfromtxt
directly to your text file instead of creating the intermediate list List
(or what I called alist
). If you need to do some processing of the file before sending it to np.genfromtxt
, you could make a file-like object wrapper around the file which can do the processing and be passed to np.genfromtxt
.
这篇关于自动检测/转换数据类型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!