在python3中的numpy.txt中打开csv文件 [英] opening csv file in a numpy.txt in python3
问题描述
我有一个csv文件,并尝试使用numpy.loadtxt打开它. 如果我使用熊猫打开它,该文件将看起来像这个小例子:
I have a csv file and tryng to open it using numpy.loadtxt. if I open it using pandas, the file will look like this small example:
小例子:
Name Accession Class Species Annotation CF330
NaN NaN NaN NaN NaN NaN
A2M NM_000014.4 Endogenous Hs NaN 11495.0
ACVR1C NM_145259.2 Endogenous Hs NaN 28.0
ADAM12 NM_003474.5 Endogenous Hs NaN 1020.0
ADGRE1 NM_001256252.1 Endogenous Hs NaN 42.0
我正在尝试使用numpy.loadtxt并使用以下代码打开文件:
I am trying to open the file using numpy.loadtxt and using the following code:
with open('datafile1.csv') as f:
for line in f:
FH = np.loadtxt(line, delimiter=',', skiprows=1)
print(FH)
但它返回此错误:
ValueError: could not convert string to float:
您知道如何解决该问题吗?
do you know how to fix the problem?
这是原始数据集:
Name,Accession,Class,Species,Annotation,CF330
,,,,,
A2M,NM_000014.4,Endogenous,Hs,,11495
ACVR1C,NM_145259.2,Endogenous,Hs,,28
ADAM12,NM_003474.5,Endogenous,Hs,,1020
ADGRE1,NM_001256252.1,Endogenous,Hs,,42
推荐答案
In [19]: txt = '''Name,Accession,Class,Species,Annotation,CF330
...: ,,,,,
...: A2M,NM_000014.4,Endogenous,Hs,,11495
...: ACVR1C,NM_145259.2,Endogenous,Hs,,28
...: ADAM12,NM_003474.5,Endogenous,Hs,,1020
...: ADGRE1,NM_001256252.1,Endogenous,Hs,,42'''
对于dtype=None
,genfromtxt
为我们提供了一个结构化数组:
With dtype=None
, genfromtxt
gives us a structured array:
In [23]: np.genfromtxt(txt.splitlines(), names=True, dtype=None, encoding=None,delimiter=',')
Out[23]:
array([('', '', '', '', False, -1),
('A2M', 'NM_000014.4', 'Endogenous', 'Hs', False, 11495),
('ACVR1C', 'NM_145259.2', 'Endogenous', 'Hs', False, 28),
('ADAM12', 'NM_003474.5', 'Endogenous', 'Hs', False, 1020),
('ADGRE1', 'NM_001256252.1', 'Endogenous', 'Hs', False, 42)],
dtype=[('Name', '<U6'), ('Accession', '<U14'), ('Class', '<U10'), ('Species', '<U2'), ('Annotation', '?'), ('CF330', '<i8')])
以数据框形式:
In [26]: pd.DataFrame(_23)
Out[26]:
Name Accession Class Species Annotation CF330
0 False -1
1 A2M NM_000014.4 Endogenous Hs False 11495
2 ACVR1C NM_145259.2 Endogenous Hs False 28
3 ADAM12 NM_003474.5 Endogenous Hs False 1020
4 ADGRE1 NM_001256252.1 Endogenous Hs False 42
loadtxt
和genfromtxt
的默认dtype
是float
.如果文件包含无法转换的字符串,则会在loadtxt
中出现错误;和genfromtxt
中的nan
.这些功能的文档很长,但是如果您想正确使用它们,则值得阅读.
Default dtype
for loadtxt
and genfromtxt
is float
. You get errors in loadtxt
if the file has strings that don't convert; and nan
in genfromtxt
. The documentation for these functions is long, but worth the read if you want to use them correctly.
np.loadtxt(
fname,
dtype=<class 'float'>, # DEFAULT DTYPE
comments='#',
delimiter=None,
converters=None,
skiprows=0,
usecols=None,
unpack=False,
ndmin=0,
encoding='bytes',
max_rows=None,
)
loadtxt
的替代用法:
In [31]: np.loadtxt(txt.splitlines(), skiprows=1, dtype=str, encoding=None,delimiter=',')
Out[31]:
array([['', '', '', '', '', ''],
['A2M', 'NM_000014.4', 'Endogenous', 'Hs', '', '11495'],
['ACVR1C', 'NM_145259.2', 'Endogenous', 'Hs', '', '28'],
['ADAM12', 'NM_003474.5', 'Endogenous', 'Hs', '', '1020'],
['ADGRE1', 'NM_001256252.1', 'Endogenous', 'Hs', '', '42']],
dtype='<U14')
In [32]: np.loadtxt(txt.splitlines(), skiprows=1, dtype=object, encoding=None,delimiter=',')
Out[32]:
array([['', '', '', '', '', ''],
['A2M', 'NM_000014.4', 'Endogenous', 'Hs', '', '11495'],
['ACVR1C', 'NM_145259.2', 'Endogenous', 'Hs', '', '28'],
['ADAM12', 'NM_003474.5', 'Endogenous', 'Hs', '', '1020'],
['ADGRE1', 'NM_001256252.1', 'Endogenous', 'Hs', '', '42']],
dtype=object)
这篇关于在python3中的numpy.txt中打开csv文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!