索引csv文件时for循环如何工作? [英] How does for loop work when indexing csv files?
问题描述
我正在使用包含大量信息的大型csv文件.通过提取l以"ATOM"开头的行,基于biopandas文件来创建它.我设法只提取了我需要的列,但似乎只读取了第一行中的字母.我需要它来按列读取下一行中的值.
I'm using a large csv file with a lot of information. It was created based on a biopandas file by extracting l the rows that start with "ATOM". I managed to extract only the columns I need, but it only seems to read the letters in the first row. I need it to read the values in the next rows by column.
这是我使用的代码:
import numpy as np
import pandas as pd
p1 = pd.read_csv ('1xao.csv',index_col='atom_name',usecols=['atom_name','x_coord','y_coord','z_coord'])
print('info in csv',p1)
for row in p1:
for column in row:
a=row[1]
x=row[2]
y=row[3]
z=row[4]
print('after the for loop',a,x,y,z)
输出为
info in csv x_coord y_coord z_coord
|atom_name | | | |
|N | -20.557 | 15.418 | 15.416|
|CA | -21.279 | 14.111 | 15.335|
|C | -20.626 | 13.120 | 14.374|
|O | -20.124 | 13.507 | 13.318|
|CB | -22.723 | 14.347 | 14.907|
|... | ... | ... | ...|
|CA | -12.469 | -1.643 | -2.404|
|C | -12.890 | -2.022 | -0.985|
|O | -14.089 | -2.315 | -0.787|
|CB | -11.354 | -2.564 | -2.882|
|OXT | -12.019 | -2.015 | -0.089|
[1782 rows x 3 columns]
after the for loop _ c o o
after the for loop _ c o o
after the for loop _ c o o
after the for loop _ c o o
after the for loop _ c o o
after the for loop _ c o o
after the for loop _ c o o
after the for loop _ c o o
after the for loop _ c o o
after the for loop _ c o o
after the for loop _ c o o
after the for loop _ c o o
after the for loop _ c o o
after the for loop _ c o o
after the for loop _ c o o
after the for loop _ c o o
after the for loop _ c o o
after the for loop _ c o o
after the for loop _ c o o
after the for loop _ c o o
after the for loop _ c o o
>>>
因此,当它进入for循环时,即问题开始了.我想要一行输出:N,-20.557、15.418、15.416,依此类推,对于我所有的原子名称
So when it goes into the for loop that's when the problem starts. I want a line that outputs: N,-20.557,15.418,15.416, and so on for all my atom names
我敢肯定我没有正确使用它,但是我不知道如何解决它.
I'm pretty sure I'm not using it right, but I don't know how to fix it.
推荐答案
在df中使用x
:
当您遍历这样的数据框时,您遍历列标签,而不是遍历数据框.这是因为 __ iter __
使用的是 info_axis
,它是DataFrame的列.
for x in df
:
When you iterate over a DataFrame like this you iterate over the column labels, NOT the DataFrame. This because __iter__
uses the info_axis
, which is the columns for a DataFrame.
import pandas as pd
df = pd.DataFrame(data=1, columns=['X_1234', 'Y_ABC', 'Z_abc'],
index=['foo', 'bar'])
df.__iter__?
#Signature: df.__iter__()
#Docstring: Iterate over info axis.
df._info_axis
#Index(['X_1234', 'Y_ABC', 'Z_abc'], dtype='object')
因此,基本上,您的循环非常混乱;您将变量称为行和列.您所说的 row
实际上是列标签,然后在对其进行迭代(作为字符串)时,您会得到一个字符.
So basically your loop is very mangled; you're referring to variables as rows and columns. What you refer to as a row
is really the column label and then when you iterate over that (being a string) you get a character.
因为您的标签是'x_coord'
,'y_cood'
,'z_coord'
,所以所有标签中的第一个字符是'_'
,第二,第三和第四分别是'c','o'和'0'.以下是一种更适合您的循环的命名方案:
Because your labels are 'x_coord'
, 'y_cood'
, 'z_coord'
the 1st character in all of them is '_'
, the 2nd, 3rd and 4th are 'c', 'o', '0', respectively. A more appropriate naming scheme for your loop is the following:
for col_lab in df:
for char in col_lab: # This does nothing as you never use char and only
char1=col_lab[1] # reference label. Essentially you just repeat this
char2=col_lab[2] # print N times, where N is the number of characters
char3=col_lab[3] # in each label.
char4=col_lab[4]
print('after the for loop', char1, char2, char3, char4)
#after the for loop _ 1 2 3 |
#after the for loop _ 1 2 3 |
#after the for loop _ 1 2 3 | Repeated 6 times, i.e len('X_1234')
#after the for loop _ 1 2 3 |
#after the for loop _ 1 2 3 |
#after the for loop _ 1 2 3 |
#after the for loop _ A B C \
#after the for loop _ A B C \
#after the for loop _ A B C \ Repeated 5 times, i.e len('Y_ABC')
#after the for loop _ A B C \
#after the for loop _ A B C \
#after the for loop _ a b c
#after the for loop _ a b c
#after the for loop _ a b c
#after the for loop _ a b c
#after the for loop _ a b c
如果需要遍历DataFrame的行,则可以使用 iterrows
或 itertuples
. iterrows
将每一行变成一个Series,其中原来的DataFrame列标签现在是该Series的行标签.行标签将成为系列名称.
If you need to iterate over the rows of a DataFrame you can use iterrows
or itertuples
. iterrows
turns each row into a Series, where the original DataFrame column labels are now the row-labels for that Series. The row label becomes the Series name.
for r_label, row in df.iterrows():
print(row, '\n')
#X_1234 1
#Y_ABC 1
#Z_abc 1
#Name: foo, dtype: int64
#
#X_1234 1
#Y_ABC 1
#Z_abc 1
#Name: bar, dtype: int64
这篇关于索引csv文件时for循环如何工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!