索引csv文件时for循环如何工作? [英] How does for loop work when indexing csv files?

查看:45
本文介绍了索引csv文件时for循环如何工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用包含大量信息的大型csv文件.通过提取l以"ATOM"开头的行,基于biopandas文件来创建它.我设法只提取了我需要的列,但似乎只读取了第一行中的字母.我需要它来按列读取下一行中的值.

I'm using a large csv file with a lot of information. It was created based on a biopandas file by extracting l the rows that start with "ATOM". I managed to extract only the columns I need, but it only seems to read the letters in the first row. I need it to read the values in the next rows by column.

这是我使用的代码:

import numpy as np
import pandas as pd

p1 = pd.read_csv ('1xao.csv',index_col='atom_name',usecols=['atom_name','x_coord','y_coord','z_coord'])
print('info in csv',p1)

for row in p1:
    for column in row:
        a=row[1]
        x=row[2]
        y=row[3]
        z=row[4]

        print('after the for loop',a,x,y,z)

输出为

info in csv            x_coord  y_coord  z_coord
|atom_name |         |         |        |
|N         | -20.557 |  15.418 |  15.416|
|CA        | -21.279 |  14.111 |  15.335|
|C         | -20.626 |  13.120 |  14.374|
|O         | -20.124 |  13.507 |  13.318|
|CB        | -22.723 |  14.347 |  14.907|
|...       |     ... |     ... |     ...|
|CA        | -12.469 |  -1.643 |  -2.404|
|C         | -12.890 |  -2.022 |  -0.985|
|O         | -14.089 |  -2.315 |  -0.787|
|CB        | -11.354 |  -2.564 |  -2.882|
|OXT       | -12.019 |  -2.015 |  -0.089|

[1782 rows x 3 columns]
after the for loop _ c o o
after the for loop _ c o o
after the for loop _ c o o
after the for loop _ c o o
after the for loop _ c o o
after the for loop _ c o o
after the for loop _ c o o
after the for loop _ c o o
after the for loop _ c o o
after the for loop _ c o o
after the for loop _ c o o
after the for loop _ c o o
after the for loop _ c o o
after the for loop _ c o o
after the for loop _ c o o
after the for loop _ c o o
after the for loop _ c o o
after the for loop _ c o o
after the for loop _ c o o
after the for loop _ c o o
after the for loop _ c o o
>>> 

因此,当它进入for循环时,即问题开始了.我想要一行输出:N,-20.557、15.418、15.416,依此类推,对于我所有的原子名称

So when it goes into the for loop that's when the problem starts. I want a line that outputs: N,-20.557,15.418,15.416, and so on for all my atom names

我敢肯定我没有正确使用它,但是我不知道如何解决它.

I'm pretty sure I'm not using it right, but I don't know how to fix it.

推荐答案

在df中使用x :

当您遍历这样的数据框时,您遍历列标签,而不是遍历数据框.这是因为 __ iter __ 使用的是 info_axis ,它是DataFrame的列.

for x in df:

When you iterate over a DataFrame like this you iterate over the column labels, NOT the DataFrame. This because __iter__ uses the info_axis, which is the columns for a DataFrame.

import pandas as pd
df = pd.DataFrame(data=1, columns=['X_1234', 'Y_ABC', 'Z_abc'], 
                  index=['foo', 'bar'])

df.__iter__?
#Signature: df.__iter__()
#Docstring: Iterate over info axis.

df._info_axis
#Index(['X_1234', 'Y_ABC', 'Z_abc'], dtype='object')


因此,基本上,您的循环非常混乱;您将变量称为行和列.您所说的 row 实际上是列标签,然后在对其进行迭代(作为字符串)时,您会得到一个字符.


So basically your loop is very mangled; you're referring to variables as rows and columns. What you refer to as a row is really the column label and then when you iterate over that (being a string) you get a character.

因为您的标签是'x_coord''y_cood''z_coord',所以所有标签中的第一个字符是'_',第二,第三和第四分别是'c','o'和'0'.以下是一种更适合您的循环的命名方案:

Because your labels are 'x_coord', 'y_cood', 'z_coord' the 1st character in all of them is '_', the 2nd, 3rd and 4th are 'c', 'o', '0', respectively. A more appropriate naming scheme for your loop is the following:

for col_lab in df:
    for char in col_lab:  # This does nothing as you never use char and only
        char1=col_lab[1]  # reference label. Essentially you just repeat this
        char2=col_lab[2]  # print N times, where N is the number of characters
        char3=col_lab[3]  # in each label. 
        char4=col_lab[4]
        
        print('after the for loop', char1, char2, char3, char4)

#after the for loop _ 1 2 3   |
#after the for loop _ 1 2 3   |
#after the for loop _ 1 2 3   | Repeated 6 times, i.e len('X_1234')
#after the for loop _ 1 2 3   |
#after the for loop _ 1 2 3   |
#after the for loop _ 1 2 3   |
#after the for loop _ A B C   \
#after the for loop _ A B C   \
#after the for loop _ A B C   \ Repeated 5 times, i.e len('Y_ABC')
#after the for loop _ A B C   \
#after the for loop _ A B C   \
#after the for loop _ a b c
#after the for loop _ a b c
#after the for loop _ a b c
#after the for loop _ a b c
#after the for loop _ a b c


如果需要遍历DataFrame的行,则可以使用 iterrows itertuples . iterrows 将每一行变成一个Series,其中原来的DataFrame列标签现在是该Series的行标签.行标签将成为系列名称.


If you need to iterate over the rows of a DataFrame you can use iterrows or itertuples. iterrows turns each row into a Series, where the original DataFrame column labels are now the row-labels for that Series. The row label becomes the Series name.

for r_label, row in df.iterrows():
    print(row, '\n')

#X_1234    1
#Y_ABC     1
#Z_abc     1
#Name: foo, dtype: int64
#
#X_1234    1
#Y_ABC     1
#Z_abc     1
#Name: bar, dtype: int64 

这篇关于索引csv文件时for循环如何工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆