大 pandas 遍历行与行列并根据某些条件进行打印 [英] pandas Iterate through Rows & Column and print it based on some condition

查看:77
本文介绍了大 pandas 遍历行与行列并根据某些条件进行打印的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Excel文件&我处理了该文件以进行数据分析并创建了 Data Frame(Pandas).
现在我需要得到结果,我试图通过使用 for&if 条件但是我没有得到想要的输出.
我在Excel文件中使用了连字符(-),以便可以应用一些条件.

I have an excel file & I Processed that file for Data Analysis and Created a Data Frame(Pandas).
Now I Need to Get the result , I'm trying to get it through iterating over pandas columns and rows using for & if Condition But I'm not getting desired output.
I've Taken hyphen(-) in excel file so that I can apply some conditions.

Excel文件输入文件

Required Output
A -> B -> C -> E -> I
F -> G ->L
H -> J -> K 
A1 -> B1
C1 -> A1
Z -> X
Note: Saving Output in Text file in plain. No need of Graph / Visualization

代码

df =  pd.read_excel('Test.xlsx')
df.fillna('-')
     
# Below code answer Z -> X
for index, row in df.iterrows():
    if row['Start_Name'] != '-':
        if row['End_Name'] != '-':
            print(row['Start_Name'] +' -> '+ row['End_Name'])

# Below code answer A -> B / F -> G / H -> J / C1 -> A1     
for index, row in df.iterrows():
    if row['Start_Name'] != '-':
        if row['Mid_Name_1'] == '-':
            if row['Mid_Name_2'] != '-':
                print(row['Start_Name'] +' -> '+ row['Mid_Name_2'])

# Below code answer B -> C /  C -> E
for index, row in df.iterrows():
    if row['Mid_Name_1'] != '-':
        if row['Mid_Name_2'] != '-':
            print(row['Mid_Name_1'] +' -> '+ row['Mid_Name_2'])

推荐答案

设置:

Fronts 词典保存以名称/关键字开头的序列的值/位置.

Fronts dictionary holds value/position of the sequence that stars with name/key.

Backs 词典保存以名称/关键字结尾的序列的值/位置.

Backs dictionary holds value/position of the sequence that ends with name/key.

序列是包含所有组合关系的列表.

sequences is a list to hold all combined relations.

position_counter 存储最后创建的序列的位置.

position_counter stores position of last made sequence.

from collections import deque
import pandas as pd

data = pd.read_csv("Names_relations.csv")

fronts = dict()
backs = dict()

sequences = []
position_counter = 0

Extract_all. For each row select values that match regex-pattern

selector = data.apply(lambda row: row.str.extractall("([\w\d]+)"), axis=1)

对于选择器中的每个关系,请提取元素.

For each relation from selector get extracted elements.

将它们放入队列.

检查新的 relation front 是否可以附加到任何先前的序列上.

Check if front of new relation can be attached to any previous sequence.

如果是这样:

  1. 采用该序列的位置.
  2. 将序列本身作为 llist2
  3. llist2
  4. 中删除最后一个重复的元素
  5. 添加序列
  6. 使用连接的文字更新序列
  7. 使用序列的当前结束位置更新 backs
  8. 最后从 fronts backs
  9. 中删除上一个序列的突出末端
  1. take position of that sequence.
  2. take sequence itself as llist2
  3. remove last duplicated element from llist2
  4. add the sequences
  5. update sequences with connected llists
  6. update backs with position of the current end of the seuquence
  7. and finally remove exausted ends of the previous sequence from fronts and backs

类似于fronts.keys()中的:

Analogous to back in fronts.keys():

如果尚不存在与新关系匹配的序列:

If no already existing sequence match to new relation:

  1. 保存该关系
  2. 使用该关系的位置更新 fronts backs
  3. 更新位置计数器

for relation in selector:
    front, back = relation[0]
    llist = deque((front, back))

    finb =  front in backs.keys()
#     binf = back in fronts.keys()

    if finb:
        position = backs[front]
        llist2 = sequences[position]
        back_llist2 = llist2.pop()
        llist = llist2 + llist
        sequences[position] = llist
        backs[llist[-1]] = position
        if front in fronts.keys():
            del fronts[front]
        if back_llist2 in backs.keys():
            del backs[back_llist2]

#     if binf:
#         position = fronts[back]
#         llist2 = sequences[position]
#         front_llist2 = llist2.popleft()
#         llist = llist + llist2
#         sequences[position] = llist
#         fronts[llist[0]] = position
#         if back in backs.keys():
#             del backs[back]
#         if front_llist2 in fronts.keys():
#             del fronts[front_llist2]

#     if not (finb or binf):
    if not finb: #(equivalent to 'else:')
        sequences.append(llist)
        fronts[front] = position_counter
        backs[back] = position_counter
        position_counter += 1

for s in sequences:
    print(' -> '.join(str(el) for el in s))

输出:

A -> B -> C -> E -> I
F -> G -> L
H -> J -> K
A1 -> B1
C1 -> A1
Z -> X

#if binf is active:
# A -> B -> C -> E -> I
# F -> G -> L
# H -> J -> K
# C1 -> A1 -> B1
# Z -> X

Name_relations.csv

Name_relations.csv

Start_Name,Mid_Name_1,Mid_Name_2,End_Name
A,-,B,-
-,B,C,-
-,C,E,-
F,-,G,-
H,-,J,-
-,E,-,I
-,J,-,K
-,G,-,L
-,A1,-,B1
C1,-,A1,-
Z,-,-,X

这篇关于大 pandas 遍历行与行列并根据某些条件进行打印的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆