python中的SQL递归 [英] SQL recursion in python

查看:54
本文介绍了python中的SQL递归的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 Pandas read_csv 从 2 个 csv 读取数据.

I'm reading data from 2 csv using pandas read_csv.

详细信息.csv

ID,VALID
P1,Yes
P2,No
P3,Yes
P4,No

Relations.csv

Relations.csv

ParentID,ChildID
P1,C1
P1,C2
C1,C1A
C2,C2A
C1A,C1AA
P2,D1
P2,D2
D2,D2A
D2A,D2AA
P3,C4
P4,C5

现在我将两者都存储在单独的数据帧中.我必须从 Relationship 中的 Details 中检查 ID,并为每个 ID 找到其所有级别的子级(直到没有其他子级).如果 ID 的 VALID 列具有 Yes,则其所有子项都应具有VALID";如果不是,那么这些是无效的".

Now i stored both in separate dataframes. I have to check the ID's from Details in Relationship and for each ID find all level of its children(until no further child). If the ID has Yes for VALID column, then all its child should have "VALID" if not then those are "NOT VALID".

预期输出

P1,VALID
C1,VALID
C2,VALID
C1A,VALID
C2A,VALID
C1AA,VALID
P2,NOT VALID
D1,NOT VALID
D2,NOT VALID
D2A,NOT VALID
D2AA,NOT VALID
P3,VALID
C4,VALID
P4,NOT VALID
C5,NOT VALID

目前我正在 SQL 中执行此操作.我不知道如何在 python 中复制它.pandas 中是否有任何可用的函数,或者我必须使用 for 循环.任何建议将不胜感激.

Currently I'm doing this in SQL. I have no idea how to replicate this in python. Are there any functions available in pandas or I have to do with for loop. Any suggestion would be appreciated.

从这个问题,我尝试过类似下面的事情,但它是不工作.

From this question, i have tried something like below but it is not working.

import pandas as pd

details = pd.read_csv('C:/Myfolder/Python/Details.csv')
relationship = pd.read_csv('C:/Myfolder/Python/Relationship.csv')

def nlevel(details.id, parent_dict=relationship.ParentID, _cache={0:0}):
if details.id in _cache:
    return _cache[details.id]

return 1+nlevel(parent_dict[details.id],parent_dict)

推荐答案

您可以使用堆栈递归地执行此操作.首先,将元素放入邻接列表以简化遍历,然后清空堆栈,添加子项对于每个节点并将有效性字符串存储在单独的结构中.保留顺序,从有序的节点有效性对构建结果数据帧并转储到 CSV.

You can do this recursively using a stack. First, put the elements into an adjacency list to simplify traversal, then empty the stack, adding children for each node and storing validity strings in a separate structure. Preserving order, build a result dataframe from the ordered node-validity pairs and dump to CSV.

import pandas as pd
from collections import defaultdict

details_df = pd.read_csv("Details.csv")
relationship_df = pd.read_csv("Relationship.csv")
order = {k: 1 for k in relationship_df.values.flatten()}
graph = defaultdict(list)
validity = {}

for parent, child in relationship_df.values:
    graph[parent].append(child)

for root, valid in details_df.values:
    stack = [root]

    while stack:
        curr = stack.pop()
        validity[curr] = "VALID" if valid == "Yes" else "NOT VALID"
        stack.extend(graph[curr])

result_df = pd.DataFrame([[x, validity[x]] for x in order])
print(result_df.to_csv(index=False, header=False))

输出:

P1,VALID
C1,VALID
C2,VALID
C1A,VALID
C2A,VALID
C1AA,VALID
P2,NOT VALID
D1,NOT VALID
D2,NOT VALID
D2A,NOT VALID
D2AA,NOT VALID
P3,VALID
C4,VALID
P4,NOT VALID
C5,NOT VALID

这篇关于python中的SQL递归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆