如何使用Python中的N行在csv文件中创建嵌套字典 [英] How to create a nested dictionary from a csv file with N rows in Python

查看:238
本文介绍了如何使用Python中的N行在csv文件中创建嵌套字典的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种方法来读取具有未知数量列的csv文件到嵌套字典中。即输入表格

  file.csv:
1,2,3,4
1 ,6,7,8
9,10,11,12

我想要一本字典的形式:

  {1:{2:{3:4},6:{7:8}},9 :{10:{11:12}}} 

这是为了让O(1)搜索csv文件中的值。
创建字典可能需要相当长的时间,因为在我的应用程序中,我只创建一次,但搜索了数百万次。



我也想要选项来命名相关的列,以便我可以忽略不必要的一次 解决方案

这是我想出来的。

  import csv 
import itertools

def list_to_dict (lst):
#获取一个列表,并递归地将其转换为一个嵌套字典,其中
#第一个元素是一个键,其值是从
创建的字典。列表。列表中的最后一个元素将是
#最内层字典的值
#INPUTS:
#lst - 列表(例如字符串或浮点数)
#OUTPUT:
#嵌套字典
#示例运行:
#>>> lst = [1,2,3,4]
#>>> list_to_dict(lst)
#{1:{2:{3:4}}}
如果len(lst)== 1:
return lst [0]
else:
data_dict = {lst [-2]:lst [-1]}
lst.pop()
lst [-1] = data_dict
return list_to_dict(lst)


def dict_combine(d1,d2):
#将两个嵌套字典组合成一个。
#输入:
#d1,d2:两个嵌套字典。该函数可能会改变d1和d2,
#,因此如果输入字典不会变异,
#应该传递d​​1和d2的副本。
#请注意,如果d1是
#更大的字典,则该函数可以更高效地工作。
#输出:
#组合字典
#示例运行:
#>>> d1 = {1:{2:{3:4,5:6}}}
#>>> d2 = {1:{2:{7:8},9:{10,11}}}
#>>> dict_combine(d1,d2)
#{1:{2:{3:4,5:6,7:8},9:{10,11}}}

for key在d2中:
如果键入d1:
d1 [key] = dict_combine(d1 [key],d2 [key])
else:
d1 [key] = d2 [ key]
返回d1


def csv_to_dict(csv_file_path,params = None,n_row_max =无):
#名称:csv_to_dict

#说明:读取csv文件并将相关列转换为嵌套的
#字典。

#输入:
#csv_file_path:数据文件的完整路径
#params:相关列名称的列表。生成的字典
#将按照与params中的参数相同的顺序进行嵌套。
#默认为无(读取所有列)
#n_row_max:要读取的最大行数。默认值是None
#(读取所有行)

#OUTPUT:
#包含所有相关csv数据的嵌套字典

csv_dictionary = { (csv_file_path,'r')作为csv_file:
csv_data = csv.reader(csv_file,delimiter =',')
names = next(csv_data)#

with open阅读标题行
如果不是参数:
#从csv读取列索引列表
relevant_param_indices = list(范围(0,len(names) - 1))
else :
#从csv读取的列索引列表
relevant_param_indices = []
在参数中的名称:
如果名称不在名称中:
#参数名称是没有在标题行中找到
提高ValueError('在csv file'.format(name)中找不到{}}
else:
#获取相关列的索引ns
relevant_param_indices.append(names.index(name))
用于itertools.islice(csv_data,1,n_row_max)中的行:
#获取仅包含相关列的列表
relevant_cols = [在相关参数指标中为i行] [$]
#将字符串转换为数字。不需要
float_row = [在relevant_cols中元素的float(元素)]
#构建嵌套字典
csv_dictionary = dict_combine(csv_dictionary,list_to_dict(float_row))

return csv_dictionary


I was looking for a way to read a csv file with an unknown number of columns into a nested dictionary. i.e. for input of the form

file.csv:
1,  2,  3,  4
1,  6,  7,  8
9, 10, 11, 12

I want a dictionary of the form:

{1:{2:{3:4}, 6:{7:8}}, 9:{10:{11:12}}}

This is in order to allow O(1) search of a value in the csv file. Creating the dictionary can take a relatively long time, as in my application I only create it once, but search it millions of times.

I also wanted an option to name the relevant columns, so that I can ignore unnecessary once

解决方案

Here is what I came up with. Feel free to comment and suggest improvements.

import csv
import itertools

def list_to_dict(lst):
    # Takes a list, and recursively turns it into a nested dictionary, where
    # the first element is a key, whose value is the dictionary created from the 
    # rest of the list. the last element in the list will be the value of the
    # innermost dictionary
    # INPUTS:
    #   lst - a list (e.g. of strings or floats)
    # OUTPUT:
    #   A nested dictionary
    # EXAMPLE RUN:
    #   >>> lst = [1, 2, 3, 4]
    #   >>> list_to_dict(lst)
    #   {1:{2:{3:4}}}
    if len(lst) == 1:
        return lst[0]
    else:
        data_dict = {lst[-2]: lst[-1]}
        lst.pop()
        lst[-1] = data_dict
        return list_to_dict(lst)


def dict_combine(d1, d2):
    # Combines two nested dictionaries into one.
    # INPUTS:
    #   d1, d2: Two nested dictionaries. The function might change d1 and d2, 
    #           therefore if the input dictionaries are not to be mutated, 
    #           you should pass copies of d1 and d2.
    #           Note that the function works more efficiently if d1 is the 
    #           bigger dictionary.
    # OUTPUT:
    #   The combined dictionary
    # EXAMPLE RUN:
    #   >>> d1 = {1: {2: {3: 4, 5: 6}}}
    #   >>> d2 = {1: {2: {7: 8}, 9: {10, 11}}}
    #   >>> dict_combine(d1, d2)
    #   {1: {2: {3: 4, 5: 6, 7: 8}, 9: {10, 11}}}

    for key in d2:
        if key in d1:
            d1[key] = dict_combine(d1[key], d2[key])
        else:
            d1[key] = d2[key]
    return d1


def csv_to_dict(csv_file_path, params=None, n_row_max=None):
    # NAME: csv_to_dict
    #
    # DESCRIPTION: Reads a csv file and turns relevant columns into a nested 
    #              dictionary.
    #
    # INPUTS:
    #   csv_file_path: The full path to the data file
    #   params:        A list of relevant column names. The resulting dictionary
    #                  will be nested in the same order as parameters in 'params'.
    #                  Default is None (read all columns)
    #   n_row_max:     The maximum number of rows to read. Default is None
    #                  (read all rows)
    #
    # OUTPUT:
    #   A nested dictionary containing all the relevant csv data

    csv_dictionary = {}

    with open(csv_file_path, 'r') as csv_file:
        csv_data = csv.reader(csv_file, delimiter=',')
        names  = next(csv_data)          # Read title line
        if not params:
            # A list of column indices to read from csv
            relevant_param_indices = list(range(0, len(names) - 1))  
        else:
            # A list of column indices to read from csv
            relevant_param_indices = []  
            for name in params:
                if name not in names:    
                # Parameter name is not found in title line
                    raise ValueError('Could not find {} in csv file'.format(name))
                else:
                # Get indices of the relevant columns
                    relevant_param_indices.append(names.index(name))   
        for row in itertools.islice(csv_data, 1, n_row_max):
            # Get a list containing relevant columns only
            relevant_cols = [row[i] for i in relevant_param_indices] 
            # Turn the string to numbers. Not necessary  
            float_row = [float(element) for element in relevant_cols]  
            # Build nested dictionary
            csv_dictionary = dict_combine(csv_dictionary, list_to_dict(float_row))  

        return csv_dictionary

这篇关于如何使用Python中的N行在csv文件中创建嵌套字典的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆