如何使用Python中的N行在csv文件中创建嵌套字典 [英] How to create a nested dictionary from a csv file with N rows in Python
问题描述
我正在寻找一种方法来读取具有未知数量列的csv文件到嵌套字典中。即输入表格
file.csv:
1,2,3,4
1 ,6,7,8
9,10,11,12
我想要一本字典的形式:
{1:{2:{3:4},6:{7:8}},9 :{10:{11:12}}}
这是为了让O(1)搜索csv文件中的值。
创建字典可能需要相当长的时间,因为在我的应用程序中,我只创建一次,但搜索了数百万次。
我也想要选项来命名相关的列,以便我可以忽略不必要的一次 解决方案
这是我想出来的。
import csv
import itertools
def list_to_dict (lst):
#获取一个列表,并递归地将其转换为一个嵌套字典,其中
#第一个元素是一个键,其值是从
创建的字典。列表。列表中的最后一个元素将是
#最内层字典的值
#INPUTS:
#lst - 列表(例如字符串或浮点数)
#OUTPUT:
#嵌套字典
#示例运行:
#>>> lst = [1,2,3,4]
#>>> list_to_dict(lst)
#{1:{2:{3:4}}}
如果len(lst)== 1:
return lst [0]
else:
data_dict = {lst [-2]:lst [-1]}
lst.pop()
lst [-1] = data_dict
return list_to_dict(lst)
def dict_combine(d1,d2):
#将两个嵌套字典组合成一个。
#输入:
#d1,d2:两个嵌套字典。该函数可能会改变d1和d2,
#,因此如果输入字典不会变异,
#应该传递d1和d2的副本。
#请注意,如果d1是
#更大的字典,则该函数可以更高效地工作。
#输出:
#组合字典
#示例运行:
#>>> d1 = {1:{2:{3:4,5:6}}}
#>>> d2 = {1:{2:{7:8},9:{10,11}}}
#>>> dict_combine(d1,d2)
#{1:{2:{3:4,5:6,7:8},9:{10,11}}}
for key在d2中:
如果键入d1:
d1 [key] = dict_combine(d1 [key],d2 [key])
else:
d1 [key] = d2 [ key]
返回d1
def csv_to_dict(csv_file_path,params = None,n_row_max =无):
#名称:csv_to_dict
#
#说明:读取csv文件并将相关列转换为嵌套的
#字典。
#
#输入:
#csv_file_path:数据文件的完整路径
#params:相关列名称的列表。生成的字典
#将按照与params中的参数相同的顺序进行嵌套。
#默认为无(读取所有列)
#n_row_max:要读取的最大行数。默认值是None
#(读取所有行)
#
#OUTPUT:
#包含所有相关csv数据的嵌套字典
csv_dictionary = { (csv_file_path,'r')作为csv_file:
csv_data = csv.reader(csv_file,delimiter =',')
names = next(csv_data)#
with open阅读标题行
如果不是参数:
#从csv读取列索引列表
relevant_param_indices = list(范围(0,len(names) - 1))
else :
#从csv读取的列索引列表
relevant_param_indices = []
在参数中的名称:
如果名称不在名称中:
#参数名称是没有在标题行中找到
提高ValueError('在csv file'.format(name)中找不到{}}
else:
#获取相关列的索引ns
relevant_param_indices.append(names.index(name))
用于itertools.islice(csv_data,1,n_row_max)中的行:
#获取仅包含相关列的列表
relevant_cols = [在相关参数指标中为i行] [$]
#将字符串转换为数字。不需要
float_row = [在relevant_cols中元素的float(元素)]
#构建嵌套字典
csv_dictionary = dict_combine(csv_dictionary,list_to_dict(float_row))
return csv_dictionary
I was looking for a way to read a csv file with an unknown number of columns into a nested dictionary. i.e. for input of the form
file.csv:
1, 2, 3, 4
1, 6, 7, 8
9, 10, 11, 12
I want a dictionary of the form:
{1:{2:{3:4}, 6:{7:8}}, 9:{10:{11:12}}}
This is in order to allow O(1) search of a value in the csv file. Creating the dictionary can take a relatively long time, as in my application I only create it once, but search it millions of times.
I also wanted an option to name the relevant columns, so that I can ignore unnecessary once
Here is what I came up with. Feel free to comment and suggest improvements.
import csv
import itertools
def list_to_dict(lst):
# Takes a list, and recursively turns it into a nested dictionary, where
# the first element is a key, whose value is the dictionary created from the
# rest of the list. the last element in the list will be the value of the
# innermost dictionary
# INPUTS:
# lst - a list (e.g. of strings or floats)
# OUTPUT:
# A nested dictionary
# EXAMPLE RUN:
# >>> lst = [1, 2, 3, 4]
# >>> list_to_dict(lst)
# {1:{2:{3:4}}}
if len(lst) == 1:
return lst[0]
else:
data_dict = {lst[-2]: lst[-1]}
lst.pop()
lst[-1] = data_dict
return list_to_dict(lst)
def dict_combine(d1, d2):
# Combines two nested dictionaries into one.
# INPUTS:
# d1, d2: Two nested dictionaries. The function might change d1 and d2,
# therefore if the input dictionaries are not to be mutated,
# you should pass copies of d1 and d2.
# Note that the function works more efficiently if d1 is the
# bigger dictionary.
# OUTPUT:
# The combined dictionary
# EXAMPLE RUN:
# >>> d1 = {1: {2: {3: 4, 5: 6}}}
# >>> d2 = {1: {2: {7: 8}, 9: {10, 11}}}
# >>> dict_combine(d1, d2)
# {1: {2: {3: 4, 5: 6, 7: 8}, 9: {10, 11}}}
for key in d2:
if key in d1:
d1[key] = dict_combine(d1[key], d2[key])
else:
d1[key] = d2[key]
return d1
def csv_to_dict(csv_file_path, params=None, n_row_max=None):
# NAME: csv_to_dict
#
# DESCRIPTION: Reads a csv file and turns relevant columns into a nested
# dictionary.
#
# INPUTS:
# csv_file_path: The full path to the data file
# params: A list of relevant column names. The resulting dictionary
# will be nested in the same order as parameters in 'params'.
# Default is None (read all columns)
# n_row_max: The maximum number of rows to read. Default is None
# (read all rows)
#
# OUTPUT:
# A nested dictionary containing all the relevant csv data
csv_dictionary = {}
with open(csv_file_path, 'r') as csv_file:
csv_data = csv.reader(csv_file, delimiter=',')
names = next(csv_data) # Read title line
if not params:
# A list of column indices to read from csv
relevant_param_indices = list(range(0, len(names) - 1))
else:
# A list of column indices to read from csv
relevant_param_indices = []
for name in params:
if name not in names:
# Parameter name is not found in title line
raise ValueError('Could not find {} in csv file'.format(name))
else:
# Get indices of the relevant columns
relevant_param_indices.append(names.index(name))
for row in itertools.islice(csv_data, 1, n_row_max):
# Get a list containing relevant columns only
relevant_cols = [row[i] for i in relevant_param_indices]
# Turn the string to numbers. Not necessary
float_row = [float(element) for element in relevant_cols]
# Build nested dictionary
csv_dictionary = dict_combine(csv_dictionary, list_to_dict(float_row))
return csv_dictionary
这篇关于如何使用Python中的N行在csv文件中创建嵌套字典的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!