将带有换行符和制表符的python字符串转换为字典 [英] Convert python string with newlines and tabs to dictionary

查看:520
本文介绍了将带有换行符和制表符的python字符串转换为字典的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对这个特定的问题有些困惑.我有一个可行的解决方案,但我认为这不是Pythonic.

I'm a bit stuck with this particular problem I'm having. I have a working solution, but I don't think it's very Pythonic.

我有这样的原始文本输出:

I have a raw text output like this:

Key 1   
  Value 1 
Key 2   
  Value 2 
Key 3   
  Value 3a  
  Value 3b
  Value 3c 
Key 4   
  Value 4a  
  Value 4b

我正在尝试制作字典:

{ 'Key 1': ['Value 1'], 'Key 2': ['Value 2'], 'Key 3': ['Value 3a', 'Value 3b', 'Value 3c'], 'Key 4': ['Value 4a', 'Value 4b'] }

原始输出可以制成字符串,看起来像这样:

The raw output can be made into a string and it looks something like this:

my_str = "
Key 1\n\tValue 1
\nKey 2\n\tValue 2
\nKey 3\n\tValue 3a \n\tValue 3b \n\tValue 3c
\nKey 4\n\tValue 4a \n\tValue 4b "

因此,值之间用\ n \ t分隔,而键之间用\ n \ p

So the Values are separated by \n\t and the Keys are separated by \n

如果我尝试执行以下操作:

If I try to do something like this:

dict(item.split('\n\t') for item in my_str.split('\n'))

它不能正确解析它,因为它也将\ n \ t中的'n'分割了.

It doesn't parse it correctly because it splits the 'n' in \n\t as well.

到目前为止,我有这样的事情:

So far I have something like this:

#!/usr/bin/env python

str = "Key 1\n\tValue 1\nKey 2\n\tValue 2\nKey 3\n\tValue 3a \n\tValue 3b \n\tValue 3c\nKey 4\n\tValue 4a \n\tValue 4b"

output = str.replace('\n\t', ',').replace('\n',';')
result = {}
for key in output.split(';'):
  result[key.split(',')[0]] = key.split(',')[1:]
print result

哪个返回:

{'Key 1': ['Value 1'], 'Key 2': ['Value 2'], 'Key 3': ['Value 3a ', 'Value 3b ', 'Value 3c'], 'Key 4': ['Value 4a ', 'Value 4b']}

但是,对我来说,这看起来很麻烦,我只是想知道是否有Python方式可以做到这一点.任何帮助将不胜感激!

However, this looks quite gross to me, I'm just wondering if there is a pythonic way to do this. Any help would be super appreciated!

推荐答案

包含电池-defaultdict处理自动将新键的值作为列表进行水化处理,我们利用striswhitespace方法进行检查缩进(否则我们可以使用正则表达式):

Batteries are included - defaultdict deals with auto-hydrating a new key's value as a list and we leverage str's iswhitespace method to check for indentation (otherwise we could have used a regular expression):

from collections import defaultdict

data = """
Key 1   
  Value 1 
Key 2   
  Value 2 
Key 3   
  Value 3a  
  Value 3b
  Value 3c 
Key 4   
  Value 4a  
  Value 4b
"""

result = defaultdict(list)
current_key = None

for line in data.splitlines():
    if not line: continue  # Filter out blank lines

    # If the line is not indented then it is a key
    # Save it and move on
    if not line[0].isspace():
        current_key = line.strip()
        continue

    # Otherwise, add the value
    # (minus leading and trailing whitespace)
    # to our results
    result[current_key].append(line.strip())

# result is now a defaultdict
defaultdict(<class 'list'>,
    {'Key 1': ['Value 1'],
     'Key 2': ['Value 2'], 
     'Key 3': ['Value 3a', 'Value 3b', 'Value 3c'],
     'Key 4': ['Value 4a', 'Value 4b']})

这篇关于将带有换行符和制表符的python字符串转换为字典的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆