从Python的嵌套列表中提取字符串 [英] Extracting strings from nested lists in Python

查看:672
本文介绍了从Python的嵌套列表中提取字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

可能重复:
用Python整理的(不规则的)列表列表

Possible Duplicate:
Flatten (an irregular) list of lists in Python

我正在尝试使用python中的nltk库(更具体地说是wordnet语料库)来提取诸如动物"之类的广泛语义类别中的所有单词.我设法编写了一个遍历所有类别并提取其中的单词的函数,但是最终我得到的是列表中大量列表的混乱情况.列表的长度或深度都不是可预测的,它们看起来像这样:

I'm trying to use the nltk library in python, and more specifically the wordnet corpus, to extract all the words in a broad semantic category like 'animal'. I've managed to write a function that goes down through all the categories and extracts the words in them, but what I end up with is a huge jumble of lists within lists. The lists aren't of any predictable length or depth, they look like this:

['pet', 'pest', 'mate', 'young', 'stunt', 'giant', ['hen', 'dam', 'filly'], ['head', 'stray', 'dog', ['puppy', 'toy', 'spitz', 'pooch', 'doggy', 'cur', 'mutt', 'pug', 'corgi', ['Peke'], ['chow'], ['feist', 'fice'], ['hound', ['Lhasa', 'cairn']], ['boxer', 'husky']], ['tabby', 'tabby', 'queen', 'Manx', 'tom', 'kitty', 'puss', 'pussy', ['gib']]]

我想要的是能够从中获取每个字符串,并返回一个未嵌套的列表.有什么建议吗?

What I want is to be able to grab each of those strings out of that , and return a single, unnested list. Any advice?

推荐答案

通常,当您必须处理任意级别的嵌套时,递归解决方案是一个不错的选择.列表中的列表,解析HTML(标签中的标签),使用文件系统(目录中的目录)等.

In general, when you have to deal with arbitrary levels of nesting, a recursive solution is a good fit. Lists within lists, parsing HTML (tags within tags), working with filesystems (directories within directories), etc.

我尚未对该代码进行广泛的测试,但我相信它应该可以执行您想要的操作:

I haven't tested this code extensively, but I believe it should do what you want:

ll = [ 1, 2, 3, [4, 5, [6, 7, 8]]]

def flatten(input_list):
    output_list = []
    for element in input_list:
        if type(element) == list:
            output_list.extend(flatten(element))
        else:
            output_list.append(element)
    return output_list

print (flatten(ll)) #prints [1, 2, 3, 4, 5, 6, 7, 8]

通常,递归很容易考虑,解决方案往往非常优雅(如上),但是对于真正非常嵌套的东西(想想成千上万个级别),您可能会遇到栈溢出等问题.

In general recursion is very easy to think about and the solutions tend to be very elegant (like above) but for really, really deeply nested things - think thousands of levels deep - you can run into problems like stack overflow.

通常这不是问题,但是我相信递归函数始终可以*转换为循环(看起来并不那么好.)

Generally this isn't a problem, but I believe a recursive function can always* be converted to a loop (it just doesn't look as nice.)

  • 注意:在这里,我对compsci理论并不感到热衷.如果我输入错误,可以添加详细信息或纠正我.

这篇关于从Python的嵌套列表中提取字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆