元组列表中元组的小写第一个元素 [英] Lowercase first element of tuple in list of tuples

查看:108
本文介绍了元组列表中元组的小写第一个元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一份文件清单,上面标有相应的类别:

I have a list of documents, labeled with their appropriate categories:

documents = [(list(corpus.words(fileid)), category)
              for category in corpus.categories()
              for fileid in corpus.fileids(category)]

这给出了以下元组列表,其中元组的第一个元素是单词(句子的标记)列表.例如:

which gives me the following list of tuples, where the first element of the tuple is a list of words (tokens of a sentence). For instance:

[([u'A', u'pilot', u'investigation', u'of', u'a', u'multidisciplinary', 
u'quality', u'of', u'life', u'intervention', u'for', u'men', u'with', 
u'biochemical', u'recurrence', u'of', u'prostate', u'cancer', u'.'], 
'cancer'), 
([u'A', u'Systematic', u'Review', u'of', u'the', u'Effectiveness', 
u'of', u'Medical', u'Cannabis', u'for', u'Psychiatric', u',', 
u'Movement', u'and', u'Neurodegenerative', u'Disorders', u'.'], 'hd')]

我想应用一些文本处理技术,但是我希望维护元组格式列表.

I want to apply some text-processing techniques, but I wish to maintain the list of tuples format.

我知道,如果我只有一个单词列表,那就可以做到:

I know that if I had only a list of words, this would do:

[w.lower() for w in words]

但是在这种情况下,我想将.lower()应用于元组列表中每个元组的第一个元素(字符串列表),并尝试以下各种选项后:

But in this case, I want to apply .lower() to the first element (list of strings) of every tuple in the tuples list, and after trying various options like:

[[x.lower() for x in element] for element in documents],
[(x.lower(), y) for x,y in documents], or
[x[0].lower() for x in documents]

我总是会收到此错误:

AttributeError:列表"对象没有属性较低"

AttributeError: 'list' object has no attribute 'lower'

我也尝试过在创建列表之前应用所需的内容,但是.categories()和.fileids()是语料库的属性,它们也返回相同的错误(它们也是列表).

I have also tried applying what I need before creating the list, but .categories() and .fileids() are properties of corpus and they also return the same error (they're lists as well).

任何帮助将不胜感激.

已解决:

@Adam Smith的答案和@vasia都是正确的:

both @Adam Smith's answer and @vasia were right:

[([s.lower() for s in item[0]], item[1]) for item in documents]

@Adam的上述答案保持了元组的结构; @vasia从创建元组列表起就发挥了作用:

@Adam's answer above maintains the tuple structure; @vasia does the trick right from the creation of the list of tuples:

documents = [([word.lower() for word in corpus.words(fileid)], category)
              for category in corpus.categories()
              for fileid in corpus.fileids(category)]

谢谢大家:)

推荐答案

,因此您的数据结构为[([str], str)].每个元组为(list of strings, string)的元组列表.在尝试从中提取数据之前,深刻理解这意味着什么很重要.

so your data structure is [([str], str)]. A list of tuples where each tuple is (list of strings, string). It's important to deeply understand what that means before you try to pull data out of it.

这意味着for item in documents将为您提供一个元组列表,其中item是每个元组.

That means that for item in documents will get you a list of tuples, where item is each tuple.

这意味着item[0]是每个元组中的列表.

That means that item[0] is the list in each tuple.

这意味着for item in documents: for s in item[0]:将遍历该列表中的每个字符串.让我们试试吧!

That means that for item in documents: for s in item[0]: will iterate through each string inside that list. Let's try that!

[s.lower() for item in documents for s in item[0]]

这应该从您的示例数据中得出:

This should give, from your example data:

[u'a', u'p', u'i', u'o', u'a', u'm', ...]

如果您尝试保留元组格式,则可以执行以下操作:

If you're trying to keep the tuple format, you could do:

[([s.lower() for s in item[0]], item[1]) for item in documents]

# or perhaps more readably
[([s.lower() for s in lst], val) for lst, val in documents]

这两个语句都给出:

[([u'a', u'p', u'i', u'o', u'a', u'm', ...], 'cancer'), ... ]

这篇关于元组列表中元组的小写第一个元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆