使用python将字符串中的单词替换为列表中的单词 [英] Replace words in string with words from list using python

查看:778
本文介绍了使用python将字符串中的单词替换为列表中的单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Python创建词云程序,但我陷入了单词替换功能.我正在尝试用有序列表中的单词替换html文件中的一组数字(因此我正在使用字符串).因此,000将替换为列表中的第一个单词,001替换为第二个单词,等等.

I'm working on creating a word cloud program in Python and I'm getting stuck on a word replace function. I am trying to replace a set of numbers in an html file (so I'm working with a string) with words from an ordered list. So 000 would be replaced with the first word in the list, 001 with the second, etc.

所以下面我选择正确替换w的单词,但是我无法用字符串中的单词正确替换w.任何帮助表示赞赏.谢谢!

So below I have it selecting the word to replace w properly but I can't get it to properly replace the it with the words from the string. Any help is appreciated. Thanks!

def replace_all():  
  text = '000 001 002 003 '
  word = ['foo', 'bar', 'that', 'these']
  for a in word:    
    y = -1
    for w in text:     
      y = y + 1
      x = "00"+str(y)
      w = {x:a}      
      for i, j in w.iteritems():
        text = text.replace(i, j)
  print text      

推荐答案

这实际上是一个非常简单的列表理解:

This is actually a really simple list comprehension:

>>> text = '000 001 002 003 '
>>> words = ['foo', 'bar', 'that', 'these']
>>> [words[int(item)] for item in text.split()]
['foo', 'bar', 'that', 'these']

如果需要其他值保留下来,可以满足以下条件:

If you need other values to be left alone, this can be catered for:

def get(seq, item):
    try:
        return seq[int(item)]
    except ValueError:
        return item

然后只需使用类似[get(words, item) for item in text.split()]的东西-自然,如果字符串中还会有其他可能被意外替换的数字,则可能需要在get()中进行更多测试. (编辑结束)

Then simply use something like [get(words, item) for item in text.split()] - naturally, more testing might be required in get() if there will be other numbers in the string that could get accidentally replaced. (End of edit)

我们要做的是将文本拆分为单独的数字,然后将其转换为整数,并使用它们索引您找到的单词列表.

What we do is split the text into the individual numbers, then convert them to integers and use them to index the list you have given to find words.

关于为什么代码无法正常工作,主要问题是您正在遍历字符串,这将为您提供字符,而不是单词.但是,这不是解决任务的好方法.

As to why your code doesn't work, the main issue is you are looping over the string, which will give you characters, not words. However, it's not a great way of solving the task.

值得一提的是,当您遍历值并希望索引与它们一起使用时,应使用

It's also worth a quick note that when you are looping over values and want indices to go with them, you should use the enumerate() builtin rather than using a counting variable.

例如:代替:

y = -1
for w in text:
    y = y + 1
    ...

使用:

for y, w in enumerate(text):
    ...

这更具可读性和Python风格.

This is much more readable and Pythonic.

现有代码的另一件事是:

Another thing with your existing code is this:

w = {x:a}      
for i, j in w.iteritems():
    text = text.replace(i, j)

考虑到这一点,可以简化为:

Which, if you think about it, simplifies down to:

text = text.replace(x, a)

您正在将w设置为一项的字典,然后对其进行循环,但是您知道它只会包含一项.

You are setting w to be a dictionary of one item, then looping over it, but you know it will only ever contain one item.

更接近您的方法的解决方案将是这样的:

A solution that follows your method more closely would be something like this:

words_dict = {"{0:03d}".format(index): value for index, value in enumerate(words)}
for key, value in words_dict.items():
    text = test.replace(key, value)

我们从零填充数字字符串(使用 str.format() )的值,然后替换为每个项目.请注意,在使用2.x时,您会需要dict.iteritems(),如果您在2.7之前,请使用元组生成器上内置的dict(),因为dict理解不存在.

We create a dictionary from the zero padded number string (using str.format()) to the value, then replace for each item. Note as you are using 2.x, you'll want dict.iteritems(), and if you are pre-2.7, use the dict() builtin on a generator of tuples as dict comprehensions don't exist.

这篇关于使用python将字符串中的单词替换为列表中的单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆