在Python中为字符串中的单词分配数字 [英] Assigning number to word in a string in python

查看:66
本文介绍了在Python中为字符串中的单词分配数字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好,我在python中有一个压缩任务来开发代码,如果输入为

hi i have a compression task in python to develop code where if the input is

你好,我,你好,我能听到你的声音吗?

然后输出应为

1,2,3,1,4,5,6,3,1,7,5,8

基本上,每个单词都分配有一个数值,如果重复单词,则单词也是如此.这段编码是用python编写的,请帮助我谢谢

Basically each word is assigned a numerical value and if the word is repeated so is the word. This coding is in python, please help me thank you

推荐答案

一种简单的方法是使用dict,当您发现一个新单词时,如果您之前看到过该单词,则使用递增变量添加键/值对只需打印字典中的值即可:

An easy way is to use a dict, when you find a new word add a key/value pairing using an incrementing variable, when you have seen the word before just print the value from the dict:

s = 'hello its me, hello can you hear me, hello are you listening'


def cyc(s):
    # set i to 1 
    i = 1
    # split into words on whitespace
    it = s.split()
    # create first key/value pair 
    seen = {it[0]: i}
    # yield 1 for first word
    yield i
    # for all var the first word
    for word in it[1:]:
        # if we have seen this word already, use it's value from our dict
        if word in seen:
            yield seen[word]
        # else first time seeing it so increment count
        # and create new k/v pairing
        else:
            i += 1
            yield i
            seen[word] = i


print(list(cyc(s)))

输出:

[1, 2, 3, 1, 4, 5, 6, 3, 1, 7, 5, 8]

您也可以通过使用 iter 并调用 next 弹出第一个单词来避免切片,即使您想将 foo == foo!我们需要从字符串中删除所有标点,这些标点可以通过 str.rstrip :

You can also avoid slicing by using iter and calling next to pop the first word, also if you want to make foo == foo! we need to remove any punctuation from the string which cam be done with str.rstrip:

from string import punctuation
def cyc(s):
    i = 1
    it = iter(s.split())
    seen = {next(it).rstrip(punctuation): i}
    yield i
    for word in it:
        word = word.rstrip(punctuation)
        if word in seen:
            yield seen[word]
        else:
            i += 1
            yield i
            seen[word] = i

这篇关于在Python中为字符串中的单词分配数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆