同时需要从列表和元组值对的字典中删除项目 [英] Need to remove items from both a list and a dictionary of tuple value pairs at same time

查看:94
本文介绍了同时需要从列表和元组值对的字典中删除项目的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这与以前的问题,但我意识到我的目标要复杂得多:



我有一句话:福布斯亚洲200最佳下我有一个令牌像:

 

code> oldTokens = [u'Forbes',u'Asia',u'200',u'Best',u'Under',u'500',u'Billion',u'2011']

以前的解析器找出哪里应该有位置或号码位置的索引:

  numberTokenIDs = {(7,):2011.0,(2)):200.0,(5,6):500000000000.00} 
locationTokenIDs = {(0,1):u'Forbes Asia'}

令牌ID对应于令牌的位置或数字的索引,目的是获取一组新的令牌,如:

  newTokens = [u'Asia',u'200',u'Best',u'Under',u'500',u'2011'] 

新的号码和位置tokenID可能是(为了避免索引超出范围的异常):

  numberTokenIDs = {(5,):2011.0,(1)):200.0,(4):500000000000.00} 
locationTokenIDs = {(0, ):u'Forbes Asia'}

本质上,我想通过一系列新的令牌,并能够最终创建一个新的句子:

 LOCATION_SLOT NUMBER_SLOT NUMBER_SLOT NUMBER_SLOT以下

通过新的令牌,并用LOCATION_SLOT或NUMBER_SLOT替换正确的tokenID。如果我使用当前的一组号码和位置令牌ID,我将得到:

 LOCATION_SLOT LOCATION_SLOT NUMBER_SLOT NUMBER_SLOT以下NUMBER_SLOT NUMBER_SLOT。 

我该怎么做?



另一个例子是:

 位置标记ID为:(0,1)
数字标记ID为:(3, 4)
旧样本标签[u'United',u'Kingdom',u'USD',u'1.240',u'billion']

我想要同时删除令牌,还可以更改位置和号码令牌ID,以便能够替换如下所示的句子:

  sampleTokens [numberTokenID] =NUMBER_SLOT
sampleTokens [locationTokenID] =LOCATION_SLOT

这样替换的令牌是 [u'LOCATION_SLOT',u'USD',u'NUMBER_SLOT']

解决方案

不是一个非常优雅,但工作的解决方案:

  oldTokens = [u'Forbes',u'Asia',u'200',u'Best',u'Under',u'500',u'Billion',u'2011'] 

numberTokenIDs = {(7,):2011.0,(2)):200.0,(5 ,6):500000000000.00}
locationTokenIDs = {(0,1):u'Forbes Asia'}

newTokens = []
newnumberTokenIDs = {}
newlocationTokenIDs = {}

new_ind = 0
skip = False

for range(len(oldTokens)):
如果跳过:
skip = False
继续

for loc_ind在locationTokenIDs.keys()中:
if ind in loc_ind:
newTokens.append(oldTokens [ind + 1])
newlocationTokenIDs [(new_ind,]] = locationTokenIDs [loc_ind]
new_ind + = 1
如果len(loc_ind)> 1:#如果元组中有2个元素,则跳过下一个位置
skip = True
break
else:
在numberTokenIDs.keys()中的num_ind:
if ind in num_ind:
newTokens.append(oldTokens [ind])
newnumberTokenIDs [(new_ind,]] = numberTokenIDs [num_ind]
new_ind + = 1
如果len(num_ind) > 1:
skip = True
break
else:
newTokens.append(oldTokens [ind])
new_ind + = 1

newTokens
Out [37]:[u'Asia',u'200',u'Best',u'Under',u'500',u'2011']

newnumberTokenIDs
输出[38]:{(1,):200.0,(4,):500000000000.0,(5,):2011.0}

newlocationTokenIDs
Out [39] 0,):u'Forbes Asia'}


This is very related to a previous question but I realised that my objective is much more complicated:

I have a sentence: "Forbes Asia 200 Best Under 500 Billion 2011"

I have tokens like:

oldTokens = [u'Forbes', u'Asia', u'200', u'Best', u'Under', u'500', u'Billion', u'2011']

And the indices of where a previous parser has figured out where there should be location or number slots:

numberTokenIDs =  {(7,): 2011.0, (2,): 200.0, (5,6): 500000000000.00}
locationTokenIDs = {(0, 1): u'Forbes Asia'}

The token IDs correspond to the index of the tokens where there are locations or numbers, the objective is to obtain a new set of tokens like:

newTokens = [u'Asia', u'200', u'Best', u'Under', u'500', u'2011']

With new number and location tokenIDs perhaps like (to avoid index out of bounds exceptions):

numberTokenIDs =  {(5,): 2011.0, (1,): 200.0, (4,): 500000000000.00}
locationTokenIDs = {(0,): u'Forbes Asia'}

Essentially I would like to go through the new reduced set of tokens, and be able to ultimately create a new sentence called:

"LOCATION_SLOT NUMBER_SLOT Best Under NUMBER_SLOT NUMBER_SLOT"

via going through the new set of tokens and replacing the correct tokenID with either "LOCATION_SLOT" or "NUMBER_SLOT". If I did this with the current set of number and location token IDs, I would get:

"LOCATION_SLOT LOCATION_SLOT NUMBER_SLOT Best Under NUMBER_SLOT NUMBER_SLOT NUMBER_SLOT".

How would I do this?

Another example is:

Location token IDs are:  (0, 1)
Number token IDs are:  (3, 4)
Old sampleTokens [u'United', u'Kingdom', u'USD', u'1.240', u'billion']

Where I want to both delete tokens and also change location and number token IDs to be able to replace the sentence like:

sampleTokens[numberTokenID] = "NUMBER_SLOT"
sampleTokens[locationTokenID] = "LOCATION_SLOT"

Such that the replaced tokens are [u'LOCATION_SLOT', u'USD', u'NUMBER_SLOT']

解决方案

Not a very elegant, but working solution:

oldTokens = [u'Forbes', u'Asia', u'200', u'Best', u'Under', u'500', u'Billion', u'2011']

numberTokenIDs =  {(7,): 2011.0, (2,): 200.0, (5,6): 500000000000.00}
locationTokenIDs = {(0, 1): u'Forbes Asia'}

newTokens = []
newnumberTokenIDs = {}
newlocationTokenIDs = {}

new_ind = 0
skip = False

for ind in range(len(oldTokens)):
    if skip:
        skip=False
        continue

    for loc_ind in locationTokenIDs.keys():
        if ind in loc_ind:
            newTokens.append(oldTokens[ind+1])
            newlocationTokenIDs[(new_ind,)] = locationTokenIDs[loc_ind]
            new_ind += 1
            if len(loc_ind) > 1: # Skip next position if there are 2 elements in a tuple
                skip = True
            break
    else:
        for num_ind in numberTokenIDs.keys():
            if ind in num_ind:
                newTokens.append(oldTokens[ind])
                newnumberTokenIDs[(new_ind,)] = numberTokenIDs[num_ind]
                new_ind += 1
                if len(num_ind) > 1:
                    skip = True
                break
        else:
            newTokens.append(oldTokens[ind])
            new_ind += 1

newTokens
Out[37]: [u'Asia', u'200', u'Best', u'Under', u'500', u'2011']

newnumberTokenIDs
Out[38]: {(1,): 200.0, (4,): 500000000000.0, (5,): 2011.0}

newlocationTokenIDs
Out[39]: {(0,): u'Forbes Asia'}

这篇关于同时需要从列表和元组值对的字典中删除项目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆