为列表中的每个唯一值分配一个数字 [英] Assign a number to each unique value in a list
问题描述
我有一个字符串列表.我想为每个字符串分配一个唯一的数字(确切的数字并不重要),并依次使用这些数字创建一个长度相同的列表.以下是我的最佳尝试,但出于以下两个原因,我感到不满意:
I have a list of strings. I want to assign a unique number to each string (the exact number is not important), and create a list of the same length using these numbers, in order. Below is my best attempt at it, but I am not happy for two reasons:
-
它假定相同的值彼此相邻
It assumes that the same values are next to each other
我必须以0
开始列表,否则输出将不正确
I had to start the list with a 0
, otherwise the output would be incorrect
我的代码:
names = ['ll', 'll', 'll', 'hl', 'hl', 'hl', 'LL', 'LL', 'LL', 'HL', 'HL', 'HL']
numbers = [0]
num = 0
for item in range(len(names)):
if item == len(names) - 1:
break
elif names[item] == names[item+1]:
numbers.append(num)
else:
num = num + 1
numbers.append(num)
print(numbers)
我想使代码更通用,因此它将与未知列表一起使用.有什么想法吗?
I want to make the code more generic, so it will work with an unknown list. Any ideas?
推荐答案
无需使用外部库(检查 EDIT 以获取Pandas
解决方案),您可以按照以下步骤进行操作:
Without using an external library (check the EDIT for a Pandas
solution) you can do it as follows :
d = {ni: indi for indi, ni in enumerate(set(names))}
numbers = [d[ni] for ni in names]
简要说明:
在第一行中,为列表中的每个唯一元素分配一个数字(存储在字典d
中;您可以使用字典理解轻松地创建它; set
返回names
的唯一元素) .
In the first line, you assign a number to each unique element in your list (stored in the dictionary d
; you can easily create it using a dictionary comprehension; set
returns the unique elements of names
).
然后,在第二行中,进行列表理解,并将实际数字存储在列表numbers
中.
Then, in the second line, you do a list comprehension and store the actual numbers in the list numbers
.
一个示例说明它也可以用于未排序的列表:
One example to illustrate that it also works fine for unsorted lists:
# 'll' appears all over the place
names = ['ll', 'll', 'hl', 'hl', 'hl', 'LL', 'LL', 'll', 'LL', 'HL', 'HL', 'HL', 'll']
这是numbers
的输出:
[1, 1, 3, 3, 3, 2, 2, 1, 2, 0, 0, 0, 1]
如您所见,与ll
关联的数字1
出现在正确的位置.
As you can see, the number 1
associated with ll
appears at the correct places.
编辑
If you have Pandas available, you can also use pandas.factorize
(which seems to be quite efficient for huge lists and also works fine for lists of tuples as explained here):
import pandas as pd
pd.factorize(names)
然后将返回
(array([(array([0, 0, 1, 1, 1, 2, 2, 0, 2, 3, 3, 3, 0]),
array(['ll', 'hl', 'LL', 'HL'], dtype=object))
因此
numbers = pd.factorize(names)[0]
这篇关于为列表中的每个唯一值分配一个数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!