如何快速重新映射到连续的数字 [英] How to remap ids to consecutive numbers quickly

查看:129
本文介绍了如何快速重新映射到连续的数字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  stringa,stringb 
stringb,stringc
stringd,stringa

我需要转换它,以便id从0连续编号。在此案例如下:

  0,1 
1,2
3,0

我目前的代码如下:

 code> import csv 
names = {}
counter = 0
with open('foo.csv','rb')as csvfile:
reader = csv。读者(csvfile)
读者中的行
如果行[0]在名称中:
id1 =行[0]
其他:
名称[row [0 ]] = counter
id1 = counter
counter + = 1
如果名称中的行[1]:
id2 = row [1]
else:
名称[row [1]] = counter
id2 = counter
counter + = 1
print id1,id2
/ pre>

Python的dache使用了很多内存,我的输入很大。


如果输入太大以致于无法适应内存,我该怎么办?


如果有的话我也会感兴趣是一个更好/更快的方式来解决这个问题。

解决方案

  df = pd .DataFrame([['a','b'],['b','c'],['d','a']])

v = df.stack()。 unique()
v.sort()
f = pd.factorize(v)
m = pd.Series(f [0],f [1])$ ​​b
$ b df.stack()。map(m).unstack()


I have a large csv file with lines that looks like

stringa,stringb
stringb,stringc
stringd,stringa

I need to convert it so the ids are consecutively numbered from 0. In this case the following would work

0,1
1,2
3,0

My current code looks like:

import csv
names = {}
counter = 0
with open('foo.csv', 'rb') as csvfile:
     reader = csv.reader(csvfile)
     for row in reader:
         if row[0] in names:
             id1 =  row[0]
         else:
             names[row[0]] = counter
             id1 = counter
             counter += 1
         if row[1] in names:
             id2 = row[1]
         else:
             names[row[1]] = counter
             id2 = counter
             counter += 1
     print id1, id2

Python dicts use a lot of memory sadly and my input is large.

What can I do when the input is too large for the dict to fit in memory

I would also be interested if there is a better/faster way to solve this problem in general.

解决方案

df = pd.DataFrame([['a', 'b'], ['b', 'c'], ['d', 'a']])

v = df.stack().unique()
v.sort()
f = pd.factorize(v)
m = pd.Series(f[0], f[1])

df.stack().map(m).unstack()

这篇关于如何快速重新映射到连续的数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆