如何快速重新映射到连续的数字 [英] How to remap ids to consecutive numbers quickly

查看：129 发布时间：2017/3/26 1:33:57 python pandas dataframe

本文介绍了如何快速重新映射到连续的数字的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

  stringa，stringb 
 stringb，stringc 
 stringd，stringa

我需要转换它，以便id从0连续编号。在此案例如下：

  0,1 
 1,2 
 3,0

我目前的代码如下：

 code> import csv 
 names = {} 
 counter = 0 
 with open（'foo.csv'，'rb'）as csvfile：
 reader = csv。读者（csvfile）
读者中的行
如果行[0]在名称中：
 id1 =行[0] 
其他：
名称[row [0 ]] = counter 
 id1 = counter 
 counter + = 1 
如果名称中的行[1]：
 id2 = row [1] 
 else：
名称[row [1]] = counter 
 id2 = counter 
 counter + = 1 
 print id1，id2 
  / pre> 
 
  Python的dache使用了很多内存，我的输入很大。
 
 如果输入太大以致于无法适应内存，我该怎么办？
 
 
如果有的话我也会感兴趣是一个更好/更快的方式来解决这个问题。
解决方案
 
  df = pd .DataFrame（[['a'，'b']，['b'，'c']，['d'，'a']]）
 
v = df.stack（）。 unique（）
 v.sort（）
f = pd.factorize（v）
m = pd.Series（f [0]，f [1]）$ b 
 $ b df.stack（）。map（m）.unstack（）
  
  
 
I have a large csv file with lines that looks like
stringa,stringb
stringb,stringc
stringd,stringa
I  need to convert it so the ids are consecutively numbered from 0. In this case the following would work
0,1
1,2
3,0
My current code looks like:
import csv
names = {}
counter = 0
with open('foo.csv', 'rb') as csvfile:
     reader = csv.reader(csvfile)
     for row in reader:
         if row[0] in names:
             id1 =  row[0]
         else:
             names[row[0]] = counter
             id1 = counter
             counter += 1
         if row[1] in names:
             id2 = row[1]
         else:
             names[row[1]] = counter
             id2 = counter
             counter += 1
     print id1, id2
Python dicts use a lot of memory sadly and my input is large.

  What can I do when the input is too large for the dict to fit in memory
I would also be interested if there is a better/faster way to solve this problem in general.
 解决方案 
df = pd.DataFrame([['a', 'b'], ['b', 'c'], ['d', 'a']])

v = df.stack().unique()
v.sort()
f = pd.factorize(v)
m = pd.Series(f[0], f[1])

df.stack().map(m).unstack()


                        这篇关于如何快速重新映射到连续的数字的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何快速重新映射到连续的数字 [英] How to remap ids to consecutive numbers quickly

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何快速重新映射到连续的数字 [英] How to remap ids to consecutive numbers quickly

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭