numpy/pandas:如何将一系列由零和一组成的字符串转换为矩阵 [英] numpy/pandas: How to convert a series of strings of zeros and ones into a matrix

查看:377
本文介绍了numpy/pandas:如何将一系列由零和一组成的字符串转换为矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个以以下格式到达的数据:

I have a data that arrives in this format:

[
  (1, "000010101001010101011101010101110101", "aaa", ... ),
  (0, "111101010100101010101110101010111010", "bb", ... ),
  (0, "100010110100010101001010101011101010", "ccc", ... ),
  (1, "000010101001010101011101010101110101", "ddd", ... ),
  (1, "110100010101001010101011101010111101", "eeee", ... ),
  ...
]

以元组格式,看起来像这样:

In tuple format, it looks like this:

(Y, X, other_info, ... )

最后,我需要使用Y和X训练分类器(例如sklearn.linear_model.logistic.LogisticRegression).

At the end of the day, I need to train a classifier (e.g. sklearn.linear_model.logistic.LogisticRegression) using Y and X.

将一和零的字符串转换为类似np.array的最直接的方法是什么,以便我可以通过分类器运行它?似乎这里应该有一个简单的答案,但我一直没想到/谷歌.

What's the most straightforward way to turn the string of ones and zeros into something like a np.array, so that I can run it through the classifier? Seems like there should be an easy answer here, but I haven't been able to think of/google one.

一些注意事项:

  • 我已经在使用numpy/pandas/sklearn,所以这些库中的任何东西都是公平的游戏.
  • 对于我正在做的很多事情,将other_info列一起放在DataFrame中很方便
  • 字符串很长(〜20,000列),但总数据帧不是很高(〜500行).

推荐答案

由于您主要要求将一串和零串转换为numpy数组的方法,因此,我将提供以下解决方案:

Since you asked primarily for a way to convert a string of ones and zeros into a numpy array, I'll offer my solution as follows:

d = '0101010000' * 2000 # create a 20,000 long string of 1s and 0s
d_array = np.fromstring(d, 'int8') - 48 # 48 is ascii 0. ascii 1 is 49

就速度而言,这比 @DSM的解决方案更有利:

This compares favourable to @DSM's solution in terms of speed:

In [21]: timeit numpy.fromstring(d, dtype='int8') - 48
10000 loops, best of 3: 35.8 us per loop

In [22]: timeit numpy.fromiter(d, dtype='int', count=20000)
100 loops, best of 3: 8.57 ms per loop

这篇关于numpy/pandas:如何将一系列由零和一组成的字符串转换为矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆