如何从Unicode(UTF-8)CSV文件导入numpy数组 [英] how do i import from a unicode (utf-8) csv file into a numpy array
本文介绍了如何从Unicode(UTF-8)CSV文件导入numpy数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我不是要聪明或快速地做到这一点,而只是想做到这一点.
im not trying to do this smart or fast, just trying to do it at all.
我有一个看起来像这样的文件:
i have a file looks like this :
$ cat all_user_token_counts.csv
@5raphaels,in,15
@5raphaels,for,15
@5raphaels,unless,11
@5raphaels,you,11
我知道它的未编码utf-8编码是因为我创建了它,就像这样
i know its uncode utf-8 encoded because i created it, like this
debug('opening ' + ALL_USER_TOKEN_COUNTS_FILE)
file = codecs.open(ALL_USER_TOKEN_COUNTS_FILE, encoding="utf-8",mode= "w")
for (user, token) in tokenizer.get_tokens_from_all_files():
#... count tokens ..
file.write(unicode(username +","+ token +","+ str(count) +"\r\n"))
我想将其读取到一个numpy数组中,使其看起来像这样或类似的东西.
i want to read it in to a numpy array so it looks like this, or something..
array([[u'@5raphaels', u'in', 15],
[u'@5raphaels', u'for', 11],
[u'@5raphaels', u'unless', 11]],
dtype=('<U10', '<U10', int))
在我尝试编写此问题的过程中,我发现它甚至可能无法实现?如果是这样,我很想知道!
As i experiment in process of writing this question it comes to me that it may not even be possible? If so, I'd love to know!
提前谢谢!
推荐答案
使用 查看全文