如何从Unicode(UTF-8)CSV文件导入numpy数组 [英] how do i import from a unicode (utf-8) csv file into a numpy array

查看:106
本文介绍了如何从Unicode(UTF-8)CSV文件导入numpy数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不是要聪明或快速地做到这一点,而只是想做到这一点.

im not trying to do this smart or fast, just trying to do it at all.

我有一个看起来像这样的文件:

i have a file looks like this :

$ cat all_user_token_counts.csv  
@5raphaels,in,15
@5raphaels,for,15
@5raphaels,unless,11
@5raphaels,you,11

我知道它的未编码utf-8编码是因为我创建了它,就像这样

i know its uncode utf-8 encoded because i created it, like this

    debug('opening ' + ALL_USER_TOKEN_COUNTS_FILE)
    file = codecs.open(ALL_USER_TOKEN_COUNTS_FILE, encoding="utf-8",mode= "w")
    for (user, token) in tokenizer.get_tokens_from_all_files():
        #... count tokens ..
        file.write(unicode(username +","+ token +","+ str(count) +"\r\n"))

我想将其读取到一个numpy数组中,使其看起来像这样或类似的东西.

i want to read it in to a numpy array so it looks like this, or something..

   array([[u'@5raphaels', u'in', 15],
          [u'@5raphaels', u'for', 11],
          [u'@5raphaels', u'unless', 11]], 
          dtype=('<U10', '<U10', int))

在我尝试编写此问题的过程中,我发现它甚至可能无法实现?如果是这样,我很想知道!

As i experiment in process of writing this question it comes to me that it may not even be possible? If so, I'd love to know!

提前谢谢!

推荐答案

使用 查看全文

登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆