解析一个numpy数组的字符串表示形式 [英] Parsing the string-representation of a numpy array

查看:106
本文介绍了解析一个numpy数组的字符串表示形式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我只有numpy.array的字符串表示形式:

If I only have the string-representation of a numpy.array:

>>> import numpy as np
>>> arr = np.random.randint(0, 10, (10, 10))
>>> print(arr)  # this one!
[[9 4 7 3]
 [1 6 4 2]
 [6 7 6 0]
 [0 5 6 7]]

如何将其转换回numpy数组?手动手动插入,并不复杂,但我正在寻找一种编程方式.

How can I convert this back to a numpy array? It's not complicated to actually insert the , manually but I'm looking for a programmatic approach.

,替换空格的简单正则表达式实际上适用于一位整数:

A simple regex replacing whitespaces with , actually works for single-digit integers:

>>> import re
>>> sub = re.sub('\s+', ',', """[[8 6 2 4 0 2]
...  [3 5 8 4 5 6]
...  [4 6 3 3 0 3]]
... """)
>>> sub
'[[8,6,2,4,0,2],[3,5,8,4,5,6],[4,6,3,3,0,3]],'  # the trailing "," is a bit annoying

可以将其转换为几乎相同的数组(dtype可能会丢失,但是可以):

It can be converted to an almost (dtype may be lost but that's okay) identical array:

>>> import ast
>>> np.array(ast.literal_eval(sub)[0])
array([[8, 6, 2, 4, 0, 2],
       [3, 5, 8, 4, 5, 6],
       [4, 6, 3, 3, 0, 3]])

但是对于多位数的整数和浮点数则失败:

But it fails for multidigit integers and floats:

>>> re.sub('\s+', ',', """[[ 0.  1.  6.  9.  1.  4.]
... [ 4.  8.  2.  3.  6.  1.]]
... """)
'[[,0.,1.,6.,9.,1.,4.],[,4.,8.,2.,3.,6.,1.]],'

因为它们的开头有一个额外的,.

because these have an additional , at the beginning.

解决方案并不一定要基于正则表达式,任何其他适用于未加的方法(...都不会简化)具有1-4的bool/int/float/complex数组尺寸还可以.

A solution doesn't necessarily need to be based on regex, any other approach that works for unabriged (not shortened with ...) bool/int/float/complex arrays with 1-4 dimensions would be ok.

推荐答案

这是一个非常手动的解决方案:

Here's a pretty manual solution:

import re
import numpy

def parse_array_str(array_string):
    tokens = re.findall(r'''             # Find all...
                            \[         | # opening brackets,
                            \]         | # closing brackets, or
                            [^\[\]\s]+   # sequences of other non-whitespace characters''',
                        array_string,
                        flags = re.VERBOSE)
    tokens = iter(tokens)

    # Chomp first [, handle case where it's not a [
    first_token = next(tokens)
    if first_token != '[':
        # Input must represent a scalar
        if next(tokens, None) is not None:
            raise ValueError("Can't parse input.")
        return float(first_token)  # or int(token), but not bool(token) for bools

    list_form = []
    stack = [list_form]

    for token in tokens:
        if token == '[':
            # enter a new list
            stack.append([])
            stack[-2].append(stack[-1])
        elif token == ']':
            # close a list
            stack.pop()
        else:
            stack[-1].append(float(token))  # or int(token), but not bool(token) for bools

    if stack:
        raise ValueError("Can't parse input - it might be missing text at the end.")

    return numpy.array(list_form)

或者,根据检测到逗号的位置,采用一种较不手动的解决方案:

Or a less manual solution, based on detecting where to insert commas:

import re
import numpy

pattern = r'''# Match (mandatory) whitespace between...
              (?<=\]) # ] and
              \s+
              (?= \[) # [, or
              |
              (?<=[^\[\]\s]) 
              \s+
              (?= [^\[\]\s]) # two non-bracket non-whitespace characters
           '''

# Replace such whitespace with a comma
fixed_string = re.sub(pattern, ',', array_string, flags=re.VERBOSE)

output_array = numpy.array(ast.literal_eval(fixed_string))

这篇关于解析一个numpy数组的字符串表示形式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆