解析一个numpy数组的字符串表示形式 [英] Parsing the string-representation of a numpy array
问题描述
如果我只有numpy.array
的字符串表示形式:
If I only have the string-representation of a numpy.array
:
>>> import numpy as np
>>> arr = np.random.randint(0, 10, (10, 10))
>>> print(arr) # this one!
[[9 4 7 3]
[1 6 4 2]
[6 7 6 0]
[0 5 6 7]]
如何将其转换回numpy数组?手动手动插入,
并不复杂,但我正在寻找一种编程方式.
How can I convert this back to a numpy array? It's not complicated to actually insert the ,
manually but I'm looking for a programmatic approach.
用,
替换空格的简单正则表达式实际上适用于一位整数:
A simple regex replacing whitespaces with ,
actually works for single-digit integers:
>>> import re
>>> sub = re.sub('\s+', ',', """[[8 6 2 4 0 2]
... [3 5 8 4 5 6]
... [4 6 3 3 0 3]]
... """)
>>> sub
'[[8,6,2,4,0,2],[3,5,8,4,5,6],[4,6,3,3,0,3]],' # the trailing "," is a bit annoying
可以将其转换为几乎相同的数组(dtype可能会丢失,但是可以):
It can be converted to an almost (dtype may be lost but that's okay) identical array:
>>> import ast
>>> np.array(ast.literal_eval(sub)[0])
array([[8, 6, 2, 4, 0, 2],
[3, 5, 8, 4, 5, 6],
[4, 6, 3, 3, 0, 3]])
但是对于多位数的整数和浮点数则失败:
But it fails for multidigit integers and floats:
>>> re.sub('\s+', ',', """[[ 0. 1. 6. 9. 1. 4.]
... [ 4. 8. 2. 3. 6. 1.]]
... """)
'[[,0.,1.,6.,9.,1.,4.],[,4.,8.,2.,3.,6.,1.]],'
因为它们的开头有一个额外的,
.
because these have an additional ,
at the beginning.
解决方案并不一定要基于正则表达式,任何其他适用于未加的方法(...
都不会简化)具有1-4的bool/int/float/complex数组尺寸还可以.
A solution doesn't necessarily need to be based on regex, any other approach that works for unabriged (not shortened with ...
) bool/int/float/complex arrays with 1-4 dimensions would be ok.
推荐答案
这是一个非常手动的解决方案:
Here's a pretty manual solution:
import re
import numpy
def parse_array_str(array_string):
tokens = re.findall(r''' # Find all...
\[ | # opening brackets,
\] | # closing brackets, or
[^\[\]\s]+ # sequences of other non-whitespace characters''',
array_string,
flags = re.VERBOSE)
tokens = iter(tokens)
# Chomp first [, handle case where it's not a [
first_token = next(tokens)
if first_token != '[':
# Input must represent a scalar
if next(tokens, None) is not None:
raise ValueError("Can't parse input.")
return float(first_token) # or int(token), but not bool(token) for bools
list_form = []
stack = [list_form]
for token in tokens:
if token == '[':
# enter a new list
stack.append([])
stack[-2].append(stack[-1])
elif token == ']':
# close a list
stack.pop()
else:
stack[-1].append(float(token)) # or int(token), but not bool(token) for bools
if stack:
raise ValueError("Can't parse input - it might be missing text at the end.")
return numpy.array(list_form)
或者,根据检测到逗号的位置,采用一种较不手动的解决方案:
Or a less manual solution, based on detecting where to insert commas:
import re
import numpy
pattern = r'''# Match (mandatory) whitespace between...
(?<=\]) # ] and
\s+
(?= \[) # [, or
|
(?<=[^\[\]\s])
\s+
(?= [^\[\]\s]) # two non-bracket non-whitespace characters
'''
# Replace such whitespace with a comma
fixed_string = re.sub(pattern, ',', array_string, flags=re.VERBOSE)
output_array = numpy.array(ast.literal_eval(fixed_string))
这篇关于解析一个numpy数组的字符串表示形式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!