从字符串中提取科学数字 [英] Extract scientific number from string
问题描述
我试图从文本文件中的行中提取科学数字。例如:
str ='价值的名字1.111 E-11下一个名字444.4'
结果:
[1.111E-11,444.4]
I已经在其他帖子尝试过的解决方案,但它看起来像只适用于整数(也许)
>>>如果s.isdigit()]
[]
$ $ b
float()会工作,但每次使用字符串时都会收到错误信息。
>>> ; float(str.split()[3])
1.111E-11
>>> float(str.split()[2])
ValueError:无法将字符串转换为float:value
在此先感谢您的帮助!!
这可以通过正则表达式完成: b
$ b $ p $ import re
s ='价值名称1.111E-11下一个名字444.4'
match_number = re.compile(' *(?:[Ee] \ * - ?\ * [0-9] +)?')
final_list = (在匹配号(s)中为x的float(x)]
print final_list
<
[1.111e-11,444.4]
$
请注意,我上面写的模式取决于小数点左边至少有一个数字。
<编辑:
这是一个教程和参考资料我发现有助于学习如何编写正则表达式模式。
由于您询问了正则表达式模式的解释:
' - ?\ * [0-9 ] [0-9] *(?:[Ee] \ * - ?\ * [0-9] +)?'
$ p
$ b一次一件:
- ?可选地匹配负号(零或一个负号)
\ *匹配任意数量的空格(以允许格式化变化,如-2.3或-2.3)
[0-9] +匹配一个或更多数字
\。? (0或1个周期)
[0-9] *匹配任意位数,包括零
(?:...)对表达式进行分组,但不形成捕获组(查找它)
[Ee]匹配e或E
\ *匹配任意数量的空格(允许格式为2.3E5或2.3E 5)
- ?可选地匹配负号
\ *匹配任意数量的空格
[0-9] +匹配一个或多个数字
?使得整个非捕获组是可选的(以允许存在或不存在指数 - 3000或3E3
<注意:\d是[0-9]的快捷方式,但我已经习惯了使用[0-9]。I am trying to extract scientific numbers from lines in a text file. Something like
Example:
str = 'Name of value 1.111E-11 Next Name 444.4'
Result:
[1.111E-11, 444.4]
I've tried solutions in other posts but it looks like that only works for integers (maybe)
>>> [int(s) for s in str.split() if s.isdigit()] []
float() would work but I get errors each time a string is used.
>>> float(str.split()[3]) 1.111E-11 >>> float(str.split()[2]) ValueError: could not convert string to float: value
Thanks in advance for your help!!
解决方案This can be done with regular expressions:
import re s = 'Name of value 1.111E-11 Next Name 444.4' match_number = re.compile('-?\ *[0-9]+\.?[0-9]*(?:[Ee]\ *-?\ *[0-9]+)?') final_list = [float(x) for x in re.findall(match_number, s)] print final_list
output:
[1.111e-11, 444.4]
Note that the pattern I wrote above depends on at least one digit existing to the left of the decimal point.
EDIT:
Here's a tutorial and reference I found helpful for learning how to write regex patterns.
Since you asked for an explanation of the regex pattern:
'-?\ *[0-9]+\.?[0-9]*(?:[Ee]\ *-?\ *[0-9]+)?'
One piece at a time:
-? optionally matches a negative sign (zero or one negative signs) \ * matches any number of spaces (to allow for formatting variations like - 2.3 or -2.3) [0-9]+ matches one or more digits \.? optionally matches a period (zero or one periods) [0-9]* matches any number of digits, including zero (?: ... ) groups an expression, but without forming a "capturing group" (look it up) [Ee] matches either "e" or "E" \ * matches any number of spaces (to allow for formats like 2.3E5 or 2.3E 5) -? optionally matches a negative sign \ * matches any number of spaces [0-9]+ matches one or more digits ? makes the entire non-capturing group optional (to allow for the presence or absence of the exponent - 3000 or 3E3
note: \d is a shortcut for [0-9], but I'm jut used to using [0-9].
这篇关于从字符串中提取科学数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!