从文本文件中提取值 [英] Extracting values from text file

查看:89
本文介绍了从文本文件中提取值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述




一个简短的新手问题。我想从

给定的文本文件中直接将一些值提取到python变量中。这可以通过标准库或其他库简单地完成

吗?一些指示

开始的地方将非常感激。


一个示例文本文件:

------ -----

有些文字可以跨越某些行。


苹果34

56只鸭子


更多文字。


0.5克黄油


------------ -----

我首先要做的是如果有可能制作过滤器,例如:


苹果(苹果)

(鸭子)鸭子

(黄油)g黄油


数据可以放在哈希表中。


或者也许有更好的方法?我一般都想要灵活的东西,所以如果文本文件

格式发生变化,可以轻松进行过滤设置。


提前致谢

Hi

A short newbie question. I would like to extract some values from a
given text file directly into python variables. Can this be done simply
by either standard library or other libraries? Some pointers where to
get started would be much appreciated.

An example text file:
-----------
Some text that can span some lines.

Apples 34
56 Ducks

Some more text.

0.5 g butter

-----------------
What I first though was if there was possible to make a filter such as:

Apples (apples)
(ducks) Ducks
(butter) g butter

The data can be put in a hash table.

Or maybe there are better ways? I generally want something that is
flexible so one can easily make a filter settings if the text file
format changes.

Thanks in advance

推荐答案

list.txt是一个包含以下行的文件:

苹果34

香蕉10

橘子56
list.txt is a file that contains the following lines:
Apples 34
Bananas 10
Oranges 56
file = open(" list.txt"," r")
mystring = file.read()
mystring
''Apples 34 \\\
Bananas 10 \\\
Oranges 56''mylist = mystring.split(''\ n'')
mylist
[''苹果34'',''香蕉10'',''橘子56''] mydict = {}
对于el in mylist:
.... l = el.split()

.... mydict [l [0]] = l [1]

.... mydict
{''Apples'':''34'',''Oranges'':''56'',''Bananas'':'''10''} mydict [& 苹果,
''34''mydict [" Oranges"]
file = open("list.txt","r")
mystring = file.read()
mystring ''Apples 34 \nBananas 10\nOranges 56 '' mylist = mystring.split(''\n'')
mylist [''Apples 34 '', ''Bananas 10'', ''Oranges 56 ''] mydict = {}
for el in mylist: .... l = el.split()
.... mydict[l[0]] = l[1]
.... mydict {''Apples'': ''34'', ''Oranges'': ''56'', ''Bananas'': ''10''} mydict["Apples"] ''34'' mydict["Oranges"]



''56''


''56''


首先尝试一下,可能有更好的方法来做到这一点,而且它远非

弹性,它有很多不同的方式(例如:一行中不止一个

数,行两边带有文本的数字等)

我已经将数据分成多行所以我可以看到发生了什么,并且你可以快速修复/修改代码。


再见,

bearophile

data1 ="""

一些可以跨越某些行的文字。

更多文字

苹果34

56只鸭子


更多文字。


0.5克黄油

""


导入重新

#列表中的单独行

data2 = data1.split( " \ n")

打印数据2," \ n&qu ot;


#清除行尾空格,换行符等行。

data3 = map(str.strip,data2)

打印数据3," \ n"


#剥离后删除空行

data4 = filter(None,data3)

打印数据4," \ n"


#创建一个只包含数字的行(行,数字)列表

里面

patt1 = re.compile(" \d + \。?\d *")#没有科学记数法

data5 = [(行, n)用于patt1.findall(line)中n的data4中的行]

打印数据5,\ n


#从中移除数字线条,并删除这些线条

data6 = [(line.replace(num,"")。s​​trip(),num)for line,num in data5]

print data6," \ n"

def nconv(num):

"将数字转换为int,如果不是可以浮动

试试:

result = int( num)

除了ValueError:

result = float(num)

返回结果


#将数字字符串转换为整数或浮点数

data7 = [(line,nconv(num))for line,num in data6]

print data7," \ n" ;


#构建最终字典(行:数字)

result = dict(data7)

打印结果," ; \ n"

First try, probably there are better ways to do it, and it''s far from
resilient, it breaks in lot of different ways (example: more than one
number in one line, number with text on both sides of the line, etc.)
I have divided the data munging in many lines so I can see what''s
happening, and you can fix/modify the code quikly.

Bye,
bearophile
data1 = """
Some text that can span some lines.
More text
Apples 34
56 Ducks

Some more text.

0.5 g butter
"""

import re
# Separate lines in a list
data2 = data1.split("\n")
print data2, "\n"

# clear lines from trailing and leading spaces, newlines, etc.
data3 = map(str.strip, data2)
print data3, "\n"

# remove blank lines after the stripping
data4 = filter(None, data3)
print data4, "\n"

# create a list of (lines, numbers) of only the lines with a number
inside
patt1 = re.compile("\d+\.?\d*") # No scientific notation
data5 = [(line, n) for line in data4 for n in patt1.findall(line)]
print data5, "\n"

# remove the number from the lines, and strip such lines
data6 = [(line.replace(num, "").strip(), num) for line, num in data5]
print data6, "\n"

def nconv(num):
"To convert a number to an int, and if not possible to a float"
try:
result = int(num)
except ValueError:
result = float(num)
return result

# convert the number strings into ints or floats
data7 = [(line, nconv(num)) for line, num in data6]
print data7, "\n"

# build the final dict of (line: number)
result = dict(data7)
print result, "\n"


PS
P.S.
file.close()
file.close()



MTD写道:


MTD wrote:

list.txt是一个包含以下行的文件:
苹果34 <香蕉10
橙子56
file = open(" list.txt"," r")
mystring = file.read()
mystring''苹果34 \ nBananas 10 \\\
Oranges 56''mylist = mystring.split(''\ n'')
mylist [''苹果34'',''香蕉10'', 'oranges 56''] mydict = {}
对于el in mylist:... l = el.split()
... mydict [l [0]] = l [1]
... mydict {''Apples'':''34'',''Oranges'':''56'',''Bananas'':'''10''} mydict [" Apples"] ''34''mydict [" Oranges"]
list.txt is a file that contains the following lines:
Apples 34
Bananas 10
Oranges 56
file = open("list.txt","r")
mystring = file.read()
mystring ''Apples 34 \nBananas 10\nOranges 56 '' mylist = mystring.split(''\n'')
mylist [''Apples 34 '', ''Bananas 10'', ''Oranges 56 ''] mydict = {}
for el in mylist: ... l = el.split()
... mydict[l[0]] = l[1]
... mydict {''Apples'': ''34'', ''Oranges'': ''56'', ''Bananas'': ''10''} mydict["Apples"] ''34'' mydict["Oranges"]


''56''


''56''






这篇关于从文本文件中提取值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆