如何解析由换行符分隔的文本 [英] How to parse texts separated by line breaks
本文介绍了如何解析由换行符分隔的文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
如何解析由换行符分隔的令牌,例如以下令牌:
How can I parse tokens separated by line break such as the one below:
Wolff PERSON
is O
in O
Argentina LOCATION
The O
US LOCATION
Envoy O
noted O
使用python转换成这样的完整句子?
into full sentences like this using python?
Wolff is in Argentina
The US Envoy noted
推荐答案
You can use itertools.groupby
for this:
>>> from StringIO import StringIO
>>> from itertools import groupby
>>> s = '''Wolff PERSON
is O
in O
Argentina LOCATION
The O
US LOCATION
Envoy O
noted O'''
>>> c = StringIO(s)
>>> for k, g in groupby(c, key=str.isspace):
if not k:
print ' '.join(x.split(None, 1)[0] for x in g)
...
Wolff is in Argentina
The US Envoy noted
如果输入实际上是来自字符串而不是文件,则:
If input is actually coming from a string rather than a file, then:
for k, g in groupby(s.splitlines(), key= lambda x: not x.strip()):
if not k:
print ' '.join(x.split(None, 1)[0] for x in g)
...
Wolff is in Argentina
The US Envoy noted
这篇关于如何解析由换行符分隔的文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文