在python中打印第一段 [英] print first paragraph in python

查看:24
本文介绍了在python中打印第一段的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一本书的文本文件,我需要打印每个部分的第一段.我想如果我在 \n\n 和 \n 之间找到一个文本,我可以找到我的答案.这是我的代码,但它不起作用.你能告诉我我哪里错了吗?

I have a book in a text file and I need to print first paragraph of each section. I thought that if I found a text between \n\n and \n I can find my answer. Here is my codes and it didn't work. Can you tell me that where am I wrong ?

lines = [line.rstrip('\n') for line in open('G:\\aa.txt')]

check = -1
first = 0
last = 0

for i in range(len(lines)):
    if lines[i] == "": 
            if lines[i+1]=="":
                check = 1
                first = i +2
    if i+2< len(lines):
        if lines[i+2] == "" and check == 1:
            last = i+2
while (first < last):
    print(lines[first])
    first = first + 1

我也在 stackoverflow 中找到了一个代码,我也试过了,但它只是打印了一个空数组.

Also I found a code in stackoverflow I tried it too but it just printed an empty array.

f = open("G:\\aa.txt").readlines()
flag=False
for line in f:
        if line.startswith('\n\n'):
            flag=False
        if flag:
            print(line)
        elif line.strip().endswith('\n'):
            flag=True

我在下面分享了这本书的一个示例部分.

I shared a sample section of this book in belown.

土地的布局

人类兴趣的广阔领域,就在我们的门外,迄今为止,人们对它的探索还很少.这是动物智能领域.

There is a vast field of fascinating human interest, lying only just outside our doors, which as yet has been but little explored. It is the Field of Animal Intelligence.

在研究世界野生动物的所有兴趣中,没有一种能超越对它们的思想、道德和作为其心理过程结果的行为的研究.

Of all the kinds of interest attaching to the study of the world's wild animals, there are none that surpass the study of their minds, their morals, and the acts that they perform as the results of their mental processes.

野生动物气质&个性化

WILD ANIMAL TEMPERAMENT & INDIVIDUALITY

我在这里要做的是,找到大写的行,并将它们全部放入一个数组中.然后,使用索引方法,通过比较我创建的这个数组的这些元素的索引,找到每个部分的第一段和最后一段.

What I am trying to do here is, find the uppercase lines, and put them all in an array. Then, using the index method, I will find the first and last paragraphs of each section by comparing the indexes of these elements of this array I created.

输出应该是这样的:

人类兴趣的广阔领域,就在我们的门外,迄今为止,人们对它的探索还很少.这是动物智能领域.

There is a vast field of fascinating human interest, lying only just outside our doors, which as yet has been but little explored. It is the Field of Animal Intelligence.

我在这里要做的是,找到大写的行,并将它们全部放入一个数组中.然后,使用索引方法,通过比较我创建的这个数组的这些元素的索引,找到每个部分的第一段和最后一段.

What I am trying to do here is, find the uppercase lines, and put them all in an array. Then, using the index method, I will find the first and last paragraphs of each section by comparing the indexes of these elements of this array I created.

推荐答案

如果你想对节进行分组,你可以使用 itertools.groupby 使用空行作为分隔符:

If you want to group the sections you can use itertools.groupby using empty lines as the delimiters:

from itertools import groupby
with open("in.txt") as f:
    for k, sec in groupby(f,key=lambda x: bool(x.strip())):
        if k:
            print(list(sec))

使用更多的 itertools foo 我们可以获得使用大写标题作为分隔符的部分:

With some more itertools foo we can get the sections using the uppercase title as the delimiter:

from itertools import groupby, takewhile

with open("in.txt") as f:
    grps = groupby(f,key=lambda x: x.isupper())
    for k, sec in grps:
        # if we hit a title line
        if k: 
            # pull all paragraphs
            v = next(grps)[1]
            # skip two empty lines after title
            next(v,""), next(v,"")

            # take all lines up to next empty line/second paragraph
            print(list(takewhile(lambda x: bool(x.strip()), v)))

哪个会给你:

['There is a vast field of fascinating human interest, lying only just outside our doors, which as yet has been but little explored. It is the Field of Animal Intelligence.\n']
['What I am trying to do here is, find the uppercase lines, and put them all in an array. Then, using the index method, I will find the first and last paragraphs of each section by comparing the indexes of these elements of this array I created.']

每个部分的开头都有一个全大写的标题,所以一旦我们点击我们知道有两个空行,那么第一段和模式就会重复.

The start of each section has an all uppercase title so once we hit that we know there are two empty lines then the first paragraph and the pattern repeats.

使用循环将其分解:

from itertools import groupby  
from itertools import groupby
def parse_sec(bk):
    with open(bk) as f:
        grps = groupby(f, key=lambda x: bool(x.isupper()))
        for k, sec in grps:
            if k:
                print("First paragraph from section titled :{}".format(next(sec).rstrip()))
                v = next(grps)[1]
                next(v, ""),next(v,"")
                for line in v:
                    if not line.strip():
                        break
                    print(line)

对于您的文本:

In [11]: cat -E in.txt

THE LAY OF THE LAND$
$
$
There is a vast field of fascinating human interest, lying only just outside our doors, which as yet has been but little explored. It is the Field of Animal Intelligence.$
$
Of all the kinds of interest attaching to the study of the world's wild animals, there are none that surpass the study of their minds, their morals, and the acts that they perform as the results of their mental processes.$
$
$
WILD ANIMAL TEMPERAMENT & INDIVIDUALITY$
$
$
What I am trying to do here is, find the uppercase lines, and put them all in an array. Then, using the index method, I will find the first and last paragraphs of each section by comparing the indexes of these elements of this array I created.

美元符号是换行符,输出为:

The dollar signs are the new lines, the output is:

In [12]: parse_sec("in.txt")
First paragraph from section titled :THE LAY OF THE LAND
There is a vast field of fascinating human interest, lying only just outside our doors, which as yet has been but little explored. It is the Field of Animal Intelligence.

First paragraph from section titled :WILD ANIMAL TEMPERAMENT & INDIVIDUALITY
What I am trying to do here is, find the uppercase lines, and put them all in an array. Then, using the index method, I will find the first and last paragraphs of each section by comparing the indexes of these elements of this array I created.

这篇关于在python中打印第一段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆