用两个相似标题之间的特定单词提取段落 [英] Extract Paragraph with specific words between two similar titiles

查看:91
本文介绍了用两个相似标题之间的特定单词提取段落的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的文本文件包含类似这样的段落.

my text file contains, paragraphs something like this.

summary

A result oriented and dedicated professional with three years’ experience in Software Development. A proactive individual with a logical approach to challenges, performs effectively even within a highly pressurised working environment.

summary

Oct 28th, 2010 – Till date  Cognizant Technology Solutions      


Project #1

Title           Wealth Passport – R7.3
Client                    Northern Trust
Operating System    Windows XP
Technologies        J2EE, JSP, Struts, Oracle, PL/SQL
Team Size       3
Role            Team Member
Period                    22nd Aug’ 2013 - Till Date    
Project Description
Wealth Passport R7.3 release aims at enhancements in four projects SGY, PMM, WPA and WPX. This primarily involves analysing existing issues in the four applications and enhancements to some of the functionalities.
Role and Responsibilities
Handled dockets in SGY and PMM applications.
Done root cause analysis to existing issues in a short span of time.
Designed and developed enhancements in PMM application.
Preparing Unit Test cases for the developed Java modules and executing them.


Project #2
Title           PFS Development – WP Filecabinet and R7.2
Client                    Northern Trust
Operating System    Windows XP
Technologies        J2EE, JSP, Struts, Weblogic Portal, Oracle, PL/SQL, UNIX, Hibernate, Spring, DOJO
Team Size       1
Role            Team Member – JavaEE Developer
Period                   18th June’ 2013 – 21st Aug’ 2013   
Project Description
PFS Development project is to provide the development services for PFS capital projects: Wealth Passport, Private Passport 6.0 and Private Passport 7.0
Wealth Passport Filecabinet provides functionality for users to store their files on our system. This enables users to create folders, upload files and view the uploaded files.  Batch upload/delete option is also available. Deleted files will be moved to Waste Bucket, from where users can restore should they wish. This project aims at improving the performance of Filecabinet which was mandated by increasing customer base and files handled by the system.

现在,我想提取包含像"Project", "Teamsize "这样的单词的部分摘要 而不提取其他摘要部分. 我在下面尝试了此代码,它提取了摘要内容

now, i would like to extract section summary which contains words like "Project", "Teamsize " without extracting the other summary section. i have tried this code below, it extracts both summary content

import re
import os
with open ('9.txt', encoding='latin-1') as infile, open ('d.txt','w',encoding='latin-1') as outfile :
    copy = False 
    for line in infile:
        if line.strip() == 'summary':
            re.compile('\r\nproject*\r\n')
            copy = True
        elif line.strip() == "summary":
            copy =False 
        elif copy:
            outfile.write(line)
        #fh = open("d.txt",'r')
        contents = fh.read()
        len(contents)

并且我希望保存一个包含内容的d.txt文本文件

and i'm expecting a text file as d.txt to saved which contains content

 summary

    Oct 28th, 2010 – Till date  Cognizant Technology Solutions      


    Project #1

    Title           Wealth Passport – R7.3
    Client                    Northern Trust
    Operating System    Windows XP
    Technologies        J2EE, JSP, Struts, Oracle, PL/SQL
    Team Size       3
    Role            Team Member
    Period                    22nd Aug’ 2013 - Till Date    
    Project Description
    Wealth Passport R7.3 release aims at enhancements in four projects SGY, PMM, WPA and WPX. This primarily involves analysing existing issues in the four applications and enhancements to some of the functionalities.
    Role and Responsibilities
    Handled dockets in SGY and PMM applications.
    Done root cause analysis to existing issues in a short span of time.
    Designed and developed enhancements in PMM application.
    Preparing Unit Test cases for the developed Java modules and executing them.


    Project #2
    Title           PFS Development – WP Filecabinet and R7.2
    Client                    Northern Trust
    Operating System    Windows XP
    Technologies        J2EE, JSP, Struts, Weblogic Portal, Oracle, PL/SQL, UNIX, Hibernate, Spring, DOJO
    Team Size       1
    Role            Team Member – JavaEE Developer
    Period                   18th June’ 2013 – 21st Aug’ 2013   
    Project Description
    PFS Development project is to provide the development services for PFS capital projects: Wealth Passport, Private Passport 6.0 and Private Passport 7.0
    Wealth Passport Filecabinet provides functionality for users to store their files on our system. This enables users to create folders, upload files and view the uploaded files.  Batch upload/delete option is also available. Deleted files will be moved to Waste Bucket, from where users can restore should they wish. This project aims at improving the performance of Filecabinet which was mandated by increasing customer base and files handled by the system.

推荐答案

要提取所有包含您感兴趣的单词的summary部分:

To extract all summary sections that contain the words you are interested in:

split_on = 'summary\n\n'
must_contain = ['Project', 'Team Size']

with open('9.txt') as f_input, open('d.txt', 'w') as f_output:
    for part in f_input.read().split(split_on):
        if all(text in part for text in must_contain):
            f_output.write(split_on + part)

这篇关于用两个相似标题之间的特定单词提取段落的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆