python处理文件内容的正确姿势该怎样？

查看：53 发布时间：2017/9/6 1:28:40

本文介绍了python处理文件内容的正确姿势该怎样？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

问题

大神们：

我想把htm文件中的第一个<link到第二个<link之间的所有内容另存为一个htm该怎么写比较简洁。

<meta http-equiv="X-UA-Compatible" content="IE=edge">

<link rel="prefetch" href="https://ajax.googleapis.com/ajax/libs/jquery/1.8.2/jquery.min.js">

<meta name="application-name" content="Python.org">
<meta name="msapplication-tooltip" content="The official home of the Python Programming Language">
<meta name="apple-mobile-web-app-title" content="Python.org">
<meta name="apple-mobile-web-app-capable" content="yes">
<meta name="apple-mobile-web-app-status-bar-style" content="black">

<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="HandheldFriendly" content="True">
<meta name="format-detection" content="telephone=no">
<meta http-equiv="cleartype" content="on">
<meta http-equiv="imagetoolbar" content="false">

<script type="text/javascript" async="" src="https://ssl.google-analytics.com/ga.js"></script><script src="./Welcome to Python.org_files/modernizr.js.下载"></script><style type="text/css" adt="123"></style>

<link href="./Welcome to Python.org_files/style.css" rel="stylesheet" type="text/css" title="default">
<link href="./Welcome to Python.org_files/mq.css" rel="stylesheet" type="text/css" media="not print, braille, embossed, speech, tty">

提取的内容应该是：

<link rel="prefetch" href="https://ajax.googleapis.com/ajax/libs/jquery/1.8.2/jquery.min.js">

<meta name="application-name" content="Python.org">
<meta name="msapplication-tooltip" content="The official home of the Python Programming Language">
<meta name="apple-mobile-web-app-title" content="Python.org">
<meta name="apple-mobile-web-app-capable" content="yes">
<meta name="apple-mobile-web-app-status-bar-style" content="black">

<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="HandheldFriendly" content="True">
<meta name="format-detection" content="telephone=no">
<meta http-equiv="cleartype" content="on">
<meta http-equiv="imagetoolbar" content="false">

<script type="text/javascript" async="" src="https://ssl.google-analytics.com/ga.js"></script><script src="./Welcome to Python.org_files/modernizr.js.下载"></script><style type="text/css" adt="123"></style>

<link

解决方案

import re

text = ""
with open("read.html", "r") as rf:
    text = rf.read()
    
pattern = r"<link[\s\S]*?<link"
results = re.findall(pattern, text)
if results:
    r = results[0]
    with open("write.html", "w") as wf:
        wf.write(r)
            
================================================

with open("read.html", "r") as rf:
    with open("write.html", "w") as wf:
        num = 0
        for line in rf.readlines():
            if line.startswith("<link"):
                num += 1
                continue
            if num == 2:
                break
            wf.writelines(line)

这篇关于python处理文件内容的正确姿势该怎样？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

python处理文件内容的正确姿势该怎样？

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

python处理文件内容的正确姿势该怎样？

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭