通过迭代从文本文件中提取html标签，并将其附加到列表中，并忽略python中的所有其他字符 [英] Extract html tags from a text file through iteration and append them to a list and ignore all other characters in python

查看：56 发布时间：2020/11/24 21:10:52 python python-3.x html-parsing

本文介绍了通过迭代从文本文件中提取html标签，并将其附加到列表中，并忽略python中的所有其他字符的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我希望能够读取html文件并仅从其中提取标签.

I want to be able to read a html file and extract only the tags out of it.

一次从文件中读取一个字符，忽略所有内容以获取<"(也忽略<")
一次读取一个字符，然后将它们附加到字符串中，直到>"或空白(也忽略>")

Read one character at a time from the file, ignoring everything to get "<"(ignore "<" as well)
Read one character at a time, appending them to a string until ">" or white space(ignore ">" as well)

  <html>
   <body>
   <h1>This is test</h1>
   <h2> This is test 2<h2>
   </body>
   <html>


   with open('doc.txt', 'r') as f:
            all_lines = []
            # loop through all lines using f.readlines() method
            for line in f.readlines():
                new_line = []
                # this is how you would loop through each alphabet
                for chars in line:
                    new_line.append(chars)
                all_lines.append(new_line)

            print(all_lines)

我可以遍历文本文件并获得如下列表:

I can iterate through the text files and can get the list as below:

[[''lt;'，'h'，'t'，'m'，'l'，'>'，'\ n']，['<'，'b'，'o'， 'd'，'y'，'>'，'\ n']，['<'，'/'，'b'，'o'，'d'，'y'，'>'，'\ n']，['<'，'/'，'h'，'t'，'m'，'l'，'>']]

[['<', 'h', 't', 'm', 'l', '>', '\n'], ['<', 'b', 'o', 'd', 'y', '>', '\n'], ['<', '/', 'b', 'o', 'd', 'y', '>', '\n'], ['<', '/', 'h', 't', 'm', 'l', '>']]

，但预期输出应为:[html，body，h1，/h1，/h2，/body，/html]

but the expected output should be : [html,body,h1,/h1,/h2,/body,/html]

通过迭代从文本文件中提取html标签，并将其附加到列表中，并忽略python中的所有其他字符 [英] Extract html tags from a text file through iteration and append them to a list and ignore all other characters in python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

通过迭代从文本文件中提取html标签，并将其附加到列表中，并忽略python中的所有其他字符 [英] Extract html tags from a text file through iteration and append them to a list and ignore all other characters in python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭