HTML解析是什么意思？ [英] What does HTML Parsing mean?

查看：195 发布时间：2018/6/15 10:36:40 html parsing html-parsing

本文介绍了HTML解析是什么意思？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我听说过简单的HTML DOM和HTML解析器这样的HTML解析器库。我也听说过包含HTML解析的问题。解析HTML是什么意思？

解决方案

与Spudley说的不同，解析基本上是 resolve（一个句子）到它的组成部分并描述它们的语法角色。根据维基百科，解析或语法分析是分析一串符号的过程，或者在根据形式语法的规则，可以使用自然语言或<计算机语言。术语解析来自拉丁语语法（orationis），意思是语言的一部分。在您的情况下，HTML解析基本上是：取入HTML代码并提取相关信息如页面的标题，页面中的段落，页面中的标题，链接，粗体文本等。

解析器：

解析内容的计算机程序称为解析器。通常有两种解析器：

自顶向下解析 - 自顶向下解析可以被看作是试图找到左 - 通过使用给定形式语法规则的自顶向下扩展来搜索分析树的输入流的大部分推导。令牌从左到右消耗。包容性选择用于通过扩展语法规则的所有可选右侧来解决模糊性。
$ b 自下而上解析 - 解析器可以从输入开始并尝试将其重写到开始符号。直观上，解析器试图找到最基本的元素，然后是包含这些元素的元素，等等。 LR解析器是自底向上解析器的例子。另一个用于这种类型的解析器的术语是Shift-Reduce解析。

一些示例解析器： - 解析器：

递归下降解析器

LL解析器（左至-right，Leftmost derivation）

Earley解析器
a>

自下而上解析器：

优先解析器

 运算符优先解析器
>

简单的优先分析器

BC（有界上下文）解析

LR解析器（（b）最简单的派生）

简单的LR（SLR）解析器

LALR解析器

Canonical LR（LR（1））解析器

GLR解析器
a>

CYK解析器
a>

>递归上升解析器

示例解析器：

以下是python中的一个HTML解析器示例：

< （HTMLParser）：
def handle_starttag（self ，tag，attrs）：
print遇到一个开始标记：，标记
def handle_endtag（self，tag）：
printE ncountered结束标记：，标记
def handle_data（self，data）：
print遇到一些数据：，数据

＃实例化解析器并为它提供一些HTML
parser = MyHTMLParser（）
parser.feed（'< html>< head>< title>测试< / title>< / head>'
< body> < h1> Parse me！< / h1>< / body>< / html>'）

以下是输出：

遇到开始标记：html 遇到a开始标签：head 遇到一个开始标签：title 遇到一些数据：Test 遇到一个结束标签：title 遇到一个结束标签：head 遇到一个开始标签：body 遇到一个开始标签：h1 遇到一些数据：解析我！遇到一个结束标记：h1 遇到一个结束标记：body 遇到一个结束标记：html

参考文献

Wikipedia Python文档

I have heard of HTML Parser libraries like Simple HTML DOM and HTML Parser. I have also heard of questions containing HTML Parsing. What does it mean to parse HTML?
解决方案
Unlike what Spudley said, parsing is basically to resolve (a sentence) into its component parts and describe their syntactic roles.

According to wikipedia, Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages, according to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part (of speech).

In your case, HTML parsing is basically: taking in HTML code and extracting relevant information like the title of the page, paragraphs in the page, headings in the page, links, bold text etc.

Parsers:

A computer program that parses content is called a parser. There are in general 2 kinds of parsers:

Top-down parsing- Top-down parsing can be viewed as an attempt to find left-most derivations of an input-stream by searching for parse trees using a top-down expansion of the given formal grammar rules. Tokens are consumed from left to right. Inclusive choice is used to accommodate ambiguity by expanding all alternative right-hand-sides of grammar rules.

Bottom-up parsing - A parser can start with the input and attempt to rewrite it to the start symbol. Intuitively, the parser attempts to locate the most basic elements, then the elements containing these, and so on. LR parsers are examples of bottom-up parsers. Another term used for this type of parser is Shift-Reduce parsing.

A few example parsers:

Top-down parsers:

Recursive descent parser

LL parser (Left-to-right, Leftmost derivation)

Earley parser

Bottom-up parsers:

Precedence parser

Operator-precedence parser

Simple precedence parser

BC (bounded context) parsing

LR parser (Left-to-right, Rightmost derivation)

Simple LR (SLR) parser

LALR parser

Canonical LR (LR(1)) parser

GLR parser

CYK parser

Recursive ascent parser

Example parser:

Here's an example HTML parser in python:
from HTMLParser import HTMLParser # create a subclass and override the handler methods class MyHTMLParser(HTMLParser): def handle_starttag(self, tag, attrs): print "Encountered a start tag:", tag def handle_endtag(self, tag): print "Encountered an end tag :", tag def handle_data(self, data): print "Encountered some data :", data # instantiate the parser and fed it some HTML parser = MyHTMLParser() parser.feed('<html><head><title>Test</title></head>' '<body><h1>Parse me!</h1></body></html>')
Here's the output:

Encountered a start tag: html Encountered a start tag: head Encountered a start tag: title Encountered some data : Test Encountered an end tag : title Encountered an end tag : head Encountered a start tag: body Encountered a start tag: h1 Encountered some data : Parse me! Encountered an end tag : h1 Encountered an end tag : body Encountered an end tag : html

References

Wikipedia

Python docs

这篇关于HTML解析是什么意思？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

HTML解析是什么意思？ [英] What does HTML Parsing mean?

问题描述

解析器：

一些示例解析器： - 解析器：

自下而上解析器：

示例解析器：

参考文献

Parsers:

A few example parsers:

Top-down parsers:

Bottom-up parsers:

Example parser:

References

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

HTML解析是什么意思？ [英] What does HTML Parsing mean?

问题描述

解析器：

一些示例解析器： - 解析器：

自下而上解析器：

示例解析器：

参考文献

Parsers:

A few example parsers:

Top-down parsers:

Bottom-up parsers:

Example parser:

References

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭