(un)结构化文本文档的词法分析器/解析器 [英] lexers / parsers for (un) structured text documents

查看：127 发布时间：2020/5/25 1:23:58 parsing document lexer

本文介绍了(un)结构化文本文档的词法分析器/解析器的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

有很多用于脚本(即结构化计算机语言)的解析器和词法分析器.但是我正在寻找一种可以将(几乎)非结构化文本文档分解为较大部分的文档，例如章节，段落等

There are lots of parsers and lexers for scripts (i.e. structured computer languages). But I'm looking for one which can break a (almost) non-structured text document into larger sections e.g. chapters, paragraphs, etc.

一个人识别它们相对容易:目录，确认书或主体从哪里开始，并且有可能建立基于规则的系统来识别其中的一些(例如段落).

It's relatively easy for a person to identify them: where the Table of Contents, acknowledgements, or where the main body starts and it is possible to build rule based systems to identify some of these (such as paragraphs).

我不希望它是完美的，但是有人知道如此广泛的基于块"的词法分析器/解析器吗?还是您可以向我指出可能会有所帮助的文学方向?

I don't expect it to be perfect, but does any one know of such a broad 'block based' lexer / parser? Or could you point me in the direction of literature which may help?

(un)结构化文本文档的词法分析器/解析器 [英] lexers / parsers for (un) structured text documents

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

(un)结构化文本文档的词法分析器/解析器 [英] lexers / parsers for (un) structured text documents

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭