在哪里可以找到PHP中好的MediaWiki标记解析器? [英] Where can I find a good MediaWiki Markup parser in PHP?

查看:84
本文介绍了在哪里可以找到PHP中好的MediaWiki标记解析器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我会尝试破解一下MediaWiki的代码,但是我发现如果能够获得一个独立的解析器,那将是不必要的.

I would try hacking MediaWiki's code a little, but I figured out it would be unnecessary if I can get an independent parser.

有人可以帮我吗?

谢谢.

推荐答案

Ben Hughes是正确的.很难做到正确,特别是如果您想以100%的准确度解析大型维基(如Wikipedia本身)中的真实文章.在Wikitech邮件列表中经常讨论该问题,尽管进行了多次尝试,但仍没有其他解析器提供这些商品.

Ben Hughes is right. It's very difficult to get right, especially if you want to parse real articles from big wikis like Wikipedia itself with 100% accuracy. It is discussed frequently in the wikitech mailing list and no alternative parser has come up with the goods despite many attempts.

首先,它并不是真正的解析器,因为它没有诸如AST(抽象语法树)之类的概念.这是一个专门转换为HTML的转换器.

Firstly it's not really a parser in that it has no such concept as an AST (abstract syntax tree). It's a converter that specifically converts to HTML.

第二,不要陷入将Wikitext作为标记语言的想法的陷阱,可以在极少数情况下使用HTML对其进行扩展.您必须将其视为HTML的扩展.将Wikitext支持添加到HTML解析器要比将HTML支持添加到Wikitext解析器要容易得多.

Secondly don't fall into the trap of thinking of wikitext as a markup language which can be extended on rare occasions with HTML. You must think of it as an extension to HTML. It is much easier to add wikitext support to an HTML parser than to add HTML support to a wikitext parser.

这归结为,如果您想要任何其他格式,则需要将HTML转换为该格式.

What this boils down to is that if you want any other format you will need to convert from HTML to that format.

基本上说只有MediaWiki可以解析wikitext.但是可以,解析器与其余代码紧密集成在一起.有经验的MediaWiki黑客对于隔离解析器的问题反应不佳-我已经尝试过(-:

Basically it is stated that only MediaWiki can parse wikitext. But yes the parser is tightly integrated with the rest of the code. Experienced MediaWiki hackers do not react well to questions about isolating the parser - I've tried (-:

但是我也继续进行并隔离了它.尚不完整或准备与任何人共享.但基本上,您要从未安装或未连接到数据库或Web服务器的MediaWiki源开始.制作一个包含解析器的PHP存根程序,并调用一个入口点.当错误无法运行时检查错误,并为访问的类,函数或全局变量创建一个假存根.重复进行直到您将解析器与MediaWiki其余部分进行交互的大部分地方都取消了.

But I've also gone ahead and isolated it anyway. It's not complete or ready to share with anybody yet. But basically you want to start with the MediaWiki source not installed or connected to a database or web server. Make a PHP stub program that includes the parser and call an entry point. Check the error when it fails to run and make a phony stub for the class, function, or global that was accessed. Repeat until you have stubbed most of the places the parser interacts with the rest of MediaWiki.

然后问题来了,因为您的被砍掉的存根变体保持同步,因为源树快速更改,并且实时Wiki很快地接受了解析器中的更改,并且如果要在将来使用,您的变体将必须跟上.

The problem then comes in keeping your hacked stubbed variant in synch because the source tree changes quickly and the live wikis embrace the changes in the parser very quickly and your variant will have to keep up if it is to work into the future.

查看我的功能请求:错误25984-将解析器与数据库依赖项隔离

这篇关于在哪里可以找到PHP中好的MediaWiki标记解析器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆