如何从原始HTML文件提取数据? [英] How to extract data from a raw HTML file?

查看：154 发布时间：2020/6/18 19:18:20 php html parsing html-content-extraction

本文介绍了如何从原始HTML文件提取数据?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

是否有一种方法可以从没有IDs和classes的不正确地编写的原始html中提取所需的数据?我的意思是，假设有一个保存的网页(配置文件)的html文件，并且我想提取诸如爱好"之类的数据.可以使用PHP来做到这一点吗?

解决方案

最终，如果您需要从不是以语义方式构建的html页面中获取语义信息，则可能会以编程方式注定要失败，最好的选择是解决方案

Use regex! I kid, I kid. If you know the state of the same page, and the format is guaranteed to remain similar enough, then you can try writing a manual parser. Alternatively, there are a lot of libraries out there that will parse html for. I'm not familiar enough with PHP to recommend one, but I'm sure some Googleing could take you a long way. I've had luck with John Resig's pure javascript HTML parser before.

At the end of the day, if you need semantic information from an html page that isn't constructed semantically, you're probably doomed programmatically and your best bet may be a mechanical turk.

这篇关于如何从原始HTML文件提取数据?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何从原始HTML文件提取数据? [英] How to extract data from a raw HTML file?

问题描述

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

如何从原始HTML文件提取数据? [英] How to extract data from a raw HTML file?

问题描述

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭