以编程方式将网页保存到静态HTML文件的最佳方式 [英] Best way to programmatically save a webpage to a Static HTML File

查看:632
本文介绍了以编程方式将网页保存到静态HTML文件的最佳方式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我试图使用平面保存,或使用Python静态保存网页。这意味着将所有样式合并到内联属性,并将所有链接更改为绝对URL。



我已经尝试过几乎每一个免费的转换网站,api,甚至github上的图书馆。没有那么令人印象深刻。我可以找到的最好的python实现平铺样式是



这似乎是一个永无止境的斗争处理Malformed html,无法识别的CSS属性,Unicode错误等等。所以有没有人有建议更好的方法来做到这一点?我理解我可以去文件 - >保存在我的本地浏览器,但当我试图做这个大量,并提取一个特定的xpath,这不是真正可行的。



看起来Evernote的网络剪辑使用iFrames,但这似乎比我认为更复杂。但至少剪辑看起来像Evernote。



我有兴趣看看有没有人有一些建议。

解决方案

在走了一段时间后,我设法安装了一个ruby库,使用。这是 http://premailer.dialect.ca/

这个非常缓慢的网络界面背后的图书馆。

感谢他们在Github上发布了源代码,这是最好的双手。
https://github.com/alexdunae/premailer



它使样式平坦,创建绝对url,使用URL或字符串,甚至可以创建纯文本电子邮件模板。非常感谢这个图书馆。



2013年11月更新



我最后编写了自己的书签侧。它仅与Webkit和FireFox兼容。它会遍历每个节点并添加内联样式,然后将展开的HTML发送到clippy.in API以保存到用户的信息中心。



客户端书签


The more research I do, the more grim the outlook becomes.

I am trying to Flat Save, or Static Save a webpage with Python. This means merging all the styles to inline properties, and changing all links to absolute URLs.

I've tried nearly every free conversion website, api, and even libraries on github. None are that impressive. The best python implementation I could find for flattening styles is https://github.com/davecranwell/inline-styler. I adapted that slightly for Flask, but the generated file isn't that great. Here's how it looks:

Obviously, it should look better. Here's what it should look like: http://cl.ly/image/1H3J1O1u3v3d

It seems like a neverending struggle dealing with Malformed html, unrecognized CSS properties, Unicode errors, etc. So does anyone have a suggestion on a better way to do this? I understand I can go to file -> save in my local browser, but when I am trying to do this en mass, and extract a particular xpath that's not really viable.

It looks like Evernote's web clipper uses iFrames, but that seems more complicated than I think it should be. But at least the clippings look decent on Evernote.

I'm interested to see if anyone has some suggestions.

解决方案

After walking away for a while, I managed to install a ruby library that flattens the CSS much much better than anything else I've used. It's the library behind the very slow web interface here http://premailer.dialect.ca/

Thank goodness they released the source on Github, it's the best hands down. https://github.com/alexdunae/premailer

It flattens styles, creates absolute urls, works with a URL or string, and can even create plain text email templates. Very impressed with this library.

Update Nov 2013

I ended up writing my own bookmarklet that works purely client side. It is compatible with Webkit and FireFox only. It recurses through each node and adds inline styles then sends the flattened HTML to the clippy.in API to save to the user's dashboard.

Client Side Bookmarklet

这篇关于以编程方式将网页保存到静态HTML文件的最佳方式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆