使用 Ruby 将 HTML 转换为纯文本? [英] HTML to Plain Text with Ruby?
问题描述
有什么东西可以将 html 转换为纯文本(可能是 nokogiri 脚本)?会保持换行符的东西,但仅此而已.
Is there anything out there to convert html to plain text (maybe a nokogiri script)? Something that would keep the line breaks, but that's about it.
如果我在 googledocs 上写一些东西,比如 this,然后运行该命令,它会输出(删除 css和 javascript),这个:
If I write something on googledocs, like this, and run that command, it outputs (removing the css and javascript), this:
\n\n\n\n\nh1. Test h2. HELLO THEREI am some teexton the next line!!!OKAY!#*!)$!
所以格式都搞砸了.我敢肯定有人已经在某处解决了这些细节问题.
So the formatting's all messed up. I'm sure someone has solved the details like these somewhere out there.
推荐答案
其实这个要简单得多:
require 'rubygems'
require 'nokogiri'
puts Nokogiri::HTML(my_html).text
不过,您仍然有换行符问题,因此您必须自己弄清楚如何处理这些问题.
You still have line break issues, though, so you're going to have to figure out how you want to handle those yourself.
这篇关于使用 Ruby 将 HTML 转换为纯文本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!