我怎么HTML文件保存为无格式文本,而不是code,使用编程语言? [英] How do I save HTML files as non-formatted text, not code, using a programming language?

查看:187
本文介绍了我怎么HTML文件保存为无格式文本,而不是code,使用编程语言?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道这个问题是有点一般,但我正在做的扩展,我的Windows 10命令提示符允许您查看HTML作为程序中的纯文本。我不知道这是否会被认为是偷懒不建一个跨preTER这样的事情,但它只是看起来像的东西太多的工作只是我会用。在批量制作一个跨preTER对我知之甚少似乎是不必要的标记语言,使这将是更加困难。

I know this question is sort of general, but I'm making an extension to my Windows 10 Command Prompt allowing you to view HTML as plain text within the program. I don't know if it would be considered lazy not to build an interpreter for something like this, but it just seems like too much work for something only I will be using. Making an interpreter for a markup language I know little about seems unnecessary and making it in Batch would be even harder.

我知道如何从文件中读取,并将它们存储作为​​一个变量,但我的问题是如何将原始的HTML保存为不带任何格式的纯文本。例如,

I know how to read from files and store them as a variable, but my question would be how to store raw HTML as plain text without any formatting. For instance,

<p>Here's some text.</p>

将变成:

Here's some text.

我想一个跨preTER到HTML转换为纯文本。它并不需要在批量写入,但它是很好,如果它是。我想preFER它被写入一个更发达的语言,不过,如Python,我已经看到了之前用于跨preT编程语言。它并不需要由您来写的,所以推荐就可以了。

I'd like an interpreter to convert the HTML into plain text. It doesn't need to be written in Batch, but it's fine if it is. I'd prefer it be written into a more developed language, though, such as Python, which I've seen used to interpret programming languages before. It doesn't need to be written by you, so a referral would be fine.

很抱歉,如果我把我的时间来解释。即使是部分的解决方案就可以了。感谢您的帮助了!

Sorry if I took my time explaining. Even a partial solution would be fine. Thanks for helping out!

推荐答案

在未来,请出示一些code,以证明你尝试解决你自己的问题。类似的问题,这是我的要求,现在为我写出来还是找我的工具通常不会受欢迎这里。

In the future, please show some code to demonstrate that you've attempted to solve the problem on your own. Questions resembling, "Here are my requirements. Now write it for me or find me a tool" typically aren't well-received around here.

但一定程度上阻止了进一步的一半的答案,部分原因是因为我津津乐道的挑战,这里是写成混合批次+的JScript脚本,写的的innerText 解决您的HTML到控制台。使用.bat扩展名保存。如果你想输出去一个文件,而不是那么 batscript.bat HTMLFILE&GT; outfile.txt 在CMD线。

But partly to stem off further half answers and partly because I relished the challenge, here's a solution written as a hybrid Batch + JScript script that will write the innerText of your HTML to the console. Save it with a .bat extension. If you want the output to go to a file instead then batscript.bat htmlfile > outfile.txt at the cmd line.

@if (@CodeSection == @Batch) @then
@echo off & setlocal

if "%~1"=="" goto usage
if not exist "%~1" goto usage

cscript /nologo /e:JScript "%~f0" "%outfile%" < "%~1"
goto :EOF

:usage
2>&1 echo Usage: %~nx0 htmlfile
goto :EOF

@end // end Batch / begin JScript

var htmlfile = WSH.CreateObject('htmlfile');

htmlfile.write('<meta http-equiv="x-ua-compatible" content="IE=9" />');
htmlfile.write(WSH.StdIn.ReadAll());

WSH.Echo(htmlfile.documentElement.innerText);
htmlfile.close();

IE9兼容模式被调用来认识更多的HTML元素类型比无,同时还允许Vista的兼容性。您可以更改 IE = 9 10,11,或根据需要优势。

IE9 compatibility mode is invoked to recognize more HTML element types than without, while still allowing Vista compatibility. You can change IE=9 to 10, 11, or Edge if needed.

如果您想preFER一个非混合动力的脚本,你也可以使用构造PowerShell中的 HTMLFILE COM对象。这是执行起来慢,但它是简单的code(奇数.NET杂交方法名称虽然)。例如:

If you'd prefer a non-hybrid script, you can also construct the htmlfile COM object using PowerShell. It's slower to execute, but it is simpler code (odd .NET-ish method names notwithstanding). Examples:

蝙蝠脚本:

@echo off & setlocal

if "%~1"=="" goto usage
if not exist "%~1" goto usage

set "htmlfile=%~f1"

set "psCommand="^
    $h=new-object -COM htmlfile;^
    $h.IHTMLDocument2_write('^<meta http-equiv="x-ua-compatible" content="IE=9" /^>');^
    $h.IHTMLDocument2_write(${%htmlfile%});^
    $h.documentElement.innerText""

powershell -noprofile -noninteractive %psCommand%

goto :EOF

:usage
echo Usage: %~nx0 htmlfile
goto :EOF

的.ps1脚本:

param( $htmlfile = $false )

if (-not (test-path $htmlfile)) {
    [console]::Error.WriteLine("Usage: $($MyInvocation.MyCommand.Name) htmlfile")
    exit
}

$html = gc $htmlfile | out-string
$hObj = new-object -COM htmlfile
$hObj.IHTMLDocument2_write('<meta http-equiv="x-ua-compatible" content="IE=9" />')
$hObj.IHTMLDocument2_write($html)
$hObj.documentElement.innerText
$hObj.Close()

(将.ps1作为解决方案的实例:的PowerShell \\ scriptname.ps1 htmlfile.html

和因为我做这行的个人挑战,这里有一个批次+ HTA混合动力的变化是贴在的innerText 未保存到一个新的记事本窗口,因为我可以。

And because I'm doing this for the personal challenge, here's a batch + HTA hybrid variation that pastes the innerText unsaved into a new Notepad window, because I can.

<!-- : batch portion
@echo off & setlocal

if "%~1"=="" goto usage
if not exist "%~1" goto usage

mshta "%~f0" < "%~1"
goto :EOF

:usage
2>&1 echo Usage: %~nx0 htmlfile
goto :EOF

end Batch / begin HTA -->

<meta http-equiv="x-ua-compatible" content="IE=9" />
<div id="out"></div>

<script>
var fso = new ActiveXObject('Scripting.FileSystemObject'),
    osh = new ActiveXObject('WScript.Shell'),
    notepad = osh.Exec('notepad');

document.getElementById('out').innerHTML = fso.GetStandardStream(0).ReadAll();
clipboardData.setData('text', document.getElementById('out').innerText);

var waitActive = setInterval(function() {
    if (osh.AppActivate(notepad.ProcessID)) {
        clearInterval(waitActive);
        close(osh.SendKeys('^v'));
    }
}, 25);

</script>

我用HTA绕过浏览器的安全preventing写剪贴板访问(与 HTMLFILE COM对象的情况),而且由于HTA是重量轻,少可能最终成为一种无形的正在运行的进程比 InternetExplorer.Application COM对象。

I used HTA to circumvent browser security preventing write access to the clipboard (as happens with the htmlfile COM object), and because HTA is lighter weight and less likely to end up as an invisible running process than an InternetExplorer.Application COM object.

这篇关于我怎么HTML文件保存为无格式文本,而不是code,使用编程语言?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆