在将html转换为markdown时,为什么pandoc会保留span和div标签? [英] Why pandoc keeps span and div tags when converting html to markdown?

查看:81
本文介绍了在将html转换为markdown时,为什么pandoc会保留span和div标签?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是pandoc新手,所以我肯定缺少明显的东西. 我正在尝试将MS Word生成的HTML文件转换为markdown. 这是一个测试html:

I'm a pandoc newbie, so I must be missing something obvious. I'm trying to convert MS Word generated HTML file to markdown. Here is a test html:

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <title></title>
</head>
<body>
  <div class="Section1">
    <p class="Question"><span style="FONT-SIZE: 10pt">Today</span> <span style=
    "FONT-SIZE: 10pt">is</span> <span lang="HR" style=
    "FONT-SIZE: 10pt; mso-ansi-language: HR">a</span><span style=
    "FONT-SIZE: 10pt">nice</span> <span style="FONT-SIZE: 10pt">day</span> 
    </p>
  </div>
</body>
</html>

然后我尝试将其转换为:

and I try to convert it with:

pandoc -f html -t markdown test.html -o test.md

我期待今天是美好的一天",但是得到了:

I was expecting "Today is a nice day", but got:

<div class="Section1">

<span style="FONT-SIZE: 10pt">Today</span> <span
style="FONT-SIZE: 10pt">is</span> <span lang="HR"
style="FONT-SIZE: 10pt; mso-ansi-language: HR">a</span><span
style="FONT-SIZE: 10pt">nice</span> <span
style="FONT-SIZE: 10pt">day</span>

</div>

为什么div被保留? 为什么要保留跨度?

Why was the div kept? Why were the spans kept?

推荐答案

您需要关闭一些扩展.在HTML输入端:

You need to turn off some extensions. Either on the HTML input side:

$ pandoc -f html-native_divs-native_spans -t markdown test.html -o test.md

或在降价输出侧:

$ pandoc -f html -t markdown-raw_html-native_divs-native_spans-fenced_divs-bracketed_spans test.html -o test.md

这篇关于在将html转换为markdown时,为什么pandoc会保留span和div标签?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆