如何以编程方式遍历Word文档中的下标,上标和方程式 [英] How to programmatically iterate through subscripts,superscripts and equations found in a Word document
问题描述
我有几个Word文档,每个文档包含数百页的科学数据,其中包括:
I have a few Word documents, each containing a few hundreds of pages of scientific data which includes:
- 化学式(带有适当下标和上标的H2SO4)
- 科学数字(使用上标格式化的指数)
- 许多数学方程式.使用Word中的数学方程式编辑器编写.
问题是,以Word形式存储此数据对我们而言效率不高.因此,我们希望将所有这些信息存储在数据库(MySQL)中.我们想将这些格式转换为LaTex.
Problem is, storing this data in the form of Word is not efficient for us. So we want to store all this information in a Database (MySQL). We want to convert these formatting to LaTex.
是否有任何方法可以遍历所有子脚本&上标使用VBA的方程式?
Is there any way to iterate through all the subcripts & superscripts & Equations using VBA?
如何遍历数学方程式?
推荐答案
Based on your comment on Michael's answer
不!我只想用_ {替换下标中的内容 subscriptcontent}以及类似的带有^ {的上标内容 上标内容}.那将是Tex的等价物.现在,我只是 将所有内容复制到文本文件中,这将删除格式,但 离开这些角色.问题解决了.但是为此,我需要访问 下标&文档的上标对象
No! I just want to replace content in the subscript with _{ subscriptcontent } and similarly superscript content with ^{ superscriptcontent }. That would be the Tex equivalent. Now, I'll just copy everything to a text file which will remove the formatting but leaves these characters. Problem solved. But for that I need to access the subscript & superscript objects of document
Sub sampler()
Selection.HomeKey wdStory
With Selection.find
.ClearFormatting
.Replacement.ClearFormatting
.Font.Superscript = True
.Replacement.Text = "^^{^&}"
.Execute Replace:=wdReplaceAll
.Font.Subscript = True
.Replacement.Text = "_{^&}"
.Execute Replace:=wdReplaceAll
End With
End Sub
编辑
或者如果您还想将OMaths
转换为TeX / LaTeX
,请执行以下操作:
Or If you also want to convert OMaths
to TeX / LaTeX
, then do something like:
- 迭代Omaths>将其分别转换为MathML> [将MathML保存到磁盘] + [在文档中添加一些描述MathML文件参考的标记来代替OMath]>将Word文件转换为文本
- 现在准备一个 MathParser 这样的转换器,然后将MathML文件转换为LateX.
- 解析文本文件>相应地搜索并替换LaTeX代码.
- Iterate over Omaths > convert each to MathML > [save MathML to disk] + [put some mark-up in doc describing MathML file's reference in place of OMath] > convert Word files as text
- Now prepare a converter like MathParser and convert MathML files to LateX.
- Parse text file > search and replace LaTeX code accordingly.
For a completely different idea visit David Carlisle's blog, that might interest you.
更新
The module
Option Explicit
'This module requires the following references:
'Microsoft Scripting Runtime
'MicroSoft XML, v6.0
Private fso As New Scripting.FileSystemObject
Private omml2mml$, mml2Tex$
Public Function ProcessFile(fpath$) As Boolean
'convPath set to my system at (may vary on your system):
omml2mml = "c:\program files\microsoft office\office14\omml2mml.xsl"
'download: http://prdownloads.sourceforge.net/xsltml/xsltml_2.0.zip
'unzip at «c:\xsltml_2.0»
mml2Tex = "c:\xsltml_2.0\mmltex.xsl"
Documents.Open fpath
'Superscript + Subscript
Selection.HomeKey wdStory
With Selection.find
.ClearFormatting
.Replacement.ClearFormatting
'to make sure no paragraph should contain any emphasis
.Text = "^p"
.Replacement.Text = "^&"
.Replacement.Font.Italic = False
.Replacement.Font.Bold = False
.Replacement.Font.Superscript = False
.Replacement.Font.Subscript = False
.Replacement.Font.SmallCaps = False
.Execute Replace:=wdReplaceAll
.Font.Italic = True
.Replacement.Text = "\textit{^&}"
.Execute Replace:=wdReplaceAll
.Font.Bold = True
.Replacement.Text = "\textbf{^&}"
.Execute Replace:=wdReplaceAll
.Font.SmallCaps = True
.Replacement.Text = "\textsc{^&}"
.Execute Replace:=wdReplaceAll
.Font.Superscript = True
.Replacement.Text = "^^{^&}"
.Execute Replace:=wdReplaceAll
.Font.Subscript = True
.Replacement.Text = "_{^&}"
.Execute Replace:=wdReplaceAll
End With
Dim dict As New Scripting.Dictionary
Dim om As OMath, t, counter&, key$
key = Replace(LCase(Dir(fpath)), " ", "_omath_")
counter = 0
For Each om In ActiveDocument.OMaths
DoEvents
counter = counter + 1
Dim tKey$, texCode$
tKey = "<" & key & "_" & counter & ">"
t = om.Range.WordOpenXML
texCode = TransformString(TransformString(CStr(t), omml2mml), mml2Tex)
om.Range.Select
Selection.Delete
Selection.Text = tKey
dict.Add tKey, texCode
Next om
Dim latexDoc$, oPath$
latexDoc = "\documentclass[10pt]{article}" & vbCrLf & _
"\usepackage[utf8]{inputenc} % set input encoding" & vbCrLf & _
"\usepackage{amsmath,amssymb}" & vbCrLf & _
"\begin{document}" & vbCrLf & _
"###" & vbCrLf & _
"\end{document}"
oPath = StrReverse(Mid(StrReverse(fpath), InStr(StrReverse(fpath), "."))) & "tex"
'ActiveDocument.SaveAs FileName:=oPath, FileFormat:=wdFormatText, Encoding:=1200
'ActiveDocument.SaveAs FileName:=oPath, FileFormat:=wdFormatText, Encoding:=65001
ActiveDocument.Close
Dim c$, i
c = fso.OpenTextFile(oPath).ReadAll()
counter = 0
For Each i In dict
counter = counter + 1
Dim findText$, replaceWith$
findText = CStr(i)
replaceWith = dict.item(i)
c = Replace(c, findText, replaceWith, 1, 1, vbTextCompare)
Next i
latexDoc = Replace(latexDoc, "###", c)
Dim ost As TextStream
Set ost = fso.CreateTextFile(oPath)
ost.Write latexDoc
ProcessFile = True
End Function
Private Function CreateDOM()
Dim dom As New DOMDocument60
With dom
.async = False
.validateOnParse = False
.resolveExternals = False
End With
Set CreateDOM = dom
End Function
Private Function TransformString(xmlString$, xslPath$) As String
Dim xml, xsl, out
Set xml = CreateDOM
xml.LoadXML xmlString
Set xsl = CreateDOM
xsl.Load xslPath
out = xml.transformNode(xsl)
TransformString = out
End Function
The calling(from immediate window):
?ProcessFile("c:\test.doc")
结果将在c:\
中创建为test.tex
.
The result would be created as test.tex
in c:\
.
该模块可能需要修复一些位置.如果是这样,请告诉我.
The module may need to fix some places. If so let me know.
这篇关于如何以编程方式遍历Word文档中的下标,上标和方程式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!