如何以编程方式遍历Word文档中的下标,上标和方程式 [英] How to programmatically iterate through subscripts,superscripts and equations found in a Word document

查看:102
本文介绍了如何以编程方式遍历Word文档中的下标,上标和方程式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有几个Word文档,每个文档包含数百页的科学数据,其中包括:

I have a few Word documents, each containing a few hundreds of pages of scientific data which includes:

  • 化学式(带有适当下标和上标的H2SO4)
  • 科学数字(使用上标格式化的指数)
  • 许多数学方程式.使用Word中的数学方程式编辑器编写.

问题是,以Word形式存储此数据对我们而言效率不高.因此,我们希望将所有这些信息存储在数据库(MySQL)中.我们想将这些格式转换为LaTex.

Problem is, storing this data in the form of Word is not efficient for us. So we want to store all this information in a Database (MySQL). We want to convert these formatting to LaTex.

是否有任何方法可以遍历所有子脚本&上标使用VBA的方程式?

Is there any way to iterate through all the subcripts & superscripts & Equations using VBA?

如何遍历数学方程式?

推荐答案

基于您的

Based on your comment on Michael's answer

不!我只想用_ {替换下标中的内容 subscriptcontent}以及类似的带有^ {的上标内容 上标内容}.那将是Tex的等价物.现在,我只是 将所有内容复制到文本文件中,这将删除格式,但 离开这些角色.问题解决了.但是为此,我需要访问 下标&文档的上标对象

No! I just want to replace content in the subscript with _{ subscriptcontent } and similarly superscript content with ^{ superscriptcontent }. That would be the Tex equivalent. Now, I'll just copy everything to a text file which will remove the formatting but leaves these characters. Problem solved. But for that I need to access the subscript & superscript objects of document

Sub sampler()
    Selection.HomeKey wdStory
    With Selection.find
        .ClearFormatting
        .Replacement.ClearFormatting
        .Font.Superscript = True
        .Replacement.Text = "^^{^&}"
        .Execute Replace:=wdReplaceAll
        .Font.Subscript = True
        .Replacement.Text = "_{^&}"
        .Execute Replace:=wdReplaceAll
    End With
End Sub

编辑

或者如果您还想将OMaths转换为TeX / LaTeX,请执行以下操作:

Or If you also want to convert OMaths to TeX / LaTeX, then do something like:

  • Iterate over Omaths > convert each to MathML > [save MathML to disk] + [put some mark-up in doc describing MathML file's reference in place of OMath] > convert Word files as text
  • Now prepare a converter like MathParser and convert MathML files to LateX.
  • Parse text file > search and replace LaTeX code accordingly.

有关完全不同的想法,请访问

For a completely different idea visit David Carlisle's blog, that might interest you.

更新

The module

Option Explicit

'This module requires the following references:
'Microsoft Scripting Runtime
'MicroSoft XML, v6.0

Private fso As New Scripting.FileSystemObject
Private omml2mml$, mml2Tex$

Public Function ProcessFile(fpath$) As Boolean
    'convPath set to my system at (may vary on your system):
    omml2mml = "c:\program files\microsoft office\office14\omml2mml.xsl"
    'download: http://prdownloads.sourceforge.net/xsltml/xsltml_2.0.zip
    'unzip at «c:\xsltml_2.0»
    mml2Tex = "c:\xsltml_2.0\mmltex.xsl"

    Documents.Open fpath

    'Superscript + Subscript
    Selection.HomeKey wdStory
    With Selection.find
        .ClearFormatting
        .Replacement.ClearFormatting

        'to make sure no paragraph should contain any emphasis
        .Text = "^p"
        .Replacement.Text = "^&"
        .Replacement.Font.Italic = False
        .Replacement.Font.Bold = False
        .Replacement.Font.Superscript = False
        .Replacement.Font.Subscript = False
        .Replacement.Font.SmallCaps = False
        .Execute Replace:=wdReplaceAll


        .Font.Italic = True
        .Replacement.Text = "\textit{^&}"
        .Execute Replace:=wdReplaceAll

        .Font.Bold = True
        .Replacement.Text = "\textbf{^&}"
        .Execute Replace:=wdReplaceAll

        .Font.SmallCaps = True
        .Replacement.Text = "\textsc{^&}"
        .Execute Replace:=wdReplaceAll


        .Font.Superscript = True
        .Replacement.Text = "^^{^&}"
        .Execute Replace:=wdReplaceAll


        .Font.Subscript = True
        .Replacement.Text = "_{^&}"
        .Execute Replace:=wdReplaceAll
    End With

    Dim dict As New Scripting.Dictionary
    Dim om As OMath, t, counter&, key$
    key = Replace(LCase(Dir(fpath)), " ", "_omath_")
    counter = 0

    For Each om In ActiveDocument.OMaths
        DoEvents
        counter = counter + 1
        Dim tKey$, texCode$
        tKey = "<" & key & "_" & counter & ">"
        t = om.Range.WordOpenXML

        texCode = TransformString(TransformString(CStr(t), omml2mml), mml2Tex)
        om.Range.Select
        Selection.Delete
        Selection.Text = tKey

        dict.Add tKey, texCode

    Next om

    Dim latexDoc$, oPath$
    latexDoc = "\documentclass[10pt]{article}" & vbCrLf & _
                "\usepackage[utf8]{inputenc} % set input encoding" & vbCrLf & _
                "\usepackage{amsmath,amssymb}" & vbCrLf & _
                "\begin{document}" & vbCrLf & _
                "###" & vbCrLf & _
                "\end{document}"

    oPath = StrReverse(Mid(StrReverse(fpath), InStr(StrReverse(fpath), "."))) & "tex"
    'ActiveDocument.SaveAs FileName:=oPath, FileFormat:=wdFormatText, Encoding:=1200
    'ActiveDocument.SaveAs FileName:=oPath, FileFormat:=wdFormatText, Encoding:=65001
    ActiveDocument.Close

    Dim c$, i
    c = fso.OpenTextFile(oPath).ReadAll()

    counter = 0

    For Each i In dict
        counter = counter + 1
        Dim findText$, replaceWith$
        findText = CStr(i)
        replaceWith = dict.item(i)
        c = Replace(c, findText, replaceWith, 1, 1, vbTextCompare)
    Next i

    latexDoc = Replace(latexDoc, "###", c)

    Dim ost As TextStream
    Set ost = fso.CreateTextFile(oPath)
    ost.Write latexDoc

    ProcessFile = True


End Function

Private Function CreateDOM()
    Dim dom As New DOMDocument60
    With dom
        .async = False
        .validateOnParse = False
        .resolveExternals = False
    End With
    Set CreateDOM = dom
End Function

Private Function TransformString(xmlString$, xslPath$) As String
    Dim xml, xsl, out
    Set xml = CreateDOM
    xml.LoadXML xmlString
    Set xsl = CreateDOM
    xsl.Load xslPath
    out = xml.transformNode(xsl)
    TransformString = out
End Function

The calling(from immediate window):

?ProcessFile("c:\test.doc")

结果将在c:\中创建为test.tex.

The result would be created as test.tex in c:\.

该模块可能需要修复一些位置.如果是这样,请告诉我.

The module may need to fix some places. If so let me know.

这篇关于如何以编程方式遍历Word文档中的下标,上标和方程式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆