为什么Office OpenXML在标签之间拆分文本,以及如何防止文本拆分? [英] Why Office OpenXML splits text between tags and how to prevent it?

查看:206
本文介绍了为什么Office OpenXML在标签之间拆分文本,以及如何防止文本拆分?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在尝试使用 PHPWord 库及其模板系统来处理docx文件.我已经找到并更新了指向该库的,可以使用表的路径(请记住名称,但并不重要),该路径可以与表一起使用(复制其行,然后在每行上使用PHPWord的标准setValue()).

I'm currently trying to work with docx files using PHPWord library and its templating system. I have found and updated someones (cant remember the name, but its not important) path to this library that can work with tables (replicate its rows and then use standard setValue() from PHPWord on each of row).

如果我创建自己的文档,则xml中的数据具有正常结构,因此要替换的变量$ {variable}在其自己的标记中,如下所示:

If i create my own document, the data in xml is in normal structure, so the variable to be replaced ${variable} is in its own tag like this:

<w:tbl>
    <w:tr>
        ...
         ${variable}
    </w:tr>
</w:tbl>

我简化了代码,在实际代码中,还有许多其他标签说明了尺寸,样式等.

I simplified the code, in actual code there is number of other tags descibing sizes, styles, etc.

我的问题是我必须从其他人那里获取文件,而我被禁止做大的更改,我得到的文件在某些​​时候是一张有空白行的表.我添加$ {variable}变量并通过PHPWord运行它.问题是,它失败了.经过研究,我发现源XML看起来像这样:

My problem is i have to proccess documents from other people where i am prohibited to make big changes, I get a document where at some point they is a table with one blank row. I add the ${variable} variables and run it through PHPWord. Problem is, that it fails. After doing some research , I found out that the source XML looks like this:

    ....
        ...
         ${va

        ...
         riab

        ...
         le}
    ....

(再次简化,但您可以看到图片)

(again heavily simplified, but you get the picture)

这个结构对我来说是个问题,因为克隆行的函数使用strpos(),substr()和正则表达式来工作,并且不适用于此结构(而且我无法想象这样做的优雅方法).

This structure is a problem for me, because the function to clone rows uses strpos(), substr() and regular expressions to work and does not work with this structure (and I cant imagine elegant way to do it so).

问题是-有人知道docx为什么这样做以及如何防止他吗?我正在寻找通过单词而不是PHP的解决方案(我需要当前的函数才能进行大量编辑,而无需进行大量编辑)

So the question is - Does anybody know why docx does this and how to prevent him? I am looking for a solution via word, not PHP (I need current functions to work without much editing)

推荐答案

我为此问题做了很多工作:

I have worked with this problem a lot:

换句话说,文档可以像这样保存

In word, the document can be saved like this

  <w:t>{</w:t>...
  <w:t>variable</w:t>
  <w:t>}</w:t>

因此,我创建了一个即使变量名称被分割也可以使用的JS库: Docxtemplater (也可以在服务器端).我在开发过程中发现,如果满足以下条件,则不会拆分变量名称:

I have therefore create a JS library that works even if variable names are splitted: Docxtemplater (works server side too) . What I have found out during development is that variables names aren't splitted if:

  • 要查找的文本仅由a-zA-Z字符组成(不包含{,$或})
  • 如果文本不是一键书写,则文本可能会被分割:例如,如果您犯了拼写错误,并输入$ {varuable},然后进行编辑-> $ {variable},则文本内部xml很可能会被拆分.基本上,您必须一键编写变量名,如果要编辑它,请完全重写变量名.

我不认为有一种方法可以用Word中的一个命令来修复docx文档,但是重写变量以将其写入一次Stroke应该可以.

I don't think there's a way to fix a docx document with one command in Word, , but rewriting the variables to write them in one Stroke should work.

这篇关于为什么Office OpenXML在标签之间拆分文本,以及如何防止文本拆分?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆