Trim() 不适用于表格 [英] Trim() doesn't work with tables

查看:39
本文介绍了Trim() 不适用于表格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不得不今天解决一个小问题(修剪 PDF 转换器添加的 MS Word 文档中的尾随空格到每个单元格),我很快发现使用标准的 Word 界面是不可能的,所以写了一个小的 VBA 脚本:

I had to solve a little problem today (trimming trailing whitespace in a MS Word document that the PDF converter had added to each and every cell), and I quickly found out that this isn't possible using the standard Word interface, so wrote a small VBA script:

Sub TrimCellSpaces()
    Dim itable As Table
    Dim C As Cell
    For Each itable In ThisDocument.Tables
        For Each C In itable.Range.Cells
            C.Range.Text = Trim(C.Range.Text)
        Next
    Next
End Sub

令我惊讶的是,这不仅无法删除尾随空格,甚至在每个单元格的末尾添加了段落标记.所以我尝试了正则表达式方法:

I was surprised that not only did this fail to remove the trailing spaces, it even added paragraph markers at the end of each cell. So I tried a regex approach:

Sub TrimCellSpaces()
    Dim myRE As New RegExp
    Dim itable As Table
    Dim C As Cell
    myRE.Pattern = "\s+$"
    For Each itable In ThisDocument.Tables
        For Each C In itable.Range.Cells
            With myRE
                C.Range.Text = .Replace(C.Range.Text, "")
            End With
        Next
    Next
End Sub

同样的结果.我加了一个断点,将C.Range.Text(替换前)的值复制到十六进制编辑器中,发现以十六进制序列0D 0D 07(07 是 ASCII 响铃字符 (!)).

Same result. I added a breakpoint, copied the value of C.Range.Text (before replacement) into a hex editor and found that it ended in the hex sequence 0D 0D 07 (07 is the ASCII Bell character (!)).

我将正则表达式更改为 \s+(?!.*\w),脚本运行完美.替换操作后,C.Range.Text的值只以0D 07结束(少一个0D).

I changed the regex to \s+(?!.*\w), and the script worked flawlessly. After the replace operation, the value of C.Range.Text ended only in 0D 07 (one 0D fewer).

我也用一个新创建的表格尝试了这个,而不是由 Word 的 PDF 导入器生成的表格 - 结果相同.

I also tried this with a newly created table, not one generated by Word's PDF importer - same results.

这是怎么回事?Word 是否使用 0D 0D 07 作为单元格结尾"标记?还是0D 07?为什么 \s+ 只删除了一个 0D?

What's going on here? Is Word using 0D 0D 07 as an "end of cell" marker? Or is it 0D 07? Why did \s+ remove only one 0D?

推荐答案

Word 中的所有单元格都以 ANSI 13 + ANSI 07 结尾 - 它是单元格结尾"标记(如果您有非- 在用户界面中打开打印字符).Word 使用它来构建表格并存储与单元格相关的信息.

All cells in Word end in ANSI 13 + ANSI 07 - it's the "end of cell" marker (a little "sunshine" if you have the display of non-printing characters turned on in the UI). Word uses this for structuring the table and storing cell-related information.

无法从表格单元格中删除此字符组合 - Word 需要它.如果你能把它移开,桌子就会坏掉.因此,Word 只会阻止您删除它们.

It's not possible to remove this character combination from the table cells - Word requires it. If you could remove it, the table would break. So Word simply prevents you from deleting them.

如果您需要表格单元格内容作为文本字符串,您基本上需要在使用字符串之前检查最后两个字符的字符代码并删除它们.您需要检查这两个字符,因为 Microsoft 在几个版本之前更改了从单元格返回文本的方式.有时它只返回一个字符,有时两者都返回,这取决于您获取信息的方式以及所涉及的 Word 版本.

If you need table cell content as a text string you basically need to check the character codes of the last two characters and remove them before you use the string. You need to check the two characters because Microsoft changed the way text is returned from a cell a few versions back. Sometimes it returns only one of the characters, sometimes both, depending on how you pick up the information and which version of Word is involved.

这篇关于Trim() 不适用于表格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆