XML Node InnerText Whitespace / Tabs [英] XML Node InnerText Whitespace/Tabs

查看:86
本文介绍了XML Node InnerText Whitespace / Tabs的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

感谢昨天的帮助,我现在正在访问我的XML节点。问题是,即使使用XML.PreserveWhitespace = false,我也会收到太多空格;

Thanks to the help yesterday, I am now accessing my XML nodes. The problem is, I'm receiving too much white space, even with XML.PreserveWhitespace = false;

示例节点看起来像这样

<Chapter Title="Chapter 1">
	<snippet>
		<sKey>function</sKey>
		<sVal>
			def myNewFunc():
				print("hello world")
		</sVal>
	</snippet>
</Chapter>

当我将sVal.InnerText粘贴到我的多行文本框中时,它会带来所有的带有它的空白区域,包括来自文档根目录的三个缩进的sVal。如您所见,我粘贴的值是Python代码,因此缩进很重要。
我没有编写一个函数来擦除我从XML中获取的文本,而是在寻找一种格式化空白区域的方法,同时忽略了sVal的缩进级别。我的快速解决方法是,string.Replace(" \t\t\t","");,但我想知道是否有
是更好的方法。

When I paste the sVal.InnerText into my multi line textbox, it is bringing all of the white space with it, including the three indentations of sVal from the root of the document. As you can see, the value I am pasting is Python code, so indentation is important. Rather than writing a function to scrub the text I get from the XML, I'm searching for a way to format the white space while ignoring the indentation level of sVal. My quick fix is, string.Replace("\t\t\t", "");, but I'm wondering if there is a better way to go about it.

期望的结果,

def myNewFunc():
	print("hello world")

提前致谢!

推荐答案

这真的取决于如何解析数据,但在检索InnerText时需要获取空格。我个人总是在属性上调用Trim,它消除了任何前导/尾随空格。但在你的情况下,你说空白
是重要的。修剪消除了核心空间但是根据您的要求,如果我理解正确的话,你真的需要从每一行中删除相同数量的空白。所以我可能会做的是在InnerText属性上调用Split并在换行符上拆分。
这为您提供了"行" HTML中的文本。然后计算第一行开头的空格,然后从剩余的行中删除相同数量的空格。这实际上取消了HTML。

It really depends upon how you're parsing the data but getting the whitespace is expected when retrieving the InnerText. Personally I always call Trim on the property which eliminates any leading/trailing whitespace. But in your case you said whitespace was significant. Trim eliminates the core space but given your requirement you really need to remove the same amount of whitespace from each line if I understand correctly. So what I would probably do is call Split on the InnerText property and split on newlines. This gives you the "lines" of text within the HTML. Then count the spaces at the beginning of the first line and then remove that same amount of whitespace from the remaining lines. This effectively unindents the HTML.

您应该处理两种情况。


  1. 后续行可能不会缩进所以不要盲目地从每一行中删除相同数量的空间,只有在它是空格(空格和制表符)时才删除它。扩展方法在这里可能很有用。
  2. 您的行可能没有那么多空格,因此处理行不那么长的情况。

我在想一个简单的Unindent扩展方法可以处理这两种情况。这是一个unindent函数的第一次传递(作为扩展方法)。你需要测试和优化它。

I'm thinking a simple Unindent extension method could handle both of these situations. Here's a first pass at an unindent function (as extension method). You'll want to test and optimize it.

public static string Unindent ( this string source, int count )
{
    if (source.Length <= count)
        return source.Trim();

    var leadingSpaces = 0;
    foreach (var ch in source)
    {
        if (Char.IsWhiteSpace(ch))
            ++leadingSpaces;
        else
            break;

        //Stop when we get to our count (optimizes scanning the string)
        if (leadingSpaces == count)
            break;
    };

    return source.Remove(0, leadingSpaces);
}

public static IEnumerable<string> Unindent ( this IEnumerable<string> source, int count )
{
    foreach (var item in source)
        yield return item.Unindent(count);
}

Michael Taylor

http://www.michaeltaylorp3.net

Michael Taylor
http://www.michaeltaylorp3.net


这篇关于XML Node InnerText Whitespace / Tabs的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆