C# - 在标记之间移除HTML源代码中的空格? [英] C# - Remove spaces in HTML source in between markups?
问题描述
我目前正在开发一个程序,该程序允许我将HTML源代码输入到RichTextBox控件中,并删除标记之间的空格。唯一的问题是,我不确定如何区分标记和标记内部空间之间的空白。显然,删除标记内的空格会很糟糕。任何想法如何我可以区分?
示例:(在空白空间被移除之前)
< p为H. blahblahblah< / p为H. < p为H. blahblahblah< / p为H.
示例:(在空格被移除后)
< p为H. blahblahblah< / p为H.< p为H. blahblahblah< / p为H.
Rasik发送这里它也是你的解决方案。
Regex.Replace(html,@\s *(< [>] +>)\ s *,$ 1,RegexOptions.Singleline) ;
常规会按照原样加上标记和周围的空格字符,并用标记对其进行更改。 / b>
编辑:
更好的解决方案适用于Micheal示例
Regex.Replace(txtSource.Text,
@\s *(?< capture><(?< markUp> \ w +)>。*< ; \ / \k< markUp>>)\s *,$ {capture},RegexOptions.Singleline);
这个正则表达式会检测标记标记,不会改变它里面的内容并删除空格侧。
还有其他一些案例可以看。就像没有结束标签的标记一样。
I am currently working on a program that allows me to enter HTML source code into a RichTextBox control and removes the spaces from in between markups. The only problem is, I am not sure how I can differentiate between the spaces BETWEEN the markups and the spaces INSIDE the markups. Obviously, removing the spaces inside the markups would be bad. Any ideas as to how I can tell the difference?
Example: (before white space is removed)
<p>blahblahblah</p> <p>blahblahblah</p>
Example: (after white space is removed)
<p>blahblahblah</p><p>blahblahblah</p>
the solution in the link that Rasik sent here it's a solution for you too
Regex.Replace(html, @"\s*(<[^>]+>)\s*", "$1", RegexOptions.Singleline);
The regular take the markup as it is and the around space characters and change it with the markup.
Edit: A better solution that work for Micheal example
Regex.Replace(txtSource.Text,
@"\s*(?<capture><(?<markUp>\w+)>.*<\/\k<markUp>>)\s*", "${capture}", RegexOptions.Singleline);
this regular expression will detect the markup tags and don't change what it's inside and remove the spaces out side. There's some other cases to look to it too. Like the markup without ending tags.
这篇关于C# - 在标记之间移除HTML源代码中的空格?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!