如何删除存在-within-一些文本,而不是在一些文本的开始任何UTF-8 BOM [英] How can I remove any UTF-8 BOM that exists -within- some text, not at the start of some text
问题描述
我们收到了一些文件,这些文件已被并置另一方。在这些文件中的中间是一些 BOM
字符。
We receive some files, which have been concatenated by another party. In the middle of these files are some BOM
characters.
有没有一种方法能检测出这些3字符并删除它们?我看过很多有关如何删除 BOM
从文件的-start- ......但没有中间的例子。
Is there a way we can detect these 3 chars and remove them? I've seen plenty of examples about how to remove the BOM
from the -start- of a file ... but not the middle.
推荐答案
假设你的文件足够小,以保留在内存中,并且你有一个 Enumerable.Replace
更换子序列扩展方法,那么你可以使用:
Assuming that your file is small enough to hold in memory, and that you have an Enumerable.Replace
extension method for replacing subsequences, then you could use:
var bytes = File.ReadAllBytes(filePath);
var bom = new byte[] { 0xEF, 0xBB, 0xBF };
var empty = Enumerable.Empty<byte>();
bytes = bytes.Replace(bom, empty).ToArray();
File.WriteAllBytes(filePath, bytes);
下面是一个简单的(低效率)实施替换
扩展方法:
Here is a simple (inefficient) implementation of the Replace
extension method:
public static IEnumerable<TSource> Replace<TSource>(
this IEnumerable<TSource> source,
IEnumerable<TSource> match,
IEnumerable<TSource> replacement)
{
return Replace(source, match, replacement, EqualityComparer<TSource>.Default);
}
public static IEnumerable<TSource> Replace<TSource>(
this IEnumerable<TSource> source,
IEnumerable<TSource> match,
IEnumerable<TSource> replacement,
IEqualityComparer<TSource> comparer)
{
int sLength = source.Count();
int mLength = match.Count();
if (sLength < mLength || mLength == 0)
return source;
int[] matchIndexes = (
from sIndex in Enumerable.Range(0, sLength - mLength + 1)
where source.Skip(sIndex).Take(mLength).SequenceEqual(match, comparer)
select sIndex
).ToArray();
var result = new List<TSource>();
int sPosition = 0;
foreach (int mPosition in matchIndexes)
{
var sPart = source.Skip(sPosition).Take(mPosition - sPosition);
result.AddRange(sPart);
result.AddRange(replacement);
sPosition = mPosition + mLength;
}
var sLastPart = source.Skip(sPosition).Take(sLength - sPosition);
result.AddRange(sLastPart);
return result;
}
这篇关于如何删除存在-within-一些文本,而不是在一些文本的开始任何UTF-8 BOM的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!