如何得到没有。在c#win中一起出现的一封信。形成 [英] How to get no. of occurrence of a letter which are together in c# win. form
问题描述
如何获得不。在c#win中出现一起出现的信件表格
例如: -
案例
How to get no. of occurrence of a letter which are together in c# win. form
Example:-
Case
string s = "aaaabbbcca";
输出
a-4,b- 3,c-2,a-1
注:
1 )由于性能问题,请不要提供循环或迭代的解决方案
2)我必须使用 GC.Collect( )因为在我的实际案例中 10,000,000,000字符串长度所以我需要在处理完数据后立即释放内存。
到目前为止,我的代码是
Output
a-4,b-3,c-2,a-1
Note:
1) Please don't provide solutions with loops or iteration because of performance issue
2) I have to use GC.Collect() because there is 10,000,000,000 length of string in my real case so i need to free memory as soon as the data has been processed)
My code till now is
StringBuilder Output = new StringBuilder();
int Times = 0;
char NewChar = snewbuild[0];
char Lastchar = NewChar;//Need first char
for (; snewbuild.Length > 0; )
{
NewChar = snewbuild[0];
if (Lastchar == NewChar)
{
Times++;
}
else
{
Output.Append(Lastchar + "-" + Times + ",");
Times = 1;
GC.Collect();
}
Lastchar = NewChar;
snewbuild.Remove(0, 1);
}
Output.Append(Lastchar + "-" + Times);
任何一段代码或任何有关以下问题的新想法都将受到赞赏。提前谢谢
Any piece of code or any new idea for the following question will be appreciated & thanks in advance
推荐答案
public static string TallyPhraseFrequencies(string input)
{
StringBuilder output = new StringBuilder();
int frequency = 1;
char c = input[0];
for (int i = 1; i < input.Length; i++)
{
if (i == input.Length - 1)
{
if (input[i] != c)
{
output.Append(c.ToString()).Append("-").Append(frequency).Append(",");
output.Append(input[i].ToString()).Append("-1,");
}
else
output.Append(c.ToString()).Append("-").Append(frequency + 1).Append(",");
}
else if (input[i] != c)
{
output.Append(c.ToString()).Append("-").Append(frequency).Append(",");
c = input[i];
frequency = 1;
}
else
frequency++;
}
return output.ToString();
}
.NET对象不能> 2GB。在尺寸方面。 64位系统上的字符串对象... Unicode =每个字符两个字节...最大可能超过1GB。
StringBuilder有一个最大容量与Int32相同:+2,147,483,647。
虽然我相信你可以读取 10gb。文本文件(5GB的Unicode双字节字符代码),并逐块处理它来分析相同字母的相邻频率,我不敢相信你可以拥有那个大小的内存中字符串。
如果没有某种形式的循环/迭代,无法进行频率分析。
无论如何,这里有一种方法可以解决这个问题:
.NET objects cannot be > 2gb. in size. A string object on a 64-bit system ... Unicode = two bytes per character ... would probably max out at a little over 1gb.
StringBuilder has a maximum capacity the same as that of an Int32: +2,147,483,647.
While I believe you could read a 10gb. text file (5gb. of Unicode two-byte character codes), and process it chunk-by-chunk to analyze same-letter adjacent frequency, I cannot believe you can have an in-memory string of that size.
There is also simply no way to perform frequency analysis without some form of loop/iteration.
Anyhow, here's one way you could go about this:
private string parseFrequencies(string data)
{
StringBuilder sb1 = new StringBuilder();
StringBuilder sb2 = new StringBuilder();
sb1.Append(data);
int count, pos;
while (sb1.Length > 0)
{
char c = sb1[0];
pos = 1;
count = 1;
while (pos < sb1.Length)
{
if (sb1[pos] == c)
{
count++;
pos++;
}
else
{
break;
}
}
sb2.Append(string.Format("{0}-{1},", c, count));
sb1.Remove(0, count);
}
return sb2.ToString();
}
// sample test
string freq = parseFrequencies("aaaabbbcca"); // => "a-4,b-3,c-2,a-1,"
所需的循环可以整理一下,例如
The loops required can be tidied up a bit, for example
public static void RunLengthEncode(string s) {
Console.WriteLine(s);
int pos = 0;
while (pos < s.Length) {
char c = s[pos];
int startPos = pos;
for (; pos < s.Length && c == s[pos]; ++pos) ;
int runLength = pos - startPos;
// example output
Console.Write("{0}-{1},", c, runLength);
}
Console.WriteLine();
}
Alan。
Alan.
这篇关于如何得到没有。在c#win中一起出现的一封信。形成的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!