值的概率属于给定的一种或另一种类型(C#) [英] Probability of a value being of a given one or another type (C#)

查看:96
本文介绍了值的概率属于给定的一种或另一种类型(C#)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在制作一个程序来生成一个基于数据集属性的类,(没什么特别的,只是简化生活的工具)。



现在我正在测试明星的CSV数据,因为它有很多列,而且数据有些混乱。



我现在正在做的是我采取的数据集,提取标题,然后生成一个类,(为简洁起见删除约30行):

  class  
{
public int Id {获得; set ; }
public string Hip { get ; set ; }
public string Hd { get ; set ; }
public string Hr { get ; set ; }
public decimal Ci { get ; set ; }
public decimal X { get ; set ; }
public decimal Y { get ; set ; }
public decimal Z { get ; set ; }
}



这是通过循环遍历第一行数据来完成的,用这个来评估单元格:

  public   static   string 评估(字符串输入)
{
string output = string;

if decimal .TryParse(input, out var result)){output = 十进制 ; }
if (DateTime.TryParse(输入, out var result1)){output = DateTime; }
if int .TryParse(input, out var result2)){output = INT; }

if (input == ){output = string; }

return 输出;
}



它有点笨拙,但这不是一场选美比赛



我需要的是评估前5行数据,然后比较最可能的数据类型。事实证明,只是对第一行进行采样是一个坏主意,所以我正在寻找哪种类型对于五行中的三行是正确的(如果没有类型,那么它就是一个字符串)。



数据:

 id hip hd ra dec dist pmra pmdec 
0 0 0 0 0 0
1 1 224700 0.00006 1.089009 219.7802 -5.2 -1.88
2 2 224690 0.000283 -19.49884 47.9616 181.21 -0.93
3 3 224699 0.000335 38.859279 442.4779 5.24 -2.91
4 4 224707 0.000569 -51.893546 134.2282 62.85 0.16



注意:第一个数据线是Sol,在填充小数的列中将其作为值0,所以只检查第一行或一行是愚蠢的,因为它不会处理一个未显示为0.0的整数。



GitHub项目: https://github.com/frankhaugen/class-from-dataset



我的尝试:



是的,我用谷歌搜索了,但我要么缺少我的rds,或者没有任何关于这个的信息



而且我已经考虑过使用ML.NET来完成任务,但它看起来有点矫枉过正,(但它可能很有趣)

解决方案

试试这篇文章: CSV / Excel文件解析器 - 重访 [ ^ ]



它通过确定类型本身来创建强类型DataTable,或者由程序员指定的类型。


我查看了评论,他们开始了我的正确道路。



我改变了从字典和列表到DataTable,(它已经设置为二维数据)。我的代码需要大量的清理和评论,但现在它做了我想要的,(大多数情况下)



我创建数据表的代码:

 静态  void  GenerateDataTable()
{
dt.TableName = Stars;
string [] heads = inputData [ 0 ]。替换( )。分割(' ,');

foreach string head 头部)
{
dt.Columns.Add( new DataColumn(){ColumnName = head});
}

for int i = 1 ; i < inputData.Count; i ++)
{
dt.Rows.Add( inputData [i] .Replace( )。分割(' ,'< /跨度>));
}

字典< int,string> testDict = new Dictionary< int,string>();

for int i = 0 ; i < dt.Columns.Count; i ++)
{
List< string> testList = new List< string>();
for int j = 0 ; j < 500 ; j ++)
{
testList.Add (EvaluateVariableType.Evaluate(dt.Rows [j]的.ItemArray [I]的ToString()));

}

testDict.Add(i,MostOccurences(testList));
}

for int i = 0 ; i < testDict.Count; i ++)
{
HeadersAndTypes.Add(heads [ i],testDict [i]);

Console.WriteLine(testDict [i]);
}
}



为了计算出现率,我这样做了,(从stackexchange上偷来的):

< pre lang =c#> static string MostOccurences(List< string> input)
{
string 输出;

尝试
{
var groupsWithCounts = < span class =code-sdkkeyword> from s in input
group s by s into g
select new
{
Item = g.Key,
Count = g.Count()
};

var groupsSorted = groupsWithCounts.OrderByDescending(g = > g.Count );
string mostFrequest = groupsSorted.First()。Item;

output = mostFrequest;
}
catch (例外e)
{
output = e.Message;
}

return 输出;
}


I'm making a program to generate a class with properties based on a dataset, (nothing fancy, just a tool to simplify life).

Right now I'm testing CSV-data of stars, as it a lot of columns, and it's somewhat messy data.

What I'm doing right now is that I take the dataset, extracts the headers, then generate a class, (removed about 30 lines for brevity):

class 
{
	public int Id { get; set; }
	public string Hip { get; set; }
	public string Hd { get; set; }
	public string Hr { get; set; }
	public decimal Ci { get; set; }
	public decimal X { get; set; }
	public decimal Y { get; set; }
	public decimal Z { get; set; }
}


This is done by looping through the first row of data, evaluating the cells with this:

public static string Evaluate(string input)
{
    string output = "string";
    
    if (decimal.TryParse(input, out var result)) { output = "decimal"; }
    if (DateTime.TryParse(input, out var result1)) { output = "DateTime"; }
    if (int.TryParse(input, out var result2)) { output = "int"; }

    if (input == "") { output = "string"; }

    return output;
}


It's a bit clunky, but this isn't a beauty-contest

What I need to to is to evaluate the first 5 rows of data, then compare which datatype is the most likely. It turns out that just sampling the first line is a bad idea, so I'm looking for which type is correct for three out of five lines, (if no type has this it's a string).

The data:

id	hip	hd	ra	dec	dist	pmra	pmdec
0			0	0	0	0	0
1	1	224700	0.00006	1.089009	219.7802	-5.2	-1.88
2	2	224690	0.000283	-19.49884	47.9616	181.21	-0.93
3	3	224699	0.000335	38.859279	442.4779	5.24	-2.91
4	4	224707	0.000569	-51.893546	134.2282	62.85	0.16


Note: The first dataline is Sol, ging it 0 as value in a column filled with decimals, so just checking the first, or one line would be folly, as it wouldn't handle a whole number not shown as 0.0.

The GitHub Project: https://github.com/frankhaugen/class-from-dataset

What I have tried:

Yes, I've googled, but I to either lack the words, or there isn't any information on this

And I've looked into using ML.NET for the task, but it seems somewhat overkill, (but it might be fun)

解决方案

Try this article: CSV/Excel File Parser - A Revisit[^]

It creates a strongly-typed DataTable either by determining the types itself, or by programmer-specified types.


I looked at the comments, and they started me down the right path.

I changed over to DataTable from Dictionaries and Lists, (it's already set up for 2-dimensional data). My code need a lot of cleaning and commenting, but now it does what I want, (mostly)

My code for creating the datatable:

static void GenerateDataTable()
{
    dt.TableName = "Stars";
    string[] heads = inputData[0].Replace(" ", "").Split(',');

    foreach (string head in heads)
    {
        dt.Columns.Add(new DataColumn() { ColumnName = head });
    }

    for (int i = 1; i < inputData.Count; i++)
    {
        dt.Rows.Add(inputData[i].Replace(" ", "").Split(','));
    }

    Dictionary<int, string> testDict = new Dictionary<int, string>();

    for (int i = 0; i < dt.Columns.Count; i++)
    {
        List<string> testList = new List<string>();
        for (int j = 0; j < 500; j++)
        {
            testList.Add(EvaluateVariableType.Evaluate(dt.Rows[j].ItemArray[i].ToString()));
            
        }
        
        testDict.Add(i, MostOccurences(testList));
    }

    for (int i = 0; i < testDict.Count; i++)
    {
        HeadersAndTypes.Add(heads[i], testDict[i]);

        Console.WriteLine(testDict[i]);
    }
}


For the counting of the occurances i did this, (stolen from stackexchange):

static string MostOccurences(List<string> input)
{
    string output;

    try
    {
        var groupsWithCounts = from s in input
                               group s by s into g
                               select new
                               {
                                   Item = g.Key,
                                   Count = g.Count()
                               };

        var groupsSorted = groupsWithCounts.OrderByDescending(g => g.Count);
        string mostFrequest = groupsSorted.First().Item;

        output = mostFrequest;
    }
    catch (Exception e)
    {
        output = e.Message;
    }

    return output;
}


这篇关于值的概率属于给定的一种或另一种类型(C#)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆