生成CSV树结构 [英] Generate tree structure from csv

查看:177
本文介绍了生成CSV树结构的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我还抓我的头过这个问题,而现在。我基本上是试图生成一组CSV数据树层次结构。 CSV数据不一定排序。这是喜欢的东西如下:

I have scratched my head over this problem for a while now. I am basically trying to generate a tree hierarchy from a set of CSV data. The CSV data is not necessarily ordered. This is like something as follows:

Header: Record1,Record2,Value1,Value2
Row: A,XX,22,33
Row: A,XX,777,888
Row: A,YY,33,11
Row: B,XX,12,0
Row: A,YY,13,23
Row: B,YY,44,98

我试图使分组尽可能灵活执行的方式。分组的最简单的将做它记录1和RECORD2与值1和值2 RECORD2下储存,使我们得到以下的输出:

I am trying to make the way the grouping is performed as flexible as possible. The simplest for of grouping would to do it for Record1 and Record2 with the Value1 and Value2 stored under Record2 so that we get the following output:

Record1
    Record2
        Value1 Value2

这将是:

A
    XX
        22,33
        777,888
    YY
        33,11
        13,23
B
    XX
        12,0
    YY
        44,98 

我在我的存储组设置的列表为present - 我不知道这是阻碍我的想法。这个列表包含组的层次结构,例如:

I am storing my group settings in a List at present - which I don't know if this is hindering my thoughts. This list contains a hierarchy of the groups for example:

Record1 (SchemaGroup)
    .column = Record1
    .columns = null
    .childGroups =
        Record2 (SchemaGroup)
            .column = Record1
            .columns = Value1 (CSVColumnInformation), Value2 (CSVColumnInformation)
            .childGroups = null

在code这个样子如下:

The code for this looks like as follows:

private class SchemaGroup {
    private SchemaGroupType type = SchemaGroupType.StaticText;  // default to text
    private String text;
    private CSVColumnInformation column = null;
    private List<SchemaGroup> childGroups = new ArrayList<SchemaGroup>();
    private List<CSVColumnInformation> columns = new ArrayList<CSVColumnInformation>();
}


private enum SchemaGroupType {
    /** Allow fixed text groups to be added */
    StaticText,
    /** Related to a column with common value */
    ColumnGroup
}

我有点吃力产生算法对于这一点,冥思苦想的底层结构来使用。在present我解析的CSV从上到下,用我自己的包装类:

I am stuggling producing an algorithm for this, trying to think of the underlying structure to use. At present I am parsing the CSV top to bottom, using my own wrapper class:

CSVParser csv = new CSVParser(content);
String[] line;
while((line = csv.readLine()) != null ) {
    ...
}

我只是想踢启动我的编码大脑。

I am just trying to kick start my coding brain.

有什么想法?

推荐答案

其基本思想是并不困难:组的第一个记录,然后通过第二个记录,等等,直到你得到这样的:

The basic idea isn't difficult: group by the first record, then by the second record, etc. until you get something like this:

(A,XX,22,33)
(A,XX,777,888)
-------------------------
(A,YY,33,11)
(A,YY,13,23)
=============
(B,XX,12,0)
-------------------------
(B,YY,44,98)

,然后倒推来构建树。

and then work backwards to build the trees.

不过,有一个递归的组件,使得它有点难以原因这个问题,或显示它一步一步,所以它实际上更容易编写伪code。

However, there is a recursive component that makes it somewhat hard to reason about this problem, or show it step by step, so it's actually easier to write pseudocode.

我假设您的CSV的每一行重新psented像一个元组$ P $。每个元组都有记录和价值,用你提问中使用相同的术语。 史记是必须被放入一个层次结构的东西。 值将是树的叶子。我将使用语录,当我使用这些术语与这些特定的含义。

I'll assume that every row in your csv is represented like a tuple. Each tuple has "records" and "values", using the same terms you use in your question. "Records" are the things that must be put into a hierarchic structure. "Values" will be the leaves of the tree. I'll use quotations when I use these terms with these specific meanings.

我还以为所有的记录来之前所有的价值。

I also assume that all "records" come before all "values".

事不宜迟,在code:

Without further ado, the code:

// builds tree and returns a list of root nodes
// list_of_tuples: a list of tuples read from your csv
// curr_position: used to keep track of recursive calls
// number_of_records: assuming each csv row has n records and then m values, number_of_records equals n
function build_tree(list_of_tuples, curr_position, number_of_records) {
    // check if we have already reached the "values" (which shouldn't get converted into trees)
    if (curr_position == number_of_records) {
        return list of nodes, each containing a "value" (i.e. everything from position number_of_records on)
    }

    grouped = group tuples in list_of_tuples that have the same value in position curr_position, and store these groups indexed by such common value
    unique_values = get unique values in curr_position

    list_of_nodes = empty list

   // create the nodes and (recursively) their children
    for each val in unique_values {
        the_node = create tree node containing val
        the_children = build_tree(grouped[val], curr_position+1, number_of_records)
        the_node.set_children(the_children)

        list_of_nodes.append(the_node)
    }

    return list_of_nodes
}

// in your example, this returns a node with "A" and a node with "B"
// third parameter is 2 because you have 2 "records"
build_tree(list_parsed_from_csv, 0, 2)

现在你不得不考虑具体的数据结构来使用,但希望你理解算法(你提到,我认为决定一个数据结构,早期可能是阻碍这应该不会太困难你的想法)。

Now you'd have to think about the specific data structures to use, but hopefully this shouldn't be too difficult if you understand the algorithm (as you mention, I think deciding on a data structure early on may have been hindering your thoughts).

这篇关于生成CSV树结构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆