将制表符/空格分隔的行转换为嵌套数组 [英] convert tab/space delimited lines into nested array

查看:85
本文介绍了将制表符/空格分隔的行转换为嵌套数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将下面的文本转换成一个嵌套数组,就像使用MPTT数据库结构一样.

I would like convert the below text into a nested array, something like you would get with MPTT database structure.

我正在从Shell脚本获取数据,需要将其显示在网站上.对格式:/

I am getting the data from a shell script and need to display it on a website. Don't have any control over the format :/

关于数组->列表的信息很多,反之则不多.

There is lots of information about array -> list, but not much going the other way.

感谢您的任何投入.

cat, true cat
       => domestic cat, house cat, Felis domesticus, Felis catus
           => kitty, kitty-cat, puss
           => mouser
           => alley cat
           => tom, tomcat
               => gib
           => Angora, Angora cat
           => Siamese cat, Siamese
               => blue point Siamese
       => wildcat
           => sand cat
           => European wildcat, catamountain, Felis silvestris
           => cougar, puma, catamount, mountain lion, painter, panther, Felis concolor
           => ocelot, panther cat, Felis pardalis
           => manul, Pallas's cat, Felis manul
           => lynx, catamount
               => common lynx, Lynx lynx
               => Canada lynx, Lynx canadensis

推荐答案

您已经在此处获得了排序的树列表.接下来的每一行都是前一行的子级或同级.因此,您可以处理列表,获取项目的名称,通过缩进获得项目所在的级别,并从中创建一个元素.

You have got a sorted tree list here already. Each next line is either a child of the previous line or a sibling. So you can process over the list, get the name of an item, gain the level an item is in by it's indentation and create an element out of it.

1 Line <=> 1 Element (level, name)

因此,每个元素都有一个名称和零个或多个子代.从输入中还可以说出它属于哪个级别.

So every element has a name and zero or more children. From the input it can also be said which level it belongs to.

一个元素可以表示为一个数组,其中第一个值是名称,第二个值是子元素的数组.

An element can be represented as an array, in which it's first value is the name and the second value is an array for the children.

在对列表进行排序时,我们可以使用一个简单的映射,每个级别是某个级别的子代的别名.因此,随着每个元素具有的级别,我们可以将其添加到堆栈中:

As the list is sorted, we can use a simple map, which per level is an alias to the children of a certain level. So with the level each element has, we can add it to the stack:

    $self = array($element, array());
    $stack[$level][] = &$self;
    $stack[$level + 1] = &$self[1];

如该代码示例所示,添加子级后,当前级别的堆栈/映射将变为$self:

As this code-example shows, the stack/map for the current level is getting $self as children added:

    $stack[$level][] = &$self;

较高一级的堆栈,获得对$self(索引1)的子代的引用:

The stack for the level one higher, get's the reference to the children of $self (index 1):

    $stack[$level + 1] = &$self[1];

因此,现在每一行中,我们需要查找级别.如该堆栈所示,该级别按顺序编号:0, 1, 2, ...,但在输入中只是多个空格.

So now per each line, we need to find the level. As this stack shows, the level is sequentially numbered: 0, 1, 2, ... but in the input it's just a number of spaces.

一个小的辅助对象可以完成将字符串中的字符数收集/分组为多个级别的工作,请注意-如果缩进级别还不存在,则会添加该级别,但前提是更高级别.

A little helper object can do the work to collect/group the number of characters in a string to levels, taking care that - if a level yet does not exist for an indentation - it is added, but only if higher.

这解决了在您的输入中缩进大小与其索引之间没有1:1关系的问题.至少不是显而易见的.

This solves the problem that in your input there is no 1:1 relation between the size of the indentation and it's index. At least not an obvious one.

此辅助对象示例性地命名为Levels,并实现__invoke为缩进提供级别,同时在必要时透明地添加新级别:

This helper object is exemplary named Levels and implements __invoke to provide the level for an indent while transparently adding a new level if necessary:

$levels = new Levels();
echo $levels(''); # 0
echo $levels('    '); # 1
echo $levels('    '); # 1
echo $levels('      '); # 2
echo $levels(' '); # Throws Exception, this is smaller than the highest one

因此,现在我们可以将缩进转换为水平.该级别使我们可以运行堆栈.堆栈允许构建树.好吧.

So now we can turn indentation into the level. That level allows us to run the stack. The stack allows to build the tree. Fine.

可以使用正则表达式轻松进行逐行解析.由于我很懒,我只使用preg_match_all并按行返回缩进和名称.因为我想更加舒适,所以将它包装到一个函数中,该函数始终会返回一个数组,因此可以在迭代器中使用它:

The line by line parsing can be easily done with a regular expression. As I'm lazy, I just use preg_match_all and return - per line - the indentation and the name. Because I want to have more comfort, I wrap it into a function that does always return me an array, so I can use it in an iterator:

$matches = function($string, $pattern)
{
    return preg_match_all($pattern, $string, $matches, PREG_SET_ORDER)
        ? $matches : array();
};

使用具有类似模式的输入

Using on input with a pattern like

/^(?:(\s*)=> )?(.*)$/m

每行都会给我一个数组,即:

will give me an array per each line, that is:

array(whole_line, indent, name)

您在这里看到图案了吗?接近

You see the pattern here? It's close to

1 Line <=> 1 Element (level, name)

可以使用Levels对象进行映射,因此只需调用映射函数即可:

With help of the Levels object, this can be mapped, so just a call of a mapping function:

function (array $match) use ($levels) {
    list(, $indent, $name) = $match;
    $level = $levels($indent);
    return array($level, $name);
};

array(line, indent, name)array(level, name).为了使它可访问,这是由另一个可以插入Levels的函数返回的:

From array(line, indent, name) to array(level, name). To have this accessible, this is returned by another function where the Levels can be injected:

$map = function(Levels $levels) {
    return function ...
};
$map = $map(new Levels());

因此,一切都是为了从所有行读取.但是,这需要放置到树中.记住添加到堆栈中:

So, everything is in order to read from all lines. However, this needs to be placed into the the tree. Remembering adding to the stack:

function($level, $element) use (&$stack) {
    $self = array($element, array());
    $stack[$level][] = &$self;
    $stack[$level + 1] = &$self[1];
};

(此处是$element的名称).这实际上需要堆栈,而堆栈实际上是树.因此,让我们创建另一个函数,该函数返回该函数并允许将每一行压入堆栈以构建树:

($element is the name here). This actually needs the stack and the stack is actually the tree. So let's create another function that returns this function and allow to push each line onto the stack to build the tree:

$tree = array();
$stack = function(array &$tree) {
    $stack[] = &$tree;
    return function($level, $element) use (&$stack) {
        $self = array($element, array());
        $stack[$level][] = &$self;
        $stack[$level + 1] = &$self[1];
    };
};
$push = $stack($tree);

所以最后要做的就是只处理一个元素,然后处理另一个元素:

So the last thing to do is just to process one element after the other:

foreach ($matches($input, '/^(?:(\s*)=> )?(.*)$/m') as $match) {
    list($level, $element) = $map($match);
    $push($level, $element);
}

因此,现在使用给出的$input可以创建一个数组,它的第一级上只有(根)子节点,然后有一个array每个节点有两个条目:

So now with the $input given this creates an array, with only (root) child nodes on it's first level and then having an array with two entries per each node:

array(name, children)

这里的名字是一个字符串,子元素是一个数组.因此,这在技术上已经完成了对阵列/树的列表.但这相当麻烦,因为您还希望能够输出树结构.您可以通过执行递归函数调用或实现递归迭代器来实现.

Name is a string here, children an array. So this has already done the list to array / tree here technically. But it's rather burdensome, because you want to be able to output the tree structure as well. You can do so by doing recursive function calls, or by implementing a recursive iterator.

让我举一个递归迭代器示例:

Let me give an Recursive Iterator Example:

class TreeIterator extends ArrayIterator implements RecursiveIterator
{
    private $current;

    public function __construct($node)
    {
        parent::__construct($node);
    }

    public function current()
    {
        $this->current = parent::current();
        return $this->current[0];
    }

    public function hasChildren()
    {
        return !empty($this->current[1]);
    }

    public function getChildren()
    {
        return new self($this->current[1]);
    }
}

这只是一个数组迭代器(因为所有节点都是数组,以及所有子节点),对于当前节点,它返回名称.如果要求孩子,它将检查是否有孩子,并再次提供他们作为TreeIterator.这使得使用起来很简单,例如输出为文本:

This is just an array iterator (as all nodes are an array, as well as all child nodes) and for the current node, it returns the name. If asked for children, it checks if there are some and offers them again as a TreeIterator. That makes using it simple, e.g. outputting as text:

$treeIterator = new RecursiveTreeIterator(
    new TreeIterator($tree));

foreach ($treeIterator as $val) echo $val, "\n";

输出:

\-cat, true cat
  |-domestic cat, house cat, Felis domesticus, Felis catus
  | |-kitty, kitty-cat, puss
  | |-mouser
  | |-alley cat
  | |-tom, tomcat
  | | \-gib
  | |-Angora, Angora cat
  | \-Siamese cat, Siamese
  |   \-blue point Siamese
  \-wildcat
    |-sand cat
    |-European wildcat, catamountain, Felis silvestris
    |-cougar, puma, catamount, mountain lion, painter, panther, Felis concolor
    |-ocelot, panther cat, Felis pardalis
    |-manul, Pallas's cat, Felis manul
    \-lynx, catamount
      |-common lynx, Lynx lynx
      \-Canada lynx, Lynx canadensis

如果您正在寻找与递归迭代器结合使用的更多HTML输出控件,请参见以下问题,其中包含基于<ul><li>的HTML输出示例:

If you're looking for more HTML output control in conjunction with an recursive iterator, please see the following question that has an example for <ul><li> based HTML output:

  • How can I convert a series of parent-child relationships into a hierarchical tree? (Should as well have some other insightful information for you)

那么这一切看起来如何?作为 github上的要点立即查看的代码.

So how does this look like all together? The code to review at once as a gist on github.

这篇关于将制表符/空格分隔的行转换为嵌套数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆