计算二进制串的Lempel-Ziv(LZ)复杂度(也称为序列复杂度) [英] calculating Lempel-Ziv (LZ) complexity (aka sequence complexity) of a binary string

查看:1507
本文介绍了计算二进制串的Lempel-Ziv(LZ)复杂度(也称为序列复杂度)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要计算二进制字符串的LZ复杂度。 LZ复杂性是从流从开始到结束所遇到的差异子串的数量。例如:



s = 1001111011000010



在不同的子串中标记序列复杂度c 6:
s = 1/0/01/1110/1100/0010 /



有人可以指导我找到一个简单的解决方案吗?我相信应该有一些非常直接的实现这个着名的问题,但我很难找到它们。可以简单地通过构造后缀树或类似的东西来完成。如果是,究竟如何?我该怎么办?



任何人都知道任何c / c ++源代码来完成任务。



p>

以澄清答案中建议的树的结构。树是否像这样?

  o 
/ \
oo
/ \ / \
oooo
/ /
oo


解决方案

@Arash和@Sanchit Gupta:你可能会混淆LZ76的复杂性和LZ78的复杂性。一个Arash指的是LZ76的复杂性,另一个是LZ78的复杂性。你可以参考论文通过Lempel-Ziv复杂性估计峰值列车的熵率的第3节。


I need to calculate the LZ-complexity of a binary string. The LZ-complexity is the number of differencet substrings encountered as the stream is viewed from begining to the end. As an example:

s = 1001111011000010

Marking in the different substrings the sequence complexity c(s) = 6: s = 1 / 0 / 01 / 1110 / 1100 / 0010 /

can someone guide me to find a simple solution for that? I am sure there should be some very straight-forward implementations for this well-known problem, but I have difficulty finding them. Can it be done simply done with constructing a suffix tree or something similar. If yes, exactly how? and what should I do?

anyone knows of any c/c++ source code to accomplish the task?

thanks in advance.

to clarify the construction of the tree suggested in the answers. Does the tree looks like this?

         o
       /   \
      o     o
     / \   / \
    o   o o   o
       /     /
      o     o

解决方案

@Arash and @Sanchit Gupta: You might've got confused between LZ76 complexity and LZ78 complexity. The one Arash is refering to is LZ76 complexity and the other one is LZ78 complexity. You can refer to section-3 of the paper "Estimating the Entropy Rate of Spike Trains via Lempel-Ziv Complexity".

这篇关于计算二进制串的Lempel-Ziv(LZ)复杂度(也称为序列复杂度)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆