使用TraMineR计算序列距离时出现大数据(?)问题 [英] Problem with big data (?) during computation of sequence distances using TraMineR

查看：135 发布时间：2020/7/11 18:59:40 r traminer

本文介绍了使用TraMineR计算序列距离时出现大数据(?)问题的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用TraMineR运行最佳匹配分析，但是似乎我遇到了数据集大小的问题.我有一个包含就业法则的欧洲国家大数据集.我有57,000多个序列，这些序列长48个单位，由9个不同的州组成. 为了让您有一个分析的思路，这里是序列对象employdat.sts的头部:

I am trying to run an optimal matching analysis using TraMineR but it seems that I am encountering an issue with the size of the dataset. I have a big dataset of European countries which contains employment spells. I have more than 57,000 sequences which are 48 units long and consist of 9 distinct states. In order to get an idea of the analysis, here is the head of sequence object employdat.sts:

[1] EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-...  
[2] EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-...  
[3] ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-...  
[4] ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-...  
[5] EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-EF-...  
[6] ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-ST-...

在较短的SPS格式中，内容如下:

In a shorter SPS format, this reads as follows:

Sequence               
[1] "(EF,48)"              
[2] "(EF,48)"              
[3] "(ST,48)"              
[4] "(ST,36)-(MS,3)-(EF,9)"
[5] "(EF,48)"              
[6] "(ST,24)-(EF,24)"

将此序列对象传递给seqdist()函数后，我收到以下错误消息:

After passing this sequence object to the seqdist() function, I get the following error message:

employdat.om <- seqdist(employdat.sts, method="OM", sm="CONSTANT", indel=4)    
[>] creating 9x9 substitution-cost matrix using 2 as constant value  
[>] 57160 sequences with 9 distinct events/states  
[>] 12626 distinct sequences  
[>] min/max sequence length: 48/48  
[>] computing distances using OM metric  
Error in .Call(TMR_cstringdistance, as.integer(dseq), as.integer(dim(dseq)),  : negative length vectors are not allowed

此错误与大量不同的长序列有关吗?我正在使用具有4GB RAM的x64机器，并且还在具有8GB RAM的机器上尝试了该机器，该机器再现了错误消息.有人知道解决此错误的方法吗? 此外，使用相同的语法对每个国家/地区进行分析，并为该国家/地区建立索引，效果很好，并且产生了有意义的结果.

Is this error related to the huge number of distinct, long sequences? I am using a x64-machine with 4GB RAM and I have also tried it on a machine with 8-GB RAM which reproduced the error message. Does someone know a way to tackle this error? Besides, analyses for each single country using the same syntax with an index for the country worked well and produced meaningful results.

使用TraMineR计算序列距离时出现大数据(?)问题 [英] Problem with big data (?) during computation of sequence distances using TraMineR

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用TraMineR计算序列距离时出现大数据(?)问题 [英] Problem with big data (?) during computation of sequence distances using TraMineR

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭