在fortran中查找重复记录 [英] finding duplicate records in fortran

查看：250 发布时间：2018/3/16 17:37:58 sorting fortran grouping

本文介绍了在fortran中查找重复记录的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个数据大约一百万条记录，每条记录有6个浮点数。我想查找具有相同六个值的记录集合，理想情况下我想在Fortran中执行此操作，因为剩下的处理过程是在Fortran中完成的。对此的推荐方法是什么？最后，我想从原始索引映射到新索引，这是这些数据集的精简版本，而不会有重复。每条记录都有其他属性，我有兴趣根据这六个属性来聚合这些组。

我试图通过将输出导出为csv来查找这些集合，并将其导入到MS Access，那么发现这些集合的查询需要10秒左右才能运行。我写了一个代码 http://rosettacode.org/wiki/Remove_duplicate_elements#Fortran this（线性搜索？），但百万记录在10分钟左右没有完成，我刚刚放弃了这种方法。

我现在想的方法是调整排名/排序例程 slatec 或订单包，我认为这比我的原始代码更好。但我想知道这样的事情是否已经完成，我可以下载，或者如果有更好的方法。

编辑： 从原始数据记录映射到这个缩减集。我想要像imap（1：n）这样的映射数组，其中imap（1），imap（4），imap（5）具有相同的值，如果这些浮点数是pt。原始记录1,4和5中的值是相同的。希望这不会太偏离我原先所说的......

解决方案

这就是我最终做的事情。。我从 ORDERPACK 取得了代码 mrgrnk ，并适合我的目的。下面的子程序 findmap 似乎正在做我想做的事。

  module fndmap 
 use m_mrgrnk，only：mrgrnk 
 implicit none 
包含
子程序findmap（stkprm，stkmap）
！给出二维真实数组stkprm，找到如下所示的映射：
！ 
！ （相同的记录分配有相同的索引）
！ stkmap（i）== stkmap（j）iff stkprm（：，i）== stkprm（：，j）
！ （订单保存）
！如果我< j和stkmap（i）/ = stkmap（j），则stkmap（i）< stkmap（j）
！ （新索引是连续的）
！ set（stkmap）== {1,2，..，maxval（stkmap）} 
！ 
 real，dimension（：，:)，intent（in）:: stkprm 
整数，维（:)，intent（out）:: stkmap 
整数，维（size（stkprm， 2））:: irngt 
 integer，dimension（size（stkprm，2））:: iwork 
 integer :: nrec，i，j 
 nrec = size（stkprm，2）
！查找每条记录的等级，重复记录保存
调用ar_mrgrnk（stkprm，irngt）
 
！构建iwork数组，其中包含原始数组的索引，其中
！记录是相同的，并且索引是youguest 
i = 1 
 do（i <= nrec）
 do j = i + 1，nrec 
 if（any（stkprm ，irngt（i））/ = stkprm（：，irngt（j））））exit 
 enddo 
 iwork（irngt（i：j-1））= minval（irngt（i：j-1 ））
i = j 
 enddo 
 
！现在构建地图，其中stkmap（i）显示新数组
的索引！如果（i == iwork（i））然后
j = j + 1 $ b $保留原始订单
j = 0 
 do i = 1，nrec 
 b stkmap（i）= j 
 else 
 stkmap（i）= stkmap（iwork（i））
 endif 
 enddo 
结束子程序
 
递归子程序ar_mrgrnk（xdont，irngt）
！就像ORDERPACK的mrgrnk一样，除了数组是2-b 
！每行按第一个字段排序，然后第二个等等
 real，dimension（:, :)，intent（in）:: xdont 
整数，维度（:)，intent（out），目标:: irngt 
 integer，dimension（size（xdont，2））:: iwork 
 
整数:: nfld，nrec 
整数:: i，j 
整数，dimension（:)，pointer :: ipt 
 
 nfld = size（xdont，1）
 nrec = size（xdont，2）
 
！排名第一场
打电话mrgrnk（xdont（1，:)，irngt）
 
！如果只有一个字段，如果（nfld == 1）返回
 
，则完成
！检查级别以查看多个记录是否具有相同的
！ （1，irngt（i））的第一场
i = 1 
的值为（i <= nrec）
 do j = i + 1，nrec 
 / = xdont（1，irngt（j）））退出
 enddo 
！如果一对一，如果（j-1> i）然后
，则不做任何
！如果多对一，
！收集那些许多，然后排列他们
调用ar_mrgrnk（xdont（2 :, irngt（i：j-1）），iwork）
！根据这些字段重新排列我的排名到右边
 ipt => irngt（i：j-1）
 ipt = ipt（iwork（1：ji））
 endif 
i = j 
 enddo 
 if（associated（ipt）） nullify（ipt）
结束子程序
结束模块

I have a data approximately a million record, each record have 6 floating point number. I want to find sets of records who share identical six values, and ideally I want to do it in Fortran since the rest of processing is done in Fortran. What would be the recommended approach for this? At the end i want to have mapping from original index to new index which is condensed version of these dataset without duplicate. Each record has other attributes and i am interested in aggregating those for groups based on the six attributes.

I tried to find those sets by exporting output as csv, import it into MS Access, then a query that finds those sets took 10 seconds or so to run. I wrote a code which does http://rosettacode.org/wiki/Remove_duplicate_elements#Fortran this ("linear search"?), but with million record it didnt complete after 10 min or so, i just abandoned this approach.

Approach I am thinking now is adapting ranking/sorting routine from slatec or orderpack which i assume do better than my crude code. But I am wondering if such things are already done and i can download, or if there is better approach for this.

EDIT:

I said "finding duplicate", but i actually need mapping from original data records to this reduced sets. I want to have mapping array like imap(1:n), where imap(1), imap(4), imap(5) has same values if those 6 float pt. values in original record 1, 4 and 5 are the same. Hope this is not too much a deviation from what I said originally...
解决方案
This is what I ended up doing... I took code mrgrnk from ORDERPACK , and adapted for my purpose. The subroutine findmap below appears to be doing what I wanted it to do.
module fndmap use m_mrgrnk, only:mrgrnk implicit none contains subroutine findmap(stkprm, stkmap ) ! given 2-d real array stkprm, find a mapping described below: ! ! (identical records are assigned with same index) ! stkmap(i) == stkmap(j) iff stkprm(:,i) == stkprm(:,j) ! (order conserved) ! if i < j and stkmap(i) /= stkmap(j), then stkmap(i) < stkmap(j) ! (new index are contiguous) ! set(stkmap) == {1,2,..,maxval(stkmap)} ! real,dimension(:,:),intent(in) :: stkprm integer,dimension(:), intent(out) :: stkmap integer, dimension(size(stkprm,2)) :: irngt integer, dimension(size(stkprm,2)) :: iwork integer :: nrec, i, j nrec = size(stkprm,2) ! find rank of each record, duplicate records kept call ar_mrgrnk(stkprm, irngt) ! construct iwork array, which has index of original array where the ! record are identical, and the index is youguest i = 1 do while(i<=nrec) do j=i+1,nrec if (any(stkprm(:,irngt(i))/=stkprm(:,irngt(j)))) exit enddo iwork(irngt(i:j-1)) = minval(irngt(i:j-1)) i = j enddo ! now construct the map, where stkmap(i) shows index of new array ! with duplicated record eliminated, original order kept j = 0 do i=1,nrec if (i==iwork(i)) then j = j+1 stkmap(i) = j else stkmap(i) = stkmap(iwork(i)) endif enddo end subroutine recursive subroutine ar_mrgrnk(xdont, irngt) ! behaves like mrgrnk of ORDERPACK, except that array is 2-d ! each row are ranked by first field, then second and so on real, dimension(:,:), intent(in) :: xdont integer, dimension(:), intent(out), target :: irngt integer, dimension(size(xdont,2)) :: iwork integer :: nfld,nrec integer :: i, j integer, dimension(:), pointer :: ipt nfld=size(xdont,1) nrec=size(xdont,2) ! rank by the first field call mrgrnk(xdont(1,:), irngt) ! if there's only one field, it's done if (nfld==1) return ! examine the rank to see if multiple record has identical ! values for the first field i = 1 do while(i<=nrec) do j=i+1,nrec if (xdont(1,irngt(i))/=xdont(1,irngt(j))) exit enddo ! if one-to-one, do nothing if (j-1>i) then ! if many-to-one, ! gather those many, and rank them call ar_mrgrnk(xdont(2:,irngt(i:j-1)),iwork) ! rearrange my rank based on those fields to the right ipt => irngt(i:j-1) ipt = ipt(iwork(1:j-i)) endif i = j enddo if(associated(ipt)) nullify(ipt) end subroutine end module

这篇关于在fortran中查找重复记录的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在fortran中查找重复记录 [英] finding duplicate records in fortran

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在fortran中查找重复记录 [英] finding duplicate records in fortran

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭