C移至内存部分就地 [英] C move memory parts inplace

查看:95
本文介绍了C移至内存部分就地的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我采取的几个数据结构和一个原始的我想用如下:我有一个内存块A [N](它有一个可变长度的,而是我拿100对我的例子),这一块里面,有的长度K一小部分C(可以说30),我要在不使用任何额外的内存移动。

的额外困难是,使A包装,即,C可在A [80]启动,然后C的第一20个元素是元素A [80..100]和最近10个元件的元素a [0..10]。此外,目标范围也可能包裹,并与下在任何可能的方式重叠。此外,我不希望比其他内存恒定的量,以使用更多的,一切都应该出现的地方。此外,这部分既不在目标范围内,也不在源范围可能包含一些重要的东西,所以它也不能被使用。因此,一种情况是以下内容:

有一个看起来是这样的:

| 456789ABCDEF0123456789AB | ----- | 0123 |

和应转化为这样的:

| 89AB | ----- | 0123456789ABCDEF01234567 |

只是委派到图书馆或使用另一种数据结构从图书馆不是一个不错的选择,我想了解自己的问题。在第一眼看到,我认为这可能不是小事,但只要你区分少数情况下,很明显,现在我有很大的麻烦。当然也有琐碎的情况下,如果他们不重叠或不换行,但至少如果这两个发生在同一时间,它就会变得混乱。你可以从一个自由的地方移动属于那里的一部分,但你在其他地方创造另一个自由的一部分,它变得很难跟踪哪个部位,你可以STIL使用。

也许我失去了一些东西完全,但即使我如果目标范围内不换特例,有近100线(它的一半是断言和评论,虽然)我可以更新,以便它也能处理一般情况下有一些额外的指标计算,但如果有人有一个优雅的和短的解决方案,我想AP preciate一些帮助。直觉上,我认为这应该以某种方式是微不足道的,但我没有看到最好的解决办法呢。

注:有趣的案例,当然,如果C是几乎一样大A.如果| C | < N / 2,这是微不足道的。

编辑:使用一个以上的其他标志恒定量/指数算作额外的内存和我想避免这种可能的话

编辑:一些人想看看我的code。我的问题是相当抽象的,所以我没有要发布它,但也许有人认为如何改进它。这是可怕的,它仅适用于该目标开始之初的情况下(但是,可以很容易地改变),非常长,但它的工作而无需额外的内存为O(n)。

 的#include< STDDEF.H>
#包括< stdio.h中>
#包括< string.h中>
#包括< ASSERT.H>

无效move_part(INT * A,为size_t N,为size_t目标,为size_t源,为size_t大小,诠释show_steps)
{
  断言(来源+尺寸与LT = N);
  断言(目标+尺寸与LT = N);
  如果(show_steps){
    的printf(移动的大小%d在%D到%d \ñ。,尺寸,来源,目标);
  }
  memmove与(A +的目标,A +来源,尺寸*的sizeof(INT));
}

无效swap_parts(INT * A,为size_t N,为size_t first_begin,为size_t second_begin,为size_t大小,诠释show_steps)
{
  如果(show_steps){
    的printf(交换大小%D在%D和%D \ñ。,尺寸,first_begin,second_begin);
  }
  断言(first_begin +尺寸与LT = N);
  断言(second_begin +尺寸与LT = N);
  为size_t我;
  对于(i = 0; I<大小; ++ I){
    INT X = A [first_begin + 1];
    A [first_begin + I] = A [second_begin + 1];
    A [second_begin +我= X;
  }
}

无效move_to_beginning(INT * A,为size_t N,SIZE_T开始,为size_t大小,诠释show_steps)
{
  断言(开始< = N);
  断言(大小与LT = N);
  //表示我们的工作范围的开始。在增加
  //算法和变为N
  为size_t part_start = 0;
  //注意:保留大小是至关重要的,因为开始==到底能
  //表示范围为空或满。
  为size_t结束=(开始+大小)%N;
  而(part_start!= N){
    为size_t我;
    如果(show_steps){
      对于(I = 0; I&所述N; ++ⅰ){
    的printf(%D,A [1]);
      }
      的printf(\ N);
      的printf(part_start%D开始%D结束%D尺寸%D \ N,part_start,开始,结束,大小);
    }
    //循环不变
    断言(part_start n种);
    //这两个指针是在我们的范围内
    断言(part_start< =开始和放大器;&安培;开始< = N);
    断言(part_start< =结束和放大器;&安培;末< = N);
    //尺寸是有效的(包裹的情况下,非空,非满的情况下)
    断言(开始< =结束||(N  - 开始)+(完 -  part_start)==大小);
    //尺寸是有效的(非包裹的情况下,非空,非满的情况下)
    断言(开始> = ||年底结束 - 开始==大小);
    //大小是有效的(工作范围为满或空的情况下)
    断言(开始=结束||大小== 0 || part_start +大小== N!);
    如果(大小== 0 ||开始==ñ||开始== part_start){
      // ## | 1234 |# - > 1234 ### ||
      如果(show_steps){
    的printf(案例1:\ nTerminating \ N);
      }
      //#||# - > ## ||
      // 12 | ## | - > 12 ## ||
      // | 12 | ##  - > 12 ## ||
      打破;
      / *没必要了,而是将是正确的转变:
     part_start = N;
     开始= N;
     结束= N;
     大小= 0; * /
    }否则,如果(完== part_start){
      // | ## | 123  - > ## | 123 |
      如果(show_steps){
    的printf(案例2:\ N);
    的printf(设置结束到%d \ñ,N);
      }
      结束= N;
    }否则如果(开始<结束){
      // ## | 1234 |# - > 1234 ### ||
      如果(show_steps){
    的printf(案例三:\ N);
      }
      move_part(A,N,part_start,首先,大小,show_steps);
      打破;
      / *没必要了,而是将是正确的转变:
     part_start = N;
     开始= N;
     结束= N;
     大小= 0; * /
    } 其他 {
      为size_t end_size =结束 -  part_start;
      为size_t begin_size = N  - 开始;
      断言(begin_size + end_size ==大小);
      如果(end_size> = begin_size){
    // 345 |#| 12  - > 12 5 |#| 34
    如果(show_steps){
      的printf(案例4:\ N);
    }
    swap_parts(A,N,part_start,首先,begin_size,show_steps);
    断言(begin_size大于0); //必要取得进展
    part_start + = begin_size;
    大小= end_size;
    //开始,结束维持不变
      }否则,如果(开始 -  part_start< = begin_size){
    // 56 |#| 1234  - > 123 56 |#| 4
    为size_t size_moved =开始 -  part_start;
    断言(size_moved> = end_size); //否则下一步将是更有效
    如果(show_steps){
      的printf(案例5 \ N);
    }
    swap_parts(A,N,part_start,首先,end_size,show_steps);
    move_part(A,N,结束,开始+ end_size,开始 - 结束,show_steps);
    断言(end_size +(开始 - 结束)== size_moved);
    大小 -  = size_moved;
    part_start =开始;
    首先+ = size_moved;
    到底+ = size_moved;
      }否则如果(end_size< = begin_size){
    // 45 | ## | 123  - > 123#| 45 |#
    如果(show_steps){
      的printf(案例6 \ N);
    }
    swap_parts(A,N,part_start,首先,end_size,show_steps);
    move_part(A,N,结束,开始+ end_size,begin_size  -  end_size,show_steps);
    part_start + = begin_size;
    大小= end_size;
    结束=开始+ end_size;
    //开始保持不变
      } 其他 {
    //不适用的情况下,这种情况不应该发生
    断言(0);
      }
    }
  }
}


诠释的main()
{
  INT N = 20;
  INT A [20];
  为size_t大小= 17;
  为size_t开始= 15;
  为size_t我;
  对于(i = 0; I<大小; ++ I){
    A [(开始+ I)%N] =我;
  }
  move_to_beginning(A,N,首先,大小,0);
  对于(i = 0; I<大小; ++ I){
    的printf(%D,A [1]);
  }
  的printf(\ N);
  返回0;
}
 

解决方案

案例1:源与目的在一个连续的区域,这比整个阵列小重叠最多

这种情况的详细解释见由R第一个答案..我没有什么要补充。

第2种情况:要么源与目标重叠在两个相邻的地区或我们旋转整个阵列

最简单的方法是永远转动整个阵列。这也从移动目标范围的一些不必要的元素,但因为在这种情况下, K> N / 2 ,这不会使操作次数多于两倍必要

要旋转阵列,使用周期组长算法:以数组的第一个元素(A [0]),并将其复制到目标位置; $ P $的这个位置再移动到合适的位置pvious内容;继续下去,直到一些元件被移动到起始位置。

继续应用周期领袖算法下一个起始位置:A [1],A [2],...,A [GCD(N,D) - 1],其中 D 是源和目标之间的距离。

GCD(N,D)步骤,所有的元素都在自己的正确位置上。这样做是因为:

  1. 在位置0,1,...,GCD(N,D) - 1属于不同的周期 - 因为所有这些数字都是不同的(模 GCD(N,D))。
  2. 在每个周期的长度为 N / GCD(N,D) - 因为 D / GCD(N,D) N / GCD(N,D)互质。

此算法简单,它的动作每个元素一次。这可能是线程安全的(如果我们跳过写入步骤,除非目标范围之内)。其他多线程相关的好处是,每个元素可能只有两个值 - 移动和值移动后(没有临时中间值可能)

在值

不过,这并不总是具有最佳的性能。如果 element_size * GCD(N,D)媲美的高速缓存行的大小,我们可能会采取一切 GCD(N,D)起始位置和处理它们在一起。如果该值过大时,我们可以拆分起始位置成若干个连续的段,以较低的空间要求回O(1)。

现在的问题是,当 element_size * GCD(N,D)比高速缓存行的大小要小得多。在这种情况下,我们得到了很多高速缓存未命中和性能下降的。 gusbro的想法暂时交换阵列件与一些交换区域(尺寸 D )的建议更有效的算法对于这种情况。它可以被优化更多的,如​​果我们使用交换区域,适合于高速缓存中,并复制不重叠的区域与memcpy的


还有一个算法。它不会覆盖不在目标范围的元素。它是高速缓存友好。唯一的缺点是:它的动作的每个元素恰好两次

的想法是将两个指针以相反的方向和交换指出的元件。这里是因为重叠区域只是颠倒重叠区域没有问题。在此之后算法的第一阶段,我们拥有所有的源元素移动到目标区间,但在相反的顺序。因此,第二遍要扭转目标范围:

 为(D = dst_start,S = src_end  -  1;
     D = dst_end!;
     D =(D + 1)%N,S =(S + N  -  1)%N)
  掉期(S,D);

为(D = dst_start,S = dst_end  -  1;
     D = dst_end!;
     D =(D + 1)%N,S =(S + N  -  1)%N)
  掉期(S,D);
 

I am implementing several datastructures and one primitive I want to use is the following: I have a memory chunk A[N] (it has a variable length, but I take 100 for my examples) and inside this chunk, there is a smaller part C of length K (lets say 30) which I want to move without using any additional memory.

The additional difficulty is, that A "wraps", that is, C can start at A[80] and then the first 20 elements of C are the elements A[80..100] and the last 10 elements are the elements A[0..10]. Furthermore, the target range may also "wrap" and overlap with C in any possible way. Additionally, I don't want to use more than a constant amount of additional memory, everything should happen in place. Also, the part of A which is neither in the target range nor in the source range may contain something important, so it cannot be used either. So one case would be the following:

A looks like this:

|456789ABCDEF0123456789AB|-----|0123|

And should be transformed to this:

|89AB|-----|0123456789ABCDEF01234567|

Just delegating it to a library or use another datastructure from a library is not an option here, I want to understand the problem myself. On the first sight, I thought that it might not be trivial, but as soon as you distinguish a few cases, it becomes clear, but now I am having serious trouble. Of course there are the trivial cases if they don't overlap or don't wrap, but at least if both happens at the same time, it gets messy. You could start with one free place and move the part that belongs there, but then you create another free part somewhere else and it gets hard to keep track of which parts you can stil use.

Maybe I am missing something completely, but even my special case if the target range does not wrap has almost 100 lines (half of it are assertions and comments, though) and I could update it so that it also handles the general case with some additional index calculations, but if someone has an elegant and short solution, I would appreciate some help. Intuitively I think that this should somehow be trivial, but I just don't see the best solution yet.

Note: The interesting case is of course, if C is almost as big as A. If |C| < N/2, it is trivial.

edit: Using more than a constant amount of additional flags/indices counts as additional memory and I want to avoid that if possible.

edit: Some people wanted to see my code. My question is rather abstract, so I didn't want to post it, but maybe someone sees how to improve it. It is terrible, it only works for the case that the target starts at the beginning (however, that can easily be changed) and terribly long, but it does the job without additional memory in O(n).

#include <stddef.h>
#include <stdio.h>
#include <string.h>
#include <assert.h>

void move_part(int* A, size_t N, size_t target, size_t source, size_t size, int show_steps)
{
  assert(source + size <= N);
  assert(target + size <= N);
  if (show_steps) {
    printf("Moving size %d from %d to %d.\n", size, source, target);
  }
  memmove(A + target, A + source, size * sizeof(int));
}

void swap_parts(int* A, size_t N, size_t first_begin, size_t second_begin, size_t size, int show_steps)
{
  if (show_steps) {
    printf("Swapping size %d at %d and %d.\n", size, first_begin, second_begin);
  }
  assert(first_begin + size <= N);
  assert(second_begin + size <= N);
  size_t i;
  for (i = 0; i < size; ++i) {
    int x = A[first_begin + i];
    A[first_begin + i] = A[second_begin + i];
    A[second_begin + i] = x;
  }
}

void move_to_beginning(int* A, size_t N, size_t begin, size_t size, int show_steps)
{
  assert(begin <= N);
  assert(size <= N);
  // Denotes the start of our "working range". Increases during
  // the algorithm and becomes N
  size_t part_start = 0;
  // Note: Keeping the size is crucial since begin == end could
  // mean that the range is empty or full.
  size_t end = (begin + size) % N;
  while (part_start != N) {
    size_t i;
    if (show_steps) {
      for (i = 0; i < N; ++i) {
    printf("%d ", A[i]);
      }
      printf("\n");
      printf("part_start %d  begin %d  end %d  size %d\n", part_start, begin, end, size);
    }
    // loop invariants
    assert(part_start < N);
    // The two pointers are in our range
    assert(part_start <= begin && begin <= N);
    assert(part_start <= end && end <= N);
    // size is valid (wrapped case, non-empty, non-full case)
    assert(begin <= end || (N - begin) + (end - part_start) == size);
    // size is valid (non wrapped case, non-empty, non-full case)
    assert(begin >= end || end - begin == size);
    // size is valid (working range is full or empty case)
    assert(begin != end || size == 0 || part_start + size == N);
    if (size == 0 || begin == N || begin == part_start) {
      // ##|1234|# -> 1234### ||
      if (show_steps) {
    printf("Case 1:\nTerminating\n");
      }
      // #||# -> ## ||
      // 12|##| -> 12## ||
      // |12|## -> 12## ||
      break;
      /* Not necessary any more, but would be the correct transformation:
     part_start = N;
     begin = N;
     end = N;
     size = 0;*/
    } else if (end == part_start) {
      // |##|123 -> ##|123|
      if (show_steps) {
    printf("Case 2:\n");
    printf("Setting end to %d.\n", N);
      }
      end = N;
    } else if (begin < end) {
      // ##|1234|# -> 1234### ||
      if (show_steps) {
    printf("Case 3:\n");
      }
      move_part(A, N, part_start, begin, size, show_steps);
      break;
      /* Not necessary any more, but would be the correct transformation:
     part_start = N;
     begin = N;
     end = N;
     size = 0;*/
    } else {
      size_t end_size = end - part_start;
      size_t begin_size = N - begin;
      assert(begin_size + end_size == size);
      if (end_size >= begin_size) {
    // 345|#|12 -> 12 5|#|34
    if (show_steps) {
      printf("Case 4:\n");
    }
    swap_parts(A, N, part_start, begin, begin_size, show_steps);
    assert(begin_size > 0); // Necessary for progress
    part_start += begin_size;
    size = end_size;
    // begin, end remain unchanged
      } else if (begin - part_start <= begin_size) {
    // 56|#|1234 -> 123 56|#|4
    size_t size_moved = begin - part_start;
    assert(size_moved >= end_size); // else the next step would be more efficient
    if (show_steps) {
      printf("Case 5\n");
    }
    swap_parts(A, N, part_start, begin, end_size, show_steps);
    move_part(A, N, end, begin + end_size, begin - end, show_steps);
    assert(end_size + (begin - end) == size_moved);
    size -= size_moved;
    part_start = begin;
    begin += size_moved;
    end += size_moved;
      } else if (end_size <= begin_size) {
    // 45|##|123 -> 123 #|45|# 
    if (show_steps) {
      printf("Case 6\n");
    }
    swap_parts(A, N, part_start, begin, end_size, show_steps);
    move_part(A, N, end, begin + end_size, begin_size - end_size, show_steps);
    part_start += begin_size;
    size = end_size;
    end = begin + end_size;
    // begin remains unchanged
      } else {
    // No case applies, this should never happen
    assert(0);
      }
    }
  }
}


int main()
{
  int N = 20;
  int A[20];
  size_t size = 17;
  size_t begin = 15;
  size_t i;
  for (i = 0; i < size; ++i) {
    A[(begin + i) % N] = i;
  }
  move_to_beginning(A, N, begin, size, 0);
  for (i = 0; i < size; ++i) {
    printf("%d ", A[i]);
  }
  printf("\n");
  return 0;
}

解决方案

Case 1: Source overlaps with destination at most in a single contiguous region, which is smaller than whole array

Detailed explanation of this case is given in the first answer by R.. I've nothing to add here.

Case 2: Either source overlaps with destination in two contiguous regions or we rotate whole array

The easiest approach would be always rotate whole array. This also moves some unneeded elements from destination range, but since in this case K > N/2, this does not make number of operations more then twice as necessary.

To rotate the array, use cycle leader algorithm: take first element of the array (A[0]) and copy it to destination position; previous contents of this position move again to its proper position; continue until some element is moved to the starting position.

Continue applying the cycle leader algorithm for next starting positions: A[1], A[2], ..., A[GCD(N,d) - 1], where d is the distance between source and destination.

After GCD(N,d) steps, all elements are on their proper positions. This works because:

  1. Positions 0, 1, ..., GCD(N,d) - 1 belong to different cycles - because all these numbers are different (modulo GCD(N,d)).
  2. Each cycle has length N / GCD(N,d) - because d / GCD(N,d) and N / GCD(N,d) are relatively prime.


This algorithm is simple and it moves each element exactly once. It may be made thread-safe (if we skip the write step unless inside the destination range). Other multi-threading-related advantage is that each element may have only two values - value before "move" and value after "move" (no temporary in-between values possible).

But it does not always have optimal performance. If element_size * GCD(N,d) is comparable to cache line size, we might take all GCD(N,d) starting positions and process them together. If this value is too large, we can split starting positions into several contiguous segments to lower space requirements back to O(1).

The problem is when element_size * GCD(N,d) is much smaller than cache line size. In this case we get a lot of cache misses and performance degrades. gusbro's idea to temporarily swap array pieces with some "swap" region (of size d) suggests more efficient algorithm for this case. It may be optimized more if we use "swap" region, that fits in the cache, and copy non-overlapped areas with memcpy.


One more algorithm. It does not overwrite elements that are not in the destination range. And it is cache-friendly. The only disadvantage is: it moves each element exactly twice.

The idea is to move two pointers in opposite directions and swap pointed elements. There is no problem with overlapping regions because overlapping regions are just reversed. After first pass of this algorithm, we have all source elements moved to destination range, but in reversed order. So second pass should reverse destination range:

for (d = dst_start, s = src_end - 1;
     d != dst_end;
     d = (d + 1) % N, s = (s + N - 1) % N)
  swap(s, d);

for (d = dst_start, s = dst_end - 1;
     d != dst_end;
     d = (d + 1) % N, s = (s + N - 1) % N)
  swap(s, d);

这篇关于C移至内存部分就地的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆