允许通过迭代和从子集中随机选择进行更改的数据结构(C ++) [英] Data Structure(s) Allowing For Alteration Through Iteration and Random Selection From Subset (C++)

查看:79
本文介绍了允许通过迭代和从子集中随机选择进行更改的数据结构(C ++)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定一个固定大小的对象数组A,假设这些对象的较小子集满足特定条件B.我想以大约相等的频率完成三个任务:

  1. 我希望能够在按索引访问A中的对象时随时更改当前不符合条件B的对象以符合条件B.
  2. 我希望能够通过索引访问A中的对象时,将当前满足条件B的对象更改为不再满足条件B.
  3. 我还希望能够仅从那些符合条件B的对象中选择一个随机对象.

所有任务应该能够在恒定时间内或尽可能接近恒定时间完成,不依赖于A中对象的数量,也不依赖于A中对象的数量.符合条件B 的对象.如果无法做到固定时间(我怀疑是这种情况),那么我要考虑到我前面提到的频率,尽快完成这两个过程.如果要重复执行这两次任务,哪种数据结构适合于这两项任务?

例如,以我下面的C ++实现为例.虽然定时部分(重复大量代码的代码部分)与A(总体)的整体大小无关,但时间复杂度线性地取决于B(bluetiles)(无论总体数量是否增加)或没有),严重降低了代码的速度.

#include <iostream>
#include <vector>
#include <chrono>
#include <cstdlib>
#include <algorithm>

using namespace std;

enum color {RED, GREEN, BLUE};
const int NUM_ATTEMPTS = 10000;
const int INITIAL_NUM_BLUE_TILES = 1000;
const int TOTAL_TILES = 1000000;

struct tile
{
  int color = RED;
};

struct room
{
  vector<tile> alltiles;
  vector<tile*> bluetiles;
  room(vector<tile> v) : alltiles(v) {}
};

int main()
{
  srand (time(NULL));

  // set up the initial room, time complexity here is irrelevant
  room myroom(vector<tile>(1*TOTAL_TILES));
  for(int i = 0; i < INITIAL_NUM_BLUE_TILES; i++)
  {
    myroom.alltiles[i].color = BLUE;
    myroom.bluetiles.push_back(&myroom.alltiles[i]);
  }

  auto begin = std::chrono::high_resolution_clock::now();
  for(int attempt_num = 0; attempt_num < NUM_ATTEMPTS; attempt_num++)
  {
    // access a BLUE tile by index from alltiles to change its color to RED
    myroom.alltiles[5].color = RED; // constant time
    myroom.bluetiles.erase(std::remove(myroom.bluetiles.begin(), myroom.bluetiles.end(), &myroom.alltiles[5]), myroom.bluetiles.end()); // linear time, oh no!

    // access a RED tile by index from alltiles to change its color to BLUE
    myroom.alltiles[5].color = BLUE; // constant time
    myroom.bluetiles.push_back(&myroom.alltiles[5]); // constant time

    // randomly choose from ONLY the blue tiles
    int rand_index = rand() % myroom.bluetiles.size(); // constant time
    myroom.bluetiles[rand_index]->color = GREEN; // constant time
    myroom.bluetiles[rand_index]->color = BLUE; // constant time
    // so now I have constant time access to a random blue tile

  }
  auto end = std::chrono::high_resolution_clock::now();
  double runtime = std::chrono::duration_cast<std::chrono::milliseconds>(end-begin).count();
  cout << runtime << " ms" << endl; 
  return 0;
}

正在计时的部分是我感兴趣的经常执行的操作;在实际程序中,选择更改哪些图块的逻辑不同.希望,更好的数据结构不需要任何概率分析,但我担心仍然需要.

我怀疑,也许通过在tile类中保留一个指针(指向bluetiles向量中的元素)来使用双重间接访问可能会使我在恒定时间内实现这一目标,但我不确定.我猜它至少可以从某种意义上加快搜索的速度,因为不再需要搜索bluetiles,但是移除bluetiles中的元素仍然是线性时间(因为我正在使用向量),所以我真的只是不知道该怎么办.

您能设计出最快的数据结构来实现此目标,并从我的示例中提供C ++实现的构建吗?还是我所能拥有的一切都会很好?

解决方案

更新:这类似于我针对SO问题

The portions that are being timed are the operations that I am interested in performing frequently; in the real program, the logic behind choosing which tiles to change differs. Hopefully, a better data structure wouldn't require any probabilistic analysis, but I fear it still might.

I suspect that, perhaps, using double indirection by keeping a pointer (pointing to elements in the bluetiles vector) in the tile class could maybe allow me to achieve this in constant time, but I'm not certain. I guess it could at least speed it up in the sense that searching bluetiles would no longer be necessary, but then removal of an element in bluetiles would still be linear time (since I'm using a vector), so I'm really just not sure what to do here.

Can you devise the fastest data structure to achieve this goal, and provide a C++ implementation building from my example? Or is what I have as good as it will ever get?

解决方案

UPDATE: This resembles the solution I propose for the SO question Random element from unordered_set in O(1)

You can implement something like the following SubsetVector<T> class which lets you to insert/remove element from the subset (i.e. mark them) in O(1). Then it lets you find the size of the subset in O(1), and access i-th item from this subset in O(1). I think this is what you wanted. Note that the subset does not guarantee any specific order, but this should be OK for your needs.

The idea is to maintain two vectors.

  1. m_entries that contains the actual data. m_entries[i] contains the element and and an index into m_subset_indices, if the element is in the subset, and -1 otherwise.
  2. m_subset_indices contains all indices of m_entries elements that are in the subset.

Here is the code (compiled but not tested):

template <class T>
class SubsetVector
{
private:
   struct Entry
   {
       T element;
       int index_in_subset = -1;
   };
public:
   explicit SubsetVector(unsigned size = 0) : m_entries(size) 
   {
       m_subset_indices.reserve(size);
   }

   void push_back(const T & element)
   {
       m_entries.push_back(Entry{element, -1});
   }
   const T & operator[](unsigned index) const { return m_entries[index].element; }
   T & operator[](unsigned index) { return m_entries[index].element; }

   void insert_in_subset(unsigned index)
   {
       if (m_entries[index].index_in_subset < 0) {
           m_entries[index].index_in_subset = m_subset_indices.size();
           m_subset_indices.push_back(index);
       }
   }
   void erase_from_subset(unsigned index)
   {
       if (m_entries[index].index_in_subset >= 0) {
           auto subset_index = m_entries[index].index_in_subset;
           auto & entry_to_fix = m_entries[m_subset_indices.back()];
           std::swap(m_subset_indices[subset_index], m_subset_indices.back());
           entry_to_fix.index_in_subset = subset_index;
           m_subset_indices.pop_back();
           m_entries[index].index_in_subset = -1;
       }
   }
   unsigned subset_size() const 
   {
       return m_subset_indices.size();
   }
   T & subset_at(unsigned subset_index)
   {
       auto index = m_subset_indices.at(subset_index);
       return m_entries.at(index).element;
   }
   const T & subset_at(unsigned subset_index) const
   {
       auto index = m_subset_indices.at(subset_index);
       return m_entries.at(index).element;
   }

private:
   std::vector<Entry> m_entries;
   std::vector<unsigned> m_subset_indices;
};

这篇关于允许通过迭代和从子集中随机选择进行更改的数据结构(C ++)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆