在 C 中删除数组中的重复项 [英] Removing Duplicates in an array in C

查看:24
本文介绍了在 C 中删除数组中的重复项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题有点复杂.这里的问题是去除重复元素并将数组的唯一元素保存到具有原始序列的另一个数组中.

The question is a little complex. The problem here is to get rid of duplicates and save the unique elements of array into another array with their original sequence.

例如:

如果输入是 b a c a d t

If the input is entered b a c a d t

结果应该是:b a c d t 处于输入输入的确切状态.

The result should be : b a c d t in the exact state that the input entered.

所以,为了对数组进行排序,然后检查无法工作,因为我丢失了原始序列.有人建议我使用索引数组,但我不知道该怎么做.那么您对此有何建议?

So, for sorting the array then checking couldn't work since I lost the original sequence. I was advised to use array of indices but I don't know how to do. So what is your advise to do that?

对于那些愿意回答问题的人,我想补充一些具体信息.

For those who are willing to answer the question I wanted to add some specific information.

char** finduni(char *words[100],int limit)
{
//
//Methods here
//
}

是我的功能.应删除其重复项并将其存储在不同数组中的数组是 words[100].因此,该过程将在此完成.我首先考虑将单词的所有元素放入另一个数组并对该数组进行排序,但经过一些测试后这不起作用.只是提醒解决者:).

is the my function. The array whose duplicates should be removed and stored in a different array is words[100]. So, the process will be done on this. I firstly thought about getting all the elements of words into another array and sort that array but that doesn't work after some tests. Just a reminder for solvers :).

推荐答案

好吧,这里有一个 char 类型的版本.请注意,它不会缩放.

Well, here is a version for char types. Note it doesn't scale.

#include "stdio.h"
#include "string.h"

void removeDuplicates(unsigned char *string)
{
   unsigned char allCharacters [256] = { 0 };
   int lookAt;
   int writeTo = 0;
   for(lookAt = 0; lookAt < strlen(string); lookAt++)
   {
      if(allCharacters[ string[lookAt] ] == 0)
      {
         allCharacters[ string[lookAt] ] = 1;  // mark it seen
         string[writeTo++] = string[lookAt];     // copy it
      }
   }
   string[writeTo] = '';
}

int main()
{
   char word[] = "abbbcdefbbbghasdddaiouasdf";
   removeDuplicates(word);
   printf("Word is now [%s]
", word);
   return 0;
}

以下是输出:

Word is now [abcdefghsiou]

这和你想要的一样吗?如果字母之间有空格,则可以修改方法,但如果使用 intfloatdoublechar * 作为类型,此方法根本无法扩展.

Is that something like what you want? You can modify the method if there are spaces between the letters, but if you use int, float, double or char * as the types, this method won't scale at all.

编辑

我发布然后看到您的说明,它是 char * 的数组.我会更新方法.

I posted and then saw your clarification, where it's an array of char *. I'll update the method.

我希望这不是太多的代码.我改编了 这个 QuickSort 算法 并基本上为它添加了索引内存.该算法是 O(n log n),因为以下 3 个步骤是相加的,这是其中 2 个步骤的最坏情况复杂度.

I hope this isn't too much code. I adapted this QuickSort algorithm and basically added index memory to it. The algorithm is O(n log n), as the 3 steps below are additive and that is the worst case complexity of 2 of them.

  1. 对字符串数组进行排序,但每次交换也应反映在索引数组中.在此阶段之后,originalIndices 的第 i 个元素保存已排序数组的第 i 个元素的原始索引.
  2. 通过将已排序数组中的重复元素设置为 NULL,并将索引值设置为 elements(这是可以达到的最高值)来删除它们.
  3. 对原始索引数组进行排序,并确保每个交换都反映在字符串数组中.这给了我们原始的字符串数组,除了重复的在末尾而且它们都是 NULL.
  4. 为了更好地衡量,我返回了新的元素计数.
  1. Sort the array of strings, but every swap should be reflected in the index array as well. After this stage, the i'th element of originalIndices holds the original index of the i'th element of the sorted array.
  2. Remove duplicate elements in the sorted array by setting them to NULL, and setting the index value to elements, which is the highest any can be.
  3. Sort the array of original indices, and make sure every swap is reflected in the array of strings. This gives us back the original array of strings, except the duplicates are at the end and they are all NULL.
  4. For good measure, I return the new count of elements.

代码:

#include "stdio.h"
#include "string.h"
#include "stdlib.h"

void sortArrayAndSetCriteria(char **arr, int elements, int *originalIndices)
{
   #define  MAX_LEVELS  1000
   char *piv;
   int  beg[MAX_LEVELS], end[MAX_LEVELS], i=0, L, R;
   int idx, cidx;
   for(idx = 0; idx < elements; idx++)
      originalIndices[idx] = idx;
   beg[0] = 0;
   end[0] = elements;
   while (i>=0)
   {
      L = beg[i];
      R = end[i] - 1;
      if (L<R)
      {
         piv = arr[L];
         cidx = originalIndices[L];
         if (i==MAX_LEVELS-1)
            return;
         while (L < R)
         {
            while (strcmp(arr[R], piv) >= 0 && L < R) R--;
            if (L < R)
            {
               arr[L] = arr[R];
               originalIndices[L++] = originalIndices[R];
            }
            while (strcmp(arr[L], piv) <= 0 && L < R) L++;
            if (L < R)
            {
               arr[R] = arr[L];
               originalIndices[R--] = originalIndices[L];
            }
         }
         arr[L] = piv;
         originalIndices[L] = cidx;
         beg[i + 1] = L + 1;
         end[i + 1] = end[i];
         end[i++] = L;
      }
      else
      {
         i--;
      }
   }
}

int removeDuplicatesFromBoth(char **arr, int elements, int *originalIndices)
{
   // now remove duplicates
   int i = 1, newLimit = 1;
   char *curr = arr[0];
   while (i < elements)
   {
      if(strcmp(curr, arr[i]) == 0)
      {
         arr[i] = NULL;   // free this if it was malloc'd
         originalIndices[i] = elements;  // place it at the end
      }
      else
      {
         curr = arr[i];
         newLimit++;
      }
      i++;
   }
   return newLimit;
}

void sortArrayBasedOnCriteria(char **arr, int elements, int *originalIndices)
{
   #define  MAX_LEVELS  1000
   int piv;
   int beg[MAX_LEVELS], end[MAX_LEVELS], i=0, L, R;
   int idx;
   char *cidx;
   beg[0] = 0;
   end[0] = elements;
   while (i>=0)
   {
      L = beg[i];
      R = end[i] - 1;
      if (L<R)
      {
         piv = originalIndices[L];
         cidx = arr[L];
         if (i==MAX_LEVELS-1)
            return;
         while (L < R)
         {
            while (originalIndices[R] >= piv && L < R) R--;
            if (L < R)
            {
               arr[L] = arr[R];
               originalIndices[L++] = originalIndices[R];
            }
            while (originalIndices[L] <= piv && L < R) L++;
            if (L < R)
            {
               arr[R] = arr[L];
               originalIndices[R--] = originalIndices[L];
            }
         }
         arr[L] = cidx;
         originalIndices[L] = piv;
         beg[i + 1] = L + 1;
         end[i + 1] = end[i];
         end[i++] = L;
      }
      else
      {
         i--;
      }
   }
}

int removeDuplicateStrings(char *words[], int limit)
{
   int *indices = (int *)malloc(limit * sizeof(int));
   int newLimit;
   sortArrayAndSetCriteria(words, limit, indices);
   newLimit = removeDuplicatesFromBoth(words, limit, indices);
   sortArrayBasedOnCriteria(words, limit, indices);
   free(indices);
   return newLimit;
}

int main()
{
   char *words[] = { "abc", "def", "bad", "hello", "captain", "def", "abc", "goodbye" };
   int newLimit = removeDuplicateStrings(words, 8);
   int i = 0;
   for(i = 0; i < newLimit; i++) printf(" Word @ %d = %s
", i, words[i]);
   return 0;
}

这篇关于在 C 中删除数组中的重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆