如何根据数据中对象的类型将大数据集分成较小的集合? [英] How do I manipulate a large data set into smaller sets based on the type of object within the data?

查看:148
本文介绍了如何根据数据中对象的类型将大数据集分成较小的集合?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的代码中,用户输入了一个文本文件.文本文件包含4列,并且行数将随所加载的文本文件而变化,因此代码必须是通用的.从文本文件生成的数组的第一列包含一种动物,第二列是其在字段中的Xlocation,第三列是其在字段中的Ylocation,第四列是该字段中的动物Zlocation. 加载数据如果您不希望点击链接,数据,这是加载数据和返回的数组的代码的副本:

In my code, the user inputs a text file. The text file contains 4 columns and the number of rows will vary with the text file that is loaded so the code must be generic. The first column of the array generated from the text file contains a type of animal, the second column is its Xlocation in a field, the third is its Ylocation in a field and the fourth is the animals Zlocation in the field. Load the data If you don't want to follow the link to the picture of the data, here is a copy of the code loading the data and the array that is returned:

#load the data
emplaced_animals_data = np.genfromtxt('animal_data.txt', skip_header = 1, dtype = str)
print(type(emplaced_animals_data))
print(emplaced_animals_data)

[['butterfly' '1' '1' '3']
 ['butterfly' '2' '2' '3']
 ['butterfly' '3' '3' '3']
 ['dragonfly' '4' '1' '1']
 ['dragonfly' '5' '2' '1']
 ['dragonfly' '6' '3' '1']
 ['cat' '4' '4' '2']
 ['cat' '5' '5' '2']
 ['cat' '6' '6' '2']
 ['cat' '7' '8' '3']
 ['elephant' '8' '9' '3']
 ['elephant' '9' '10' '4']
 ['elephant' '10' '10' '4']
 ['camel' '10' '11' '5']
 ['camel' '11' '6' '5']
 ['camel' '12' '5' '6']
 ['camel' '12' '3' '6']
 ['bear' '13' '13' '7']
 ['bear' '5' '15' '7']
 ['bear' '4' '10' '5']
 ['bear' '6' '9' '2']
 ['bear' '15' '13' '1']
 ['dog' '1' '3' '9']
 ['dog' '2' '12' '8']
 ['dog' '3' '10' '1']
 ['dog' '4' '8' '1']]

在加载数据之后,数据中总会有两种动物,我们不想了解任何信息,因此我从第一列中删除了这些动物的名称,但是我不确定如何从整个行中删除数据.如何将数据的选择从动物的类型扩展到其位置,并删除不需要的动物的数据?我提供了一些图像,以显示我目前所做的工作的输出. 删除不需要的动物

After the data is loaded in, there will always be two types of animals in the data that we don't want to know anything about so I remove the names of these animals from the first column, but I am unsure how to remove the data from the whole row. How would I extend the selection of data from the type of animal to its location and delete it for the unwanted animals? I have included images to show the outputs of what I have currently done. Remove Unwanted Animals

#Removes unwanted animals from list
print('Original list:', emplaced_animals_data[:,0])
all_the_animals = list(emplaced_animals_data[:,0])
Butterfly = set('butterfly')
Dragonfly = set('dragonfly')

for i in range(0, len(emplaced_animals_data)):
    for animal in all_the_animals:
        if Butterfly == set(animal):
            all_the_animals.remove(animal)
        if Dragonfly == set(animal):
            all_the_animals.remove(animal)
print('Updated list:', words)

接下来,我想把剩下的动物和每只动物及其位置数据一起排序到自己的数组中,该数组将另存为某些变量,但是目前我只能将动物类型排序到它们自己的数组中.我将如何扩展对动物的选择以合并它们的位置,以及根据动物的类型将动物及其位置保存到自己的阵列中?

Next, I would like to take the remaining animals and sort each animal along with its location data into its own array which would be saved as some variable, but currently I am only able to sort the animal types into their own arrays. How would I extend my selection of the animals to incorporate their locations as well as save the animals and their locations to their own array based on type of animal?Grouping Animals

#Groups all of the items with the same name together
setofanimals = set(all_the_animals)

animal_groups = {}

for one in setofanimals:
    ids = [one for i in emplaced_animals_data[:,0] if i == one]
    animal_groups.update({one:ids})

for one in animal_groups:
    print(one, ":", animal_groups[one])

我的最终目标是能够绘制每种动物每次出现时的情况,而与加载的文本文件无关.

这是我正在使用的数据,是从保存为文本文件的Excel电子表格中复制的:

Here is the data I am working with, copied from the Excel Spreadsheet that I have saved as a text file:

数据

推荐答案

以下功能可以完成此任务.您输入的txt的长度可以是任意的,并且两个函数都包含一个动物列表,以便根据所述列表中包含的动物进行删除或选择:

The following functions should accomplish this. Your input txt can be arbitrary in length, and both functions take in a list of animals to delete or select based on the animals contained in said list:

import numpy as np

# note that my delimiter is a tab, which might be different from yours
emplaced_animals = np.genfromtxt('animals.txt', skip_header=1, dtype=str, delimiter='   ')
listed_animals = ['cat', 'dog', 'bear', 'camel', 'elephant']

def get_specific_animals_from(list_of_all_animals, specific_animals):
    """get a list only containing rows of a specific animal"""
    list_of_specific_animals = np.array([])
    for specific_animal in specific_animals:
        for animal in list_of_all_animals:
            if animal[0] == specific_animal:
                list_of_specific_animals = np.append(list_of_specific_animals, animal, 0)
    return list_of_specific_animals

def delete_specific_animals_from(list_of_all_animals, bad_animals):
    """
    delete all rows of bad_animal in provided list
    takes in a list of bad animals e.g. ['dragonfly', 'butterfly']
    returns list of only desired animals
    """
    all_useful_animals = list_of_all_animals
    positions_of_bad_animals = []
    for n, animal in enumerate(list_of_all_animals):
        if animal[0] in bad_animals:
            positions_of_bad_animals.append(n)
    if len(positions_of_bad_animals):
        for position in sorted(positions_of_bad_animals, reverse=True):
            # reverse is important
            # without it, list positions change as you delete items
            all_useful_animals = np.delete(all_useful_animals, (position), 0)
    return all_useful_animals

emplaced_animals = delete_specific_animals_from(emplaced_animals, ['dragonfly', 'butterfly'])

list_of_elephants = get_specific_animals_from(emplaced_animals, ['elephant'])

list_of_needed_animals = get_specific_animals_from(emplaced_animals, listed_animals)

这篇关于如何根据数据中对象的类型将大数据集分成较小的集合?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆