多输入多变量数据可视化 [英] Multiple inputs multivariate data visualisation

查看:124
本文介绍了多输入多变量数据可视化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图通过从多个输入文件中读取多变量数据模型来形象化它们.我正在寻找一个简单的解决方案,以可视化从多个输入的csv文件读取的多个类别数据.没有在单个文件中,输入行的范围为1到10000s.格式与所有带有4列csv文件的输入相同.

输入1

tweetcricscore 34  51 high

输入2

tweetcricscore 23 46 low
tweetcricscore 24  12 low
tweetcricscore 456 46 low

输入3

tweetcricscore 653  1 medium 
tweetcricscore 789 178 medium

输入4

tweetcricscore 625  46 part
tweetcricscore 86  23 part
tweetcricscore 3  1 part
tweetcricscore 87 8 part
tweetcricscore 98 56 part

四个输入分别属于不同的类别,并且col[1]col[2]是某种分类的成对结果.这里的所有输入都是相同分类的输出.我想以更好的方式可视化它们,以仅在一个图中显示所有类别.寻找相同的python或pandas解决方案.散点图或任何最佳绘制方法.

我已经将此查询发布在堆栈交换的数据分析"部分中,因此没有运气,因此请尝试在这里进行操作. https://datascience.stackexchange.com/questions/11440/multi-model-数据集可视化Python

可能类似于下面的图像,其中每个类都有自己的标记和颜色,可以进行分类或以更好的方式将对值一起显示.

代码:我正在尝试使用上述输入文件来绘制散点图.

import numpy as np
import matplotlib.pyplot as plt
from pylab import*
import math
from matplotlib.ticker import LogLocator
import pandas as pd

df1 = pd.read_csv('input_1.csv', header = None)

df1.columns = ['col1','col2','col3','col4']
plt.df1(kind='scatter', x='col2', y='col3', s=120, c='b', label='Highly')

plt.legend(loc='upper right')
plt.xlabel('Freq (x)')
plt.ylabel('Freq(y)')
#plt.gca().set_xscale("log")
#plt.gca().set_yscale("log")
plt.show()

错误:

Traceback (most recent call last):
  File "00_scatter_plot.py", line 12, in <module>
    plt.scatter(x='col2', y='col3', s=120, c='b', label='High')
  File "/usr/lib/pymodules/python2.7/matplotlib/pyplot.py", line 3087, in scatter
    linewidths=linewidths, verts=verts, **kwargs)
  File "/usr/lib/pymodules/python2.7/matplotlib/axes.py", line 6337, in scatter
    self.add_collection(collection)
  File "/usr/lib/pymodules/python2.7/matplotlib/axes.py", line 1481, in add_collection
    self.update_datalim(collection.get_datalim(self.transData))
  File "/usr/lib/pymodules/python2.7/matplotlib/collections.py", line 185, in get_datalim
    offsets = np.asanyarray(offsets, np.float_)
  File "/usr/local/lib/python2.7/dist-packages/numpy/core/numeric.py", line 514, in asanyarray
    return array(a, dtype, copy=False, order=order, subok=True)
ValueError: could not convert string to float: col2

预期的输出绘制熊猫

解决方案

更新:

具有不同的颜色:

colors = dict(low='DarkBlue', high='red', part='yellow', medium='DarkGreen')

fig, ax = plt.subplots()

for grp, vals in df.groupby('col4'):
    color = colors[grp]
    vals[['col2','col3']].plot.scatter(x='col2', y='col3', ax=ax,
                                       s=120, label=grp, color=color)

PS,您必须注意所有组(col4)-在colors词典中定义

老答案:

假设您已将文件串联/合并/合并到单个DF中,我们可以执行以下操作:

fig, ax = plt.subplots()
[vals[['col2','col3']].plot.scatter(x='col2', y='col3', ax=ax, label=grp)
 for grp, vals in df.groupby('col4')]

PS作为作业-您可以玩彩色游戏;)

I am trying to visualise multivariate data model by reading them from multiple input files. I am looking for a simple solution to visualise multiple category data read from multiple input csv files. The no. Of rows in inputs range from 1 to 10000s in individual files. The format is same of all the inputs with 4 columns csv files.

Input 1

tweetcricscore 34  51 high

Input 2

tweetcricscore 23 46 low
tweetcricscore 24  12 low
tweetcricscore 456 46 low

Input 3

tweetcricscore 653  1 medium 
tweetcricscore 789 178 medium

Input 4

tweetcricscore 625  46 part
tweetcricscore 86  23 part
tweetcricscore 3  1 part
tweetcricscore 87 8 part
tweetcricscore 98 56 part

The four inputs are each of different category and col[1] and col[2] are pair results of some kind of classification. All the inputs here are the outputs of the same classification. I want to visualise them in better way to show all the categories in one plot only. Looking for a python or pandas solutions for the same. Scatter plot or any best approach to plot.

I have already posted this query in Data analysis section of stack exchange and I have no luck hence trying here. https://datascience.stackexchange.com/questions/11440/multi-model-data-set-visualization-python

May be something like below image where every class has its own marker and color and can be categorized or any better way to show the pair values together.

code: Edit 1: I am trying to plot a scatter plot with above input files.

import numpy as np
import matplotlib.pyplot as plt
from pylab import*
import math
from matplotlib.ticker import LogLocator
import pandas as pd

df1 = pd.read_csv('input_1.csv', header = None)

df1.columns = ['col1','col2','col3','col4']
plt.df1(kind='scatter', x='col2', y='col3', s=120, c='b', label='Highly')

plt.legend(loc='upper right')
plt.xlabel('Freq (x)')
plt.ylabel('Freq(y)')
#plt.gca().set_xscale("log")
#plt.gca().set_yscale("log")
plt.show()

Error:

Traceback (most recent call last):
  File "00_scatter_plot.py", line 12, in <module>
    plt.scatter(x='col2', y='col3', s=120, c='b', label='High')
  File "/usr/lib/pymodules/python2.7/matplotlib/pyplot.py", line 3087, in scatter
    linewidths=linewidths, verts=verts, **kwargs)
  File "/usr/lib/pymodules/python2.7/matplotlib/axes.py", line 6337, in scatter
    self.add_collection(collection)
  File "/usr/lib/pymodules/python2.7/matplotlib/axes.py", line 1481, in add_collection
    self.update_datalim(collection.get_datalim(self.transData))
  File "/usr/lib/pymodules/python2.7/matplotlib/collections.py", line 185, in get_datalim
    offsets = np.asanyarray(offsets, np.float_)
  File "/usr/local/lib/python2.7/dist-packages/numpy/core/numeric.py", line 514, in asanyarray
    return array(a, dtype, copy=False, order=order, subok=True)
ValueError: could not convert string to float: col2

Expected Output Plotting- Pandas

解决方案

UPDATE:

with different colors:

colors = dict(low='DarkBlue', high='red', part='yellow', medium='DarkGreen')

fig, ax = plt.subplots()

for grp, vals in df.groupby('col4'):
    color = colors[grp]
    vals[['col2','col3']].plot.scatter(x='col2', y='col3', ax=ax,
                                       s=120, label=grp, color=color)

PS you will have to care that all your groups (col4) - are defined in colors dictionary

OLD answer:

assuming that you've concatenated/merged/joined your files into single DF, we can do the following:

fig, ax = plt.subplots()
[vals[['col2','col3']].plot.scatter(x='col2', y='col3', ax=ax, label=grp)
 for grp, vals in df.groupby('col4')]

PS as a homework - you can play with colors ;)

这篇关于多输入多变量数据可视化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆