从三个数据帧动态创建字符串 [英] dynamically create string from three data frames

查看:46
本文介绍了从三个数据帧动态创建字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

第三个数据帧如下:-

  d = {'10028':['US,IN'],'1058':['NA,JO,US'],'20120':[''],'20121':['US,PK'],'20122':['IN'],'20123':['Us,LN'],'5043':['AI,AL'],'5046':['AA,AB']}df3 = pd.DataFrame(data = d) 

然后使用以下代码将其转换为特定格式:-

  details =('\ n'+'指标名称'+'\ t'+'计数'+'\ t'+'异常'+'\ t'+'国家''\ n'+'10028:'+'\ t'+'\ t'+ str(df1.tail(1)['10028'] [0])+'\ t'+ str(df2 ['10028'] [0])+'\ t'+ str(df3 ['10028'] [0])+'\ n'+'1058:'+'\ t'+'\ t'+ str(df1.tail(1)['1058'] [0])+'\ t'+ str(df2 ['1058'] [0])+'\ t'+ str(df3 ['1058'] [0])+'\ n'+'20120:'+'\ t'+'\ t'+ str(df1.tail(1)['20120'] [0])+'\ t'+ str(df2 ['20120'] [0])+'\ t'+ str(df3 ['20120'] [0])+'\ n'+'20121:'+'\ t'+'\ t'+ str(round(df1.tail(1)['20121'] [0],2))+'\ t'+ str(df2 ['20121'] [0])+'\ t'+ str(df3 ['20121'] [0])+'\ n'+'20122:'+'\ t'+'\ t'+ str(round(df1.tail(1)['20122'] [0],2))+'\ t'+ str(df2 ['20122'] [0])+'\ t'+ str(df3 ['20122'] [0])+'\ n'+'20123:'+'\ t'+'\ t'+ str(round(df1.tail(1)['20123'] [0],3))+'\ t'+ str(df2 ['20123'] [0])+'\ t'+ str(df3 ['20123'] [0])+'\ n'+'5043:'+'\ t'+'\ t'+ str(round(df1.tail(1)['5043'] [0],3))+'\ t'+ str(df2 ['5043'] [0])+'\ t'+ str(df3 ['5043'] [0])+'\ n'+'5046:'+'\ t'+'\ t'+ str(round(df1.tail(1)['5046'] [0],3))+'\ t'+ str(df2 ['5046'] [0])+'\ t'+ str(df3 ['5046'] [0])+'\ n \ n'+'消息:'+'\ t'+平台出现了问题,因为[异常值== 1]出现峰值.") 

问题是列值在每次运行中总是在变化,我的意思是像在此运行中一样,其'10028','1058','20120','20121','20122','20123','5043","5046" ,但在下一次运行中可能会是'10029','1038','20121','20122','20123','5083','5946'

我如何根据数据框中存在的列动态创建细节,因为我不想硬编码,并且在消息中我想传递值为1的列的名称.

对于df1和df2,列的值始终为1或0;对于df3,列的值始终为列表或空白.

预期输出:-

对于两个数据框,我得到了一个可行的解决方案,如下所示:-

 #字符串的第一部分s ='\ n'+'指标名称'+'\ t'+'计数'+'\ t'+'异常'#动态添加数据对于idx,df1.iloc [-1] .iteritems()中的val:s + = f'\ n {idx} \ t {val} \ t {df2 [idx] [0]}'# 最后部分s + =('\ n \ n'+'消息:'+'\ t'+平台出现了问题,因为[异常值== 1]出现峰值.") 

,如果不存在匹配值,则输出null

解决方案

要获得预期结果,您可以执行以下操作(输入数据必须是所显示的词典,否则,请提供实际输入数据):

 将熊猫作为pd导入final_d = []d = {'10028':0,'1058':25,'20120':29,'20121':22,'20122':0,'20123':0,'5043':0,'5046':0}final_d.append(d)d = {'10028':0,'1058':1,'20120':1,'20121':0,'20122':0,'20123':0,'5043':0,'5046':0,'91111':0}final_d.append(d)d = {'10028':['US','IN'],'1058':['NA','JO','US'],'20120':[''],'20121':['US','PK'],'20122':['IN'],'20123':['Us','LN'],'5043':['AI','AL'],'5046':['AA','AB'],'00000':['kk','dd','ee']}final_d.append(d)#现在,我们将合并键上的字典数据= {}对于我,dt in enumerate(final_d):对于dt.items()中的k,v:如果数据中有k:如果type(v)== list:数据[k] [i] =','.join(v)别的:数据[k] [i] = v别的:数据[k] = [''] * len(final_d)如果type(v)== list:数据[k] [i] =','.join(v)别的:数据[k] [i] = vmaxlen = max([[len(v)for data in data.values()])数据= {k:v,如果len(v)== maxlen else v + [''] *(maxlen-len(v))对于data.items()中的k,v#创建基础数据框df = pd.DataFrame.from_dict(数据)#将列标题(度量标准名称)转换为数据帧中的一行df = pd.concat([pd.DataFrame.from_dict({k:[v] for zip(df.columns.tolist(),df.columns.tolist())})中的k,v,df],ignore_index =真的)#删除列名df.columns = [''] * len(df.columns)#根据所需的输出组织数据框结果= df.T.reset_index(drop = True)#根据需要添加列名称result.columns = ['Metric Name','Count','Anomaly','Country']#瞧!打印(result.to_string(index = False)) 

生成的数据框:

 指标名称计数异常国家/地区10028 0 0 US,IN1058 25 1 NA,JO,US20120 29 120121 22 0 US,PK20122 0 0输入20123 0 0我们,LN5043 0 0 AI,AL5046 0 0 AA,AB91111 000000 kk,dd,ee 

Dynamically create string from pandas column

I have three data frame like below one is df and another one is anomalies:-

d = {'10028': [0], '1058': [25], '20120': [29], '20121': [22],'20122': [0], '20123': [0], '5043': [0], '5046': [0]}
    
    df1 = pd.DataFrame(data=d)

Basically anomalies in a mirror copy of df just in anomalies the value will be 0 or 1 which indicates anomalies where value is 1 and non-anomaly where value is 0

d = {'10028': [0], '1058': [1], '20120': [1], '20121': [0],'20122': [0], '20123': [0], '5043': [0], '5046': [0]}

df2 = pd.DataFrame(data=d)

And a third data frame like below:-

d = {'10028': ['US,IN'], '1058': ['NA, JO, US'], '20120': [''], '20121': ['US,PK'],'20122': ['IN'], '20123': ['Us,LN'], '5043': ['AI,AL'], '5046': ['AA,AB']}

df3 = pd.DataFrame(data=d)

and I am converting that into a specific format with the below code:-

details = (
        '\n' + 'Metric Name' + '\t' + 'Count' + '\t' + 'Anomaly' + '\t' + 'Country' 
        '\n' + '10028:' + '\t'+ '\t' + str(df1.tail(1)['10028'][0]) + '\t' + str(df2['10028'][0]) + '\t'+ str(df3['10028'][0]) + 
        '\n' + '1058:' + '\t' + '\t' + str(df1.tail(1)['1058'][0]) + '\t' + str(df2['1058'][0]) + '\t'+ str(df3['1058'][0]) +
        '\n' + '20120:' + '\t' +'\t' + str(df1.tail(1)['20120'][0]) + '\t' + str(df2['20120'][0]) + '\t'+ str(df3['20120'][0]) +
        '\n' + '20121:' + '\t' + '\t' +str(round(df1.tail(1)['20121'][0], 2)) + '\t' + str(df2['20121'][0]) + '\t'+ str(df3['20121'][0]) +
        '\n' + '20122:' + '\t' + '\t' +str(round(df1.tail(1)['20122'][0], 2)) + '\t' + str(df2['20122'][0]) + '\t'+str(df3['20122'][0]) +
        '\n' + '20123:' + '\t' + '\t' +str(round(df1.tail(1)['20123'][0], 3)) + '\t' + str(df2['20123'][0]) + '\t'+str(df3['20123'][0]) +
        '\n' + '5043:' + '\t' + '\t' +str(round(df1.tail(1)['5043'][0], 3)) + '\t' + str(df2['5043'][0]) + '\t'+str(df3['5043'][0]) +
        '\n' + '5046:' + '\t' + '\t' +str(round(df1.tail(1)['5046'][0], 3)) + '\t' + str(df2['5046'][0]) + '\t'+str(df3['5046'][0]) +
        '\n\n' + 'message:' + '\t' +
        'Something wrong with the platform as there is a spike in [values where anomalies == 1].'
            )

The problem is the column values are changing always in every run I mean like in this run its '10028', '1058', '20120', '20121', '20122', '20123', '5043', '5046' but maybe in next run it will be '10029', '1038', '20121', '20122', '20123', '5083', '5946'

How I can create the details dynamically depending on what columns are present in the data frame as I don't want to hard code and in the message I want to pass the name of columns whose value is 1.

The value of columns will always be either 1 or 0 for df1 and df2 and for df3 either a list or blank.

Expected Output:-

For two data frames I got a working solution which is below :-

# first part of the string
s = '\n' + 'Metric Name' + '\t' + 'Count' + '\t' + 'Anomaly' 

# dynamically add the data
for idx, val in df1.iloc[-1].iteritems():
    s += f'\n{idx}\t{val}\t{df2[idx][0]}' 
# last part
s += ('\n\n' + 'message:' + '\t' +
      'Something wrong with the platform as there is a spike in [values where anomalies == 1].'
     )

and if the matching value is not present then print null

解决方案

To obtain the expected result, you can do the following (the input data must be the dictionaries as shown in question, if not, please provide the real input data):

import pandas as pd

final_d = []
d = {'10028': 0, '1058': 25, '20120': 29, '20121': 22,'20122': 0, '20123': 0, '5043': 0, '5046': 0}
final_d.append(d)

d = {'10028': 0, '1058': 1, '20120': 1, '20121': 0,'20122': 0, '20123': 0, '5043': 0, '5046': 0, '91111':0}
final_d.append(d)

d = {'10028': ['US','IN'], '1058': ['NA', 'JO', 'US'], '20120': [''], '20121': ['US','PK'],'20122': ['IN'], '20123': ['Us','LN'], '5043': ['AI','AL'], '5046': ['AA','AB'], '00000':['kk','dd','ee']}
final_d.append(d)

# Now, we will merge the dictionaries on key
data = {}
for i, dt in enumerate(final_d):
    for k,v in dt.items():
        if k in data:
            if type(v)==list:
                data[k][i] = ','.join(v)
            else:
                data[k][i] = v
        else:
            data[k] = ['']*len(final_d)
            if type(v)==list:
                data[k][i] = ','.join(v)
            else:
                data[k][i] = v
maxlen = max([len(v) for v in data.values()])
data = {k:v if len(v)==maxlen else v+['']*(maxlen-len(v)) for k,v in data.items()}

# Creating the base dataframe
df = pd.DataFrame.from_dict(data)

# Converting the column headers (metric names) into a row in the dataframe
df = pd.concat([pd.DataFrame.from_dict({k:[v] for k,v in zip(df.columns.tolist(), df.columns.tolist())}), df], ignore_index=True)

# removing column names
df.columns = [''] * len(df.columns)

# organising the dataframe according to your required output
result = df.T.reset_index(drop=True)

# Adding the column names as required
result.columns = ['Metric Name', 'Count', 'Anomaly', 'Country']

# Voila!
print(result.to_string(index=False))

The generated dataframe:

Metric Name Count Anomaly   Country
      10028     0       0     US,IN
       1058    25       1  NA,JO,US
      20120    29       1          
      20121    22       0     US,PK
      20122     0       0        IN
      20123     0       0     Us,LN
       5043     0       0     AI,AL
       5046     0       0     AA,AB
      91111             0          
      00000                kk,dd,ee

这篇关于从三个数据帧动态创建字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆