如何使用to_clipboard()提供DataFrame的副本 [英] How to provide a copy of your DataFrame with to_clipboard()

查看:104
本文介绍了如何使用to_clipboard()提供DataFrame的副本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

2018-09-18_reproducible_dataframe.ipynb

这被标记为如何制作好的可复制熊猫示例的重复.另一个问题和相关的答案涉及如何创建可复制的数据框.他们没有介绍如何使用to_clipboard复制现有数据帧,而该问题专门介绍了.to_clipboard并且更加简洁.

This was marked as a duplicate of How to make good reproducible pandas examples. The other question and associated answers cover how to create a reproducible dataframe. They do not cover how to copy an existing dataframe with to_clipboard, while this question specifically covers .to_clipboard and is more succinct.

这似乎是一个显而易见的问题.但是,许多询问有关熊猫的问题的用户都是新手,没有经验.提出问题的关键要素是如何创建最小,完整和可验证的示例,该示例解释了什么"和为什么" ",而不是如何".

This may seem like an obvious question. However, many of the users asking questions about Pandas are new and inexperienced. A critical component of asking a question is How to create a Minimal, Complete, and Verifiable example, which explains "what" and "why", but not "how".

例如,作为 OP ,我可能具有以下数据框:

For example, as the OP, I may have the following dataframe:

  • 在此示例中,我创建了综合数据,这是用于创建可重现数据集的一种选择,但不在此问题范围内.
    • 考虑到这一点,就好像您已经加载了文件,并且只需要共享其中的一部分,即可重现错误.
    • For this example, I've created synthetic data, which is an option for creating a reproducible dataset, but not within the scope of this question.
      • Think of this, as if you've loaded a file, and only need to share a bit of it, to reproduce the error.
      import pandas as pd
      import numpy as np
      from datetime import datetime
      
      np.random.seed(365)
      data = {'a': [np.random.randint(10) for _ in range(15)],
              'b': [np.random.randint(10) for _ in range(15)],
              'date': pd.bdate_range(datetime.today(), periods=15).tolist()}
      
      df = pd.DataFrame(data)
      
          a  b       date
      0   2  0 2019-11-06
      1   4  8 2019-11-07
      2   1  4 2019-11-08
      3   5  3 2019-11-11
      4   2  2 2019-11-12
      5   2  6 2019-11-13
      6   9  2 2019-11-14
      7   8  6 2019-11-15
      8   4  8 2019-11-18
      9   0  9 2019-11-19
      10  3  6 2019-11-20
      11  3  1 2019-11-21
      12  7  6 2019-11-22
      13  7  5 2019-11-25
      14  7  7 2019-11-26
      

      数据框后可能会跟随一些其他代码,这些代码会产生错误或无法产生预期的结果

      The dataframe could be followed by some other code, that produces an error or doesn't produce the desired outcome

      提出有关堆栈溢出的问题时应该提供的东西.

      Things that should be provided when asking a question on Stack Overflow.

      • 一个写得很好的连贯问题
      • 产生错误的代码
      • 错误堆栈
      • 可能是某些代码的预期结果
      • 易于使用的数据格式
      • A well written coherent question
      • The code that produces the error
      • The error stack
      • Potentially, the expected outcome of some code
      • The data, in an easily usable form

      推荐答案

      从pandas DataFrame提供示例数据的最快方法

      有多种方法可以回答这个问题.但是,此答案并不旨在提供详尽的解决方案.它提供了最简单的方法.出于好奇,Stack Overflow还提供了其他更详细的解决方案.

      There is more than one way to answer this question. However, this answer isn't meant to provide an exhaustive solution. It provides the simplest method possible. For the curious, there are other more verbose solutions provided on Stack Overflow.

      1. 提供指向可共享数据集的链接(可能在GitHub或Google上的共享文件).如果数据量很大并且目标是优化某些方法,则此功能特别有用.缺点是数据将来可能不再可用,这降低了发布的好处.
        • 必须在问题中提供数据,但可以附带指向更广泛的数据集的链接.
        • 不要仅发布数据的链接或图像.
      1. Provide a link to a shareable dataset (maybe on GitHub or a shared file on Google). This is particularly useful if it's a large dataset and the objective is to optimize some method. The drawback is that the data may no longer be available in the future, which reduces the benefit of the post.
        • Data must be provided in the question, but can be accompanied by a link to a more extensive dataset.
        • Do not post only a link or an image of the data.

      代码:

      提供pandas.DataFrame.to_clipboard

      Code:

      Provide the output of pandas.DataFrame.to_clipboard

      df.head(10).to_clipboard(sep=',', index=False)
      

      • 如果您有一个多索引DataFrame或0 ... n以外的索引,请使用index=True并在问题中提供有关哪个列是索引的注释.
      • 注意:执行上一行代码时,将不会显示任何输出.代码的结果现在在剪贴板中.
      • 在您的堆栈溢出问题中将剪贴板粘贴到code block
        • If you have a multi-index DataFrame or an index other than 0...n, use index=True and provide a note in your question as to which column(s) are the index.
        • Note: when the previous line of code is executed, no output will appear. The result of the code is now in the clipboard.
        • paste the clipboard into a code block in your Stack Overflow question
        • a,b,date
          2,0,2019-11-06
          4,8,2019-11-07
          1,4,2019-11-08
          5,3,2019-11-11
          2,2,2019-11-12
          2,6,2019-11-13
          9,2,2019-11-14
          8,6,2019-11-15
          4,8,2019-11-18
          0,9,2019-11-19
          

          • 可以由试图回答您问题的人将其复制到剪贴板,然后跟随:
          • df = pd.read_clipboard(sep=',')
            

            .head(10)

            以外的数据框位置
            • 使用 .iloc 属性
            • 以下示例选择第3-11行和所有列
            • Locations of the dataframe other the .head(10)

              • Specify a section of the dataframe with the .iloc property
              • The following example selects rows 3 - 11 and all the columns
              • df.iloc[3:12, :].to_clipboard(sep=',')
                

                Google Colab用户

                • .to_clipboard()无法正常工作
                • 执行以下操作
                • Google Colab Users

                  • .to_clipboard() won't work
                  • Do the following
                  • # if you have a datetime column, convert it to a str
                    df['date'] = df['date'].astype('str')
                    
                    # output to a dict
                    df.head(10).to_dict()
                    
                    # paste into a code block on SO
                    # convert datatime column back
                    

                    这篇关于如何使用to_clipboard()提供DataFrame的副本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆