如何将现有Pandas DataFrame的所有值设置为零？ [英] How to set all the values of an existing Pandas DataFrame to zero?

查看：1862 发布时间：2020/10/16 23:27:37 python pandas dataframe

本文介绍了如何将现有Pandas DataFrame的所有值设置为零？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我目前已有一个带有日期索引的Pandas DataFrame，每个列都有一个特定的名称。

I currently have an existing Pandas DataFrame with a date index, and columns each with a specific name.

对于数据单元，它们充满了各种浮点数值。

As for the data cells, they are filled with various float values.

我想复制DataFrame，但将所有这些值替换为零。

I would like to copy my DataFrame, but replace all these values with zero.

目标是重用DataFrame的结构（维度，索引，列名），但通过将它们替换为零来清除所有当前值。

The objective is to reuse the structure of the DataFrame (dimensions, index, column names), but clear all the current values by replacing them with zeroes.

我当前的方式实现此目标的方法如下：

The way I'm currently achieving this is as follow:

df[df > 0] = 0

但是，这不会替换DataFrame中的任何负值。

However, this would not replace any negative value in the DataFrame.

不是有一种更通用的方法来用单个公共值填充整个现有DataFrame吗？

Isn't there a more general approach to filling an entire existing DataFrame with a single common value?

预先感谢您

时间比较

设置

For small DataFrames, the subtype check is somewhat costly. However, the cost of zeroing a non-numeric column is substantial, so if you're not sure whether your DataFrame is entirely numeric, you should probably include the issubdtype check.

import pandas as pd
import numpy as np

def make_df(n, only_numeric):
    series = [
        pd.Series(range(n), name="int", dtype=int),
        pd.Series(range(n), name="float", dtype=float),
    ]
    if only_numeric:
        series.extend(
            [
                pd.Series(range(n, 2 * n), name="int2", dtype=int),
                pd.Series(range(n, 2 * n), name="float2", dtype=float),
            ]
        )
    else:
        series.extend(
            [
                pd.date_range(start="1970-1-1", freq="T", periods=n, name="dt")
                .to_series()
                .reset_index(drop=True),
                pd.Series(
                    [chr((i % 26) + 65) for i in range(n)],
                    name="string",
                    dtype="object",
                ),
            ]
        )

    return pd.concat(series, axis=1)

>>> make_df(5, True)
   int  float  int2  float2
0    0    0.0     5     5.0
1    1    1.0     6     6.0
2    2    2.0     7     7.0
3    3    3.0     8     8.0
4    4    4.0     9     9.0

>>> make_df(5, False)
   int  float                  dt string
0    0    0.0 1970-01-01 00:00:00      A
1    1    1.0 1970-01-01 00:01:00      B
2    2    2.0 1970-01-01 00:02:00      C
3    3    3.0 1970-01-01 00:03:00      D
4    4    4.0 1970-01-01 00:04:00      E

小型DataFrame

Small DataFrame

n = 10_000                                                                                  

# Numeric df, no issubdtype check
%%timeit df = make_df(n, True)
for col in df.columns:
    df[col].values[:] = 0
36.1 µs ± 510 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

# Numeric df, yes issubdtype check
%%timeit df = make_df(n, True)
for col in df.columns:
    if np.issubdtype(df[col].dtype, np.number):
        df[col].values[:] = 0
53 µs ± 645 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

# Non-numeric df, no issubdtype check
%%timeit df = make_df(n, False)
for col in df.columns:
    df[col].values[:] = 0
113 µs ± 391 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

# Non-numeric df, yes issubdtype check
%%timeit df = make_df(n, False)
for col in df.columns:
    if np.issubdtype(df[col].dtype, np.number):
        df[col].values[:] = 0
39.4 µs ± 1.91 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

大型DataFrame

Large DataFrame

n = 10_000_000                                                                             

# Numeric df, no issubdtype check
%%timeit df = make_df(n, True)
for col in df.columns:
    df[col].values[:] = 0
38.7 ms ± 151 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

# Numeric df, yes issubdtype check
%%timeit df = make_df(n, True)
for col in df.columns:
    if np.issubdtype(df[col].dtype, np.number):
        df[col].values[:] = 0
39.1 ms ± 556 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

# Non-numeric df, no issubdtype check
%%timeit df = make_df(n, False)
for col in df.columns:
    df[col].values[:] = 0
99.5 ms ± 748 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

# Non-numeric df, yes issubdtype check
%%timeit df = make_df(n, False)
for col in df.columns:
    if np.issubdtype(df[col].dtype, np.number):
        df[col].values[:] = 0
17.8 ms ± 228 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

我以前曾建议以下答案，但现在我认为这是有害的-它比上述答案慢得多，也很难推理。唯一的好处是写起来更好。

最干净的方法是使用裸冒号引用整个
数据框。

The cleanest way is to use a bare colon to reference the entire dataframe.

df[:] = 0

不幸的是 dtype 的情况有点模糊，因为结果数据帧中的每个
列都具有相同的 dtype 。如果 df 的每个
列最初都是 float ，则新的 dtypes 仍将是
float 。但是，如果单个列是 int 或 object ，似乎
是新的 dtypes 将 all 全部为 int 。

Unfortunately the dtype situation is a bit fuzzy because every column in the resulting dataframe will have the same dtype. If every column of df was originally float, the new dtypes will still be float. But if a single column was int or object, it seems that the new dtypes will all be int.

这篇关于如何将现有Pandas DataFrame的所有值设置为零？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何将现有Pandas DataFrame的所有值设置为零？ [英] How to set all the values of an existing Pandas DataFrame to zero?

问题描述

推荐答案

时间比较

设置

小型DataFrame

Small DataFrame

大型DataFrame

Large DataFrame

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何将现有Pandas DataFrame的所有值设置为零？ [英] How to set all the values of an existing Pandas DataFrame to zero?

问题描述

推荐答案

时间比较

设置

小型DataFrame

Small DataFrame

大型DataFrame

Large DataFrame

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭