apache 气流:initdb 与 resetdb [英] apache airflow: initdb vs resetdb

查看:13
本文介绍了apache 气流:initdb 与 resetdb的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

airflow initdb"命令和airflow resetdb"命令之间究竟有什么区别?

What precisely is the difference between the "airflow initdb" command and the "airflow resetdb" command?

真的有必要有 2 个不同的命令吗?

Is it really necessary to have 2 different commands?

什么时候使用一种和另一种比较合适?

When is it appropriate to use one vs the other?

doc 说...

airflow initdb:初始化元数据数据库

airflow resetdb:烧毁并重建元数据数据库

airflow resetdb: Burn down and rebuild the metadata database

这并没有告诉我太多.

我最好的猜测是

airflow initdb 仅在第一次从 airflow.cfg 创建数据库时使用
如果需要对该配置进行任何更改,则将使用 airflow resetdb.

airflow initdb is to be used only the first time that the database is created from the airflow.cfg
airflow resetdb is to be used if any changes to that configuration are required.

当我运行它们时,sqlite 数据库上的时间戳都没有改变,但 resetdb 似乎噪音更大.

When I run them, neither changes the timestamp on the sqlite database but resetdb seems to be much noisier.

气流初始化数据库:

(.sandbox) [airflow@localhost airflow]$ airflow initdb
[2020-01-01 21:49:21,603] {settings.py:252} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=24917
DB: postgresql+psycopg2://airflow@localhost:5432/airflow_mdb
[2020-01-01 21:49:22,257] {db.py:368} INFO - Creating tables
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
Done.

气流重置数据库:

(.sandbox) [airflow@localhost airflow]$ airflow resetdb
[2020-01-01 21:49:46,579] {settings.py:252} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=25045
DB: postgresql+psycopg2://airflow@localhost:5432/airflow_mdb
This will drop existing tables if they exist. Proceed? (y/n)y
[2020-01-01 21:49:49,984] {db.py:389} INFO - Dropping tables that exist
[2020-01-01 21:49:50,062] {migration.py:154} INFO - Context impl PostgresqlImpl.
[2020-01-01 21:49:50,063] {migration.py:161} INFO - Will assume transactional DDL.
[2020-01-01 21:49:50,070] {db.py:368} INFO - Creating tables
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade  -> e3a246e0dc1, current schema
INFO  [alembic.runtime.migration] Running upgrade e3a246e0dc1 -> 1507a7289a2f, create is_encrypted
INFO  [alembic.runtime.migration] Running upgrade 1507a7289a2f -> 13eb55f81627, maintain history for compatibility with earlier migrations
INFO  [alembic.runtime.migration] Running upgrade 13eb55f81627 -> 338e90f54d61, More logging into task_instance
INFO  [alembic.runtime.migration] Running upgrade 338e90f54d61 -> 52d714495f0, job_id indices
INFO  [alembic.runtime.migration] Running upgrade 52d714495f0 -> 502898887f84, Adding extra to Log
INFO  [alembic.runtime.migration] Running upgrade 502898887f84 -> 1b38cef5b76e, add dagrun
INFO  [alembic.runtime.migration] Running upgrade 1b38cef5b76e -> 2e541a1dcfed, task_duration
INFO  [alembic.runtime.migration] Running upgrade 2e541a1dcfed -> 40e67319e3a9, dagrun_config
INFO  [alembic.runtime.migration] Running upgrade 40e67319e3a9 -> 561833c1c74b, add password column to user
INFO  [alembic.runtime.migration] Running upgrade 561833c1c74b -> 4446e08588, dagrun start end
INFO  [alembic.runtime.migration] Running upgrade 4446e08588 -> bbc73705a13e, Add notification_sent column to sla_miss
INFO  [alembic.runtime.migration] Running upgrade bbc73705a13e -> bba5a7cfc896, Add a column to track the encryption state of the 'Extra' field in connection
INFO  [alembic.runtime.migration] Running upgrade bba5a7cfc896 -> 1968acfc09e3, add is_encrypted column to variable table
INFO  [alembic.runtime.migration] Running upgrade 1968acfc09e3 -> 2e82aab8ef20, rename user table
INFO  [alembic.runtime.migration] Running upgrade 2e82aab8ef20 -> 211e584da130, add TI state index
INFO  [alembic.runtime.migration] Running upgrade 211e584da130 -> 64de9cddf6c9, add task fails journal table
INFO  [alembic.runtime.migration] Running upgrade 64de9cddf6c9 -> f2ca10b85618, add dag_stats table
INFO  [alembic.runtime.migration] Running upgrade f2ca10b85618 -> 4addfa1236f1, Add fractional seconds to mysql tables
INFO  [alembic.runtime.migration] Running upgrade 4addfa1236f1 -> 8504051e801b, xcom dag task indices
INFO  [alembic.runtime.migration] Running upgrade 8504051e801b -> 5e7d17757c7a, add pid field to TaskInstance
INFO  [alembic.runtime.migration] Running upgrade 5e7d17757c7a -> 127d2bf2dfa7, Add dag_id/state index on dag_run table
INFO  [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> cc1e65623dc7, add max tries column to task instance
INFO  [alembic.runtime.migration] Running upgrade cc1e65623dc7 -> bdaa763e6c56, Make xcom value column a large binary
INFO  [alembic.runtime.migration] Running upgrade bdaa763e6c56 -> 947454bf1dff, add ti job_id index
INFO  [alembic.runtime.migration] Running upgrade 947454bf1dff -> d2ae31099d61, Increase text size for MySQL (not relevant for other DBs' text types)
INFO  [alembic.runtime.migration] Running upgrade d2ae31099d61 -> 0e2a74e0fc9f, Add time zone awareness
INFO  [alembic.runtime.migration] Running upgrade d2ae31099d61 -> 33ae817a1ff4, kubernetes_resource_checkpointing
INFO  [alembic.runtime.migration] Running upgrade 33ae817a1ff4 -> 27c6a30d7c24, kubernetes_resource_checkpointing
INFO  [alembic.runtime.migration] Running upgrade 27c6a30d7c24 -> 86770d1215c0, add kubernetes scheduler uniqueness
INFO  [alembic.runtime.migration] Running upgrade 86770d1215c0, 0e2a74e0fc9f -> 05f30312d566, merge heads
INFO  [alembic.runtime.migration] Running upgrade 05f30312d566 -> f23433877c24, fix mysql not null constraint
INFO  [alembic.runtime.migration] Running upgrade f23433877c24 -> 856955da8476, fix sqlite foreign key
INFO  [alembic.runtime.migration] Running upgrade 856955da8476 -> 9635ae0956e7, index-faskfail
INFO  [alembic.runtime.migration] Running upgrade 9635ae0956e7 -> dd25f486b8ea, add idx_log_dag
INFO  [alembic.runtime.migration] Running upgrade dd25f486b8ea -> bf00311e1990, add index to taskinstance
INFO  [alembic.runtime.migration] Running upgrade 9635ae0956e7 -> 0a2a5b66e19d, add task_reschedule table
INFO  [alembic.runtime.migration] Running upgrade 0a2a5b66e19d, bf00311e1990 -> 03bc53e68815, merge_heads_2
INFO  [alembic.runtime.migration] Running upgrade 03bc53e68815 -> 41f5f12752f8, add superuser field
INFO  [alembic.runtime.migration] Running upgrade 41f5f12752f8 -> c8ffec048a3b, add fields to dag
INFO  [alembic.runtime.migration] Running upgrade c8ffec048a3b -> dd4ecb8fbee3, Add schedule interval to dag
INFO  [alembic.runtime.migration] Running upgrade dd4ecb8fbee3 -> 939bb1e647c8, task reschedule fk on cascade delete
INFO  [alembic.runtime.migration] Running upgrade c8ffec048a3b -> a56c9515abdc, Remove dag_stat table
INFO  [alembic.runtime.migration] Running upgrade 939bb1e647c8 -> 6e96a59344a4, Make TaskInstance.pool not nullable
INFO  [alembic.runtime.migration] Running upgrade 6e96a59344a4 -> 74effc47d867, change datetime to datetime2(6) on MSSQL tables
INFO  [alembic.runtime.migration] Running upgrade 939bb1e647c8 -> 004c1210f153, increase queue name size limit
(.sandbox) [airflow@localhost airflow]$ 

当然,您可以将数据库从 sqlite 移动到 postgres.
目前尚不清楚哪种情况适合这种情况.
也不清楚网络服务器和调度程序如何知道在哪里寻找配置?
也许他们先查看airflow.cfg 找出数据库在哪里,然后再查看数据库?这似乎是多余的.

Of course you might move database from say sqlite to postgres.
It is unclear which is appropriate for that circumstance.
It is also unclear how the webserver and scheduler know where to look for the configuration?
Perhaps they look in airflow.cfg first to find out where the database is and then look into the database? This seems redundant.

推荐答案

db reset 将从元数据数据库中删除所有条目.这包括所有 dag 运行、变量和连接.

db reset will delete all entries from the metadata database. This includes all dag runs, Variables and Connections.

db init 仅在安装了气流时运行一次.

db init is only run once, when airflow is installed.

通常我们不会太担心 dag 运行.但是重新创建变量和连接可能会很烦人,因为它们通常包含秘密和敏感数据,根据安全最佳实践,这些数据可能不会被复制.

Generally we aren't too worried about the dag runs. but the Variables and connections can be annoying to recreate as they often contain secret and sensitive data, which may not be duplicated as a matter of security best practice.

db init 也是幂等的,因此它可以根据您选择的频率运行,而无需担心数据库更改.

db init is also idempotent, so this can be run as often as you choose to, without needing to worry about the database changing.

这篇关于apache 气流:initdb 与 resetdb的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆