使用awk或其他替换文件中的整个字段值 [英] Replace an entire field value in a file using awk or other

查看:217
本文介绍了使用awk或其他替换文件中的整个字段值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从postgresql表中导出了多个字段,包括boolean(由postgresql导出为 t f 字符),我需要将其导入另一个无法将t / f理解为bool值的数据库(monetdb)。



编辑删除空间以反映真实的文件外观并避免发怒的注释-以前显示了空格)

  id | val_str | bool_1 | bool2 | bool_3 | bool4 | 
1 |帮助| t | t | f | t |
2 |测试| f | t | f | f |
...

由于我无法替换所有 t的情况 / f 我需要将字段分隔符集成到我的模式中。
我尝试使用 awk 将字段 t 替换为 TRUE f FALSE

  awk -F'|''{gsub(/ \ | t\ | /, | TRUE |); gsub(/ \ | f\ | /, | FALSE |); print;}'

这部分起作用,因为具有相同值的连续字段( | t | t | )将仅替换第一个匹配项( | TRUE | t | -因为第二个匹配项实际上是 t | 而不是 | t | )。

  id | val_str | bool_1 | bool2 | bool_3 | bool4 | 
1 |帮助| TRUE | t | FALSE | TRUE |
2 | test | FALSE | TRUE | FALSE | f |
...

表有〜450列,因此我无法真正指定列表要替换的列,也不能在postgres中工作以转换布尔列(我可以但是...)。



我可以运行 gsub()两次,但是我一直在寻找一种更优雅的方式来匹配所有字段的整个字段内容。



gsub(/ ^ t $ /,...)也不起作用,因为大多数时候我们都处于一行中间。

解决方案




表有〜450列,所以我不能真正指定
列的列表替换,也不在postgres中工作以转换布尔列(我
可以,但是...)。


让Postgres为您完成工作。生成 SELECT 列表的基本查询:

  SELECT string_agg(当atttypid ='bool':: regtype 
THEN quote_ident(attname)||':: text'
ELSE quote_ident(attname)END,','按attnum排序)的情况b $ b从pg_attribute
attrelid ='mytable':: regclass-在这里提供表名
AND attnum> 0
且未消失;

产生以下形式的字符串:

  col1, CoL 2,bool1 :: text, Bool 2 :: text 

所有标识符均已正确转义。列按默认顺序排列。复制并执行它。使用 COPY 导出到文件。 (或者psql中的 \copy 。)性能与导出普通表大致相同。如果不需要大写,请省略 upper()



为什么简单地转换为文本足够?





关于 regclass 并正确转义标识符:





如果需要带有 TRUE / FALSE / NULL 为大写,标准SQL强制转换表示法(无冒号 :: ),仍然是原始的列名,也许还有模式限定的表名:

  SELECT'选择’
|| string_agg(当atttypid ='bool':: regtype
THEN format('upper(cast(%1 $ I AS text))AS%1 $ I',attname)
ELSE quote_ident(attname)的情况END,','按attnum命令)
|| ‘FROM myschema.mytable;’-现在两次提供表名
FROM pg_attribute
WHERE attrelid =‘myschema.mytable’:: regclass
AND attnum> 0
且未消失;

产生以下形式的完整语句:



< pre class = lang-sql prettyprint-override> SELECT col1, CoL 2,upper(cast(bool1 AS文本)AS bool1,upper(cast( Bool 2 AS文本))AS 布尔2,来自myschema.mytable;


I have an export from postgresql table with multiple fields, including boolean (exported by postgresql as t and f character), and I need to import it in another database (monetdb) that won't understand t/f as bool values.

(EDIT removed spaces to reflect true file aspect and avoid angry comments - previously there was spaces displayed)

id|val_str|bool_1|bool2|bool_3|bool4|
1|help|t|t|f|t|
2|test|f|t|f|f|
...

As I cannot replace all occurence of t/f I need to integrate the field separator in my pattern. I tried to use awk to replace fields t with TRUE and f with FALSE:

awk -F'|' '{gsub(/\|t\|/, "|TRUE|"); gsub(/\|f\|/, "|FALSE|"); print;}' 

This is working partially, as consecutive fields with a same value (|t|t|) will have only the first occurrence replaced (|TRUE|t| - as 2nd occurence is in fact t| and not |t|).

id|val_str|bool_1|bool2|bool_3|bool4|
1|help|TRUE|t|FALSE|TRUE|
2|test|FALSE|TRUE|FALSE|f|
...

Table has ~450 columns so I can't really specify the list of columns to be replaced, nor work in postgres to 'transform' boolean columns (I could but ...).

I could run the gsub() twice, but I was looking for more elegant way to match the entire field content for all fields.

gsub(/^t$/, ...) is not helping either as we are in the middle of a line most of the time.

解决方案

Table has ~450 columns so I can't really specify the list of columns to be replaced, nor work in postgres to 'transform' boolean columns (I could but ...).

You can let Postgres do the work for you. Basic query to produce the SELECT list:

SELECT string_agg(CASE WHEN atttypid = 'bool'::regtype
                       THEN quote_ident(attname) || '::text'
                       ELSE quote_ident(attname) END, ', ' ORDER BY attnum)
FROM   pg_attribute
WHERE  attrelid = 'mytable'::regclass  -- provide table name here
AND    attnum > 0
AND    NOT attisdropped;

Produces a string of the form:

col1, "CoL 2", bool1::text, "Bool 2"::text

All identifiers are escaped properly. Columns are in default order. Copy and execute it. Use COPY to export to file. (Or \copy in psql.) Performance is about the same as exporting a plain table. If you don't need upper case omit upper().

Why is a simple cast to text enough?

About regclass and escaping identifiers properly:

If you need a complete statement with TRUE / FALSE / NULL in upper case, standard SQL cast notation (without colons ::), still original column names and maybe a schema-qualified tablename:

SELECT 'SELECT '
     || string_agg(CASE WHEN atttypid = 'bool'::regtype
                        THEN format('upper(cast(%1$I AS text)) AS %1$I', attname)
                        ELSE quote_ident(attname) END, ', ' ORDER BY attnum)
     || ' FROM myschema.mytable;'           -- provide table name twice now
FROM   pg_attribute
WHERE  attrelid = 'myschema.mytable'::regclass
AND    attnum > 0
AND    NOT attisdropped;

Produces a complete statement of the form:

SELECT col1, "CoL 2", upper(cast(bool1 AS text) AS bool1, upper(cast("Bool 2" AS text)) AS "Bool 2" FROM myschema.mytable;

这篇关于使用awk或其他替换文件中的整个字段值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆