talend 的 utf8mb4 设置 - 不起作用 [英] utf8mb4 setting for talend - not working

查看:43
本文介绍了talend 的 utf8mb4 设置 - 不起作用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在将数据从 sql server 迁移到 mysql.我正在使用工具 Talend(ETL).

当我在源 (sql server) 中有表情符号时,问题就出现了,它没有插入到 mysql 的表中.所以,我知道我必须在 mysql 端使用 utf8mb4.

必须设置客户端设置字符编码,才能插入笑脸.数据库、表和服务器都在utf8mb4上

但是,客户端,即 talend 不是 utf8mb4.那么我在哪里设置这个?

我尝试在 tmysqloutput 的附加参数中使用set names utf8mb4".但这不起作用

我已经坚持了好几天了,对此的任何帮助将不胜感激

更新:

现在的工作看起来像这样.但是,表情符号仍然被导出为 '?'

谢谢拉蒂

解决方案

首先,确保您的服务器已正确配置为使用 utf8mb4.按照这个

更新

在 Talend 6.3.1 中使用原生 MySQL 组件,你得到 mysql-connector-java-5.1.30-bin.jar,它应该自动检测服务器使用的 utf8mb4,但是出于某种原因(错误?)它没有这样做.
我切换到使用 JDBC 组件,并下载了最新的 mysql

I am migrating the data from sql server to mysql. I am using the tool Talend(ETL) for the same.

The problem comes when I have emojis in the source (sql server) , it does not get inserted to the table in mysql. So, I know I must use utf8mb4 on mysql side.

The client settings character encoding has to be set, for the smileys to get inserted. The database, tables and the server are all on utf8mb4

But, the client i.e., talend is not utf8mb4. So where do I set this?

I tried with 'set names utf8mb4' in additional parameters of tmysqloutput. But this does not work

I have been stuck on this for days, any help on this would be greatly appreciated

Update :

The job looks like this now. But, the smileys are still getting exported as '?'

Thanks Rathi

解决方案

First, make sur that your server is properly configured to use utf8mb4. Following this tutorial, you need to add the following to your my.cnf (or my.ini if you're on Windows):

[client]
default-character-set = utf8mb4

[mysql]
default-character-set = utf8mb4

[mysqld]
character-set-client-handshake = FALSE
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci

That tells MySQL server to use utf8mb4 and ignore any encoding set by client.
After that, I didn't need to do set any additional properties on the MySQL connection in Talend. I've executed this query in Talend to check the encoding set by it :

SHOW VARIABLES 
WHERE Variable_name LIKE 'character\\_set\\_%' OR Variable_name LIKE 'collation%'

And it returned:

|=-----------------------+-----------------=|
|Variable_Name           |Value             |
|=-----------------------+-----------------=|
|character_set_client    |utf8mb4           |
|character_set_connection|utf8mb4           |
|character_set_database  |utf8mb4           |
|character_set_filesystem|binary            |
|character_set_results   |                  |
|character_set_server    |utf8mb4           |
|character_set_system    |utf8              |
|collation_connection    |utf8mb4_unicode_ci|
|collation_database      |utf8mb4_unicode_ci|
|collation_server        |utf8mb4_unicode_ci|
'------------------------+------------------'

The following test to insert a pile of poop works:

Update

Using native MySQL components in Talend 6.3.1, you get mysql-connector-java-5.1.30-bin.jar, which is supposed to automatically detect the utf8mb4 used by the server, but for some reason (bug?) it isn't doing that.
I switched to using JDBC components, and downloaded the latest mysql connector (mysql-connector-java-5.1.45-bin.jar), I got it working by setting these additional parameters on the tJDBCConnection component :

useUnicode=true&characterEncoding=utf-8

(even if I'm specifying utf-8, the doc says it will treat it as utf8mb4)

Here's what my job looks like now :

这篇关于talend 的 utf8mb4 设置 - 不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆