Using a Pinyin java library in Talend to transliterate Chinese to English

If you want to transliterate Chinese characters to Roman / Latin alphabet using Talend, then you may find this blog helpful.

I will show you how to build a simple Talend job that converts some Chinese characters to the English readable representation using a 3rd party library that uses the Pinyin conversion standard.

You will need to download the jar pinyin4j-2.5.0.jar from: https://mvnrepository.com/artifact/ruiyun/pinyin4j/2.5.0

Create a new Talend DI a job and begin with adding a tLibraryLoad.

Configure the Basic settings (specify the path of pinyin4j-2.5.0.jar).

In the advanced settings specify the functions to import. I have loaded all of them even though some are not required for this example.

import net.sourceforge.pinyin4j.PinyinHelper;

import net.sourceforge.pinyin4j.format.HanyuPinyinCaseType;

import net.sourceforge.pinyin4j.format.HanyuPinyinOutputFormat;

import net.sourceforge.pinyin4j.format.HanyuPinyinToneType;

import net.sourceforge.pinyin4j.format.HanyuPinyinVCharType;

import net.sourceforge.pinyin4j.format.exception.BadHanyuPinyinOutputFormatCombination;

Join the tLibraryLoad to a tFixedFlowInput.

Create a new column and call it ‘Name’.

Insert some Chinese characters to test e.g. “你好,世界”

Join the tFixedFlowInput to a tJavaRow, sync the columns and then configure as follows:

HanyuPinyinOutputFormat defaultPinyinFormat = new HanyuPinyinOutputFormat();
defaultPinyinFormat.setCaseType(HanyuPinyinCaseType.LOWERCASE);
defaultPinyinFormat.setToneType(HanyuPinyinToneType.WITHOUT_TONE);

output_row.Name = input_row.Name;

Now join the tJavaRow to a tMap.

Create a new output with a column ‘Name’ and map the input to the output.

In the Expression editor use the Pinyin Library function PinyinHelper to convert the string.

PinyinHelper.toHanyuPinyinString(row2.Name,defaultPinyinFormat,"")

Join the output from the tMap to a tLogRow and Run the job.

You should now see in the log window, the transliterated string

Using SSL certificates in a Talend integration job – tSetKeystore

 If you are wanting to use Talend to integrate with web services that require SSL, you will need to use the tSetKeystore component. In this blog I will show you how to import your .cer / .crt files into a .jks (Java KeyStore) and import into your Talend job.

Start with downloading and installing KeyStore Explorer.

Launch KeyStore Explorer and select ‘Create a new KeyStore’

In the selection box that opens, select ‘JKS’

From the menu bar select the red rosette icon (Import Trusted Certificate)

Navigate to the .cer / .crt file and click ‘Open’

Save your JKS. At password prompt enter a password for your JKS (optional).

In the tSetKeystore component settings within your Talend job enter the location of your .JKS and the password (if applicable).

Now, any connections you make to web service’s will use the SSL certificate contained in the Java KeyStore file. For example when using tSOAP, tREST, tESBConsumer  

If its all gone to plan you should no longer receive the error:

java.lang.Exception: nulljavax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed

TIP: if using other components in your job that that connect to other web services or SMTP server eg tSendMail, you will need to use a separate child job and select “Use an independent process to run subjob”. This is because the SSL certificates in your keystore are used for every connection.