If you want to transliterate Chinese characters to Roman / Latin alphabet using Talend, then you may find this blog helpful.
I will show you how to build a simple Talend job that converts some Chinese characters to the English readable representation using a 3rd party library that uses the Pinyin conversion standard.
You will need to download the jar pinyin4j-2.5.0.jar from: https://mvnrepository.com/artifact/ruiyun/pinyin4j/2.5.0
Create a new Talend DI a job and begin with adding a tLibraryLoad.
Configure the Basic settings (specify the path of pinyin4j-2.5.0.jar).
In the advanced settings specify the functions to import. I have loaded all of them even though some are not required for this example.
import net.sourceforge.pinyin4j.PinyinHelper; import net.sourceforge.pinyin4j.format.HanyuPinyinCaseType; import net.sourceforge.pinyin4j.format.HanyuPinyinOutputFormat; import net.sourceforge.pinyin4j.format.HanyuPinyinToneType; import net.sourceforge.pinyin4j.format.HanyuPinyinVCharType; import net.sourceforge.pinyin4j.format.exception.BadHanyuPinyinOutputFormatCombination;
Join the tLibraryLoad to a tFixedFlowInput.
Create a new column and call it ‘Name’.
Insert some Chinese characters to test e.g. “你好,世界”
Join the tFixedFlowInput to a tJavaRow, sync the columns and then configure as follows:
HanyuPinyinOutputFormat defaultPinyinFormat = new HanyuPinyinOutputFormat(); defaultPinyinFormat.setCaseType(HanyuPinyinCaseType.LOWERCASE); defaultPinyinFormat.setToneType(HanyuPinyinToneType.WITHOUT_TONE); output_row.Name = input_row.Name;
Now join the tJavaRow to a tMap.
Create a new output with a column ‘Name’ and map the input to the output.
In the Expression editor use the Pinyin Library function PinyinHelper to convert the string.
PinyinHelper.toHanyuPinyinString(row2.Name,defaultPinyinFormat,"")
Join the output from the tMap to a tLogRow and Run the job.
You should now see in the log window, the transliterated string