• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

holvasti

Data integration services

  • About
  • Blog
  • Contact
You are here: Home / Talend / Using a Pinyin java library in Talend to transliterate Chinese to English

Talend · 21st February 2018

Using a Pinyin java library in Talend to transliterate Chinese to English

If you want to transliterate Chinese characters to Roman / Latin alphabet using Talend, then you may find this blog helpful.

I will show you how to build a simple Talend job that converts some Chinese characters to the English readable representation using a 3rd party library that uses the Pinyin conversion standard.

You will need to download the jar pinyin4j-2.5.0.jar from: https://mvnrepository.com/artifact/ruiyun/pinyin4j/2.5.0

Create a new Talend DI a job and begin with adding a tLibraryLoad.

Configure the Basic settings (specify the path of pinyin4j-2.5.0.jar).

In the advanced settings specify the functions to import. I have loaded all of them even though some are not required for this example.

import net.sourceforge.pinyin4j.PinyinHelper;

import net.sourceforge.pinyin4j.format.HanyuPinyinCaseType;

import net.sourceforge.pinyin4j.format.HanyuPinyinOutputFormat;

import net.sourceforge.pinyin4j.format.HanyuPinyinToneType;

import net.sourceforge.pinyin4j.format.HanyuPinyinVCharType;

import net.sourceforge.pinyin4j.format.exception.BadHanyuPinyinOutputFormatCombination;

Join the tLibraryLoad to a tFixedFlowInput.

Create a new column and call it ‘Name’.

Insert some Chinese characters to test e.g. “你好,世界”

Join the tFixedFlowInput to a tJavaRow, sync the columns and then configure as follows:

HanyuPinyinOutputFormat defaultPinyinFormat = new HanyuPinyinOutputFormat();
defaultPinyinFormat.setCaseType(HanyuPinyinCaseType.LOWERCASE);
defaultPinyinFormat.setToneType(HanyuPinyinToneType.WITHOUT_TONE);

output_row.Name = input_row.Name;

Now join the tJavaRow to a tMap.

Create a new output with a column ‘Name’ and map the input to the output.

In the Expression editor use the Pinyin Library function PinyinHelper to convert the string.

PinyinHelper.toHanyuPinyinString(row2.Name,defaultPinyinFormat,"")

Join the output from the tMap to a tLogRow and Run the job.

You should now see in the log window, the transliterated string

Filed Under: Talend

Previous Post: « Using SSL certificates in a Talend integration job – tSetKeystore
Next Post: Using the Karaf client to uninstall Talend Routes and Services »

Primary Sidebar

Latest Posts

  • Using the Karaf client to uninstall Talend Routes and Services
  • Using a Pinyin java library in Talend to transliterate Chinese to English
  • Using SSL certificates in a Talend integration job – tSetKeystore

Archives

Categories

  • Latest News
  • Talend
  • Tips

Copyright © 2023 Holvasti Ltd. · Website By Mathilde Gauvain · Log in