China’s response to Chat GPT sheds its first lines

China’s censorship regime requires Baidu and other Internet companies to block certain websites and avoid politically sensitive content. Words or phrases that should be banned in response to protests or during special events can be quickly updated.

But the threat of censorship doesn’t seem to have slowed the growth of big language models in China, said Jeffrey Ding, an assistant professor who studies China’s technology industry at Georgetown University. Baidu has made the Ernie language model that powers its new bot available via API for some time, and noted that other companies have offered similar models.

Baidu did not provide details of the Ernie bot’s training data, but it was probably scrapped from the Chinese internet. This means bot fodder is largely regulated by China’s censorship laws, which aim to limit criticism of the government, for example.

Censorship may affect Chinese chatbots in more subtle ways. In the year An academic research project that trained algorithms on the Chinese-language Wikipedia in 2021 and Baidu’s Baike, which was banned in China and a crowdsourced encyclopedia thought to be censored by the government, found that using censored training data, AI software significantly changed the assigned meaning. to different words.

An algorithm trained on the Chinese-language Wikipedia linked the words “democracy” to positive words such as “stability.” The algorithm, trained on the censored Baike material, is closer to the Chinese government’s policy of “democracy” than “anarchy.” But since chatbots like ChatGPT can be very flexible and mix materials in their training data, Baidu had to introduce additional safeguards.

Despite its mixed reception, Ernie Bot looks like a worthy contender for ChatGPT. The bot is currently only available to a limited number of users, and some say they’re impressed. Although it can communicate in Chinese, ChatGPT is not available in China.

Lei Li, a professor at UC Santa Barbara who previously worked on the technology to build some of the machine learning behind Ernie Bot, noted that Baidu has been working on the underlying technology for about a decade. Microsoft, on the other hand, has licensed Bing’s new chatbot core technology and invested billions of dollars in acquiring exclusive rights to its creations from OpenAI’s text generation features for Office.

Lee also said he was impressed with the ability to generate some of the stories and business reports the Ernie bot could do. The illusion problem, he adds, is a challenge to all such language models. “This is where researchers still have work to do,” he said.

A WeChat poster compared the Chinese bot’s rendering capabilities to ChatGPT, finding it better and in some cases more accurate at handling Chinese idioms. For example, chatgpty mistakenly says the author of the science fiction Ancestral House is Liu Cixin. The three part problem, it’s Hubei, but Ernie Bott answers Heenan correctly. ChatGPT is banned in China, but many people have found ways to access it.

