男女羞羞视频在线观看,国产精品黄色免费,麻豆91在线视频,美女被羞羞免费软件下载,国产的一级片,亚洲熟色妇,天天操夜夜摸,一区二区三区在线电影
Global EditionASIA 中文雙語Fran?ais
China

Polluted data poses risk to AI safety, ministry says

By Zou Shuo | China Daily | Updated: 2025-08-06 00:00
Share
Share - WeChat

The Ministry of State Security issued a stark warning on Tuesday about artificial intelligence security risks stemming from contaminated training data, calling it a fundamental challenge to AI safety.

In an article published on its official WeChat account, the ministry said AI data sources are often polluted by mixed-quality content containing false information, fabricated narratives and biased viewpoints. As AI is increasingly integrated into China's socioeconomic sectors, such contamination poses risks to high-quality development and national security, it said.

Data serves as the essential foundation for AI systems, providing the raw material for models to learn patterns, make decisions and generate content, the ministry said. It warned that compromised data quality directly undermines model reliability. Citing research, it noted that even a small contamination level — such as 0.01 percent of false text — can increase harmful outputs by 11.2 percent.

The ministry also highlighted the danger of "recursive pollution", in which false content generated by AI becomes part of training datasets for future models, leading to compounding errors. Real-world risks include financial market manipulation through fabricated information, public panic triggered by misinformation and life-threatening medical misjudgments from corrupted diagnostic algorithms, it said.

To counter these threats, the ministry proposed stricter source supervision under current cybersecurity and data protection laws, comprehensive risk assessments and systematic data-cleansing frameworks. It pledged to collaborate with relevant agencies to safeguard AI and data security under China's national security framework.

Zhang Xi, deputy dean and professor at the School of Cyberspace Security at the Beijing University of Posts and Telecommunications, said China faces particular vulnerability due to a shortage of high-quality Chinese-language training data. Chinese data makes up only 1.3 percent of global large-model datasets, he said.

This scarcity, along with copyright restrictions and inadequate data infrastructure, has forced domestic developers to rely on lower-quality sources such as machine-translated or synthetic content, which worsens data pollution and hinders progress in Chinese AI development, he said.

Zhang cited the GPT-3 model, which was trained on 750 gigabytes of data, and China's DeepSeek-V3 model, trained on 14.8 trillion high-quality text fragments. These datasets are drawn from massive libraries of books, academic papers, online texts and code. But due to their scale, manual inspection is neither feasible nor cost-effective, making data contamination an increasingly serious bottleneck, he said.

Polluted training data also creates unpredictable risks in high-stakes fields such as medicine, autonomous driving and national defense, Zhang said. He cited a study in which the insertion of 5,000 fabricated medical records raised misdiagnosis rates by 73 percent. In another example, inserting three manipulated image frames caused autonomous vehicles to mistake pedestrians for garbage bags, leading to 92 percent collision rates in testing.

Zhang also warned of malicious data poisoning campaigns, in which adversarial actors inject content contrary to China's core socialist values. He pointed to foreign-developed models that generated separatist content related to the Xizang autonomous region as an example.

To protect data sovereignty, Zhang advocated for greater investment in domestic data collection and the establishment of national public data platforms. He also called for legal mechanisms to criminalize malicious data poisoning and assign liability for data contamination caused by negligence, with responsibilities clarified for developers, data providers and operators.

Shen Yang, a professor at Tsinghua University's School of Journalism and Communication and College of AI, defined AI data pollution as the inclusion of erroneous, incomplete, biased or deliberately manipulated content in training data.

This fundamentally weakens AI models' comprehension, judgment and output reliability, he said.

Shen compared polluted training data to "cooking with spoiled ingredients".

He said malicious actors may seek to manipulate AI on sensitive topics, mislead the public, undermine competitors or probe vulnerabilities in AI systems. While such acts are usually isolated rather than coordinated conspiracies, their cumulative impact can erode public trust in AI, he said.

For the general public, Shen said it is essential to understand that AI-generated content can shape — or distort — their perception of reality. "They need to see through the logic behind AI, just like identifying the motives behind people's words," he said.

 

Today's Top News

Editor's picks

Most Viewed

Top
BACK TO THE TOP
English
Copyright 1995 - . All rights reserved. The content (including but not limited to text, photo, multimedia information, etc) published in this site belongs to China Daily Information Co (CDIC). Without written authorization from CDIC, such content shall not be republished or used in any form. Note: Browsers with 1024*768 or higher resolution are suggested for this site.
License for publishing multimedia online 0108263

Registration Number: 130349
FOLLOW US
主站蜘蛛池模板: 元朗区| 广南县| 平潭县| 仙居县| 彩票| 庆安县| 恩施市| 镇康县| 黔东| 珠海市| 泰和县| 绥棱县| 泸定县| 克东县| 渭源县| 珲春市| 康乐县| 定兴县| 庆安县| 什邡市| 科技| 甘德县| 原平市| 八宿县| 亚东县| 延津县| 元氏县| 新化县| 兴安县| 龙江县| 石嘴山市| 青龙| 灵宝市| 平潭县| 且末县| 太谷县| 吉安市| 祁东县| 张家港市| 石楼县| 龙泉市| 阜康市| 武鸣县| 呈贡县| 惠来县| 托克逊县| 淄博市| 三台县| 聂拉木县| 黄浦区| 富裕县| 樟树市| 奉化市| 深圳市| 乐清市| 马鞍山市| 张家口市| 香河县| 罗平县| 金乡县| 玉山县| 台江县| 桂阳县| 尖扎县| 蒙阴县| 乌审旗| 绩溪县| 望城县| 南阳市| 曲沃县| 卢龙县| 大名县| 库尔勒市| 原阳县| 柳江县| 乌拉特后旗| 萝北县| 赫章县| 肥东县| 财经| 信丰县| 贵定县|