Synthetic data will become the main source of AI training by 2028.

tanjimajuha20 · Post by **tanjimajuha20** » Sun Jan 19, 2025 6:38 am

Sberbank PJSC reported that it has developed a preliminary draft of a national data synthesis standard together with the Big Data Association (BD). According to the report, the standard should describe the technology by which "confidential synthetic data for the development of artificial intelligence (AI) technologies" will be created.

"Privacy must be maintained at all stages of the synthesis process, which is based on the differential privacy method. The document provides mathematical proof that, when following the recommendations of the standard, it is possible to synthesize data without the risk of privacy violation. In fact, switzerland telegram security is ensured by finding the optimal balance between privacy protection and the quality of the resulting datasets," the report says.

Read also

The main source of data for training artificial intelligence by 2028 will be synthetic data (60% of the total volume) and data from IoT sensors (27%). More than a third of the data will be generated using cloud computing. At the same time, it is business and the state that will be the main data producer, not the population.

According to Kirill Menshov, Senior Vice President and Head of the Technology Block at Sberbank, the standard should create conditions for further development of the AI sphere in Russia. He noted that researchers lack available data and this is becoming a major obstacle to the implementation of AI technologies in various sectors of the economy. In his opinion, synthetic data will play a major role in the development of artificial intelligence.

ABD President Anna Serebryannikova noted that the new national standard will ensure transparency of the synthesis process, reliability of the architecture and will define the criteria for data quality. According to her, synthetic data is becoming a real alternative to impersonal data, which is constrained by excessive restrictions of regulators.

"If privacy requirements are met, synthetic data does not carry risks and opens a breakthrough path to achieving the goals of data availability necessary for training artificial intelligence. We hope that with the introduction of a national data synthesis standard, we will be able to meet such requirements and introduce synthetic data into wide circulation in our country," she said.