Abstract:
The emergence of a new generation of artificial intelligence, represented by ChatGPT, has sparked intense public debate regarding the governance of AI-related risks. Current academic research primarily focuses on the risks and regulatory frameworks of large-scale text generation models, while comparatively overlooking the more distinctive and potentially higher-risk domain of speech generation AI. The core stages of speech generation include data analysis, acoustic modeling, and waveform synthesis. A closer examination of these stages reveals several risk factors, including privacy breaches, violations of personality rights, and potential misuse of technology. To effectively prevent such risks, adjustments must be made at two levels: governance philosophy and regulatory measures. In terms of governance philosophy, the principle of purpose limitation should be upheld, emphasizing the contextual specificity of voice data processing. Moderate regulation should be pursued to prevent excessive intervention, with the overarching goal of fostering “trustworthy” AI—ensuring reliability and accountability throughout the of speech generation systems. Regarding regulatory measures, it is essential to first clarify the legal scope of “natural voice protection”, then delineate the responsibilities and authority of regulatory bodies, and establish internal risk assessment and management mechanisms. This ensures the protection of individuals’ rights and provides a sound legal basis for the use and dissemination of AI-generated speech content.