Voice changer (VC) converts the supply speaker’s speech waveform right into a speech waveform with the traits of the goal speaker whereas preserving linguistic data.
A current article on arXiv.org research the transformation of an organism’s voice from human to non-human. It converts the human voice right into a non-human creature-like voice whereas preserving linguistic data. This job may very well be utilized in movie show manufacturing or taking part in video video games.
The researchers proposed the duty of “speaking like a canine” for example of such duties and constructed a dataset and analysis standards. An experiment was carried out to check present consultant non-parallel VC strategies by way of acoustic options, community structure and coaching standards. Normal VC strategies can convert human voices to dog-like voices discretely, however preserving linguistic data is a problem.
This paper proposes a brand new voice conversion (VC) job from human voice to canine voice whereas preserving linguistic data for example of human speech conversion job. human to non-human (H2NH-VC). Though most analysis on VC entails human-to-human VC, the H2NH-VC goals to transform human voices into non-human creature-like voices. The non-parallel VC allowed us to develop the H2NH-VC, as a result of we couldn’t acquire a parallel dataset the place non-human organisms converse human languages. On this research, we suggest to make use of canines for example of a non-human organism goal area and outline the duty of “speaking like a canine”. To make clear the probabilities and traits of the “speak like a canine” job, we carried out a comparative take a look at utilizing the present consultant non-parallel VC strategies within the acoustic options. studying (Mel-cepstral and Mel-spectral coefficients), community structure (5 totally different kernels- dimension settings) and coaching standards (variable autoencoder (VAE) – based mostly on the aggressive and based mostly on adversarial networks). Lastly, the transformed voices have been evaluated utilizing common opinion scores: canine breed, sound high quality and readability, in addition to character error fee (CER). Testing confirmed that utilizing Mel spectroscopy improved the dog-likeness of the transformed voice, whereas preserving linguistic data was a problem. The challenges and limitations of present VC strategies for H2NH-VC are highlighted.
Analysis articles: Suzuki, Ok., Sakamoto, S., Taniguchi, T., and Kameoka, H., “Talking Like a Canine: Changing Voices from People to Non-People”, 2022. Hyperlinks: https://arxiv.org/abs/2206.04780