MICROSOFT has released new AI that can make creepy videos of people using just one photo – but won’t release tool over impersonation fears.
The technology can create synchronised animated clips of a person talking or singing with a single snap of their face and an audio track.
4
The new AI tech can animate a single image into a realistic video with audio synchingCredit: ARS Technica
4
Microsoft has refused to release the codes over fears of impersonationCredit: ARS Technica
4
A number of competitors are working on similar techCredit: ARS Technica
4
Microsoft has denied it is looking to enhance deepfake technologyCredit: ARS Technica
The computer giant’s Research Asia team unveiled the VASA-1 model this week and say in future it could even power virtual avatars that appear to say whatever the creator wants.
“It paves the way for real-time engagements with lifelike avatars that emulate human conversational behaviours,” says an accompanying research paper.
VASA – short for Visual Affective Skills Animator – can analyse a static image alongside audio to generate a realistic video with lip syncing, facial expressions and head movements.
It can’t, however, clone or simulate voices like other Microsoft research.
Read more technology news
The company – co-founded by billionaire Bill Gates – claims the model is a significant improvement on previous speech animation methods in terms of realism, expressiveness and efficiency.
In February, an AI model called EMO: Emote Portrait Alive from Alibaba’s Institute for Intelligent Computing research group used a similar approach to VASA-1 called Audio2Video.
Microsoft researchers trained their tech on the VoxCeleb2 dataset created in 2018 by a team from the University of Oxford.
That dataset claims to hold over a million “utterances” from 6,112 celebrities taken from videos uploaded to YouTube.
VASA-1 can reportedly generate videos at a resolution and frame rate that would not look out of place if used in realtime applications like video conferencing.
A…
..