Artificial intelligence at the service of deception: deepfakes and vishing

The advancement of new technologies has allowed cybercriminals to improve their techniques to defraud victims; what role does AI play? How can we take care of ourselves?

Isbel Contents
September 28, 2023

Home " Blog " Networking and Security " Artificial intelligence at the service of deception: deepfakes and vishing

The new era of social engineering?

A 2020 report by the United Nations Interregional Crime and Justice Research Institute's AI and Robotics Center identified trends in underground forums regarding the abuse of artificial intelligence (AI) ormachine learning that could gain strong momentum in the near future. Among these trends, it highlighted human impersonation on social media platforms. Today, this future is already a reality, and everything that could have been anticipated is happening and has surpassed expectations.

At the intersection between technological innovation and criminal tactics, worrying phenomena arise such as the use of deepfakes and vishing(voice phishing ).

Both modalities are modern versions of impostor scams (or, as they are also known in Uruguay, "el cuento del tío"). This is a tactic in which the swindler impersonates a person, usually very close to the victim, to deceive him/her in search of money.

At 2022In 2022, scams of this type were the second most reported category of fraud in the United States, with losses of $2.6 billion.

We could consider these scams to be modern uses of social engineering. According to Knowbe4, the world's first phishing and security awareness training platform, social engineering is defined as "the art of manipulating, influencing or tricking the user to take control of their system".

What role does AI play in this scheme? In recent years, cybercriminals have perfected the use of AI to build more truthful and agile attacks, which increases their chances of making gains in a shorter period. In turn, it allows them to target new targets and develop more innovative criminal approaches, while minimizing the likelihood of detection.

Creating false realities: deepfake and vishing

The term "deepfake" encapsulates the fusion of two concepts: "deep learning" and "fake". It refers to a sophisticated AI-driven technique that enables the creation of multimedia content that appears to be authentic, but is fictitious.

Using deep learning algorithms, deepfakes can superimpose faces on videos, alter a person's speech into audio and even generate realistic images of individuals who never existed.

This concept dates back to the 2010s. One of the first deepfake videos to circulate on the internet was "Face2Face," published in 2016 by a group of researchers at Stanford University who sought to demonstrate the technology's ability to manipulate facial expressions.

From two resources, the facial recording of a source actor (role occupied by the researchers) and that of a target actor (presidents such as Vladimir Putin or Donald Trump), the academics managed to reconstruct the facial expressions of the target actors with the expressions of the source actors, in real time and maintaining synchronization between voice and lip movement.

Another deepfake content with great significance was a video of former President Obama in which we heard him say: "We are entering an era in which our enemies can make anyone say anything at any time". And, in fact, the reality is that it was not Obama who uttered those words, but his deepfake.

For its part, "vishing", an abbreviation of "voice phishing", represents an intriguing and dangerous variant of classic phishing. Instead of sending deceptive e-mails, fraudsters call by phone to trick their victims. Using AI-based voice generation software, criminals can mimic the pitch, timbre and resonance of the voice from an audio sample of as little as 30 seconds, something they can easily access on social networks.

Two cases that set off alarm bells

Since these early examples, deepfake technology has experienced rapid advancement and widespread diffusion in recent years, to such an extent that it has even attracted the attention of the FBI.

In early 2023, the U.S. agency issued an alert after noticing an increase in reports of fake adult videos, "starring" victims from images or videos that criminals obtained from their social networks.

Deepfake live via video call

In this context, over the past year, the Chinese authorities have stepped up their vigilance and retaliation following the revelation of an AI fraud.

The incident took place on April 20, 2023 in Baotou City, Inner Mongolia region. A man surnamed Guo, an executive of a technology company in Fuzhou, Fujian province, received a video call via WeChat, a popular messaging service in China, from a friend asking for help.

The perpetrator used AI-powered face-swapping technology to impersonate the victim's friend. Guo's "friend" mentioned that he was participating in a bidding in another city and needed to use the company's account to submit a bid of 4.3 million yuan (approximately USD 622,000). During the video call, he promised to make the payment immediately and provided a bank account number for Guo to make the transfer.

Unsuspecting, Guo transferred the full amount and then called his real friend to confirm that the transfers had been successful. That's when he got an unpleasant surprise: his real friend denied having had a video call with him, let alone having asked him for money.

Fake voices with AI

In terms of vishing cases, in 2019 a media outlet reported for the first time on a case of AI-driven voice fraud. The Wall Street Journal broke the news of the scam in which the British CEO of an energy company fell for the sum of 220,000 euros.

These individuals managed to create a voice so similar to that of the head of the German parent company that none of their colleagues in the UK could detect the fraud. As the company's insurance firm explained, the caller claimed the request was urgent and instructed the CEO to make the payment within the hour. The CEO, hearing the familiar subtle German accent and voice patterns of his boss, suspected nothing.

It is presumed that the hackers used commercial voice generation software to carry out the attack. The British executive followed the instructions and transferred the money, which was followed by swift action by the criminals, who moved the funds from the Hungarian account to different locations. 

How can we improve our online security?

The evolution of technology in recent years challenges the authenticity of images, audios and videos. Thus, it is essential to reinforce the care taken in the ways in which we communicate at a distance, through any modality.

As we have seen, the main target of social engineering is people. For this reason, the main security measure to avoid falling victim to attacks of this type should focus on the user's actions.

When faced with calls or messages with requests that seem strange, even if they come from people close to you and present a credible story (whether or not through a means of frequent contact), we should question and be suspicious. A best practice is to ask personal questions at the time, the answer to which only that person would know.

But users are not the only ones who can be fooled by vishing or deepfakes: it is also possible to breach facial or voice authentication systems. For some years now, the ISO 30107which establishes principles and methods for evaluating mechanisms for detecting presentation attacks (PAD), those aimed at falsifying biometric data (such as voice or face).

Daniel Alano, Isbel's information security management specialist, pointed out that one way to improve our online security is to use ISO 30107 certified applications. Alano explained that "this is the standard used to measure whether you are vulnerable to phishing attacks", although he warned that it is not infallible.

If you want to go deeper into cybercrime stories, we invite you to listen to Malicious, our podcast about the cyberattacks that paralyzed the world.