top of page

Detection of Deepfakes in Human Faces

Advances in deep learning have made it significantly easier to create convincing and believable face-swap videos, known as ‘Deepfakes’, that leave few traces of facial manipulation. As fake media propagates, these deepfakes not only threaten an individual’s reputation and privacy but also weaken the public’s trust in news outlets and have the potential to disrupt political and social institutions. To address this growing threat, I led a team to build three different models to identify and classify such videos as real or fake.

We first needed to understand the various ways in which deepfakes could be generated. We identified that these face manipulations can be classified into four main categories: (1) entire face synthesis (2) face identity swap (3) facial attributes manipulation and (4) facial expression manipulation. Each of these types is generated by powerful Generative Adversarial Networks (GAN) or other generative models such as Autoencoders. We observed that most deepfake videos undergo some amount of warping. This warping leaves distinct inconsistencies in the fake video.

Our approach leveraged convolutional neural networks (CNN) to identify spatial inconsistencies and recurrent neural networks (RNN) to identify temporal inconsistencies in the video. The CNN extracted frame-level features which were then used to train a RNN. Two out of the three model architectures we proposed used transfer learning with InceptionResNetV2 and XceptionNet base architectures, while the third model used a simple 8-layer CNN. We evaluated our model on two datasets -  FaceForensics++ and the Deepfake Detection Challenge (DFDC) to assess its effectiveness. We achieved an accuracy of 81% for the smaller model and up to 96% for the larger networks. 


Generation of Frames from Videos


Deepfake detection architecture


bottom of page