Startup CEOs lead busy lives. For example, finding our CEO on the three floors of the office space can be more arduous than ‘finding waldo. But as developers, our minds always follow the problem-solution approach.

We would like to know if we can use the face recognition module we wet our feet with daily to identify targeted persons over IP camera streams.

Face recognition technology has taken the world by storm recently, with its adoption increasing rapidly in various fields, including security, retail, B2B, and entertainment. 

The advancement in deep learning has enabled the widespread use of Face Recognition technology. Managing your employees is a seamless experience with face recognition tools. You can save time that would otherwise be spent on attendance tracking, payroll management, etc. 

This advanced guide will help you to understand various deep learning models and libraries used for face recognition and how to face recognition is enabled on a practical level. 

Let’s get started. 

A person's face being scanned by a face recognition technology

How does facial recognition works?

A face recognition system works by identifying or verifying a person’s face in an image. It involves several steps, which can be organized into a pipeline, as shown in the example image.

face recognition technology pipeline
  • Face Detection—Face detection refers to the process of automatically identifying human faces within digital images or video frames
  • Feature Extraction—Extracting the essential features from an image of the face.
  • Face Classification—It is the process of categorizing a detected face into one or more predefined categories based on extracted features.

Various feature extraction and classification process are there. First, we will discuss MTCNN(Multi-Task cascaded Convolutional Neural Network), which is used for face detection.


The MultiTask Cascaded Convolutional Neural Network (MTCNN) is a state-of-the-art tool for detecting faces in images and videos. It uses a 3-stage neural network detector to locate and identify faces accurately. You can learn more about MTCNN in the linked research paper.

A deep learning model used for face recognition

How does MTCNN work?

To detect faces of various sizes, the image is first resized multiple times. The P-network then scans the image, performing the initial detection. While it has a low threshold for detection, which leads to many false positives even after using Non-Maximum Suppression (NMS), this design is intentional. The regions identified by the P-network, which may contain many false positives, are input into the second network, the R-network. 

As its name suggests, the R-network refines the detections by using NMS to obtain relatively precise bounding boxes. The O-network in MTCNN refines the bounding boxes for face detection and also has the optional feature of detecting facial landmarks such as eyes, nose, and mouth corners at a low cost. These facial landmarks can be useful for face alignment


Google’s FaceNet is a computer program that can identify and verify faces on a large scale. It is based on a deep convolutional neural network, a type of artificial intelligence trained to recognize patterns in data. 

FaceNet uses a unique training method called a triplet loss function to help distinguish between different faces. This means that when the program is shown two images of the same person, it will try to make the “vectors” (mathematical representations of the images) for those two images as similar as possible. 

On the other hand, when two images of different people are shown, it will try to make the vectors for those images as dissimilar as possible. FaceNet is the foundation for several open-source face recognition systems, such as FaceNet with TensorFlow, Keras FaceNet, DeepFace, and OpenFace.

How does FaceNet work?

FaceNet is a machine learning model that takes an image of a person’s face as input and outputs a vector of 128 numbers. This vector, called an embedding, represents the most significant features of the face and contains all the essential information from the image. When using FaceNet, the goal is for the embeddings of similar faces to be similar as well.

One of the significant aspects of FaceNet is its loss function. It uses the triplet loss function. We need three images to calculate the triplet loss: anchor, positive and negative.

A deep learning model used for face recognition

We want the distances between the embedding of our anchor image and the embeddings of our positive images to be lesser than the distances between the embedding of our anchor image and our negative images.

The Triplet loss function can be formally defined as follows- 

equation for A deep learning model using Triplet loss function for face recognition
  • f(x) takes x as an input and returns a 128-dimensional vector w.
  • i denotes i’th input.
  • Subscript a indicates an Anchor image, p indicates a Positive image, and n indicates a Negative image.

FaceNet Learns in the following way

  1. Randomly selects an anchor image.
  2. Selects an image of the identical individual as the anchor image in a random manner.
  3. Randomly selects an image of a person different from the anchor image (negative example).
  4. Modify the parameters of the FaceNet network such that the positive example is positioned closer to the anchor than the negative one.


To classify a new face, we calculate the distance between its embedding and the embeddings of known faces. Then, we use a classifier called Softmax to determine which known face the new face belongs to. 

Softmax was a natural choice for us since the entire system is based on neural networks, but you could also use other classifiers such as SVM or Random Forest. As long as the face embeddings are high quality, any classifier should work well at this step.

Deep Face Library

DeepFace is a deep-learning facial recognition system developed by Facebook’s AI research team in 2014. It is a neural network-based approach that uses a 3D model to align facial features and a deep neural network to encode facial images into a high-dimensional feature vector. The deep Face model supports several face recognition models such as OpenFace, Google FaceNet, VGG-Face, Facebook DeepFace, ArcFace, DeepID, Dlib, and SFace.

The four functions, verify, find, and analyze, along with the stream, do all the functionalities of the face recognition module.

Verify function

The function determines whether face pairs belong to the same or different individuals. It expects exact image paths as inputs. And the function will return a dictionary, and you have to verify the value of the verified key. It will return true if the faces match; otherwise, it will return False.

verify function face recognition technology

Find Function

The DeepFace find function looks for the identity of the input images in the database path, similar to the one provided as the input image.

find function face recognition technology

Analyze Function

DeepFace provides robust facial attribute analysis such as age, gender, facial expressions such as (fear, anger, happiness, and sadness) and race, including Asian, white, middle eastern, Indian, Latino, and black.

Analyze function in face recognition technology

Stream Function

The stream function gives live streaming using our webcam. It applies both face recognition and facial attribute analysis.

stream function in face recognition technology

Comparison of Face Recognition models in Real-time

We have tested the FaceNet model in TensorFlow, PyTorch, and the Deep Face library. Below are the results and conclusions we drew after rigorously testing the above models explained. We used the following criteria to test my models.

  • Different angles of the face
  • Different Lighting conditions
  • Head Moving
  • Frame rate achieved 
  • Detection among a group of people.
ModelsDifferent Angles of FaceDiff lighting conditionsHead MovingDetection among groupFPS Achieved
Facenet TensorflowSome FPs are thereDepends on the dataset providedFP is coming False positives are coming6-8 FPS
Facenet PytorchGetting with minimum FPDepends on the dataset providedMinimum no of FP.Getting results up to 80 % accurately7-9FPS
DeepFace FP is comingNot gettingFacing FP issuesNot detecting among a group2-3 FPS


Face recognition technology has the potential to revolutionize a wide range of industries and applications. Whether used for security purposes, to improve the customer experience in retail settings, to manage your employees, or for entertainment, this technology can make our lives easier and more convenient.

While there will always be concerns about issues such as privacy and accuracy, the benefits of face recognition technology far outweigh the potential downsides, and we should embrace it as a powerful tool for the betterment of humanity.

If you would like to leverage the possibilities of Face recognition, get in touch with our experts today for a free consultation

Subscribe to our newsletter

Submit your email to get all the top blogs, insights and guidance your business needs to succeed!

"*" indicates required fields

This field is for validation purposes and should be left unchanged.

Start your digital transformation Journey with us now!

Waitwhile has seen tremendous growth scaling our revenues by 5X and tripling our number of paid customers.

Back to Top