Facial Emotion Recognition with AI

A deep learning project exploring facial emotion recognition through CNN architecture comparison, data augmentation, and real-time webcam inference.

Visit repository

Computer Vision
CNN Architecture Study
PyTorch
Data Augmentation
Model Evaluation

Framing the problem

This project explored facial emotion recognition as a computer vision task, with the objective of classifying the seven basic emotions: neutral, happiness, sadness, fear, anger, disgust, and surprise.

Beyond simple image classification, the challenge was to build a system capable of extracting meaningful emotional cues from low-resolution facial data, despite class imbalance, visual ambiguity, and overlapping expressions.

Data Foundation

The project was built on FER2013, a public dataset containing roughly 35,900 grayscale facial images at 48×48 resolution, labeled across seven emotion categories.

Preparing this data correctly was essential: the images were normalized, organized into PyTorch-compatible pipelines, and carefully structured for both training and evaluation.

Architecture study

A key part of the project was comparing several convolutional architectures, including custom CNNs, VGG-style models, and ResNet-based approaches.

This comparison helped evaluate the trade-offs between model complexity, convergence stability, and generalization on compact grayscale facial inputs. ResNet ultimately emerged as the most reliable solution, offering the best balance between performance and robustness across the different emotion classes.

Training Strategy

The training pipeline was developed in PyTorch using Adam, batched loading, and repeated evaluation across epochs.

To improve generalization, we applied data augmentation techniques adapted to facial imagery, mainly rotations and flips, in order to enrich the dataset without changing the semantic meaning of the expressions.

Data augmentation examples on facial images.

Hyperparameter optimization

To move beyond manual trial and error, we used Optuna to explore stronger model configurations more efficiently.

The optimization process covered learning rate, batch size, dropout, hidden dimensions, number of epochs, and architecture choice.

Reading model behavior

Rather than focusing only on final accuracy, we also analyzed how the models behaved during training and evaluation. Confusion matrices, validation curves, and failed runs revealed recurring weaknesses, especially between visually similar emotions such as sadness and neutral, or fear and anger.

The strongest ResNet-based configuration reached about 60% accuracy, while also making clear how much facial emotion recognition depends on both architecture design and the inherent ambiguity of the data itself.

Project Outcome

This project resulted in a full facial emotion recognition pipeline combining computer vision research, deep learning model design, hyperparameter optimization, and real-time inference.

More importantly, it gave us practical experience with PyTorch training workflows, architecture benchmarking, data augmentation, confusion-matrix analysis, and the challenges of translating model performance into usable interactive systems.