A hybrid phishing detection system that uses OCR, computer vision, and threat intelligence to analyse phishing content embedded in images and emails content that traditional text-based filters cannot read.
Phishing attacks increasingly embed malicious content inside image files to bypass traditional email security filters. A fake Geek Squad invoice, a PayPal billing alert, or a DocuSign impersonation email rendered as a graphic is completely invisible to text-based scanners but convincing to any human who reads it.
ImageAware+ was built to close this gap. It combines multi-pass OCR with OpenCV image preprocessing, QR code detection, HTML href URL extraction, email header analysis, and a 29-indicator rule-based scoring engine to produce an explainable forensic risk assessment for any submitted image or email file.
Every point in the final score is traced back to a named indicator with supporting evidence making the system suitable for forensic documentation, not just binary classification. The system is deployed as a live educational platform covering phishing awareness, attack types, and real-time sample analysis.
Submit a phishing image (PNG, JPG) or email file (.eml) through the web interface.
Multi-pass OCR extracts text, HTML hrefs recover hidden URLs, QR codes are decoded.
Extracted URLs are checked against VirusTotal, URLScan.io, and PhishTank APIs.
29 indicators across 8 attack categories produce an explainable risk score from 0 to 100.
Formally evaluated on 300 labelled samples 150 phishing emails from the Nazario 2025 corpus and 150 legitimate emails from the TREC 2007 ham corpus. Image pipeline evaluated on 22 labelled samples.
Built with Python and Flask, deployed as a Docker container on Render.com with automatic GitHub CI/CD deployment.
All project documentation produced throughout the year, plus the live platform and source code.
Initial project proposal and technical specification document.
Literature review and research findings supporting the project.
Full final year project dissertation and technical report.
Full source code, evaluation scripts, and deployment configuration.