Final Year Project · Cybersecurity Updated 2026-04-19

Acoustic Side Channel Attacks: Keystroke Classification from Microphone Recordings

Proof-of-concept system that infers typed input from keyboard audio using classical DSP segmentation, MFCC features, and a lightweight SVM classifier — extended with word and sentence decoding plus a Streamlit GUI.

Author: Domantas Prialgauskas Student No: C00285607 Labels: A–Z + SPACE

Quick facts

Core stack

RMS → MFCC → SVM

+ word/sentence decoding

Leakage control

Group-safe

Session-aware split

Live modes

Letter / Word / Sentence

Spacebar supported

Visual teaser: synthetic “tap energy” monitor

How to use this page

• Use the tabs in the top bar to jump between sections without endless scrolling.
• Use Downloads to open your full report and research document in a new tab.
• Demo video is embedded under Demo Video.

PS C:\Users\doman\OneDrive\Desktop\project

(.venv) python kbd_live_pipeline.py train

[RETRAIN] stable holdout report + saved model artifacts

(.venv) streamlit run app.py

Launch GUI (single-letter / word / sentence modes)

Suggested demo inputs: this, vote, quit, devil, line · letters: a, q, t, b · SPACE

Home

This site summarizes the project and links to the full documents. Use the tabs above to explore: segmentation and quality control, MFCC feature extraction, leakage-safe evaluation, and the decoding pipeline.

Open GitHub Repo

Overview

Acoustic side-channel attacks exploit information leakage from physical signals such as sound, timing, or electromagnetic emissions. Keyboards are a relevant target because each press produces a short transient with repeatable spectral structure, enabling classification under suitable recording conditions.

Pipeline

End-to-end flow: record audio → bandpass filter → RMS envelope → peak detection + debounce → onset-aligned window extraction → MFCC(+Δ,+ΔΔ) → scaling → SVM prediction → decoding.

Pipeline diagram (SVG)

Segmentation

Segmentation uses bandpass filtering (100–8000 Hz) and RMS energy envelope (20 ms frames, 10 ms hop), smoothed by a moving average. Peaks are detected via thresholding and debounced to avoid double-counting. Extracted windows are onset-aligned with a fixed pre/post region for consistency between offline and live.

RMS + peak detector (stylized)

Features & Model

Feature pipeline (at a glance)

Features: n_mfcc=20, n_fft=1024, hop=256 → stacked MFCC/Δ/ΔΔ → mean+std vector → StandardScaler → SVM (RBF).

Why this matters

Compact: MFCCs summarise spectral envelope efficiently for short transients.
Stable evaluation: session/group splits prevent train–test leakage.
Demo-friendly: fast enough for real-time-ish decoding without heavy DL.

MFCC + deltas RBF SVM Group-safe split Live retrain

Each keystroke window is converted to MFCC (20 coefficients) with Δ and ΔΔ, and summarized via per-coefficient mean and standard deviation. A StandardScaler is fit on training data only, and an RBF SVM performs multi-class classification. Session/group-aware splitting controls leakage.

Decoding

Word decoding combines: per-position probabilities (softmax over SVM margins), beam search over letter sequences, and dictionary snapping with confusion-aware substitution costs derived from observed misclassifications. Sentence mode segments by predicted SPACE and decodes each word chunk with relaxed constraints.

GUI

A Streamlit GUI provides three demo modes (letter vote, fixed-length word, sentence) with per-mode tuning controls for decoder temperature, beam width, and top-K expansion. Sentence mode adds sensitivity (threshold K) and debounce controls.

Demo Video

The player below attempts to load the YouTube embed. If your environment blocks embedded playback, use the button to open the video directly.

Open on YouTube

Security

The project highlights acoustic leakage risk under controlled conditions and discusses realism constraints: microphone placement, environment noise, keyboard model, and user style. It also discusses mitigation directions (noise, acoustics, behavioral factors).

Downloads

Because the showcase must be uploaded as a single HTML file, the documents are stored inside this page as embedded data. Use the buttons below to download the DOCX files. The filename is preserved.

Open GitHub Repo

Tip: If your browser blocks automatic downloads, right‑click the button and choose “Save link as…”, or allow downloads for this page.