PrivacyTotal · A research brief on Android permission disclosure

I

Headline numbers

A 2.7 B model on a single RTX 2060, with 92.5% LLM–oracle agreement.

117 Android applications analysed. 3,483 permission–policy pairs drawn across 175 analysis runs. F1 of 0.930 on the covered class versus a deterministic keyword oracle labelled over 2,562 pairs.

0.000

F1 · covered class

Precision 0.892, recall 0.971.

0.0%

LLM–oracle agreement

2,931 of 3,167 adjudicated pairs.

0

Applications analysed

Across 10+ Google Play categories.

0

Permission–policy pairs

175 independent analysis runs.

0

Oracle-labelled pairs

Deterministic keyword ground truth.

0GB

Peak VRAM at inference

Fits entirely on consumer hardware.

II

Why this project

Users accept permissions they don't understand, and policies don't have to explain them.

RTÉ Prime Time's 2025 investigation revealed that granular location data from tens of thousands of Irish smartphones was openly brokered on data markets, tracking journeys to prisons, military bases, and private addresses, despite GDPR Article 13 requiring plain-language disclosure of exactly this kind of collection. Manual auditing does not scale to a Play Store of millions of apps whose policies routinely run past 15,000 words. PrivacyTotal automates the audit end-to-end.

The disclosure gap

Every Android app declares its required device permissions in AndroidManifest.xml. Every app handling user data is supposed to link a privacy policy disclosing what it collects and why. In practice, the two documents diverge — silently, and across every category of app.

Silent requests: permission declared, not mentioned in policy.
Vague disclosure: policy uses hedge words ("may", "might", "certain data").
Negative contradiction: policy denies collection the manifest enables.

What PrivacyTotal does

Given only a Google Play package ID, the tool acquires the APK through a five-source fallback chain, decodes its manifest with aapt2, retrieves the Play Store privacy policy, and asks a fine-tuned MobileLLaMA whether each declared permission is disclosed in the policy text. Results are compiled into a gap report, stored in a versioned database, and surfaced in a Flask web app.

Runs on consumer hardware: 8 GB VRAM, no cloud inference required.
Longitudinal: re-runs over time detect when policies change.
Transparent: every verdict cites the specific policy excerpt.

III

Architecture

Four stages. One package ID in, one Privacy Health Score out.

The full pipeline — acquisition, extraction, classification, scoring — runs unattended from the Flask web UI or the batch CLI. No human intervention after the package ID is entered.

01

Acquisition

APK downloaded through a five-source fallback chain (APKMirror → APKPure → APKCombo → APKMonk → Uptodown), with TLS-fingerprint masking and package-ID verification on every candidate.

02

Extraction

Android manifest decoded with aapt2; permissions grouped into 14 semantic categories. Privacy-policy URL pulled from the Play Store listing, HTML stripped to clean text.

03

Classification

Fine-tuned MobileLLaMA 2.7B receives a structured prompt with the permission, its human-readable description, and the top-3 TF-IDF-ranked policy excerpts. Returns “Mentioned:” or “Not mentioned:” with rationale.

04

Scoring

Results aggregated into a Privacy Health Score on a 0–100 scale, penalising undisclosed high-risk permissions and vague language. Full report persisted to SQLite for longitudinal tracking.

IV

The model

A 2.7 B model, QLoRA-tuned from scratch, on an RTX 2060 Super.

Full fine-tuning at this scale needs more than 40 GB of VRAM. QLoRA with 4-bit NF4 quantisation brings it inside 8 GB, without giving up classification quality.

Training at a glance

Base model	MobileLLaMA 2.7B-Chat
Technique	QLoRA, 4-bit NF4
LoRA rank / alpha	r = 8, α = 16
Learning rate	5 × 10⁻⁵
Epochs	3 (≈ 2.5 h)
Dataset	5,154 examples
Loss	2.91 → 0.34
GPU	RTX 2060 Super, 8 GB

Why MobileLLaMA

Several small-footprint LLMs were evaluated before training began: TinyLLaMA, Mistral 7B, and Phi-2. MobileLLaMA 2.7B won on three criteria: it runs inside 8 GB, it handles legal and technical prose without trailing into hallucination, and it has an open Chat variant suitable for instruction tuning.

Training data was assembled from the PrivacyQA corpus, OPP-115 annotations, and hand-authored permission-style capability templates. Earlier multi-stage adapters over-fit and produced boilerplate; the current model was trained from scratch on the base checkpoint against a single classification task.

V

Privacy Health Score

A single number users and regulators can both reason about.

The Privacy Health Score compresses the gap analysis into a 0–100 metric, weighted by the risk category of each undisclosed permission and the vagueness of the policy text.

83.8

Median · Low Risk

80 – 100 Low Risk. Majority of permissions disclosed. 55 apps · 55.0%

60 – 79 Moderate Risk. Some permissions undisclosed or vague. 19 apps · 19.0%

50 – 59 High Risk. Multiple sensitive permissions undisclosed. 8 apps · 8.0%

0 – 49 Critical Risk. Policy materially under-discloses. 18 apps · 18.0%

VI

Empirical findings

The gap is widespread, structured, and concentrated on feature-level permissions.

Across 3,483 permission–policy pairs, roughly 35% of declared Android permissions are either undisclosed or only vaguely disclosed. The pattern is not random: it falls hardest on permissions introduced by specific app features (Bluetooth, notifications, NFC) rather than on core-sensitive ones.

13.6%

NFC disclosure

Only 3 of 22 declared NFC instances were disclosed; 81.8% were not mentioned at all. The least-disclosed permission group in the corpus.

36.8%

Bluetooth disclosure

Only 74 of 201 BLUETOOTH-family declarations are addressed in the policy, despite Bluetooth's role in proximity tracking and beaconing.

39.1%

Notifications group

POST_NOTIFICATIONS (Android 13), VIBRATE, and WAKE_LOCK together form the single largest source of undisclosed declarations. Policy lags platform.

69.2%

Apps with a gap

81 of 117 analysed applications had at least one undisclosed permission. Clean disclosure is the exception, not the norm.

92.5%

LLM – oracle agreement

The classifier agrees with the keyword oracle on 2,931 of 3,167 adjudicated pairs — safe to deploy as a triage layer ahead of human review.

Bimodal

PHS distribution

55% of apps score Low Risk, 18% Critical. Best and worst practices coexist: blanket template reform will not reach the worst offenders.

VII

Breakdown

Disclosure by permission category, and by risk band.

Same 3,483-pair corpus, split two ways. Left: what share of each permission category ever makes it into the policy. Right: where the 100 scored apps land on the Privacy Health Score.

Fig. 1 · Disclosure rate by permission category

Share of declarations in each group that the policy actually covers.

Forest ≥ 70% · Amber 50–69% · Terracotta 30–49% · Crimson < 30%.

Fig. 2 · Privacy Health Score distribution

100 apps with computable PHS across four risk bands.

Mean 73.7 · Median 83.8 · Standard deviation 26.9.

VIII

Built with

Open source end to end.

No cloud APIs, no proprietary dependencies. The full stack is reproducible on a single consumer GPU with off-the-shelf Python libraries.

MobileLLaMA 2.7B QLoRA / PEFT PyTorch · transformers bitsandbytes 4-bit NF4 Playwright aapt2 · androguard Flask · Jinja2 SQLite Bootstrap 5 Chart.js Python 3.11 Windows + WSL bash

IX

The papers

Full project documentation, read on site or downloaded.

The four formal write-ups behind the project. Click through for the full text rendered in the site's reader, or grab the original PDFs.

№1

Preliminary Research

Background research carried out before the project began — a survey of the legal, technical, and policy landscape that motivated PrivacyTotal. Separate from the final Research Report; this is the groundwork that shaped the project's scope.

Read preliminary research

32 pages PDF

№2

Project Specification

The opening proposal: problem framing, motivation from the 2025 RTÉ Prime Time disclosure, literature review, proposed system architecture, deliverables, technical constraints, and sources.

Read specification

Oct. 2025 HTML only

№3

Project Report

End-to-end engineering report: system design, APK acquisition pipeline, model training methodology, inference fixes, web application, database design, and a full testing and evaluation chapter.

Read report

Apr. 2026 PDF

№4

Research Report

Empirical study of 117 Play Store apps: corpus construction, permission-level coverage statistics, Privacy Health Score distribution, and case studies. Published in full on site.

Read research