An OTA update pipeline that can push firmware to your entire device fleet simultaneously is, from an attacker's perspective, the single most valuable target in your IoT infrastructure. A compromised update pipeline is not an incident that affects one device — it is an incident that affects every device. The 2016 Mirai botnet propagated through default credentials, but a compromised update server executing the same attack would not need to find devices with weak passwords. It would just push malicious firmware to every unit that checks in.
This article covers the full architecture of a secure OTA update pipeline: signing keys and key management, update manifests, secure delivery, verification on the device side, and anti-rollback mechanisms. The focus is on the complete chain from build output to device flash, not just the signing step in isolation.
The Threat Model: What You're Defending Against
Before specifying security controls, be explicit about the threats. A secure OTA architecture must address at least four attack classes:
Firmware tampering: An attacker modifies a legitimate firmware image in transit or at rest in the update distribution storage, and the device applies the modified image. Defense: cryptographic signature verification on the device before applying any update.
Supply chain compromise of the signing key: An attacker obtains the code signing private key — via CI/CD system compromise, developer workstation breach, or insider threat — and signs malicious firmware with the legitimate key. Defense: HSM-based signing key storage, separation of signing credentials from build artifacts, audit logs of every signing operation.
Rollback attacks: An attacker replays a legitimate older firmware image (with a valid signature) to downgrade the device to a version with known vulnerabilities. Defense: monotonic version counters in tamper-resistant storage, minimum version enforcement in update manifests.
Partial update / bricking attacks: An attacker (or a network fault) interrupts an update mid-write, leaving the device in an unbootable state — effectively a denial-of-service attack on the device's availability. Defense: A/B partition scheme with atomic swap, watchdog-triggered recovery to the previous valid image.
We're not saying these attacks are equally likely — in most deployments, supply chain compromise of the signing key is the highest-probability high-impact scenario. But partial update resilience is a safety property (a bricked field device may have physical consequences in industrial environments) and rollback prevention is increasingly required by IEC 62443 and automotive cybersecurity standards.
The Signing Key Architecture
The code signing key hierarchy for OTA updates should mirror the CA hierarchy used for device certificates: a root signing key, rarely used and air-gapped, and one or more operational signing keys for routine firmware releases.
Root code signing key: Generated in a FIPS 140-2 Level 3 HSM or equivalent. The public key corresponding to this root is burned into device non-volatile memory (eFuse or OTP region) at manufacturing. All operational signing keys are signed by the root — the device can verify that an operational key is authorized by checking its signature against the burned-in root public key. The root key is used only to: sign new operational keys, revoke old operational keys, and in a compromise recovery scenario.
Operational signing key: Used by the CI/CD pipeline for each firmware release. Should be stored in a hardware token accessible to the build system (HSM API, PKCS#11-compatible token, or cloud KMS like AWS CloudHSM or Google Cloud HSM). The operational key should have a defined validity period (6-24 months is common) and a rotation plan that includes time for the new public key to be distributed to devices before the old key expires.
Algorithm choice: ECDSA with P-256 is the current standard for constrained embedded devices — smaller keys and signatures than RSA-2048, with hardware acceleration on most current-generation MCUs. ED25519 is preferred for new designs due to deterministic signing (no per-operation randomness requirement) and similar key sizes. RSA-2048 should not be chosen for new work.
Update Manifests: What They Must Contain
A firmware image signature alone is not sufficient for a secure OTA system. The update manifest — a separate signed document that accompanies the firmware — carries the metadata that the device uses to determine whether to apply the update. The manifest must be signed by the same key as the firmware (or by a dedicated manifest signing key that the device trusts), and the manifest signature must be verified before any other processing.
Required fields in an OTA manifest for a production deployment:
- Firmware image hash (SHA-256 or SHA-384) — to verify the downloaded image matches what the manifest describes, even if the manifest itself was verified.
- Firmware version number — the version of the firmware in this update.
- Minimum device version — the minimum firmware version the device must be running before this update can be applied. Used to enforce update sequences (e.g., a migration step must be applied before the main update).
- Anti-rollback version (security counter value) — the security counter value this update requires. The device will reject the update if its current counter is higher than the manifest's counter (meaning the device already has newer security fixes).
- Target device model and hardware revision — to prevent cross-model application. A device should reject an update manifested for a different hardware revision.
- Update expiry timestamp — optional but useful: the update package is only valid until this date, preventing replay of old-but-valid update packages years later.
Manifest formats: SUIT (Software Updates for Internet of Things, IETF RFC 9124) is the emerging standard for constrained IoT devices. It uses CBOR encoding (compact binary) and COSE signing (Concise Binary Object Representation, Concise Signature/Encryption — the constrained device equivalents of JSON and JWS). For Linux-based devices, TUF (The Update Framework) is well-established with a rich ecosystem of tooling.
Scenario: Smart Building Controller, UK, Early 2024
A building automation company operating a fleet of approximately 14,000 smart HVAC controllers across commercial properties in the UK discovers a buffer overflow vulnerability in their MQTT client library. The vulnerability is network-exploitable — an attacker sending a malformed MQTT PUBLISH packet to a device on the building network can achieve code execution. The fix requires a firmware update to all 14,000 units within a 14-day window specified by the building operator's incident response SLA.
The OTA pipeline had been set up with firmware signing but no signed manifests and no rollback prevention. The first challenge: the firmware build system had the signing key as a plaintext secret in the CI environment variables. Before the security patch could be released, the team had to rotate the signing key (because the CI environment had been flagged as potentially compromised in the same vulnerability report), generate a new key pair on a YubiKey FIPS, burn the new public key into the next firmware image as a trusted update key, and ship a transition firmware update that only added the new trusted public key (signed by the old key) before shipping the security patch (signed by the new key).
Total elapsed time from vulnerability disclosure to all 14,000 units patched: 19 days, exceeding the 14-day SLA. The key rotation added 5 days. If the key management architecture had been correct from the start — HSM-based operational key, not CI environment variable — the update would have shipped in the target window. The building automation company subsequently invested in key management infrastructure before the next product line shipped.
Secure Delivery: Beyond HTTPS
TLS with server certificate validation is the baseline transport for OTA firmware delivery. But several additional controls matter in production:
Certificate pinning on the update endpoint: The device should pin the expected server certificate or CA for the update endpoint, not rely solely on the system trust store. This prevents man-in-the-middle attacks using certificates from other CAs in the trust store.
Manifest delivery separated from image delivery: Fetch and verify the manifest first. If the manifest signature is invalid — stop. Don't download the firmware image. This saves bandwidth (firmware images are typically 1-10 MB; manifests are a few kilobytes) and fails fast when an attacker is trying to push fake updates.
Rate limiting and authorization at the update server: Devices requesting updates at abnormal frequency may indicate a compromised device used as a probe. Rate limiting per device serial, combined with device certificate authentication, gates update access to enrolled devices only.
Delta updates for bandwidth-constrained devices: Binary delta updates reduce transfer size by 60-90% for cellular-connected devices. The device verifies the result of applying the delta against the manifest's expected final image hash before committing.
Device-Side Verification Sequence
The correct device-side verification sequence, from first byte received to firmware execution:
- Download manifest to RAM. Verify manifest signature against the trusted signing public key stored on device. If invalid — abort, delete downloaded manifest, optionally flag the endpoint as suspect.
- Download firmware image to the inactive partition (A/B scheme) or staging area. During download, compute the running hash.
- After download completes, compare the computed hash against the manifest's firmware hash. If mismatch — abort, erase staging area, do not proceed.
- Verify manifest fields: target device model, minimum version, anti-rollback counter. If any field fails — abort with specific error code logged.
- Mark the new partition as pending reboot (not yet active). Reboot.
- Bootloader verifies the pending partition's signature before transferring execution. If valid — switch active partition, increment security counter. If invalid — revert to previous partition.
- Application boots, runs self-test. If self-test passes — confirm the new firmware is valid, finalize the partition switch. If self-test fails — revert to previous partition on next reboot (watchdog-triggered).
Step 6 (bootloader verification) and step 7 (application confirmation) are both required. Bootloader verification ensures the image hasn't been corrupted between download and reboot. Application confirmation ensures the image actually runs correctly on this specific device — important for updates that have hardware-configuration dependencies.
Key Rotation Across a Live Fleet
The moment an OTA signing key must be rotated — whether due to expiry, suspected compromise, or algorithm migration — the rotation must be executed across a live fleet that may have tens of thousands of devices with intermittent connectivity. The procedure:
Devices must trust both the old key and the new key during the transition period. This requires a trusted key list on each device, not just a single hard-coded public key. The key list update is itself a signed OTA operation — signed by the old key — that adds the new key to the trusted set. Once the key list update has propagated to a high enough fraction of the fleet (95-99%, tracked in the certificate/key management system), the old key can be removed from the trust list via a follow-up update signed by the new key.
For devices that have not received the key list update before the old key expires: these devices will be unable to verify updates signed by the new key. Recovery requires either an out-of-band channel (cellular management plane, physical access) or a fail-safe re-enrollment mode. Plan the transition timeline with enough lead time that the percentage of unreachable devices is known and acceptable before the old key is retired.
The OTA update security chain — signing key architecture, manifest integrity, secure delivery, device verification, anti-rollback, and key rotation — is a complete system. Each piece depends on the others, and a gap in any one element degrades the whole. Getting it right requires treating OTA security as a first-class design requirement, not a retrofit.