Firmware Signing and Secure Boot: A Practical Guide

Microcontroller circuit board showing secure boot and firmware signing architecture

Unsigned firmware is the default state of most IoT devices built before 2020. The reasoning was usually pragmatic: "We're a hardware company, not a security company. The device is on a private network. Nobody is going to replace our firmware." Then Mirai happened, then URGENT/11, then a series of industrial SCADA exploits showed that "private network" is a much weaker perimeter than anyone assumed. Today, shipping unsigned firmware to any connected device is an engineering decision that requires explicit justification, not the default path.

This guide covers the full signing chain: the keys, the boot verification sequence, OTA update signing, and rollback prevention — with enough implementation detail to be actionable for embedded teams working with ARM Cortex-M or similar architectures.

The Signing Key Hierarchy

Firmware signing uses asymmetric cryptography: a private signing key held by the firmware publisher, a corresponding public verification key embedded in the device. The device only needs the public key; it never has access to the private key. This asymmetry is fundamental — if verification required the private key, a compromised device would expose the ability to sign arbitrary firmware.

In practice, most production systems use at least two levels of signing keys:

Root signing key (code signing root) — generated in an HSM, air-gapped, used only to sign intermediate signing keys or for emergency re-keying. This key should never touch a general-purpose workstation. The ceremony for generating and escrowing this key should be documented, witnessed, and auditable — similar to a CA root key ceremony.

Operational signing key — an intermediate key signed by the root, used in the CI/CD pipeline for routine firmware builds. This key can be automated (hardware token or HSM with API access) without compromising the root. When the operational key is rotated, the root key signs the new operational key and the updated public key is pushed to devices via an OTA update before the old operational key expires.

Many embedded teams start with a single signing key loaded on a developer's laptop. This works fine for prototypes. For production at more than a few hundred units, the single-key model creates an unacceptable single point of failure: one lost laptop, one compromised CI server, and your signing key is exposed.

Secure Boot: The Chain of Trust

Secure boot is the runtime verification mechanism that enforces firmware signing during device startup. The chain of trust works from the innermost trust anchor outward:

ROM bootloader (Stage 0) — code burned into read-only memory at silicon manufacture time, which cannot be modified in the field. This code's only job is to verify the next stage. It contains the hash of the Stage 1 bootloader's public key (or the public key itself, depending on MCU architecture). On ARM Cortex-M33 and above with TrustZone, this maps to the BL1 stage. On NXP i.MX RT series, this is the ROM boot code that reads the HAB (High Assurance Boot) configuration from eFuses.

Stage 1 bootloader — the first piece of mutable code, but verified by Stage 0 before execution. This stage verifies the main application firmware's signature before passing control. Common implementations: MCUboot on Cortex-M, U-Boot with verified boot on application processors.

Application firmware — signed by the operational signing key, verified by Stage 1. If verification fails — signature mismatch, hash mismatch, invalid certificate chain — the bootloader refuses to execute the image and either halts or falls back to a recovery partition.

The chain is only as strong as its weakest link. A Stage 1 bootloader that doesn't actually verify the signature (a surprisingly common finding in production device audits — the verification call is present but the failure path falls through) provides no protection regardless of how well-designed the Stage 0 root of trust is.

Scenario: Cortex-M33 Production Device, 2023

Consider an industrial sensor manufacturer shipping a Cortex-M33-based gas detector with a 10-year field lifecycle. Their initial implementation used MCUboot with ED25519 signing. The development team had implemented signature verification correctly — MCUboot's MCUBOOT_VALIDATE_PRIMARY_SLOT flag was set, the signing key was generated via imgtool keygen, and production images were signed with imgtool sign. So far, so good.

The audit finding was in the key management, not the code: the operational signing key was stored as a plaintext PEM file in the CI/CD repository's secrets store — an environment variable accessible to any pipeline that ran in the same CI organization. The fix required: generating a new signing key pair, loading the private key into a hardware token (in this case a YubiKey 5 FIPS), updating the CI pipeline to invoke signing via ykman piv keys sign rather than direct key file access, burning the new public key hash into the eFuse of the next production batch, and issuing a signed OTA update to field units to trust the new public key in parallel.

The eFuse burn for the next batch was irreversible. The key rotation for field units required the old key to still be valid during the transition. The exercise took six weeks of engineering time. The lesson: key management architecture is not separable from secure boot architecture. Getting signing right at the start is vastly cheaper than rotating a burned-in public key hash across a production fleet.

OTA Update Signing

Over-the-air firmware updates are, from a signing perspective, firmware signed by the operational key and delivered to a device that verifies the signature before applying. But the delivery path introduces additional attack surfaces that pure secure boot doesn't address.

An OTA signing architecture must answer four questions:

Who signs what? The firmware image itself is signed by the code signing key. In addition, the update manifest — which contains the image hash, target device model, minimum acceptable version (for rollback prevention), and delivery metadata — should also be signed. A manifest-only signing approach (verifying the manifest but not the image separately) is vulnerable to hash collisions if the manifest hash algorithm is weak. Signing both, or signing the manifest over the image hash, provides stronger integrity guarantees.

How does the device fetch the update? HTTPS to a known endpoint with server certificate pinning is the baseline. Unencrypted HTTP delivery of signed firmware is technically verifiable (the signature check still works) but leaks version information and device metadata to passive network observers — relevant for ICS/SCADA deployments where update activity patterns can be intelligence.

How does the device know the update is for its SKU? The signing key alone doesn't encode product targeting. A firmware image signed with a valid key but compiled for Model A will likely brick a Model B device. Update manifests should include explicit device model/hardware revision targeting, and devices should refuse to apply updates not intended for their model.

What happens if the update fails mid-write? A/B partition schemes (two complete firmware slots, atomic swap after verification) are the correct answer. Single-partition schemes with in-place overwrite have a power-loss failure mode: if power is lost during the write, the device may be unbootable. For devices in the field without physical access, an unbootable device is a permanent bricking.

Rollback Prevention

Rollback attacks — forcing a device to downgrade to an older, vulnerable firmware version — are a real threat in deployed fleets. If an attacker can replay an older signed firmware package (which has a valid signature from the genuine code signing key), they can undo security patches and re-expose known vulnerabilities.

Prevention mechanisms:

Monotonic version counter — a non-decreasing counter stored in tamper-resistant storage (eFuses or OTP memory, not flash). The bootloader verifies that the new image's version number is greater than or equal to the current counter value. On successful boot of the new image, the counter is incremented and cannot be decremented. MCUboot implements this as the security counter field in the image header (--security-counter in imgtool).

Minimum version enforcement in manifest — the update manifest specifies a minimum version floor below which the device will not accept the update. This allows the fleet operator to declare "no device should run anything older than version 3.2.1" and enforce it at the distribution layer.

We're not saying rollback prevention is free. Monotonic counters consume eFuse bits, which are a finite, one-time-write resource. Planning the counter bit depth (typically 16-32 bits for long-lifecycle devices) and increment granularity (per-build vs per-release) is an early design decision that cannot be easily revisited after silicon is manufactured.

Key Rotation Over a Multi-Year Device Lifecycle

A device shipped today may be in service in 2035. The code signing key used to sign its original firmware needs to either remain valid for the entire lifecycle, or the device needs a mechanism to accept a new signing key. The latter is strongly preferred — cryptographic agility (the ability to update algorithms and keys) is increasingly a requirement in standards like IEC 62443-4-2.

The practical implementation: the device stores a small set of trusted public keys (or their hashes, if the MCU has constrained storage). An OTA update that introduces a new signing key is itself signed by the current operational key, updating the trusted key set. After the key set update is deployed and confirmed, the old operational key can be retired. The device then only accepts images signed by the new key.

This key rotation ceremony requires maintaining overlap: the old key must still be valid during the period when the key-rotation OTA is being deployed to the fleet. For a fleet with variable connectivity — field sensors that report in once a day, industrial devices in areas with intermittent network coverage — "deployed to the fleet" may take weeks. Plan for the overlap period explicitly, not as an afterthought.

CI/CD Integration

Production signing should be integrated into the build pipeline at the image finalization step, not as a manual post-build operation. The signing service — whether an HSM API, a hardware token invoked via PKCS#11, or a cloud KMS call — should be called by the build system, with the signed image and manifest being the build artifacts that proceed to QA and release. Unsigned images should never be deployed to production, and the pipeline should enforce this by refusing to push unsigned images to the OTA distribution endpoint.

Build system hygiene: signing logs (which key, which key version, which image hash, which build number) should be preserved and auditable. If a firmware vulnerability is discovered post-release, the ability to answer "which devices were updated with which firmware, signed by which key, and when" is operationally valuable for both engineering and compliance purposes.