An e-mail discussion related to my recent post on IT security has motivated me to ponder about issues with Public Key Infrastructure once more. So I attempt – most likely in vain – to merge a pop-sci introduction to certificates with sort of an attachment to said e-mail discussion.
So this post might be opaque to normal users and too epic and introductory for security geeks. I apologise for the inconvenience.
I mentioned the failed governmental PKI pilot project in that post – a hardware security device destroying the key and there was no backup. I would have made fun of this – hadn’t I experienced it often that it is the so-called simple processes and logistics that can go wrong.
When compiling the following I had in mind what I call infrastructure PKIs – company-internal systems to be used mainly for internal purposes and very often for use by devices rather than by humans. (Ah, the internet of things.)
Issues often arise due to a combination of the following:
- Human project resources assigned to such projects are often limited.
- Many applications simply demand certificates so you need to create them.
Since the best way to understand certificates is probably by comparing them to passports or driver licenses I will nonetheless use one issued to me as a human life-form:
Alternatives to Hardware Security Modules
An HSM is protecting that sacred private key of the certification authority. It is often a computer, running a locked-down version of an operating system, and it is equipped with sensors that detect and attempt to access the key store physically – it should actually destroy the key rather than having an attacker gain access to it.
It allows for implementing science-fiction-style (… Kirk Alpha 2 … Spock Omega 3 …) split administration and provides strong key protection that cannot be provided if the private key is stored in software – somewhere on the hard disk of the machine running the CA.
Modern HSMs have become less cryptic in terms of usage but still: It is a hardware device not used on a daily basis, and requires additional training and management. Storage of physical items like the keys for unlocking the device and the corresponding password(s) is a challenge as is keeping the know-how of admins up to date.
Especially for infrastructure CAs I propose a purely organizational split administration for offline CAs such as a Root CA: Storing the key in software, but treating the whole CA machine as a device to be protected physically. You could store the private key of the Root CA or the virtual machine running the Root CA server on removable media (and at least one backup). The “protocol” provides spilt administration: E.g. one party has the key to the room, the other party has the password to decrypt the removable medium. Or the unencrypted medium is stored in a location protected by a third party – which in turn does only allow two persons to enter the room together.
But before any split administration is applied an evaluation of risks it should be made sure that the overall security strategy does not look like this:You might have to question the holy order (hierarchy) and the security implemented at the lowest levels of CA hierarchies.
Hierarchies and Security
In the simplest case a certification authority issues certificates to end-entities – users or computers. More complex PKIs consist of hierarchies of CAs and thus tree-like structures. The theoretical real-world metaphor would be an agency issuing some license to a subordinate agency that issues passports to citizens.
The Root CA at the top of the hierarchy should be the most secure as if it is compromised (that is: it’s private key has – probably – been stolen) all certificates issued somewhere in the tree should be invalidated.
However, this logic only makes sense:
- if there is or will with high probability be at least a second Issuing CA – otherwise the security of the Issuing CA is as important as that of the Root CA.
- if the only purpose of that Root CA is to revoke the certificate of the Issuing CA. The Root CA’s key is going to sign a blacklist referring to the Issuing CA. Since the Root should not revoke itself its key signing the revocation list should be harder to compromise than the key of the to-be-revoked Issuing CA.
Discussions of the design of such hierarchies focus a lot on the security of the private keys and cryptographic algorithms involved
But yet the effective security of an infrastructure PKI in terms of Who will be able to enroll for certificate type X (that in turn might entitle you to do Y) is often mainly determined by typical access control lists in databases or directories system that are integrated with an infrastructure PKI. Think would-be subscribers logging on to a web portal or to a Windows domain in order to enroll for a certificates. Consider e.g. Windows Autoenrollment (licensed also by non-Windows CAs) or the Simple Certificate Enrollment Protocol used with devices.
You might argue that it should be a no-no to make allegedly weak software-credential-based authentication the only prerequisite for the issuance of certificates that are then considered strong authentication. However, this is one of the things that distinguish primarily infrastructure-focused CAs from, say, governmental CAs, or “High Assurance” smartcard CAs that require a face-to- face enrollment process.
In my opinion certificates are often deployed because their is no other option to provide platform-independent authentication – as cumbersome as it may be to import key and certificate to something like a printer box. Authentication based on something else might be as secure, considering all risks, but not as platform-agnostic. (For geeks: One of my favorites is 802.1x computer authentication via PEAP-TLS versus EAP-TLS.)
It is finally the management of group memberships or access control lists or the like that will determine the security of the PKI.
Hierarchies and Cross-Certification
It is often discussed if it does make sense to deploy more intermediate levels in the hierarchy – each level associated with additional management efforts. In theory you could delegate the management of a whole branch of the CA tree to different organizations, e.g. corresponding to continents in global organizations. Actually, I found that the delegation argument is often used for political reasons – which results in CA-per-local-fiefdom instead of the (in terms of performance much more reasonable) CA-per-continent.
I believe the most important reason to introduce the middle level is for (future) cross-certification: If an external CA cross-certifies yours it issues a certificate to your CA:
Any CA on any level could on principle be cross-certified. It would be easier to cross-certificate the Root CA but then the full tree of CAs subordinate to it will also be certified (For the experts: I am not considering name or other constraints here). If a CA an intermediate level is issued the cross-certificate trust is limited to this branch.
Cross-Certification constitutes a bifurcation in the CA tree and its consequences can be as weird and sci-fi as this sounds. It means that two different paths exists that connect an end-entity certificate to a different Root CA. Which path is actually chosen depends on the application validating the certificate and the protocol involved in exchanging or collecting certificates.
In an SSL handshake (which happens if you access your blog via https://yourblog.wordpress.com, using the certificate with that asterisk) happens if you access the web server is so kind to send the full certificate chain – usually excl. the Root CA – to the client. So the path finally picked by the client depends on the chain the server knows or that takes precedence at the server.
Cross-certification is usually done by CAs considered external, and it is expected that an application in the external world sees the path chaining to the External CAs.
Tongue-in-cheek I had once depicted the world of real PKI hierarchies and their relations as:
Weird things can happen if a web server is available on an internal network and accessible by the external world (…via a reverse proxy. I am assuming there is no deliberate termination of the SSL connection at the proxy – what I call a corporate-approved man-in-the-middle attack). This server knows the internal certificate chain and sends it to the external client – which does not trust the corresponding internal-only Root CA. But the chain sent in the handshake may take precedence over any other chain found elsewhere so the client throws an error.
How to Really Use “Cross-certification”
As confusing cross-certification is – it can be used in a peculiar way to solve other PKI problems – those with applications that cannot deal with the validation of a hierarchy at all or who can deal with only a one-level hierarchy. This is interesting in particular in relation to devices such as embedded industry systems or iPhones.
Assuming that only the needed certificates can be safely injected to the right devices and that you really know what you are doing the fully pesky PKI hierarchy can be circumvented by providing an alternative Root CA certificate to the CA at the bottom of the hierarchy:
The real, full blown hierarchy is
- Root CA issued a root certificate for Root CA (itself). It contains the key 1234.
- Root CA issues a certificate to Some Other CA related to key 5678.
… then the shortcut hierarchy for “dumb devices” looks like:
- Some Other CA issues a root certificate to itself, thus to Subject named Some Other CA. The public key listed in this certificate is 5678 the same as in certificate (2) of the extended hierarchy.
Client certificates can then use either chain – the long chain including several levels or the short one consisting of a single CA only. Thus if certificates have been issued by the full-blown hierarchy they can be “dumbed-down to devices” by creating the “one-level hierarchy” in addition.
Names and Encoding
In the chain of certificates the Issuer field in the certificate of the Subordinate CA needs to be the same as the Subject field of the Root CA – just as the Subject field in my National ID certificate contains my name and the Issuer field that of the signing CA. And it depends on the application how names with be checked. In a global world, names are not simple ASCII strings anymore, but encoding matters.
Certificates are based on an original request sent by the subordinate CA, and this request most often contains the name – the encoded name. I have sometimes seen that CAs changed the encoding of the names when issuing the certificates, or they reshuffled the components of the name – the order of tags like organization and country. An application may except that or not, and the reasons for rejections can be challenging to troubleshoot if the application is running in a blackbox-style device.
Revocation List Headaches
Certificates (X.509) can be invalidated by adding their respective serial number to a blacklist. This list is – or actually: may – be checked by relying parties. So full-blown certificate validation comprises collecting all certificates in the chain up to a self-signed Root CA (Subject=Issuer) and then checking each blacklist signed by each CA in the chain for the serial number of the entity one level below:
The downside: If the CRL isn’t available at all applications following the recommended practices will for example deny network access to thousands of clients. With infrastructure PKIs that means that e.g. access to WLAN or remote access via VPN will fail.
This makes desperate PKI architects (or rather the architects accountable for the application requiring certificate based logon) build all kinds of workarounds, such as switching off CRL checking in case of an emergency or configuring grace periods. Note that this is all heavily application dependent and has to be figured out and documented individually for emergencies for all VPN servers, exotic web servers, Windows domain controllers etc.
A workaround is imperative if a very important application is dependent on a CRL issued by an “external” certificates’ provider. If I would use my Austrian’s digital ID card’s certificate for logging on to server X, that server would need tp have a valid version of this CRL which only lives for 6 hours.
The predicament is that CRLs may be cached for performance reasons. Thus if you publish short-lived CRLs frequently you might face “false negative” outages due to operational issues (web server down…) but if the CRL is too long-lived it does not serve its purpose.
Ideally, CRLs would be valid for a few days, but a current CRL would be published, say every day, AND you could delete the CRL at the validating application every day. That’s exactly how I typically try to configure it. VPN servers, for example, have allowed to delete the CRL cache for a long time and Windows has a supported way to do that since Vista. This allows for reasonable continuity but revocation information would still be current.
If you cannot control the CRL issuance process one workaround is: Pro-active fetching of the CRL in case it is published with an overlap – that is: the next CRL is published while the current one is still valid – and mirroring the repository in question.
As an aside: It is more difficult as it sounds to give internal machines access to a “public” external URL. Machines not necessarily use the proxy server configured for user (which cause false positive results – Look, I tested it by accessing it in the browser and it works), and/or machines in the servers’ network are not necessarily allowed to access “the internet”.
CRLs might also simply be too big – for some devices with limited processing capabilities. Some devices of a major vendor used to refuse to process CRLs larger than 256kB. The CRL associated with my sample certificate is about 700kB:
Emergency Revocation List
In case anything goes wrong – HSM inaccessible, passwords lost, datacenter 1 flooded abd backup datacenter 2 destroyed by a meteorite – there is one remaining option to keep PKI-dependent applications happy:
Prepare a revocation list in advance whose end of life (NextUpdate date) is after the end of validity of the CA certificate. In contrast to any backup of key material this CRL can be “backed up” by pasting the BASE64 string to the documentation as it does not contain sensitive information.
In an emergency this CRL will be published to the locations embedded in certificates. You will never be able to revoke anything anymore as CRLs might be cached – but business continuity is secured.