DANE in SMTP—the sky is not falling

The paper’s abstract purports “pervasive mismanagement” in the DANE (SMTP) ecosystem. We believe that the paper is often misleading and at times outright wrong.

Misleading primary claims:

36% of TLSA records cannot be validated due to missing or incorrect DNSSEC records

This claim is substantially misleading. The 36% is mostly composed of domains which, though internally signed, lack a signed delegation (DS records) in the parent zone, or in other words the domain is effectively unsigned from the perspective of most resolvers¹.

The typical situation is an unsigned domain whose MX hosts are in a signed zone and have TLSA records. This is common because some DNS providers (e.g. Cloudflare) routinely sign hosted zones, even when they are not yet in a position to provision the corresponding DS records.

Signing a zone that is not yet delegated security is NOT mismanagement. Nor is it mismanagement for an unsigned domain to use a DANE-enabled email hosting provider with signed MX hosts that have TLSA records.

It is claimed that 14% of TLSA records do not match the presented certificates, but this figure is not plausible in light of our own measurements. We find that ~3.4% percent of TLSA RRsets fail to match the server certificate chain.

Perhaps the authors failed to take into account that in a TLSA RRset it is sufficient for any one TLSA record to match the certificate chain. It is in fact normal to also find additional TLSA records that do not match the ceritificate chain, especially when key rollover is handled by pre-publishing TLSA records for future keys.

As outbound deployment grows making any misconfiguration more readily apparent to the domain owner, and as more DANE-aware tools for automating certificate updates become available, we expect that the error rate will decline. But even the current rate is 10 times smaller than reported in the paper, and its impact is much lower still.

14.17% of them (TLSA records) are inconsistent with their certificates

This is misleading, because it fails to weight the TLSA RRsets by the number of affected domains². A problem with the TLSA records of an MX host for a vanity domain hosting the email of single hobbyist is less significant than a problem with a professionally operated MX host that supports thousands of domains or millions of user mailboxes.

We also take issue with the number, in a more comprehensive survey, covering all signed TLDs, we find 301 TLSA RRsets not consistent with their certificates (including cases where the MX host does not consistently offer STARTTLS) out of 8370 SMTP servers with TLSA RRs.

This shows that only 3.58% of SMTP servers with TLSA records have the reported issues, but as mentioned above, this is not the most relevant metric.

The number of affected domains is 462, as compared with 1.84 million domains with DANE-enabled SMTP servers. This shows that operator error is largely confined to a small number of SOHO domains.

only four email service providers support DANE for both outgoing and incoming emails, but two of them have drawbacks of not checking the Certificate Usage in TLSA records

Of course more than just the four reported providers support outbound DANE. Some of the early adopters not mentioned are posteo.de, mailbox.org, kabelmail.de and udmedia.de. In addition to web.de, 1&1 also operate gmx.de (with millions of email users between them), both domains are DANE-enabled inbound and outbound. There are many more, though admittedly most and especially the largest providers such as gmail.com, outlook.com and yahoo.com are not yet on board.

As for certificate usage checking, the claim is simply wrong, perhaps due to a failure to understand Section 3.1 of RFC7672 where only DANE-EE(3) and DANE-TA(2) are defined applicable in (MTA-to-MTA) SMTP. The PKIX-EE(1) and PKIX-TA(0) certificate usages are undefined in SMTP and client MTAs MAY treat them as “unusable”.

When DANE was first implemented in Postfix 2.11, it was not yet clear how closely that advice would be heeded by operators implementing inbound DANE. Since a PKIX-EE(1) TLSA record provides sufficient information to identify the end-entity certificate, Postfix treated the undefined PKIX-EE(1) as equivalent to DANE-EE(3).

Once it became clear that operator practice matched the RFC, and PKIX-EE(1) was used only by a tiny number of domains (~30 at present), the ad-hoc mapping of PKIX-EE(1) to DANE-EE(3) was withdrawn in Postfix 3.2.

MTAs running Postfix 2.11, 3.0 or 3.1 do indeed consider PKIX-EE(1) to be equivalent to DANE-EE(3), but they are at liberty to do so, per RFC7672.

Other issues

…the complexity of DANE leads to many opportunities for mismanagement… TLSA records may have DNSSEC errors such as expired signatures…

True, but actually very rare. When automatic zone resigning fails, it rarely fails for just the TLSA records. More typically the DNSKEY RRset signature would also expire, and the entire zone becomes “bogus”.

DNSSEC failure is not DANE-specific. All DNS resolution for the domain fails for users behind validating DNS resolvers, such as the popular public resolvers at 8.8.8.8, 1.1.1.1, 9.9.9.9, etc.

Out of ~10.8 million domains in the DANE/DNSSEC survey operated by the authors of this document, ~60 thousand have DNSSEC resolution issues, with the majority of those being long-standing breakage for parked domains where nobody cares whether they’re working or not, and so the problems are ignored.

Working DNS (including DNSSEC) is required for DANE, and though DNS servvice outages do happen for some domains some of the time, this is not a significant problem.

… or the certificates may be inconsistent with published TLSA records

As already explained, this is not a significant problem in practice, when properly weighted by impact. Also, since email is store-and-forward, a transient mismatch that is promptly corrected generally does not result in loss of email.

… On the client side, DNS resolvers may not validate TLSA records properly

This is speculative. No evidence of this being an actual problem for sites implementing outbound DANE is presented.

… buggy TLS applications do not bother to check the validity of certificates

This is wrong. The authors of the paper failed to note that in SMTP with DANE-EE(3), per RFC7672 both the expiration date and all names in the certificate MUST be ignored.

In this paper, we present a comprehensive study of the entire DANE ecosystem for SMTP

But in fact only the top three (by DANE domain count) TLDs nl, se and com, plus numbers 12 and 15 (net and org).

though most of the mail servers that provide TLSA records (99.5%) present their certificates through STARTTLS, we find that over 14% of them do not match the presented certificates

As explained above, we find that the actual rate is closer to 3.4% and that this fraction is still misleading, because not all MX hosts are equal, and argue that the impact of that 3.4% (fraction of email that is delayed or not delivered as a result of TLSA misconfiguration) is much lower (though difficult to estimate correctly).

when focusing on 29 popular email providers, we find that only four of them support DANE for their outgoing and incoming emails and one provider only supports DANE for incoming emails

The ad-hoc list of providers is by no means comprehensive.

we tested four popular MTA … implementations to see if email providers can easily support DANE; we find that two popular MTAs correctly support DANE for both incoming and outgoing emails

The list is far from comprehensive. MTAs that support DANE include not only Postfix and Exim, but also Halon, PowerMTA, CloudMark, Cisco ESA, MeTA1, indimail-mta, …

DANE-TA(2) description is not correct

These are not necessarily root CAs in the usual sense of top-level certification authority self-signed certificates. Nor are the leaf certificates necessarily directly signed by the DANE-TA(2) trust-anchor.

The verification of a certificate chain via a DANE-TA(2) trust anchor is still PKIX validation, it just employs the DANE-supplied trust anchor, rather than one of the (typically CA/B Forum WebPKI) CAs pre-configured on the client.

Thus, the DANE operational practice recommends to avoid using PKIX-EE and PKIX-TA

More to the point, RFC7672 specifically designates these as not defined in SMTP.

Unfortunately, STARTTLS is well-known to be vulnerable to downgrade attacks,… …many TLS clients ignore mismatches between MX records and the domain names in the certificates or continue email transmissions even with invalid certificates

As explained in RFC7435, unauthenticated opportunistic TLS is a feature, not a bug. Its job is to protect against passive monitoring, not active attacks, and it performs that job remarkably well. As a result of allowing less stringent security, to this day, a substantially larger fraction of email (than HTTP) traffic is encrypted, though more recently HTTP has been catching up.

What the authors are calling “invalid certificates”, are in fact perfectly valid public key containers (along with some additional baggage required by the TLS protocol), that work reliably with opportunistic TLS, in which which the additional useless baggage is ignored and only the key is used to encrypt the SMTP traffic.

Indeed, when doing opportunistic TLS Postfix supports (and prefers) anonymous TLS ciphersuites, which don’t use a certificate at all. Unfortunately, support for these has been removed in TLS 1.3, so use of these will gradually diminish.

Because the SMTP protocol can use three possible port numbers (25, 465, and 587), we send three TLSA record requests for each MX record

This is a mistake. The SMTP protocol, two different services, SMTP (MTA-to-MTA) and SUBMIT (MUA-to-MTA). MX records are only defined for SMTP, and NOT for SUBMIT. There is no expectation that the same hosts are both the inbound SMTP servers and the outbound submission servers for a given domain.

MUAs do not perform MX lookups to find their submission servers. And since no MUAs are known to support DANE, it makes no sense to probe for working or non-working TLSA records on the submission ports (587 and 465).

a substantial portion of domains from .com, .net, .org, and .nl partially deployed TLSA records; on average 18% of .com, .net, .org and 39% of .nl domains did not fully deploy TLSA records in our oldest snapshot, which implies that these domains were susceptible to downgrade attacks

This is misleading. For example, in .NL many domains have (for a long time now, likely predating DANE) been using relay.transip.nl as backup MX host. The operators of these domains have not specifically made any concious decision to deploy DANE.

Some time ago transip.nl pubslished TLSA records for that backup MX host. As a result, a large number of .NL domains have DANE TLSA records for only their backup MX host. But is neither a configuration error, nor a security issue for the domains in question. DANE SMTP security is not expected for these domains, and its deployment on the backup MX is a harmless security feature for any mail that does happen to be sent via the backup MX.

A more accurate rate of partial deployment should be derived only for domains where the primary MX hosts have TLSA RRs, but some secondary MX hosts do not. In our survey, we find that when all the primary MX hosts have TLSA RRs, only ~0.22% of domains then also have some secondary MX hosts that do not have TLSA RRs. If we count domains where at least one primary MX host has TLSA records, then the rate rises to 0.30%.

Methodology issue.

It appears that no measurements were performed for domains without MX records. But, in SMTP a domain without MX records is implicitly its own MX host, and a non-trivial fraction of DNSSEC-signed domains have no MX RRs.

There is also no mention of checking the security status of the A and AAAA records of the explicit or implicit MX hosts before proceeding to TLSA lookups.

Therefore, the reported number of TLSA lookups each hour is both much too low, likely because domains with no MX RRs are not included, and perhaps also includes some MX hosts that should be excluded, because their address records are unsigned.

we see how many signed TLSA records do not have corresponding DS records

As explained previously, signed zones that are not securely delegated are common and are not an operational error. They do represent known obstacles to getting DS RRs published, but are not a DANE-specific issue.

That said, a claimed 18.5% of TLSA RRsets being in zones that are not delegated signed is much higher than expected. Most likely this is instead counting unsigned domains whose MX hosts are in signed domains with TLSA records (reported as 19% earlier in the paper).

about 30% of signed domains do not upload DS records because of mismanagement by large hosting service providers that provide authoritative DNS servers for their customers

We would be loathe to call proactive signing of hosted zones mismanagement. Just because it is not in always presently practictal to arrange for DS records in the parent zone, does not mean that it is then somehow irresponsible or negligent to sign the hosted zone. Especially when the DNS hosting provider is not also the domain registrar, and the domain holder has sole authority to publish the DS RRs, while the zone signing is done by the DNS provider.

Signing the zone gives customers the opportunity to later (at their leisure) enable DNSSEC by uploading the associated DS or DNSKEY records to their registrar.

on average 14.17% of the certificates cannot be validated due to a mismatch with their corresponding TLSA records

This number is not consistent with our DANE survey, even if the denominator is distinct MX hosts, rather than affected domains. As noted above, we find that rate to be ~3.4%, but again this is not the right metric for understanding the impact of the problem.

for example, only 0.006% of .se domains cannot be validated due to missing or invalid DNSSEC or STARTTLS configurations, while .org domains show a much higher error rate of 1.65%, which is 275 times higher

This mixes apples and oranges. Missing DNSSEC (i.e. no DS RRs) is NOT a deployment problem, is merely non-deployment (presently status-quo for most domains). It is wrong to combine the non-deployment number with the number of mismanaged deployments that cause outages.

Out of ~10.1 thousand .org domains with at least one DANE-enabled MX host, 26 (~0.25%) have an MX host that has incorrect TLSA records or fails to perform STARTTLS.

Surprisingly, for almost 8,200 .nl domains, the TLSA records were invalid for 7 hours on October 19, 2019

In fact what the text reports is a transient DNSSEC outage for the entire zone, which is not a TLSA record issue. A validating resolver would not even return address record for the MX host, so no mail would be sent, even without DANE, so long as the sender’s MTA used a validating DNSSEC resolver.

The outage was resolved rather quickly, which hints at timely monitoring and a responsive operational team, be it after a procedural glitch. Outages of this sort happen from time to time even to much larger providers, and even absent any involvement from DNSSEC.

The text on “Unsuitable Usages” in Section 5.5 of the paper is a muddle.

If the domain owner has a certificate issued by a CA, but serves a TLSA record with the DANE-EE or DANE-TA usage, they do not benefit fully from the security measures that DANE provides (instead, they should use the PKIX-EE or PKIX-TA Certificate Usage)

This is simply wrong, it makes no difference who issued the certificate, with DANE-EE(3) it is validated directly, and with DANE-TA(2) it is validated with respect to the designated trust-anchor, which could be an external or private CA, makes no difference.
Moreover, the validity periods of such certificates are usually determined by CAs, which are usually short. Thus, domain owners incur additional complexity as they need to update their TLSA records whenever the certificates are re-issued

This is why DANE-EE(3) SPKI(1) SHA2-256(1) TLSA records are the recommended best practice in such cases, they stay valid even when a new certificate is issued for the same key.

Another option is DANE-TA(2) SPKI(1) SHA2-256(1) TLSA records, which remain valid so long as the issuer CA key is unchanged.
Therefore, a domain name owner should avoid setting their TLSA records with the DANE-EE or DANE-TA usage when they serve a certificate issued by a CA

This is outright wrong. ONLY those usages are valid in SMTP, regardless of whether the certificate is issued in-house, or by a third-party CA.
We then configure OpenSSL to trust the set of root CA certificates in the Ubuntu 18.04 LTS root store; the validation would fail if the certificates for the TLSA records are custom certificates. Surprisingly, we find that on average 90.58% and 90.37% of TLSA records with DANE-EE and DANE-TA are still valid, which means that the certificates are valid in terms of PKIX, not custom certificates

It is not at all surprising to find that a large fraction of TLSA records are configured to validate certificates from a public CA, most often Let’s Encrypt.
Consequently, these records could have used PKIX-EE or PKIX-TA Certificate Usages, thus having the additional benefit of certificate validation through two independent mechanisms (DANE and PKIX)

No this is simply wrong. The only correct certificate usages for SMTP are DANE-EE(3) and DANE-TA(2), regardless of the provenance of the certificate.

To analyze the rollover behaviors more accurately, we remove the TLSA records from our considerations when (1) their TTLs are shorter than our scan resolution (i.e., 1 hour)

This biases the sample away from domains that carefully plan for prompt recovery from inadvertently incorrect TLSA records, by using a short TTL of less than 60 minutes.

In our TLSA RRset dataset, the median TLSA RRset TTL is 2600s. So in a misguided attempt to never miss a failed TLSA RRset change, the above filter removes more than half of the TLSA records, and likely, more often than not, the ones that are well managed.

One might also note that given the wide use of greylist with minimum retry times measured in tens of minutes, whether a transient TLSA mismatch of the same duration is essentially similar to just another (less fragile) form of greylisting. If email delays of that magnitude are expected and acceptable to some operators, then given a sufficiently short TTL, they need not bother with a more fancy rollover scheme than avoids transient delays.

We observe that only 124 domains (8.5%) domains have maintained two or more types of TLSA records with mixed usages such as maintaining DANE-EE and DANE-TA together; this allows administrators to change the leaf certificate and its TLSA records with DANE-EE usage immediately as long as it is signed by the certificate that the TLSA records with DANE-TA usage specify. Due to this advantage, we find that 109 (87.9%) of them successfully roll their keys without any validation failures

This is not the only way of proactively making key rollovers more robust. Another approach is to generate the next key and pre-publish a matching DANE-EE(3) SPKI(1) TLSA record as soon as the current key is deployed (the “3 1 1 + 3 1 1” BCP).

That said, it good to see confirmantion that the alternative “3 1 1 + 2 1 1” BCP is generally more reliable than average.

1,335 domains (91.4%) have a single TLSA record usage; in this case, the administrators need to make sure that they pre-publish the new TLSA records well in advance of a key rollover. However, we observe that the vast majority of them (1,257 or 94.2%) experience at least one validation failure during their rollovers. From further investigation, we observe that 939 of them (74.7%) introduced new certificates and the corresponding TLSA records at the same time without considering the TTL of the TLSA records or only introduced new TLSA records after changing certificates

This sample is somewhat biased towards the SOHO systems that implement TLSA record updates as part of their “Let’s Encrypt” certificate update process, and don’t presently care about the transient outages, as DANE-validating sending MTAs are also the sort of MTAs that reliably retry deliveries, and so the mail arrives anyway, be it slightly delayed.

We do not condone this sloppiness, and encourage even the SOHO operators to adopt either the “3 1 1 + 3 1 1” or the “3 1 1 + 2 1 1”³ approach.

Yes, the SOHO operators are prone to fragile key rollover practices, some learn from their first mistake and make improvements to avoid future issues, others continue to repeat the issue each time the certificate is re-issued.

Fortunately for them (but sadly for the unlucky sender who happens to be sending email to one of these domains) the impact of this malpractice is presently too low to motivate these operators to improve their practices. This may change as more senders begin to validate, and more mail is intermittently delayed. Improved key rollover tooling may also help to move the SOHO operators to better practices by reducing the effort required to implement BCPs.

In order to obtain a list of popular email providers, we use the approach from a previous study [36]; we refer to Adobe’s leaked user email database from 2013 [43] to rank the email domains based on popularity and choose the top 25 providers. We also add recent popular email service providers: protonmail.com, tutanota.com, zoho.in, fastmail.com, and runbox.com

A lot has changed since 2013. Unfortunately (?!?), more recent leaks of that scale were not available. Also DANE SMTP is still in its early deployment phase. While user-facing technology can change rapidly, Internet infrastructure techonology upgrade cycles can last decades (can anyone say IPv6?).

Therefore, evidence of DANE deployment is more readily found among mid-size early adopters, than the very largest providers, and some of these are recent arrivals. The manully added providers help, but it is no longer possible to guage the associated user count fromt the Adobe sample. More known DANE-enabled providers could have been added.

That said, there is some evidence of upcoming DANE deployment even among the larger players, the next couple of years should be interesting. But overall it is somewhat early to draw conclusions about where DANE adoption is heading. It could stall at at a low rate, or it may yet take off. Time will tell.

In some cases, the DNS resolver that an SMTP client uses resides outside its own administrative domain (e.g., it uses a public DNS resolver like Google Public DNS). We examine whether the DNS resolver is managed by a third party such as a public DNS resolver

Such a test can be misleading, because the SMTP client’s local validating resolver may be forwarding cache misses to an upstream public resolver. When the authoritative server sees queries via a third party it is not necessarily the case that the sender is not also doing validation locally.

Our DNSSEC/DANE survey forwards a subset of its queries to various public resolvers, but the results are ultimately validated locally.

Even more alarmingly, of the seven email service providers that do fetch DNSKEYs and DS records, we find that three email providers (mynet.com, sapo.pt, and sina.com) explicitly disable DNSSEC validation by setting the CD bit

This is a basic misunderstanding of the role of the CD bit. Validating resolvers legitimately set both the DO and CD bits, thereby bypassing any validation issues upstream, and performing all requisite validation for themselves. The CD bit is NOT evidence of lack of validation, in fact it is likely quite the contrary, a resolver that sets the CD bits is as much or more likely DNSSEC-aware and actually doing its own validation.

Finally, we observe that 9 out of 29 mail service providers use DNS resolvers outside their own network, which makes them vulnerable to man-in-the-middle attacks

As mentioned above, this conclusion is not necessarily valid, a validating forwarding resolver avoids the issue, while taking advantage of an upstream cache. This is likely to become more common as more resolvers add support for DoH, and more users (wisely or otherwise) configure their resolvers to use it.

we also observe that 24 out of the 29 mail service providers support STARTTLS; … However, we find that none of the 24 email service providers correctly verify presented certificates; they successfully complete the TLS handshake even though destination email servers present expired or self-signed certificates, or even certificates whose Common Name fields are inconsistent with their corresponding MX records

In other words, all the providers correctly implement opportunistic TLS, and don’t shoot themselves in the foot by needlessly falling back to clear text in the face of irrelevant certificate details. It is good to see common-sense prevailing.

However, we observe that mail.com, tutanota.com do not check whether the Certificate Usage value of the TLSA record is consistent with the certificate. That is, we present a self-signed certificate through STARTTLS, but the TLSA record sets its Certificate Usage to PKIX-EE. Given that self-signed certificates can never be PKIX valid, they should have rejected the invalid certificates during the TLS handshake

As explained above, this is normal expected behaviour from Postfix 2.11–3.1. The PKIX-EE(1) usage is treated as though it were DANE-EE(1), and no harm is done. Since the benefit of this work-around proved miniscule, it was withdrawn in Postfix 3.2.

So with some confidence we can guess that these providers were using Postfix in that version range.

If email service providers wish to support DANE, the software of their DNS servers and DNS resolvers must be able to understand TLSA records and to support DNSSEC to validate DNS responses

This is not correct, just DNSSEC support is sufficient. Neither authoritative servers, nor especially iterative resolvers need any special knowledge of TLSA records.

Only the application consuming the DNS response needs to understand TLSA records. The primary authoritative server for a zone needs to provide some mechanism to insert TLSA records to its database, but this too can often be done generically, if need be, e.g.

_25._tcp.example.org. IN TYPE52 \# 35 03010101ba4719c80b6fe911b091a7c05124b64eeece964e09c058ef8f9805daca546b

To obtain a list of popular open source DNS programs, we refer to prior work that identified DNS software programs running on second-level domains for the .com, .net, .org TLDs. In total, we investigated ten DNS software programs

That list is a list of authoritative nameserver implementations, but the goal was ostensibly to understand the capabilities of recursive resolvers used by clients. For this just support DNSSEC validation is sufficient.

We notice that all of the MTA programs support STARTTLS for both incoming and outgoing emails. However, we find that only Exim and Postfix support DANE

With a sample of just four MTAs, including Sendmail which is no longer maintained, and Exchange which sees few new features (Microsoft wants users to move to Cloud services). It is not surprising that just two support DANE.

A better sample could include PowerMTA, Halon and CloudMark (used typically by service provider), and perhaps also note Cisco ESA, and even cloud providers like ProofPoint.

DANE support in practice is poor among 29 popular email service providers: only five of them support DANE for incoming emails and four of them support DANE for outgoing emails

As mentioned above, more DANE-enabled providers could have been included, and the study omits DANE-enabled hosting providers, were instead of registersing a user account, one would have to register a domain and relay mail through the provider’s outbound MTAs.

As to four out of five doing outbound, the fifth is protonmail.ch, and their inbound implementation was still quite new at the time the study concluded. If they’ve not yet implemented outbound DANE, they will be shortly be shamed into doing so. :-)

Study Conclusions

DANE deployment is scarce but increasing

Correct.

More than one third of all the TLSA records cannot be validated due to missing or incorrect DNSSEC records

Rather misleading by conflating lack of DS RRs with misconfiguration.

14% of the certificates are inconsistent with their TLSA records

The 14% misconfiguration figure is both suspect (inconsistent with our ~3.4% measurement) and fails to take impact into account.

On the SMTP client side, we measured 29 popular email service providers to understand how they support DANE; we found only four of them support DANE for both outgoing and incoming emails, and one email service provider does so only for incoming emails. We also tested four MTA and ten DNS software programs, and found that two of the MTA and seven of the DNS programs support DANE correctly, which implies that the administrators willing to deploy DANE would not find any operational challenges

The selected providers, MTAs and DNS servers are not necessarily the most appropriate choices.

Unless a resolver happens to be explicitly configured with a trust anchor for that domain.↩
A better metric would be number of affected recipient mailboxes, but for that one would need to know the number of mailboxes hosted at each domain, which is not something that can be easily measured.↩
Less secure, as a result of also trusting Let’s Encrypt to never misissue a certificate for their SMTP server, but CA domain-control validation is fairly weak.↩

Abstract

Misleading primary claims:

Other issues

Study Conclusions