March 21, 2020
This post is a brief rebuttal to a recent paper: A Longitudinal and Comprehensive Study of the DANE Ecosystem in Email
The paper’s abstract purports “pervasive mismanagement” in the DANE (SMTP) ecosystem. We believe that the paper is often misleading and at times outright wrong.
36% of TLSA records cannot be validated due to missing or incorrect DNSSEC records
Although not specifically marked as "mismanagement" in the statement, this claim is very misleading. The 36% is almost entirely composed of domains which, though internally signed, lack a signed delegation (DS records) in the parent zone, or in other words the domain is effectively unsigned from the perspective of most resolvers 1.
The typical situation is an unsigned domain whose MX hosts are in a signed zone and have TLSA records. This is common because some DNS providers (e.g. Cloudflare) routinely sign hosted zones, even when they are not yet in a position to provision the corresponding DS records.
There are a large number of zones that can't be validated because of a missing DS record in the parent, and are thus part of the unsigned-portion of the DNS tree. However, signing a zone that is not yet linked with a delegated security (DS) record in the parent is NOT mismanagement, it is merely a way to make it easy to later enable DNSSEC. Nor is it mismanagement for an unsigned domain to use a DANE-enabled email hosting provider with signed MX hosts that have TLSA records.
14.17% of them (TLSA records) are inconsistent with their certificates
This appears in the paper as originally published, and we found this to be both inaccurate and misleading. Indeed after we reported the discrepancy to the authors, the number was corrected to 3.68% in an errata document, matching our finding that only ~3.7% percent of TLSA RRsets (in the 5 TLDs covered by the paper) fail to match the server certificate chain. Looking across a broader set of TLDs in our survey, the figure is slightly higher at ~4.2%, but this is still misleading, see below.
This percentage of wrong TLSA records is misleading because it fails to weight the TLSA RRsets by the number of affected domains.2. While the problem MX hosts number ~400 out of ~9000, they serve only ~500 little-known home office or vanity domains out of over 2 million domains with DANE TLSA records for SMTP. A problem with the TLSA records of an MX host for a vanity domain hosting the email of single hobbyist is less significant than a problem with a professionally operated MX host that supports thousands of domains or millions of user mailboxes.
Of course even our measurements don't tell the full impact story, we don't know how many mailboxes are actively used in each domain. The take away is that all such metrics, based solely on what you can directly measure, are to be taken with a grain of salt.
As outbound SMTP/TLSA deployment grows it makes any misconfiguration more readily apparent to domain owners, and as more DANE-aware tools for automating certificate updates become available, we expect that the error rate will decline. But our measured error rate is ~4x smaller than originally reported in the paper, and its real-world impact is many times lower still.
… buggy TLS applications do not bother to check the validity of certificates
This measurement failed to take into account the specification properly, and thus found an incorrectly high result. The authors of the paper failed to note that in SMTP with DANE-EE(3), per RFC7672 both the expiration date and all names in the certificate MUST be ignored.
The DANE-TA(2) description is not correct
The authors may have not fully understood the semantics of the DANE-TA(2) and DANE-EE(3) certificate usages. The CAs matched by DANE-TA(2) records are not necessarily root CAs in the usual sense of top-level certification authority self-signed certificates. Nor are the leaf certificates necessarily directly signed by the DANE-TA(2) trust-anchor.
The verification of a certificate chain via a DANE-TA(2) trust anchor still uses PKIX validation, it just employs the DANE-supplied trust anchor, rather than one of the (typically CA/B Forum WebPKI) CAs pre-configured on the client. Most specifically, it ignores any pre-configure CAs and only uses the DANE supplied value by design.
Thus, the DANE operational practice recommends to avoid using PKIX-EE and PKIX-TA
More to the point, RFC7672 specifically designates these as "not aplicable with opportunistic DANE TLs" and goes on further to state "SMTP client treatment of TLSA RRs with certificate usages PKIX-TA(0) or PKIX-EE(1) is undefined.".
The above is not an exhaustive list of problems, but are some the more significant of the ones that we have found in the paper. There are many more problems. The paper's measurements and conclusions are therefore substantially undermined by these and other issues.