Understanding modern SMTP and email Anti Spam protocols (SPF, DKIM, DMARC)

Understanding modern SMTP and email Anti Spam protocols

SMTP protocol

Without going into the details of the protocol itself, you can refer to Wikipedia: https://en.wikipedia.org/wiki/Simple_Mail_Transfer_Protocol

There is only one key point here: SMTP distinguishes between Envelope and Data. Everything we normally see in a mail client is DATA, including the From: and To: fields (which is what we normally see as the To: and From: fields). However, the so-called Envelope also has MAIL FROM and RECP TO data, and these two are generally not the same as FROM/TO in DATA! More on this below.

Agents

Modern SMTP divides the process of delivering an email into a number of agents:

A standard delivery process is:

MUA -> MSA -> MTA -> … -> MTA -> MDA -> MUA

Except for the last MDA to MUA, all agents are actually connected to each other by the SMTP protocol. So the behaviour of the protocol is actually different depending on the agent:

MUA -> MSA:

It can be understood as the first station where emails enter the Internet, MSA is responsible for verifying the authenticity of the emails, that is to say, verifying that the corresponding MUA has the right to send the emails. Usually this is done by using SMTP Auth, which is a set of usernames and passwords that the mail service provider requests from the user. As well, MSA will sign the email with DKIM.

MSA -> MTA, MTA -> MTA, MTA -> MDA:

These three are actually equivalent in the mail relay process (i.e. there is a direct link from the MSA to the MDA). If the mail is relayed within the same service provider, e.g. gmail to gmail, then even then the MSA and MDA can be the same process). The receiving Agent also checks the authenticity of the sending Agent, but not by username and password (described in more detail below).

It is worth noting that in all of these Agents, the program determines how the mail comes and goes by using MAIL FROM and RECP TO in the Envelope; FROM and TO in the DATA do not determine where the mail goes. We discuss them separately below.

MAIL FROM

Generally speaking, all Agents except MUA will set Mail From to a value that they control. As a concrete example, I own the domain c7.io which is configured with CF's mail forward, which means that any email sent to [email protected] will be forwarded to my gmail mailbox.

So, if an outlook.com mailbox sends mail to [email protected], the mail will be delivered like this:

As you can see, the MAIL FROM is changed every time, and the identification still uniquely identifies the corresponding email. The purpose of this is to allow the mail service provider to do additional processing of bounce-back messages, and to provide an opportunity for the Anti spam protocol to follow.

RECP TO

While we're on the subject of MAIL FROM, let's mention RECP TO. It has nothing to do with spam, RECP TO is generally the same as TO+CC in DATA, unless there is bcc. If it is bcc, then the target email address will not exist in DATA (that's why it is called blind), it will only appear in RECP TO.

Anti spam

The SMTP protocol itself was not designed with SPAM in mind. So there are many patches in the protocol itself. As mentioned above, when MUA goes to MSA, MSA will check the username and password of the user, and obviously check the FROM field in DATA to make sure that the logged in user has the right to send emails with the FROM identity. However, in the process of MSA, MTA and MDA communication, all of them are equivalent programmes of different service providers, so there is no possibility for anyone to set passwords on the other side in advance. Therefore, SPF and DKIM appear.

SPF

https://en.wikipedia.org/wiki/Sender_Policy_Framework

SPF is a very simple and straightforward protocol to use when transferring mail between MSAs, MTAs and MDAs. The receiver will query the SPF field of the sender's domain name. This field indicates which IPs are allowed senders for the domain. For example, when the c7.io MTA sends mail to the gmail MDA, the gmail server will query that c7.io has an SPF record: v=spf1 include:_spf.mx.cloudflare.net ~all . Then if the IP to which the mail is sent does not belong to mx.cloudflare.net, then it fails SPF.

DKIM

https://en.wikipedia.org/wiki/DomainKeys_Identified_Mail

SPF only looks at the identity of each agent from the previous hop, but it doesn't stop a malicious MTA from spoofing mail. Hence DKIM. When an email enters each agent (except MUA), the agent will sign several Headers and Bodies of the email (all in DATA field) and put the signing content in DATA field as well. the agent will use a private key for signing, and its corresponding public key will be published in the dkim record of the corresponding domain. In the spirit of DKIM, the purpose of this protocol is to verify that the message described in the DATA has not been tampered with in transit, i.e., the DKIM signature generated by the MSA is the most valuable. In practice, however, it seems that every agent inserted into the signature is verified by the receiver.

As a concrete example: still outlook->c7.io->gmail , the outlook MSA inserts a signature signed with the outlook private key. When c7.io MTA receives it, it will query outlook.com's DKIM record, that is, outlook's public key, and use it to verify whether the signature is valid. Then c7.io MTA will also sign the existing DATA with c7.io private key again. When gmail MDA receives an email, it will query both outlook and c7.io public keys (e.g. k=rsa; p= MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDckJFiBtn29uLex8LM2DG4zvZ9doM9v8veISK5rAoS2yU517rqZN/ gYGwhKVuvfmp86OJGKG2Z6SQG9JmcNQ7rGiVE6X99M71hm449ShkF29hG65lI9sFpjf/67bjnQcgwwj6q4aNKb9Rh3zc/gV4jtz+vfzaMTTcAdZbd8hKX3wIDAQAB ), then verify that both signatures are valid. Once a signature error is found, it is a DKIM fail.

DMARC

https://en.wikipedia.org/wiki/DMARC

The purpose of a DMARC is to tell an agent what to do with an email if SPF or DKIM fails, and also exists in the domain DNS record. DMARCs can ask the agent to continue to release the email (which may not be complied with), but more commonly they ask the agent to set the email to spam and to notify a specified address.

Other

Many agents will add additional validation. For example, Gmail requires that the reverse DNS of all incoming IPs correspond to the domain name of their MAIL FROM. This seems to be a mirror image of SPF.

Send and receive emails with your own domain name

To summarise the above, if you want to send and receive emails with your own domain name, you need to do:

You need to configure SPF, DKIM, DMARC correctly, which is not required but recommended. And you need to use it with an MSA that has the corresponding DKIM private key (that's why it's so hard to send an email directly without a mail relay (e.g. AWS SES, Mailgun) in modern times).

Just configure the MX of the domain to point to the MDA. Generally people use third-party services, so the MX should point to a third-party server.

Summary

SMTP is a typical design.

SMTP is a classic example of not being well thought out by design and therefore constantly patched. At least Gmail and Outlook now offer a "View Original" feature, which allows you to see the original DATA part of the message in detail. It describes in detail what each MTA inserted, including the address of the Envelope the Agent got, the SPF, DKIM check it did, and the new DKIM signature. You can go to your own mailbox to take a look and compare with the content of the article.

This article is a bit sketchy in some details, because some complex areas are difficult to say clearly in the vernacular language. I hope this article can play the role of an outline. Specific production if you encounter problems or to read the protocol standard.