Understanding modern SMTP and email Anti Spam protocols (SPF, DKIM, DMARC)
SMTP protocol
Without going into the details of the protocol itself, you can refer to Wikipedia: https://en.wikipedia.org/wiki/Simple_Mail_Transfer_Protocol
There is only one key point here: SMTP distinguishes between Envelope and Data. Everything we normally see in a mail client is DATA, including the From: and To: fields (which is what we normally see as the To: and From: fields). However, the so-called Envelope also has MAIL FROM and RECP TO data, and these two are generally not the same as FROM/TO in DATA! More on this below.
Agents
Modern SMTP divides the process of delivering an email into a number of agents:
- MUA: Mail user agent, also known as mail client such as Thunderbird. In the case of Webmail, the web page you see is the MUA.
- MSA: Mail submission agent, is the first station provided by the mail service provider to the user, which is also called "SMTP server" in mail clients.
- MTA: Mail transfer agent, can relay mail.
- MDA: Mail deliver agent, is the server of mail target box. The user MUA can contact it to pick up the mail (POP3 protocol).
A standard delivery process is:
MUA -> MSA -> MTA -> … -> MTA -> MDA -> MUA
Except for the last MDA to MUA, all agents are actually connected to each other by the SMTP protocol. So the behaviour of the protocol is actually different depending on the agent:
MUA -> MSA:
It can be understood as the first station where emails enter the Internet, MSA is responsible for verifying the authenticity of the emails, that is to say, verifying that the corresponding MUA has the right to send the emails. Usually this is done by using SMTP Auth, which is a set of usernames and passwords that the mail service provider requests from the user. As well, MSA will sign the email with DKIM.
MSA -> MTA, MTA -> MTA, MTA -> MDA:
These three are actually equivalent in the mail relay process (i.e. there is a direct link from the MSA to the MDA). If the mail is relayed within the same service provider, e.g. gmail to gmail, then even then the MSA and MDA can be the same process). The receiving Agent also checks the authenticity of the sending Agent, but not by username and password (described in more detail below).
It is worth noting that in all of these Agents, the program determines how the mail comes and goes by using MAIL FROM and RECP TO in the Envelope; FROM and TO in the DATA do not determine where the mail goes. We discuss them separately below.
MAIL FROM
Generally speaking, all Agents except MUA will set Mail From to a value that they control. As a concrete example, I own the domain c7.io which is configured with CF's mail forward, which means that any email sent to [email protected] will be forwarded to my gmail mailbox.
So, if an outlook.com mailbox sends mail to [email protected], the mail will be delivered like this:
outlook MSA sends mail to c7.io MTA (that's CF's MTA), Mail From is [[email protected]](mailto:something- http://c7.io/) (note that this is not the case here). [email protected]) (note that this is not the sender, but an identifier generated by outlook itself)
c7.io MTA sends mail to gmail MDA, Mail From is [[email protected]](mailto:something-also-long-as-an- [email protected]) (here it becomes [c7.io]). [email protected]) (here it becomes c7.io)
As you can see, the MAIL FROM is changed every time, and the identification still uniquely identifies the corresponding email. The purpose of this is to allow the mail service provider to do additional processing of bounce-back messages, and to provide an opportunity for the Anti spam protocol to follow.
RECP TO
While we're on the subject of MAIL FROM, let's mention RECP TO. It has nothing to do with spam, RECP TO is generally the same as TO+CC in DATA, unless there is bcc. If it is bcc, then the target email address will not exist in DATA (that's why it is called blind), it will only appear in RECP TO.
Anti spam
The SMTP protocol itself was not designed with SPAM in mind. So there are many patches in the protocol itself. As mentioned above, when MUA goes to MSA, MSA will check the username and password of the user, and obviously check the FROM field in DATA to make sure that the logged in user has the right to send emails with the FROM identity. However, in the process of MSA, MTA and MDA communication, all of them are equivalent programmes of different service providers, so there is no possibility for anyone to set passwords on the other side in advance. Therefore, SPF and DKIM appear.
SPF
https://en.wikipedia.org/wiki/Sender_Policy_Framework
SPF is a very simple and straightforward protocol to use when transferring mail between MSAs, MTAs and MDAs. The receiver will query the SPF field of the sender's domain name. This field indicates which IPs are allowed senders for the domain. For example, when the c7.io MTA sends mail to the gmail MDA, the gmail server will query that c7.io has an SPF record: v=spf1 include:_spf.mx.cloudflare.net ~all . Then if the IP to which the mail is sent does not belong to mx.cloudflare.net, then it fails SPF.
DKIM
https://en.wikipedia.org/wiki/DomainKeys_Identified_Mail
SPF only looks at the identity of each agent from the previous hop, but it doesn't stop a malicious MTA from spoofing mail. Hence DKIM. When an email enters each agent (except MUA), the agent will sign several Headers and Bodies of the email (all in DATA field) and put the signing content in DATA field as well. the agent will use a private key for signing, and its corresponding public key will be published in the dkim record of the corresponding domain. In the spirit of DKIM, the purpose of this protocol is to verify that the message described in the DATA has not been tampered with in transit, i.e., the DKIM signature generated by the MSA is the most valuable. In practice, however, it seems that every agent inserted into the signature is verified by the receiver.
As a concrete example: still outlook->c7.io->gmail , the outlook MSA inserts a signature signed with the outlook private key. When c7.io MTA receives it, it will query outlook.com's DKIM record, that is, outlook's public key, and use it to verify whether the signature is valid. Then c7.io MTA will also sign the existing DATA with c7.io private key again. When gmail MDA receives an email, it will query both outlook and c7.io public keys (e.g. k=rsa; p= MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDckJFiBtn29uLex8LM2DG4zvZ9doM9v8veISK5rAoS2yU517rqZN/ gYGwhKVuvfmp86OJGKG2Z6SQG9JmcNQ7rGiVE6X99M71hm449ShkF29hG65lI9sFpjf/67bjnQcgwwj6q4aNKb9Rh3zc/gV4jtz+vfzaMTTcAdZbd8hKX3wIDAQAB ), then verify that both signatures are valid. Once a signature error is found, it is a DKIM fail.
DMARC
https://en.wikipedia.org/wiki/DMARC
The purpose of a DMARC is to tell an agent what to do with an email if SPF or DKIM fails, and also exists in the domain DNS record. DMARCs can ask the agent to continue to release the email (which may not be complied with), but more commonly they ask the agent to set the email to spam and to notify a specified address.
Other
Many agents will add additional validation. For example, Gmail requires that the reverse DNS of all incoming IPs correspond to the domain name of their MAIL FROM. This seems to be a mirror image of SPF.
Send and receive emails with your own domain name
To summarise the above, if you want to send and receive emails with your own domain name, you need to do:
- Send mail
You need to configure SPF, DKIM, DMARC correctly, which is not required but recommended. And you need to use it with an MSA that has the corresponding DKIM private key (that's why it's so hard to send an email directly without a mail relay (e.g. AWS SES, Mailgun) in modern times).
- Receiving mail
Just configure the MX of the domain to point to the MDA. Generally people use third-party services, so the MX should point to a third-party server.
Summary
SMTP is a typical design.
SMTP is a classic example of not being well thought out by design and therefore constantly patched. At least Gmail and Outlook now offer a "View Original" feature, which allows you to see the original DATA part of the message in detail. It describes in detail what each MTA inserted, including the address of the Envelope the Agent got, the SPF, DKIM check it did, and the new DKIM signature. You can go to your own mailbox to take a look and compare with the content of the article.
This article is a bit sketchy in some details, because some complex areas are difficult to say clearly in the vernacular language. I hope this article can play the role of an outline. Specific production if you encounter problems or to read the protocol standard.