explained from first principles

This article
and its code were
first published on 7 May 2021
and last modified
on 9 December 2022.

If you like the article,
please share it with your friends on social media
or support me with a donation.

You can also
join the discussion on Reddit,
download the article as a PDF,
see what people are saying about it on Twitter,
or use Google Translate to read this article in your native language.

If you are visiting this website for the first time,
then please first read the front page,
where I explain the intention of this blog
and how to best make use of it.
As far as your privacy is concerned,
all data entered on this page is stored locally in your browser
unless noted otherwise.
While I researched the content on this page thoroughly,
you take or omit actions based on it at your own risk.
In no event shall I as the author be liable
for any damages arising from information or advice
on this website or on referenced websites.

Preface

Being one of the oldest services on the Internet,
email has been with us for decades and will remain with us for at least another decade.
Even though email plays an important role in everyday life, most people know very little about how it works.
Before we roll up our sleeves and change this, here are a few things that you should know:

Impact

This article had the following impact in the email industry (beyond additional DNS records):

If you made changes in your software project because of this article,
let me know so that I can add your change to the list above.

Terminology

Email, which also used to be written as e-mail, stands for electronic mail.
Since the term electronic mail applies to any mail that is transferred electronically, it also encompasses
fax, SMS, and other systems.
For this reason, I use only the short form email in this article and always mean the decentralized system
to transfer messages over the Internet as documented in numerous RFCs.
The term email doesn’t appear in the original RFC,
and many RFCs just use mail or (Internet) message instead.
In ordinary language, email refers both to the system of standards
and to individual messages transmitted via these standards.
While the English language would allow us to distinguish between the two usages
by capitalizing the former but not the latter, I’ve never seen anyone doing this.
Even though I’m tempted to pioneer the proper use of grammar here,
I’d rather save my artistic license for other things.
(Proper nouns refer to a single entity,
whereas common nouns refer to a class of entities.
Only proper nouns are capitalized in English.
For example, Earth with a capital E refers to the planet we live on,
whereas earth with a lowercase E refers to the soil in which plants grow.)
Note that this is in contrast to Internet,
which is commonly capitalized because there is only one Internet:
You’re either connected to the Internet or not.
Unfortunately, the Internet becomes increasingly fragmented along country borders
due to legal reasons, such as copyright licenses,
and political reasons, such as censorship.
Therefore, we might have to degrade Internet to a common noun soon.

Concepts

Before diving into the technical aspects of email, let’s first look at email from the perspective of its users.

Message

The purpose of email is to send messages over the Internet.
A message is a recorded piece of information which is delivered asynchronously from a sender to one or several recipients.
Asynchronous communication
means that a message can be consumed at an arbitrary point after it has been produced,
rather than having to interact with the sender concurrently.
A message can be transmitted with a physical object, such as a letter,
or with a physical signal, such as an acoustic or electromagnetic wave.
While humans have delivered messages in the form of objects for millennia
with couriers and pigeons,
it’s only since the invention of the optical telegraph
in the late 18th century and the invention of the electrical telegraph
in the middle of the 19th century that we can signal arbitrary messages over long distances.
The fundamental principle of communication stayed the same over all those years:
You can either start a new conversation or continue an existing one by replying to a previous message.

Mailbox

A mailbox is a box for incoming mail (also called an inbox),
into which everyone can deposit messages but ideally only the intended recipient can retrieve them.
In some countries, the privacy of such messages is legally protected by the
secrecy of correspondence.

Provider

There are three things that set email apart from the
traditional postal system,
which is sometimes also referred to as snail mail:

  1. Email conveys digital data,
    whereas a letter is a physical item.
    The former is much more useful for further processing.
  2. Email enables instant global delivery at a marginal cost of zero.
    The only fee you pay is for your access to the Internet.
  3. Mailboxes for email are provided and operated by companies,
    which are called mailbox providers.
    While you could operate your own server since email is an open and decentralized system,
    this is rarely done in practice for reasons we discuss later on.

Terminology: Earlier versions of this article used the term
email service provider (ESP)
instead of mailbox provider.
Since the former term is also used to refer to email delivery vendors,
I decided to replace it with the latter term.
Somewhat confusingly, mail service provider (MSP)
is a synonym for mailbox provider even though mail and email are used interchangeably in the context of email.

Which are the most popular mailbox providers?

Please treat all the numbers in this box with caution.
They were surprisingly hard to come by, with the sources being scattered and not necessarily trustworthy.
Additionally, the numbers were reported in different years,
which distorts the market share of these companies.

It is estimated
that around half of the human population uses email,
with an average of 1.75 active accounts per user.
In the Western world, the consumer market is dominated by Google
with their Gmail service, which has 1.5 billion active users.
In China, the biggest player is Tencent QQ with 900 million active accounts.
Outlook by Microsoft
has 400 million active users,
which is followed by Yahoo! Mail with 225 million active users.
Apple’s iCloud has 850 million users,
but it’s not known how many of those use its email functionality.

Address

Email addresses
are used to identify the sender and the recipient(s) of a message.
They consist of a username followed by the @ symbol and a domain name.
The domain name allows the sender to first determine
and then connect to the mail server of each recipient.
The username allows the mail server to determine the mailbox to which a message should be delivered.
The hierarchical Domain Name System ensures that the domain name is unique,
whereas the mailbox provider has to ensure that the name of each user is unique within its domain.
There doesn’t have to be a one-to-one correspondence between addresses and mailboxes:
A mailbox can be identified by several addresses,
and an email sent to a single address can be delivered to multiple mailboxes.

Display name

Email protocols accept an optional display name in most places where an email address is expected.
The format for this is Display Name
according to RFC 5322.
Mail clients display this name to the user as follows:


How Apple Mail shows the display name in the To and From fields – if you have Smart Addresses disabled, which you totally should.

This feature seems totally benign, but, as we will see later on,
it has serious privacy and security implications.

The @ symbol

While most of us know the @ symbol
exclusively from email addresses and social media to tag another user,
it has been used for centuries in commerce.
In Spanish and Portuguese, it denoted a custom unit of weight.
In English, it came to mean at the rate of similar to the French à.
The @ symbol was already included in the first edition of the ASCII character set in 1963,
years before the symbol was first used to designate the network host
in a predecessor of today’s email in 1971.

Normalization

In the standard, the part before the @ symbol
is called the local part of an email address.
The interpretation of the local part is completely up to the receiving mail system specified after the @ symbol
and you shouldn’t make any assumptions about the recipient’s address as a sender.
In particular, implementations must preserve the case
of the letters in the local part, but mail servers are encouraged to deliver messages case-independently.
In other words, it is recommended but not mandatory
that mail servers treat John.Smith and john.smith as the same user.
Some mailbox providers go further than this:
Gmail, for example, removes all dots
from the local part of an address when determining the mailbox to deliver a message to.
This means that emails addressed to john.smith@gmail.com and johnsmith@gmail.com
are received by the same user – who also gets all messages for j.o.h.n.s.m.i.t.h@gmail.com.
The process of transforming data to its canonical form
is called normalization.

Subaddressing

Many mailbox providers support a technique known as
subaddressing as part of their address normalization.
By restricting the character set for usernames more than the standard demands,
a mailbox provider can designate a special character,
which is valid according to the standard but not in its set for usernames,
to split the local part into two.
The part before this special character is used to determine the recipient of a message.
The part after this special character is a tag that the user can choose when they share their address.
Since subaddressing can be implemented by the receiving mail system at will, it has never been formalized
beyond this draft from 2007.
Gmail and
Microsoft Exchange
support subaddressing with a plus.
For example, emails to user+tag@gmail.com are delivered to user@gmail.com.

If you reply to an email that you received at a subaddress with a plus,
Gmail still uses your main address in the From field, unfortunately.
In order to send emails (including replies) from a subaddress,
you have to add it in the settings:

Go to the Accounts and Import tab of your settings and click on “Add another email address” under “Send mail as”.

Afterwards, enter the preferred display name and subaddress in the new window. You can leave the box “Treat as an alias” checked.

(In either case, Gmail asks the recipient to reply to your subaddress, while the main address is used in the Return-Path header field.)

Click on the button “Next Step” and you’re done. You can now select a different From address the next time you compose a message.

Subaddressing can be useful to filter incoming emails based on their context.
Instead of creating several accounts,
you can separate different areas of your life with the convenience of having just a single account.
Subaddressing also allows you to track whether a company passed your email address on.
When you no longer want to receive emails from a company and its affiliates,
you can simply block all emails sent to the address variant you gave them.
While subaddressing can be used for creating
disposable email addresses on the fly,
this protection against abuse can easily be circumvented.
If the subaddressing scheme is publicly known, spammers can just remove the tag from customized addresses.
A better method against unsolicited messages is to create proper email aliases or forwarding addresses,
which are indistinguishable from ordinary addresses.
The disadvantage of this approach is that you have to set them up before you can use them.
If you use a custom domain for your emails, you might be able to use a so-called catch-all address
or customize the subaddressing scheme by using wildcards.

Alias address

An alias address
doesn’t have a mailbox associated with it but simply
forwards all incoming messages to one or several addresses.
The forwarding is done by the incoming mail server of the alias address
and the expanded addresses may belong to the same or to different hosts.
Unlike in the case of a mailing list,
an automatic response by a recipient is sent to the original sender.
Alias addresses can forward messages to other alias addresses, which can cause mail loops.

Mailing list

A mailing list
is an address which forwards incoming messages to all the subscribers of the list.
The administrator of the list can decide who is allowed to send messages to the list
and whether each message needs to be approved by a moderator before it is forwarded.
Unlike in the case of an alias address,
the mailing list software has to change the envelope of the message
so that automatic responses from subscribers of the list
are sent to the administrator of the list rather than the original sender.

Address syntax

When is an email address valid?
As with many technical standards, the answer to this question looks straightforward at first.
But as soon as you dig a bit deeper, the answer becomes complicated and messy.
What standards allow is often much more than what is widely accepted and used:

What a standard allows What is actually being used

Often only a subset of a standard finds adoption,
while some things become convention without a formal standard.

The syntax of email addresses is specified in
section 3.4.1 of RFC 5322.
As mentioned earlier, an address consists of a local part followed by the @ symbol and a domain name.
If we restrict ourselves to what is widely adopted, the local part has to consist of the characters
a to z, A to Z, 0 to 9, and any of !#$%&'*+-/=?^_`{|}~.
A dot . can be used as long as it is between two of the aforementioned characters.
In other words, you cannot have multiple dots in a row or at the beginning or end of the local part.
The local part has to consist of at least one character,
and every mail system must be able to handle addresses whose local part is up to
64 characters long, including any dots.
While this is the easy part of the standard, you should avoid most of the special characters
if you want to be confident that online services accept your email address.
Twitter, for example, accepts only !+-_ beyond the alphanumeric characters and the dot.
This allows me to sign up with an address such as !+-_@ef1p.com.
Gmail, on the other hand, accepts !#$%&'*+-/=?^_`{|}~@ef1p.com as a recipient
but fails to recognize this character sequence as an email address in text.

This paragraph is about the complicated part of the standard,
which is not widely supported and therefore more of theoretical than practical interest.
The local part of an email address can also be a quoted string.
Any printable ASCII character
is allowed inside of double quotes.
If we ignore the obsolete syntax,
which may no longer be generated but must still be accepted,
the quoted string has to be the whole local part,
i.e. it cannot be combined with non-quoted characters.
Both "@"@ef1p.com and ".."@ef1p.com are valid addresses,
and so is ""@ef1p.com (at least for now).
Only " and need to be escaped with a backslash in front of them.
This means that """@ef1p.com and "\"@ef1p.com are also valid addresses.
When it comes to whitespace characters,
such as space and tab, the situation is a bit confusing.
A quoted string can contain escaped spaces (" ")
through the quoted-pair rule.
The only other way a space can be added to a quoted string
is as folding whitespace.
The standard says that runs of folding whitespace
which occur between lexical tokens in a structured header field
are semantically interpreted as a single space character.
My understanding of this is
that a local part with several unescaped spaces (" ")
is the same as a local part with a single space (" ").
It’s not clear to me, though, whether " " is to be interpreted as "".
I think this might be the case
because spaces are clearly excluded from the set of characters which don’t need to be escaped.
The qtext rule
doesn’t include the space character, which is %d32 in ASCII,
but this might change in the future.
If unescaped spaces were meant to have meaning beyond just folding lines,
which we’ll discuss later,
they could easily have been added to the qtext rule.
On the other hand,
the equivalent qtextSMTP rule of RFC 5321 does allow spaces.
What the standard does clarify is that the escape character is semantically invisible.
Therefore, "a" and "a" are equivalent.
I assume this means that mail systems are allowed to remove the backslash in front of characters
which don’t need to be escaped in non-local addresses.

What about the domain part of an email address?
While the Domain Name System allows the use of pretty much any character,
the preferred name syntax
requires that each label
consists only of letters, digits, and hyphens, where labels may neither start nor end with a hyphen.
SMTP restricts domain names to this syntax.
All labels (except the one for the root zone)
have to contain at least one character and at most 63 characters.
The length of the whole domain name is limited to 255 characters,
including the dots.
Domain names are explicitly case-insensitive.
Only fully-qualified domain names may be used in email addresses on the public Internet
and the domain part of an email address is always written without the trailing dot.
The domain name in an email address must have an MX, A, or AAAA resource record.
According to RFC 5321, a CNAME record is also permitted
as long as its target can be resolved to an IP address through one of the just mentioned record types.

Can an email address use an IP address instead of a domain name?
Yes: The address format
allows an IP address in brackets in place of a domain name.
For example, user@[192.0.2.123] is a valid email address.
However, the SMTP specification says that a host should not
be identified by its IP address, unless
the host is not known to the Domain Name System.
One reason for this is that a single mail server can receive emails for multiple domains
and the same user might exist in several of these domains.
If the recipient address doesn’t include a domain name,
the mail server might not know to which mailbox it should deliver the message.
The domain part of an email address thus serves a similar purpose
as the Host header field in HTTP.
One might think that mail servers would reject messages with an IP address in the sender address as spam,
but a reader of this article convinced me that this works just fine in many cases.
Apple Mail,
Thunderbird,
and Gmail also accept such addresses as recipients,
while Outlook.com
and Yahoo! Mail don’t.

What about characters outside of the English alphabet?
There was a working group dedicated to the
internationalization of email addresses.
RFC 6531 defines an SMTP extension
which allows envelope fields to be encoded in UTF-8
if both the sender and the recipient support it.
I’ll cover this later.

If you have to validate email addresses,
you can use the following regular expression
from the Living HTML Standard:
/^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/.
This regular expression allows adjacent dots in the local part but does not allow the local part to be quoted.
You could limit the length of the local part with [a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]{1,64},
but you should be liberal in what you accept from others.
And since some top-level domains accept email,
the regular expression intentionally ends with *$/ instead of +$/.
As we will see later on,
the validation of internationalized domain names is much more difficult.

Common addresses

If you use your own domain for email, you can choose the local part of your addresses
however you want as long as you adhere to the address syntax.
Some local parts, though, are commonly used to reach the person with a specific role in an organization:

Address Expectation
info@ Reach someone from the administrative office.
contact@ Be directed to the desired person within the organization.
sales@ Receive purchase information from the sales person.
support@ Get support for the offered product or service.
marketing@ Provide feedback to marketing campaigns.
abuse@ Report inappropriate public behavior.
security@ Responsibly disclose a security vulnerability.
postmaster@ Reach the email administrator (required according to RFC 5321).
hostmaster@ Reach the DNS administrator.
webmaster@ Reach the Web administrator.
admin@ Reach the technical administrator (as an alternative to the previous three addresses).

Most of these addresses are encouraged by RFC 2142:
“Mailbox names for common services, roles, and functions”.
Role-based addresses are usually configured as aliases
so that incoming emails can be forwarded to several people.

Recipients

You can address the recipients of a message in three different ways:

  • The To field contains the address(es) of the primary recipient(s).
    As a sender, you expect the primary recipient(s) to read and often to react to your message.
    The expected reaction can be a reply or that they perform the requested task.
  • The Cc field contains the address(es) of the secondary recipient(s).
    As a sender, you want to keep the secondary recipient(s) informed
    without expecting them to read or react to your message.
    (Cc stands for carbon copy.)
  • The Bcc field contains the address(es) of the hidden recipient(s).
    Their address(es) are not to be revealed to other recipients of the message.
    The field is usually fully preserved in your folder of sent messages
    but fully removed in the version of the email that is delivered to others.
    Alternatively, a different message could be delivered to each hidden recipient
    where their address alone is listed in the Bcc field.
    The standard also allows hidden recipients to see each other;
    they just have to be removed for the primary and secondary recipients.
    The vague semantics of this feature leads to several problems.
    (Bcc stands for blind carbon copy.)

Important: Just because someone is listed as another recipient
doesn’t mean that they received the same message as you.
The reason for this could be innocuous or malicious.
On the one hand, it may be that the email could simply not be delivered to them.
On the other hand, the sender might have delivered the message only to you in order to mislead you.
Your mailbox provider has no way of verifying
that the same message has also been delivered to the other recipients.
This allows a fraudster to fake a relationship that they do not have
or to lead you to believe that they have done the introduction you asked them for,
even when this is not the case.
If you reply to all, your reply would also be sent to the faked recipients, of course.

Group construct

The address specification
allows senders to group addresses with the following syntax: {GroupName}: {ListOfAddresses};,
where the curly brackets have to be replaced with actual values.
ListOfAddresses is a comma-separated list of addresses, where each address can also have a display name.
You can send an email to several groups, but you cannot nest groups.
The list of addresses can be empty, which allows the sender to hide the recipients of a message.
Even though the To field is optional and can therefore be skipped completely,
some mail clients prefer to put something like undisclosed-recipients:; into this field
when you list all the recipients in the Bcc field.
As far as I can tell, this is the primary use of the group construct nowadays.

Sender

There are two relevant fields to indicate the originator of a message:

  • The From field contains the address of the person who is responsible for the content of the message.
  • The Reply-To field indicates the address(es) to which replies should be sent.
    If absent, replies are sent to the From address.

Important: The core email protocols do not authenticate the sender of an email.
It’s called spoofing
when the sender uses a From address which doesn’t belong to them.
Forged sender addresses are a huge problem for the security of email.
There are additional standards to authenticate emails.
For them to have the desired effect, though,
both the sender and the recipients have to use them.

Sender field

RFC 5322
differentiates between the author and the sender of a message.
The person who writes the message is usually also the one who sends it.
If the author and the sender are different, though,
the sender should be provided in the Sender field.
The standard also allows several addresses in the From field.
If this is the case, the email must include a Sender field with a single address.
However, I’m not aware of any mail clients which support this.
In practice, the addresses of the co-authors are simply added to the Cc field.
Their contribution is made clear to the primary recipients
by mentioning the names of all the authors at the end of the message.
Remember that a sender can lie about their co-authors:
The fact that a person’s address is listed in the Cc field
doesn’t imply that the email has been delivered to them
and that they agree with the content of the message.

No reply

Many emails are sent from automated systems, which cannot handle replies.
Examples of such emails are notifications about events on a platform and reports about some usage statistics.
RFC 5322
required each email to have a From field with one or several addresses.
RFC 6854 updated the standard in 2013
to allow the group construct to be used in the From field as well.
This allows automated systems to provide no reply address by using an empty group in the From field,
rather than having to rely on users interpreting an address such as no-reply@example.com correctly.
The automated system can still identify itself by choosing the name of the group appropriately,
for example LinkedIn Notification Bot:;.
In the absence of an alternative to indicate the originating domain to the user,
I strongly advise against using an empty group in the From field, though,
because this defeats all efforts towards domain authentication.
Even the RFC itself recommends against
the general use of this method and says
that it is for limited use only.
Thus, we still have to wait for a usable
No-Reply
header field, unfortunately.
(The empty group construct is used to downgrade internationalized email addresses
as specified in RFC 6857.)

Subject

The Subject field identifies the topic of a message.
Its content is restricted to a single line but the line can be of arbitrary length.
(We’ll talk about encoding later.)
RFC 5322 also defines other informational fields,
namely Comments and Keywords, but I’ve never seen them being used.
All informational fields are optional, which means an email doesn’t need a subject line.
The mail clients I’ve checked, though, include the Subject field even when it’s empty.
While the message is transmitted with an empty Subject field,
mail clients usually display “(No subject)” instead of nothing.

Prefixes

When you reply to a message, your mail client automatically suggests the new subject:
“Re: ” followed by the original subject.
While I would argue that “Re” stands for “reply”,
RFC 5322 says
that it is an abbreviation of the Latin “in re”, which means “in the matter of”.
Similarly, if you forward an email to another recipient,
your mail client typically puts “Fwd: ” in front of the original subject.
Using such prefixes in replies and forwarded emails is optional.
In particular, they have no technical significance.
As we will see later, messages are grouped into conversations
based on other, more reliable information.

Body

Last but not least, an email has a body
(which is strictly speaking optional).
The body contains the actual content of a message.
It can be formatted in different ways and can consist of different parts.
Splitting the body into several parts is useful,
for example, to send a plaintext version alongside an HTML-encoded message
or to attach files to an email.
We’ll discuss later how all of this works.

Size limit

The email standards impose no size limit on messages.
Since various servers have to store your message at least temporarily,
they are configured to reject messages larger than a certain size.
Many providers have a size limit between around
25 to 50 MB.
Even if your mailbox provider allows you to send larger messages,
such messages might still be rejected by the mail server of the recipient.
Since attachments have to be encoded in a particular way,
their original size can be at most around 70% of the actual size limit.

Architecture

There are four separate aspects to understand email from a technical perspective:

  • Format: What is the syntax of email messages?
  • Protocols: How are these messages transmitted?
  • Entities: Who transmits these messages to whom?
  • Architecture: How are these entities arranged?

Let’s go through them one by one in the opposite order.

Simplified architecture

One reason why email is so hard to grasp is because the official terminology
is unnecessarily complicated in most circumstances.
Throughout this article, we’ll work with a much simpler version.
Email follows the client-server model:
A client opens a connection
to a server in order to request some service.
In all the graphics where arrows represent an exchange of data,
the arrows point from the client to the server;
i.e. in the direction of the request, not the response.
The following entities and protocols are involved in the transmission of a message from a sender to a recipient:

imap formessagestorage imap or pop3for messageretrieval smtp formessage relay smtp formessagesubmission Mail clientof sender Outgoingmail serverof sender Incomingmail serverof recipient Mail clientof recipient Incomingmail serverof sender Outgoingmail serverof recipient

The simplified email architecture.
We’ll discuss each entity in the next chapter and the protocols thereafter.

Standardization

If we ignore for a moment that there are separate servers for incoming and for outgoing mail,
we’re left with the following:
The user interacts with a client to read and compose messages.
The client submits the composed messages to a server for delivery.
The client also fetches newly received messages from the server.
The server connects to other servers in order to deliver some messages.
The important thing to note is that the interactions between these entities are independent from one another:

Sender Server Client User Recipient Server Client User

How emails are submitted and accessed (in blue) is independent
from how emails are exchanged between servers (in green).

Let’s have a look at each of these interactions with regard to standardization:

  • Server ➞ server:
    Just as any machines on the Internet can communicate with one another
    (as long as we ignore firewalls),
    any users with an email address can send each other messages
    (as long as we ignore spam filters).
    This works only because the exchange of messages between mail servers is standardized.
    Anyone who adheres to this standard can participate in the global email system.
    In order to maintain compatibility with older servers,
    support for new functionality is always optional.
  • Client ➞ server:
    How clients submit and access emails doesn’t have to be standardized
    for email to remain interoperable according to the previous point.
    Luckily, we do have open standards for accessing one’s mailbox.
    Since these standards are older than all commercial mailbox providers,
    most mailbox providers support at least one of them.
    This has the advantage that you can switch the server without switching the client
    and that you can switch the client without switching the server.
    This reduces vendor lock-in
    on both the client- and the server-side,
    which leads to more choice for consumers.
    However, mailbox providers can still support proprietary features,
    which only their client knows how to make use of.
  • User ➞ client:
    How users interact with mail clients is not standardized.
    In particular, users don’t have to sit directly in front of their mail client.
    They can also interact with a mail client over the Web, for example.
    Some standards demand that certain actions have to be confirmed or initiated by the user.
    Apart from this, mail clients are free to present information to the user in any way they want.
    But similar to how you can drive a car from any brand if you know how to drive a car from one brand,
    users have developed expectations regarding how the above concepts are presented.
    For example, Cc is always called Cc.
Webmail

Mailbox providers usually offer a web interface to their email service.
Instead of configuring a mail client, which runs locally on your device,
you can visit a provider-specific website with a web browser
in order to access your messages.
This is known as webmail
and it has the advantage that you can read and compose emails from any device with a web browser.
From the perspective of email standards, this constitutes a remote access to the mail client:

Web Web server Web browser User Mail Mail server Mail client User Webmail Mail server Mail client Web server Web browser User + = &

In the case of webmail, the mail client is accessed via a web server using a web browser.

Unlike a dedicated mail client,
which typically stores the downloaded messages in the persistent memory of your device for offline access,
you have to be connected to the Internet to use webmail.
While not generally desirable,
fetching all data only temporarily until you log out
is useful when you want to access your emails from someone else’s device.
In addition, configuring a mail client is more complicated than navigating to a website.
This might explain the popularity of webmail.
In my opinion, the biggest disadvantage of webmail is that
the logic of how you can interact with your messages comes from the provider:

Data Data Code Web server Web browser Mail server Mail client Code Web Mail vs.

On the left, the code to interact with the data comes from the server.
On the right, the logic is inside the client and only data is exchanged.

If you need additional features,
such as end-to-end encryption or interaction with a service from a different provider,
you have to find workarounds with browser extensions.
Open-source mail clients, on the other hand, can be modified at will.
In order to give you more control over your messages,
most mailbox providers offer an application programming interface (API) for
access to your mailbox, such as IMAP or POP3.
In the case of Gmail, you have to enable the API
through the web interface for your account before a mail client can use it.

When it comes to security, there’s no clear winner.
Webmail has the advantage that you always run the newest version of the code,
which is sandboxed
from the rest of your system by the web browser.
On the downside, attacks like phishing,
cross-site scripting,
and cross-site request forgery
are only possible because the browser runs untrusted code,
which a dedicated mail client doesn’t.
Whether you access your emails via the Web or via a local mail client is a matter of individual preference.

As we’ve learned in the previous box,
how users interact with their mail client isn’t standardized.
Webmail is thus of no interest for the scope of this article.
All you need to know is that email has nothing to do with the Web.
Both are independent services that run over the Internet.
Moreover, email is older than the Web:
SMTP was first defined in 1982,
POP in 1984,
and IMAP
in 1986.
The HyperText Transfer Protocol (HTTP), which underpins the Web,
was introduced around 1990.

Official architecture

For the sake of completeness and to enable you to understand the linked articles,
this subsection covers the official terminology as used, for example, in RFC 5598.
In the official documents, there are five instead of three entities,
with each of them having a more complicated name and, of course, an associated
three-letter acronym (TLA):

The terminology used by the
Internet Engineering Task Force (IETF)
in its official documents, such as RFC 5321.
The terms in italics are used in some newer documents,
such as RFC 8314.
I added them because I like them better.

These terms are not as precise as they seem to be and the boundaries are often fluid in practice.
Having more entities also changes the architecture.
What follows is a nicer version of this ASCII graphic,
which is a masterpiece to be appreciated in its own right.

Mail useragent (mua) Mail submissionagent (msa) Mail transferagent (mta) Mail deliveryagent (mda) Messagestore (ms) Mail useragent (mua) Mail submissionagent (msa) Mail transferagent (mta) Mail deliveryagent (mda) Messagestore (ms) Sender Recipient

The official Internet Mail Architecture
with SMTP connections in green
and IMAP connections in blue.

None of the servers have to be a single machine.
In addition, the incoming MTA and the outgoing MTA don’t have to be the same.

Entities

There are three entities in the simplified architecture:
the mail client,
the outgoing mail server,
and the incoming mail server.

Mail client

The mail client is a computer program to compose, send, retrieve, and read emails.
It provides the interface through which users handle email.
The mail client runs either locally on the user’s device or remotely on a web server.
Examples of the former kind are
Microsoft Outlook,
Apple Mail, and
Mozilla Thunderbird.
Examples of the latter are Google Gmail and
Yahoo! Mail when accessed through a web browser.
(Both companies also provide mobile
apps for Android
and iOS, which fall into the former category.)

The mail client connects to the outgoing mail server to submit messages for delivery to other users
and to the incoming mail server to fetch new messages from the user’s mailbox.
Both servers authenticate the user, typically with a username and a password.
The mail client connects to the incoming mail server through a different interface than outgoing mail servers do,
which can be seen on the recipient’s side of the simplified mail architecture:

imap formessagestorage imap or pop3for messageretrieval smtp formessage relay smtp formessagesubmission Mail clientof sender Outgoingmail serverof sender Incomingmail serverof recipient Mail clientof recipient Incomingmail serverof sender Outgoingmail serverof recipient

The recipient’s mail client connects to the incoming mail server
using a different port and protocol than outgoing mail servers.
It’s usually also a different machine with a different domain name and IP address
than the one outgoing mail servers connect to.

This distinction is apparent in the official mail architecture,
where the message store (MS) and the mail transfer agent (MTA) reside in different boxes.
By giving the impression that the incoming mail server is a single machine,
the simplified model doesn’t explain why the incoming mail server needs to be configured
in the mail client of its user but not in the outgoing mail servers of other users.
Since the simplified architecture is less confusing in every other regard,
it’s still the preferred model for the scope of this article.

Configuration

When you add an email account to your mail client,
you usually have to configure the incoming mail server and the outgoing mail server manually,
unless you use a popular mailbox provider.
If manual configuration is required,
you have to look through the documentation of your mailbox provider
to find the domain names of the two servers
and then copy the information to the respective fields in your mail client.
While most mailbox providers use the default port numbers,
which means that you usually don’t have to configure them,
the domain names of the two servers aren’t standardized.
It’s often the case that their addresses are
subdomains of the domain after the @ symbol in your email address,
such as imap.gmail.com and smtp.gmail.com for @gmail.com.
However, many organizations don’t host their emails themselves,
in which case the domains of the two servers are likely
completely different from the organization’s domain.
This is the case for my email configuration:


The simplified email architecture corresponds to what mail clients like Apple Mail display to you.

The domain of the address (ef1p.com) is different from the domain of the servers (mail.gandi.net).

The host names of the incoming mail server and the outgoing mail server are usually not the same.

One more thing that users need to be informed about is whether to use the full email address
or only the local part before the @ symbol (or even something completely different) as the username.
While flexibility is great for customizing a setup to the particular needs of an organization,
it also leads to an unnecessarily complicated experience for users.

Custom domains

Please note that you cannot simply set up CNAME records
in your own domain for the incoming and outgoing mail servers
if you want to avoid instructing your users to use an external domain
because the TLS certificates
used by the mailbox provider would no longer match.
For example, if I point imap.ef1p.com to mail.gandi.net with a CNAME record in my DNS zone
and use the former in the server settings,
then my mail client expects the TLS certificate to be issued for imap.ef1p.com
and aborts the connection when it receives a certificate for mail.gandi.net.
Besides vanity, such a setup could be desirable
because it would allow the IT administrator of an organization
to migrate all messages to another mailbox provider
without requiring every member of the organization to change their email settings.
This can be realized with the technique that I cover in the next box.

Autoconfiguration

Wouldn’t it be nice if mail clients could configure themselves automatically
by fetching the required information directly from the mailbox providers?
The good news is that we have a standard for exactly this purpose.
The bad news is that almost no one is using it, even though it’s simple and elegant.
RFC 6186 defines
how to use SRV records
in the Domain Name System (DNS)
for locating email submission and access services.
Using DNS for fetching the required information is elegant because
the email protocols already depend on DNS,
the information is provided by the owner of the domain,
and the system scales well thanks to the caching of answers by intermediary resolvers.

However, if the answers are not authenticated with DNSSEC,
an attacker who can spoof DNS responses can direct the mail client to malicious servers.
This attack vector is really bad because TLS doesn’t prevent it
(the malicious servers can have valid certificates for their domains)
and because passwords are often transmitted in cleartext
over the encrypted channel instead of using non-reusable
challenge-response authentication,
such as SCRAM.
The attacker can thus authenticate as the user to the legitimate servers
beyond the duration of the attack until the user changes their password.
The RFC just says that
the domain names of the servers should be confirmed by the user
if they are not subdomains of the queried domain without requiring or even mentioning DNSSEC.
As everyone working in IT security knows, security-critical decisions should not be left to users.

RFC 2782 specifies the format of service (SRV) records.
The basic idea is to use a different subdomain for each protocol and service and list the port number and
domain name of the host which provides the particular service in the data field of the resource record.
The subdomain is constructed as follows: _service._protocol.domain,
where domain is the domain part of the email address, _protocol is _tcp,
and _service is _submission/_submissions, _imap/_imaps, or _pop3/_pop3s in the case of email.
The data of SRV records consist of a priority number, a weight number, a port number,
and the domain name of the target host separated by a single space.
If several records are returned, the client has to connect to the host with the lowest priority number first
and fall back to the host with the next higher priority number
only if all hosts with lower priority numbers are unreachable.
If there are several records with the same priority,
the client should select one at random proportionally to its weight.
This can be useful to balance the load among several hosts.
If there isn’t any server selection to do, then the weight should be set to zero.
For example, if you have the dig command
installed in your command-line interface (CLI),
executing dig srv _submission._tcp.gmail.com +short returns 5 0 587 smtp.gmail.com..
This means that mail clients should submit outgoing emails
to smtp.gmail.com on port 587 when using gmail.com.
If the host name is ., the service is explicitly not available at this domain.
For example, running dig srv _imap._tcp.gmail.com +short returns 0 0 0 .
because Gmail supports IMAP only with Implicit TLS, which is usually called IMAPS.

You can check the email service records of a domain with the following tool,
which uses an API by Google for its DNS queries:

If you played around with the above tool for a while,
you might have realized that not many domains have service records for email.
Probably for this reason, none of the major mail clients actually
use this autoconfiguration method.
In my opinion, this is really unfortunate but not surprising given that
only supply can generate demand and only demand can generate supply.

I can see only three potential weaknesses with this standard:

  1. Service records make the incoming and outgoing mail servers publicly known.
    For public mail services, where anyone can create an account, this is the case anyway.
    For private mail services, on the other hand,
    such knowledge makes attacks on the infrastructure easier
    if the mail servers cannot be guessed otherwise.
  2. Service records provide no information about the username
    and the authentication method.
    The latter can be discovered, though, simply by connecting to the server
    and inquiring about its supported extensions.
  3. The provided information cannot depend on the local part of the email address
    since DNS queries don’t support additional parameters.
    There are autoconfiguration protocols which support this,
    such as the one used by Thunderbird.

Besides improving the experience of users,
service records make it possible to migrate an organization to another mailbox provider
without requiring every member of the organization to change their email settings.
According to RFC 6186,
mail clients should cache the resolved hosts
until they can no longer establish a connection or user authentication fails.
When either of these happen,
mail clients are supposed to fetch the SRV records of the same _service again.
Mail clients may not switch from IMAP to POP3 or vice versa without the user’s consent.

If you want to configure SRV records for your domain,
you can put the following entries into your zone file:

_imap._tcp 10800 IN SRV 0 0 143 {Domain}.
_imaps._tcp 10800 IN SRV 0 0 993 {Domain}.
_jmap._tcp 10800 IN SRV 0 0 443 {Domain}.
_pop3._tcp 10800 IN SRV 0 0 110 {Domain}.
_pop3s._tcp 10800 IN SRV 0 0 995 {Domain}.
_sieve._tcp 10800 IN SRV 0 0 4190 {Domain}.
_submission._tcp 10800 IN SRV 0 0 587 {Domain}.
_submissions._tcp 10800 IN SRV 0 0 465 {Domain}.

Insert the appropriate Domain and use 0 0 0 . for all the services
which are not supported by your mailbox provider.

Configuration database

At this point, you may be wondering how mail clients can often figure out the correct configuration by themselves
despite the lack of an established standard.
Most mail clients look up the configuration for popular mailbox providers
in a database, which is either delivered with the client or centrally hosted by the software manufacturer.
Some mail clients also use custom autoconfiguration protocols,
which typically fetch an XML file hosted at a specific subdomain via HTTPS.

Let’s have a look at how Thunderbird does it.
Its autoconfiguration process is
well documented
and its configuration database is free to use for any mail client.
Given an email address {Address} = {LocalPart}@{Domain},
Thunderbird goes through the following steps from top to bottom until it finds a suitable configuration:

  1. Check the installation directory for a
    configuration file.
    This is useful for when the employer administrates the user’s device.
  2. Check https://autoconfig.{Domain}/mail/config-v1.1.xml?emailaddress={Address} for a configuration file.
    Unlike the mechanism discussed in the previous box,
    this file can be generated dynamically based on the email address.
    This is useful for when the username is neither the email address nor the local part.
  3. Check https://{Domain}/.well-known/autoconfig/mail/config-v1.1.xml.
    The key difference between this and the previous lookup is that
    the autoconfig subdomain in step 2 can point to a web server operated by your mailbox provider,
    while the lookup in the current step must be handled by the Domain itself.
  4. Look for a configuration file in the central database at https://autoconfig.thunderbird.net/v1.1/{Domain}.
  5. Look up the MX record of the domain in the Domain Name System
    and then check whether the central database has an entry for the so-called apex domain
    at the root of the zone.
    This is useful for custom domains like ef1p.com,
    which has an MX record pointing to spool.mail.gandi.net,
    which belongs to the zone starting at gandi.net.
    The central database has an entry for gandi.net,
    which is how Thunderbird would find the configuration for my email address.
  6. If all previous attempts to find a configuration failed,
    Thunderbird resorts to guessing the mail servers.
    It tries to connect to common server names such as mail.{Domain}, smtp.{Domain}, and imap.{Domain}
    on the default port numbers and checks whether they support TLS or STARTTLS
    and the challenge-response authentication mechanism (CRAM).
    The last check prevents Thunderbird from accidentally revealing the user’s password to the wrong server.
    Unfortunately, CRAM is rather weak.
    The far better salted challenge-response authentication mechanism (SCRAM) should be used instead.
  7. If all of the above steps fail, the user has to enter the configuration themself.

I’ve implemented steps 2 to 5 of Thunderbird’s discovery procedure
in case you need to configure a mail client and don’t know the required information.
The tool makes requests to the entered domain according to the above description
and, if necessary, to Thunderbird’s database.
If the fifth step is also needed, the DNS queries are made with
Google’s DNS API.
Please note that the requests are sent directly from your browser,
which means that the lookups fail if the server does not allow
cross-origin resource sharing (CORS) with an
Access-Control-Allow-Origin
header field value of *.
Since such a header field is not required for mail clients, this is often not the case.
For this reason, the protocol tools query only Thunderbird’s database.

Outgoing mail server

The outgoing mail server accepts messages from mail clients and queues them for delivery.
It then determines the incoming mail server
of each recipient and delivers the message to them.
The outgoing mail server acts as a server in the interaction with mail clients
but assumes the role of a client
when relaying the message to incoming mail servers.
(Connections are always initiated by clients.)
If the outgoing mail server cannot deliver a message,
it sends a bounce message to the user who submitted the message.
While the outgoing mail server should not change the content of a message,
it adds information about the submitter at the top.
Before accepting a message,
the outgoing mail server authenticates the user,
typically based on a username and a password.

Why do we need outgoing mail servers when mail clients could simply deliver the messages directly?

Before we discuss why we need outgoing mail servers,
let’s first have a look at what the modified architecture would look like:

imap formessagestorage imap or pop3for messageretrieval smtp formessage relay smtp formessagesubmission smtp fordirectmessagedelivery? Mail clientof sender Outgoingmail serverof sender Incomingmail serverof recipient Mail clientof recipient Incomingmail serverof sender Outgoingmail serverof recipient

A hypothetical email architecture without outgoing mail servers.

Since outgoing mail servers are just a piece of software and can thus be integrated into mail clients,
it is technically possible to send emails directly to the incoming mail server of each recipient.
In fact, sending an email to someone from the command line
is my favorite demonstration in the seminars I give.
Only badly configured incoming mail servers accept such messages, though.

There are two main reasons
why outgoing mail servers are used in practice:

  • Shift work from the client to the server:
    Unlike the mail client, which runs on the user’s device,
    the outgoing mail server typically has a fast and permanent Internet connection.
    For example, when you send an email from your smartphone,
    your Internet connection might be slow and also expensive
    due to roaming.
    When you switch off your smartphone on an airplane or overnight,
    its mail client is offline for several hours.
    Thus, it makes sense to implement the following features on a server:

    • Retry after unsuccessful delivery:
      As we will see in the next section,
      an incoming mail server can reject a message for a number of reasons.
      One reason is simply to deter spammers,
      who often won’t attempt to transmit the message again.
      An incoming mail server might also be unreachable due to maintenance or malfunctioning.
      While Internet outages are rare in most areas of the world,
      it might happen that a communication link is temporarily unavailable.
      This is why the standard demands
      that messages which cannot be delivered immediately
      have to be queued and their transmission retried by the sender
      after a delay of at least 30 minutes for several days.
    • Send a single message to several recipients:
      If you send an email to several recipients,
      your mail client submits the email only once to the outgoing mail server.
      The outgoing mail server then delivers a copy of the email to each recipient.
      This is especially useful when you send a big attachment
      to many recipients over a bad Internet connection.
    • Batch messages for delivery:
      In the early days of email, access to the Internet was expensive
      and you often paid for the duration of your connection rather than for the volume of transmitted data.
      Since machines were permanently connected only in the local network of your organization,
      it made sense to collect outgoing mail from members on a local server
      and then deliver the messages once a link had been established.
      Given that most organizations pay a flat rate for their Internet access nowadays,
      this aspect is only of historic relevance.
  • Reduce spam and phishing:
    Unsolicited mail is an annoyance, both in the analog and the digital world.
    Unless we impose a cost on the sender,
    it’s impossible to eliminate spam completely in a decentralized system
    in which everyone is allowed to participate.
    Being able to spoof the sender of an email,
    which is often used for phishing,
    is a real security concern.
    System administrators deploy the following measures to curb the two problems,
    which require the use of outgoing mail servers:

    • Blocked connections:
      Incoming mail servers listen on port 25 for new messages.
      An Internet service provider (ISP)
      can prevent emails from being sent from its network
      by blocking
      all outgoing connections with a destination port of 25.
      Its customers can still connect to an outgoing mail server on port 587,
      which has to be in a different network
      or explicitly whitelisted by the ISP
      in order to be able to deliver messages on behalf of its users.
      This measure makes it technically infeasible to send emails
      directly to the incoming mail server of a recipient.
      Many Internet hosting providers
      also block outgoing traffic on port 25 by default
      to fight spam and to protect the reputation of their IP address range.
      For some providers,
      such as Linode,
      you can contact their customer service to lift this restriction,
      for other providers,
      such as DigitalOcean,
      the restriction is permanent.
    • Address reputation:
      Incoming mail servers learn the sources of legitimate email over time.
      Messages coming from such sources are likely to be delivered to the user’s inbox.
      Messages from sources with a bad reputation are often dropped on arrival.
      Messages from unknown sources are either dropped or put into the user’s spam folder.
      Reputation
      is crucial to build trust among unverified participants.
      Even when the sender of an email is authenticated,
      reputation remains at the core of any effort to fight spam.
      As we will see later on,
      you have to buy into the reputation of others
      if you want to have your emails delivered reliably to your customers.
      A whole industry has developed around this value proposition.
      Since building a reputation as a trustworthy email sender yourself
      is too much of a struggle for most Internet users and companies,
      the port restriction mentioned in the previous bullet point isn’t much of a problem in practice.
    • User authentication:
      Mailbox providers are incentivized to protect their reputation
      because users would no longer use their service if emails are no longer delivered reliably.
      This is why mailbox providers impose sending limits on their users
      and delete accounts when misbehavior is reported to them,
      which is possible only if they authenticate their users before relaying messages.
      For example, Gmail limits
      the number of messages per day to 2’000 and the number of recipients per message to 100
      if the message is submitted from a mail client rather than the web interface.
      Vouching for users could also be done differently,
      for example by delegating trust
      to mail clients with digital signatures.
      However, a mailbox provider could no longer rate limit and filter outgoing messages
      if mail clients delivered them directly.
    • Domain authentication:
      When it comes to information security,
      trust is good but control is better.
      Spam is a problem of quantity:
      You simply want to bring the volume of unsolicited messages to a bearable level.
      Phishing, on the other hand, is a problem of quality:
      A single successful attack can cause a lot of damage.
      A reputation system is great for fighting spam but not good enough for fighting phishing.
      The email delivery protocol itself doesn’t prevent the sender
      from putting an arbitrary address into the From field.
      In the absence of a mechanism to authenticate the sender,
      you can only hope that email servers with a good reputation don’t misuse their reputation
      and send messages with spoofed sender addresses and malicious content to you.
      The idea behind domain authentication is that each domain owner can specify
      which outgoing mail servers are allowed to send messages from their domain.
      Incoming mail servers can then verify
      whether the sender of a message is indeed authorized to send messages from the claimed domain.
      In combination with user authentication,
      where outgoing mail servers prevent their users from sending messages in the name of another user at the same domain,
      the two mechanisms guarantee that the sender of a message owns the claimed From address.
      There would be other ways to achieve a similar result without requiring outgoing mail servers,
      but this is how email works.

Userauthen-tication Userauthen-tication Domainauthentication Userauthen-tication Mail clientof sender Outgoingmail serverof sender Incomingmail serverof recipient Mail clientof recipient Incomingmail serverof sender Outgoingmail serverof recipient

The incoming mail server verifies that the outgoing mail server is authorized to send messages from the claimed domain,
while the outgoing mail server of the sender ensures that each user uses their own address in the From field.

As we will see in the next box,
having an audit trail
of sent emails is not among the reasons why outgoing mail servers are used.
And while an outgoing mail server could be useful to hide your IP address from the recipients,
many outgoing mail servers leak your IP address
in a Received header field.
Privacy could be one of the reasons for using an outgoing mail server but often isn’t.

How to avoid submitting the same message to both the outgoing mail server and the incoming mail server?

If you want to keep a record of all emails that you’ve sent,
your mail client has to store each outgoing message in the sent folder on your incoming mail server.
Since we focussed on how an email gets to its recipient so far,
this aspect has been grayed out in the above architecture diagrams.
In most cases, the client has to submit the same message twice:
Once to the outgoing mail server for delivering the message to the recipients,
and once to the incoming mail server for updating the sent folder.

imap formessagestorage imap or pop3for messageretrieval smtp formessage relay smtp formessagesubmission Mail clientof sender Outgoingmail serverof sender Incomingmail serverof recipient Mail clientof recipient Incomingmail serverof sender Outgoingmail serverof recipient

The mail client submits the same message to both the outgoing mail server and the incoming mail server.

For a bandwidth-limited mail client, this is not ideal.
There are four different approaches to avoid this double submission:

  • Always Bcc yourself:
    You can configure most mail clients to add yourself as a Bcc recipient whenever you compose an email.
    The outgoing mail server then delivers a copy of each message to your inbox.
    The downside of this method is that your copy doesn’t include the other Bcc recipients.
    Moreover, sent and received messages aren’t separated, which may be desirable.

2. Send 1. Submit Mail client Incomingmail server Outgoingmail server

The outgoing mail server sends a copy to your inbox.
  • Gmail:
    Google’s outgoing mail server automatically stores a copy of sent messages in the user’s sent folder.
    In order not to end up with
    duplicates in the sent folder,
    the mail client shouldn’t store sent messages in the user’s mailbox.
    Since the mail client
    cannot detect
    this non-standard behavior when submitting a message to the outgoing mail server,
    either the mail client has to treat @gmail.com addresses differently
    or the user has to disable the option to save a copy in the sent folder manually.
    Since mail clients remove the Bcc field before submission,
    Gmail recovers it from the envelope of the message.

2. Store 1. Submit Mail client Incomingmail server Outgoingmail server

Gmail automatically stores sent messages.
  • Courier-IMAP:
    The Courier Mail Server
    has a configuration option to designate a mailbox folder as a special outbox folder.
    When the mail client stores a message in this folder,
    the server sends the message to the addresses listed in the To, Cc and Bcc fields.
    What makes this approach interesting is that a mail client can use IMAP for everything
    and no longer needs to support SMTP.
    Unfortunately, this feature is also not standardized
    and mail clients can therefore not rely on its availability.

2. Submit 1. Store Mail client Incomingmail server Outgoingmail server

The Courier IMAP server can deliver emails.
  • Lemonade profile:
    The only standardized solution to the double-submission problem
    is a collection of extensions to IMAP
    and SMTP submission,
    which is called the lemonade profile.
    The URLAUTH extension to IMAP
    allows mail clients to create references to mailbox data,
    which include the required authorization to access the data.
    The BURL extension to SMTP submission
    allows mail clients to instruct the outgoing mail server to fetch data from the user’s mailbox.
    If the mail servers support these two extensions,
    the mail client can upload the message to be sent to the user’s mailbox on the incoming mail server
    and then instruct the outgoing mail server to deliver this message.

3. Fetch 2. Reference 1. Store Mail client Incomingmail server Outgoingmail server

The lemonade profile enables the outgoing mail server to fetch content for delivery from the incoming mail server.

The lemonade profile includes additional extensions,
such as the CATENATE extension to IMAP
and the PIPELINING extension to SMTP.
The former allows mail clients to compose new messages based on existing messages directly on the IMAP server.
This makes it possible to forward large attachments without having to download and upload them first.
The latter allows clients to send several commands in a row
without having to wait for a response from the server between them.
This reduces the number of round trips,
which makes communication over large distances much faster.

Incoming mail server

The incoming mail server waits for connections from outgoing mail servers of other users.
When an outgoing mail server connects to transmit a message,
the incoming mail server records the message together with other information from the session,
such as the sender’s IP address.
The incoming mail server can reject the incoming message for a number of reasons:
The recipient might not exist, their mailbox might be full,
the message might be too long,
or the sender might not be trusted.
If the message is rejected,
the outgoing mail server can either try to retransmit it at some later point
or inform the user about the failed delivery.
If, on the other hand, the incoming mail server accepts the message,
it also assumes responsibility for delivering the message.
If it fails to do so,
for example when the message needs to be forwarded,
then the incoming mail server should notify the author of the message.

Once the session with the outgoing mail server is over,
the incoming mail server adds the additional information
collected during the session to the top of the accepted message.
It then evaluates whether the message is likely spam.
Depending on the score of this evaluation,
the message is either delivered to the recipient’s inbox,
quarantined to the recipient’s spam folder,
or discarded without notifying the author.
While the last option violates the principle
that mail is either delivered or returned,
the alternative is often worse.
This is why the standard explicitly allows
incoming mail servers to drop received messages silently.
If the receiving address is an alias,
the incoming mail server forwards the message to the configured email address
instead of delivering it to an inbox.
In case the address denotes a mailing list,
the incoming mail server sends the message to all subscribers of the list.
The incoming mail server also applies filters
and generates automatic responses,
such as delivery failures
and out-of-office replies.

The incoming mail server waits for connections from mail clients on a different interface.
In order to access the mailbox of its user, the mail client has to present appropriate
credentials.
The user’s email address and password are often used to authenticate the client,
which is granted unlimited access to the mailbox on success.
If the incoming mail server supports OAuth,
the mail client can present an access token
to gain potentially limited access to the user’s mailbox.
The scopes offered by Gmail
are an example of what limited access can look like.
While restricted authorization is common for other services, it’s not yet the norm for email.
Once the client is authenticated, it can retrieve, deposit, and delete messages.
It can also mark them as read or flag them for later attention.

Address resolution

How do outgoing mail servers find the incoming mail server of a recipient?
As we learned above, an email address consists of a username and a domain name, separated by the @ symbol.
A sender finds the incoming mail server of a recipient
by querying the Domain Name System (DNS)
for mail exchange (MX) records of the used domain name.
If no such records exist, the sender queries for address records
(A or AAAA) of the domain name instead.
If the DNS response is not authenticated with DNSSEC,
mail might be sent to the server of an attacker.
TLS can prevent this only
if the sender requires that the recipient’s domain is included in the
server certificate,
which is usually not the case.
A standard for securing MX records with TLS exists, though.

A domain can list several servers that handle incoming mail.
MX records assign a priority to each incoming mail server.
The lower the number, the higher its priority.
This is useful for providing redundancy in case the most preferred server is not responding.
Several servers with the same priority can be used for
load balancing.
You can use the following tool to look up the incoming mail servers of a domain you are interested in.
It uses an API by Google to query the Domain Name System
and an API by ipinfo.io to determine the geographic location of each server.
The latter is just to remind you that the Internet is a physical infrastructure.
Outgoing mail servers need to know only the IP address of the incoming mail server, of course.
(A remark on the subdomains you might encounter:
spool is a synonym for
buffer/queue,
fb probably stands for fallback and alt for alternative.)

Null MX record

As we’ve seen in the previous box,
outgoing mail servers fall back to A/AAAA records
if no MX records are found at the recipient’s domain.
If no incoming mail server listens at one of the A/AAAA addresses,
an outgoing mail server will attempt to deliver emails to such a domain for days.
This is not just a waste of resources,
it also delays the bounce message to the sender of the message,
who might have simply mistyped the address of the recipient.
In order to prevent this from happening,
RFC 7505 defines a “null MX record” as 0 .
similar to how SRV records indicate the unavailability of a service.
You should configure a null MX record on all your organizational domains
which neither send nor receive emails.

Dotless domains

From a technical perspective, top-level domains
are domains like any other in the Domain Name System (DNS).
This means that they can also have A, AAAA, and MX records and receive mail.
Since top-level domains with such records can be used in email and Web addresses without a dot,
they are called dotless domains.
For example, you can visit http://ai/ with your browser.
.ai is the
country-code top-level domain
of Anguilla.
The problem with dotless domains is that single labels are often used to address other machines in the local network.
Having such names resolve in the global DNS poses a
security risk.
Additionally, browsers usually pass your input to a search engine
if you enter a single word into the address bar.
Since dotless domains violate the expectations of users and
the assumptions of programmers,
ICANN forbids A, AAAA, and MX records
on new generic top-level domains
since 2013.
Out of the 1’502 top-level domains,
the following 23 of them have have an A, AAAA, or MX record in April 2021:
.ai,
.bh,
.cf,
.cm,
.gp,
.gt,
.hr,
.kh,
.lk,
.mq,
.mr,
.pa,
.ph,
.pn,
.sr,
.tk,
.tt,
.ua,
.uz,
.va,
.ws,
.xn--l1acc, and
.xn--mgbah1a3hjkrd.
(The last two domains are internationalized domain names.)
I’ve determined this list with the script from RFC 7085,
which uses IANA’s
machine-readable list of top-level domains.
On yet another note, the formal grammar
in RFC 2821
required that the domain part of an email address consists of at least two labels.
RFC 5321 no longer has this requirement.

Name collisions

If you run the script from RFC 7085 yourself,
you will notice that many name servers cannot be resolved
and that a few top-level domains have an A record of 127.0.53.53
and an MX record of 10 your-dns-needs-immediate-attention.{TLD}.,
where {TLD} is the corresponding top-level domain.
The following eight top-level domains have such records in April 2021:
.arab,
.cpa,
.llp,
.politie,
.spa,
.watches,
.xn--mxtq1m,
and .xn--ngbrx.
Since 2014,
ICANN requires that
new generic top-level domains undergo a
controlled interruption
for 90 days before becoming operational.
Besides the above A and MX records, a controlled interruption also involves
a TXT record of Your DNS configuration needs immediate attention see https://icann.org/namecollision
and an SRV record of 10 10 0 your-dns-needs-immediate-attention.{TLD}..
New country-code top-level domains
can but don’t have to undergo a controlled interruption.
The goal of controlled interruptions is to give IT administrators an opportunity
to detect when names which are used only locally suddenly resolve differently than before.
This can happen when companies use private top-level domains in their Intranet
or when a local DNS resolver extends relative domain names to
fully qualified domain names (FQDN)
by using search lists.
On Unix-like operating systems,
a search domain can be configured in the file /etc/resolv.conf
with a line such as search example.com.
When the user enters wiki, the local DNS resolver might append the search domain to the input
and resolve it as wiki.example.com only once a query for wiki in the global DNS has returned no results.
When the top-level domain .wiki is introduced,
the user can no longer access the company’s Wiki.
Before employees load a resource from an unintended third party or leak information to the Internet,
the controlled interruption ensures that the lookup fails
and that the name collision can be detected before it causes harm.

Protocols

The above entities communicate with two kinds of protocols:
They use delivery protocols to deliver messages
and access protocols to access the user’s mailbox.
As discussed earlier,
only SMTP for message relay is mandatory.
All other protocols can be replaced in a proprietary setup.
For example, there are efforts
to combine message submission
and mailbox access
in a standardized way.

imap formessagestorage imap or pop3for messageretrieval smtp formessage relay smtp formessagesubmission Mail clientof sender Outgoingmail serverof sender Incomingmail serverof recipient Mail clientof recipient Incomingmail serverof sender Outgoingmail serverof recipient

The simplified email architecture
with delivery protocols in green
and access protocols in blue.

Use of TLS

Historically, SMTP, POP3, and
IMAP ran directly on top of the transport layer
using the Transmission Control Protocol (TCP),
which means that the communication was neither encrypted nor authenticated.
Anyone with access to one of the networks through which the communication was routed
could therefore read and potentially alter your messages.
Even your user password might have been transmitted in the clear.
In theory, the solution is straightforward:
Use Transport Layer Security (TLS)
to encrypt and authenticate the communication between each pair of entities.
In practice, however, you want to be backward compatible:
A server that expects requests to be in a specific format cannot suddenly handle a request for a TLS handshake.
There are two ways around this problem:

  • Implicit TLS:
    Introduce a new port number for each service on which the communication starts directly with a TLS handshake.
    The protocol variant which uses TLS implicitly is denoted by appending an S to its name.
    For example, IMAP becomes IMAPS.
  • Explicit TLS or STARTTLS,
    sometimes mistakenly called opportunistic TLS:
    Allow the client to upgrade an insecure connection to a secure connection with a command
    once the server has indicated that it supports TLS.
    The communication is secured only if the client requests this explicitly.
    The server cannot require the upgrade to TLS as this would break backward compatibility.

With one notable exception,
most longstanding email protocols were adapted to support both Implicit TLS and Explicit TLS.

Implicit TLS versus Explicit TLS

When comparing the two approaches,
Implicit TLS is significantly easier to implement, debug, and deploy than Explicit TLS.
For example, many implementations of Explicit TLS allowed an attacker to
inject commands during the unencrypted phase,
which would then be executed during the encrypted phase of the protocol.
Implicit TLS was once discouraged in favor of Explicit TLS for the
following reasons:

  • Implicit TLS leads to new protocols:
    The discovery of whether TLS is supported should be made by the client
    and not by the user, who is likely confused by additional protocol options.
    However, the same can also be accomplished with Implicit TLS.
  • TLS can be used insecurely:
    Unless prohibited by the client or the server,
    TLS can be used in deprecated versions or with weak security parameters.
    The protocol variant with Implicit TLS can possibly mislead users into a false sense of security.
  • Worse opportunistic mode:
    If the client prefers to proceed without encryption and authentication
    rather than aborting the connection when the server doesn’t support TLS,
    Implicit TLS forces the client to wait for a timeout on the new port
    before establishing another connection on the traditional port.
    Once a secure connection could be established, though,
    the client should no longer accept insecure connections.
    Since the insecure protocol could still advertise when its secure variant is available,
    having only Implicit TLS wouldn’t cause a lot of overhead in practice.
  • Port number exhaustion:
    If every protocol requires two ports (one to be used with TLS and one without TLS),
    only half as many protocols can be accommodated in the limited space of port numbers.
    Luckily, this won’t be a problem anytime soon.

Since the ease of deployment should trump any other concerns when it comes to security,
RFC 8314 recommends Implicit TLS over Explicit TLS
for IMAP, POP3, and SMTP for message submission since 2018.
When used opportunistically,
Implicit TLS and Explicit TLS provide security only against
passive attacks,
where an attacker can merely eavesdrop on your communication but cannot interfere with it.
In the presence of an active adversary,
who can modify and drop network packets,
neither Explicit TLS nor Implicit TLS are secure
unless the client has a trusted way to know that the server supports TLS.
In the case of Implicit TLS, the attacker just has to drop the client’s communication to the new port,
which forces the client to connect to the old port using the insecure protocol in order to remain backward compatible.
In the case of Explicit TLS,
the server lists TLS among its capabilities while the communication is not yet authenticated.
The attacker can simply strip TLS from the server’s capabilities,
which leaves the client with no other option than to continue in plaintext.
Alternatively, a client can sacrifice compatibility and refuse to exchange messages over an insecure channel.
However, such a change is difficult to introduce because users hate it when their setup no longer works.
It is therefore better if the client has a trusted way to know whether the server supports TLS.
The following three methods are used in practice to inform the client:

  • Previous connections:
    Once a mail server has been upgraded to support TLS,
    it almost certainly won’t be downgraded again.
    Based on this heuristic,
    the client can refuse plaintext connections to any server to which it had a TLS connection in the past.
  • Authenticated channel:
    While the server cannot reliably inform clients about its capabilities over a downgradeable protocol,
    it can use another, already authenticated protocol,
    such as DNSSEC,
    to convey this information to them.
  • User configuration:
    Last but not least, the user can configure the client according to some documentation,
    which has to be trustworthy, of course.
    The server’s capability might be printed on a leaflet or mentioned on a website secured with HTTPS.

Since securing email deserves much more attention,
I’ve dedicated a whole section to transport security later in this article.

TLS settings in mail clients

If you have to configure your mail client manually,
it will likely choose the right security option automatically
based on the port number that you’ve entered.
Different clients call the two options differently.
Just make sure that one of the two is enabled.

Mail clients often use other names for Implicit TLS and Explicit TLS.

As far as I can tell,
Apple Mail doesn’t distinguish between Implicit TLS and Explicit TLS.
As long as the default ports are used,
this seems like a reasonable simplification.
However, how does Apple Mail determine whether to use Implicit TLS or Explicit TLS
when one of the services is deployed on a custom port?
Will it try both and see which one worked?

Anyhow, we can just hope that mail clients refuse insecure connections when the appropriate TLS option is enabled.
I assume this is the case, but having the actual behavior documented would still be nice.
For example, Apple Mail has an option to allow insecure authentication under “Advanced IMAP Settings”,
which doesn’t disable the “Use TLS/SSL” checkbox as seen below.
The documentation says:
“For accounts that don’t support secure authentication,
let Mail use a non-encrypted version of your user name and password to connect to the mail server.”
What does this mean?
Are they talking about CRAM (challenge-response authentication mechanism),
which uses a hash function and not encryption,
or does this option make TLS opportunistic? 🤷‍♂️


The server settings in Apple Mail when “Automatically manage connection settings” is disabled. Also somewhat disappointingly, Apple Mail uses Explicit TLS rather than Implicit TLS by default.
Encryption on the Web

Historically, your web browser used the HyperText Transfer Protocol (HTTP)
to fetch websites and other resources from web servers.
Just like the original email protocols, HTTP runs directly on top of TCP,
which means that its communication is neither encrypted nor authenticated.
Since anyone on your network can read the transmitted messages
and hijack your session,
HTTP should no longer be used.
Also similar to the email protocols, there is a variant of HTTP called HTTPS,
which uses Implicit TLS to protect your communication.
In order to remain backward compatible, HTTPS has to use a different port.
While the default port for HTTP is 80, the default port for HTTPS is 443.
What is less well known because it’s rarely used, is that HTTP supports Explicit TLS as well.
Since version 1.1, HTTP has an Upgrade header field
to upgrade an insecure connection to a secure one.
Because Explicit TLS maintains backward compatibility,
it can be offered on port 80 as documented in RFC 2817.

Deployment statistics

What percentage of email is encrypted in transit?
Interestingly, Google publishes statistics about this
with the data going back as far as December 2013.
While not necessarily representative of overall email traffic,
the data shows that TLS usage for emails sent from Gmail increased from around 40% in 2013 to around 90% in 2020,
while TLS usage for email sent to Gmail increased from around 30% in 2013 to almost 95% in 2020.
This rapid increase in transport security
is likely due to the Snowden effect,
which sparked initiatives such as HTTPS Everywhere
and STARTTLS Everywhere.
Solely relying on TLS for security, including the protection of passwords,
has its own problems, though.
(You might also be interested in a similar report by Google about
HTTPS usage on the web.)

Port numbers

Each protocol specifies a default port on which servers listen for incoming requests.
Instead of scattering the port numbers used by various email protocols throughout the following subsections,
here is a table with all the relevant information for future reference:

The port numbers used by the various email protocols.


Since RFC 8314,
Implicit TLS is the preferred option and cleartext is considered obsolete on the port for Explicit TLS.
(I put the discouraged ports for Explicit TLS in parentheses.)

Why does SMTP for Relay have no port for Implicit TLS?

First of all, we’ll talk in a minute about why SMTP is different for submission and relay.
The official argument
for why SMTP for Relay has no port for Implicit TLS
is that MX records have no way to indicate which port to use and thus port 25 has to be used.
In my opinion, this argumentation is misleading.
A more accurate answer is that the outgoing mail server had no secure way to discover
whether an incoming mail server supported TLS back then,
so opportunistic security was all one could hope for at the time.
(Manual configuration isn’t an option for relay and DNSSEC was standardized only in 2005 and deployed in 2010.)
Since opportunistic TLS is more easily accomplished with Explicit TLS rather than with Implicit TLS,
we’re stuck with Explicit TLS for message relay to this day,
even though incoming mail servers can now indicate their TLS capability
in a secure way.

(In a twist of history,
port 465 was shortly registered for SMTP for Relay with Implicit TLS in 1997 before it was revoked again in 1998
when STARTTLS for SMTP was standardized.
Since some mailbox providers began to use this port for message submission with Implicit TLS,
port 465 was officially recognized for this purpose in 2018.)

Delivery protocols

Submission versus relay

The Simple Mail Transfer Protocol (SMTP) is used for two different purposes:
The mail client uses it to submit a message to the outgoing mail server of its user,
while the outgoing mail server uses it to relay the message to the incoming mail servers of the recipients.
Originally, though, mail servers relayed messages from anyone to anyone.
This is called open mail relay.
In particular, there was no distinction between outgoing mail servers and incoming mail servers.
There were just mail transfer agents, which relayed messages among them.
Mail clients connected to mail transfer agents just like other mail transfer agents did
and asked them to deliver a given message for them.
This approach had two problems:

  • Abuse by spammers:
    By routing their mail through relay servers of reputable organizations,
    spammers made it difficult to block their messages based on their origin.
    Additionally, a single message to a relay server could have a large number of recipients,
    which allowed spammers to exploit the still costly bandwidth of others.
    However, this also meant that a large number of spam messages were identical,
    which made them relatively easy to filter out on the receiving side.
  • Unwanted rewriting:
    Emails have to be in a certain format and mail servers started rewriting them
    so that they adhere to the standard as well as to organization-specific policies.
    However, relay servers are not supposed to modify messages,
    and apparently, such modifications caused more harm than good.

For these reasons, RFC 2476
introduced a separation between submission and relay in 1998.
From then on, mail clients were expected to submit outgoing messages
on port 587 instead of port 25
so that mail servers can handle them differently from relayed messages more easily.
The RFC also allowed submission servers to require authentication
before accepting a message.
In the late 90s, submission was often restricted based on the IP address of the mail client.
Allowing submission only from within the organization meant,
though, that travelling employees couldn’t use the outgoing mail server of their organization.
Just a few months later, SMTP was extended with a flexible authentication mechanism,
which is still in use today.
RFC 2476 also permitted submission servers to modify messages
in specific ways with the intention that relay servers would stop doing so.
Equally importantly, the separation between submission and relay allowed the mail transfer agent of an organization
to reject all messages which were addressed to non-local users.
This is how the modern email architecture
with a server for outgoing mail and a server for incoming mail was born.

As a consequence of this separation,
the original SMTP was split into two protocols:
One for submission and one for relay.
Apart from using different port numbers,
they differ mostly in their use of SMTP extensions.
The submission protocol is specified most recently in RFC 6409,
which also defines what a submission server has to do,
what it should do,
and what it may do.
These aspects affect only how a submission server is supposed to behave
but not how mail clients communicate with the server.
This is why the two protocols are rarely distinguished when talking about SMTP.
For example, Wikipedia also has just a single article
for the two protocols.
When a distinction is required, such as in technical documents,
the submission protocol is called SUBMISSION
(or SUBMISSIONS when Implicit TLS is used),
while the relay protocol kept the name SMTP.
For the reason we discussed above,
SMTPS
doesn’t exist.
This doesn’t stop Wikipedia from having an article about it, though.
By now, you should also understand why the identifier used for the autoconfiguration
of a mail client is _submission and not _smtp.

Header fields and body

We’ll have a closer look at the format of messages in the next chapter,
but since we already want to transmit messages in this section,
we have to cover the basics now.
A message consists of several header fields
and an optional body,
which follows after an empty line.
Each header field has to be on a separate line but can,
if necessary, span several lines.
Identical to HTTP,
header fields are formatted as Name: Value.
What follows is a simple example message.
You can find more examples in RFC 5322.

From: Alice 
To: Bob 
Cc: Carol 
Bcc: IETF 
Subject: A simple example message
Date: Thu, 01 Oct 2020 14:56:37 +0200
Message-ID: 

Hello Bob,

I think we should switch our roles.
How about you contact me from now on?

Best regards,
Alice

A simple example message with a sender, three recipients, a subject,
a date, a message ID, and a body.

Message versus envelope

While outgoing mail servers may add missing header fields
and sign each message,
incoming mail servers should only add trace information to the top of a message
and leave the message as is otherwise.
The information relevant for handling the message,
such as the addresses to deliver the message to and the address to report failures to,
belongs to the so-called envelope.
The envelope belongs to the Simple Mail Transfer Protocol (SMTP),
and it can change completely during the delivery of a message.
The message, on the other hand, mostly stays the same during delivery,
and its format is also used by two access protocols.
The important thing to remember is that emails are delivered based on the addresses in the envelope
and not the addresses in the header section of the message.
Somewhat unfortunately, the fields in the envelope are called similarly to some header fields in the message:
MAIL FROM for the address to report failures to and RCPT TO for each address to deliver the message to.

Diverging envelope example

Let’s have a look at how the above message is delivered
in order to understand how the envelope addresses diverge from the message addresses:

Mail client ofalice@example.org EnvelopeMAIL FROM:RCPT TO:RCPT TO:RCPT TO: MessageFrom: Alice To: Bob Cc: Carol Outgoingmail serverof example.org

Submission: The mail client of Alice removes the Bcc header field from the message
and submits the message with all recipient addresses in the envelope,
including the ones of Bcc recipients, to the outgoing mail server.
Automatic responses shall be sent to the mailbox of Alice.

Outgoingmail serverof example.org EnvelopeMAIL FROM:RCPT TO:RCPT TO: MessageFrom: Alice To: Bob Cc: Carol Incomingmail serverof example.com

First relay: The outgoing mail server is now responsible
for delivering the message to the recipients that the mail client specified.
It sees that two recipient addresses are handled by the same domain
and delivers the message in a single envelope to the incoming mail server of this domain.
The outgoing mail server could also connect to the incoming mail server twice,
delivering the message once for Bob and once for Carol.
I don’t know which approach is more common in practice.
RFC 5321 just says that,
when the same message is delivered to multiple recipients in the same session,
it should be delivered with a command sequence of
MAIL FROM, RCPT TO, RCPT TO, DATA rather than
MAIL FROM, RCPT TO, DATA, MAIL FROM, RCPT TO, DATA.
We’ll discuss how the envelope corresponds to protocol messages soon.

Incomingmail serverof example.com EnvelopeMAIL FROM:RCPT TO: MessageFrom: Alice To: Bob Cc: Carol Incomingmail serverof example.net

Alias: The incoming mail server of example.com knows that
carol@example.com is an alias for caroline@example.net.
It thus forwards the original message without any modifications to the incoming mail server of example.net.
A potential delivery failure is still reported to Alice.

Outgoingmail serverof example.org EnvelopeMAIL FROM:RCPT TO: MessageFrom: Alice To: Bob Cc: Carol Incomingmail serverof ietf.org

Second relay: The outgoing mail server of Alice also has to deliver the message to the recipient ietf@ietf.org,
so it does that.

Incomingmail serverof ietf.org EnvelopeMAIL FROM:RCPT TO: MessageFrom: Alice To: Bob Cc: Carol Incomingmail serverof example.net

Mailing list: It turns out that ietf@ietf.org is a mailing list.
It is now the task of the mail server of ietf.org to deliver the message to all subscribers of this list,
with one of them being dave@example.net.
RFC 5321 requires that
the bounce address
as specified in the MAIL FROM field of the envelope
is changed to the entity who administers the mailing list.
The entity can be a person but is typically a piece of software,
which keeps track of delivery failures in order to revise the list.
The RFC also demands that the From field in the message remains the same.
Mailing list tools
often modify the message in some ways,
for example by adding a field to the header and a footer to the body
in order to let recipients of the message unsubscribe from the mailing list.
Alias addresses and mailing lists cause difficulties for domain authentication.

Who removes the Bcc header field?

Is removing the Bcc field the job of the mail client or the job of the outgoing mail server?
The relevant standards are silent on this but experts agree
that the software which constructs the envelope from the message is responsible for this.
If SMTP is used for submitting the message to the outgoing mail server
(rather than using one of the custom approaches),
the mail client has to remove the Bcc field for the primary (To) and secondary (Cc) recipients.
Since this is not clearly stated in the standard,
there existed (and maybe still exist) mail clients
which relied on the outgoing mail server to remove the Bcc field.
However, RFC 6409 lists Bcc removal
neither among the mandatory actions nor among the
permitted message modifications for outgoing mail servers.
While some outgoing mail server software, such as Postfix,
which is deployed on around 34% of the reachable mail servers on the Internet,
drop the Bcc header field by default,
others, such as Exim,
which is deployed on around 57% of the reachable mail servers on the Internet,
do so only if they are invoked with the
-t option.
(This option was introduced for use in pipelines,
such as cat message | sendmail -t.)
As a result, users could end up with the list of Bcc recipients going through to non-Bcc recipients
depending on their specific combination of mail client and outgoing mail server software.
Since neither mail clients nor outgoing mail servers document how they treat Bcc recipients,
you have to send a test email to figure out the behavior of your particular setup.

RFC 5322 allows four different behaviors
when it comes to Bcc recipients, which we’ll study on the basis of another example:

From: Alice 
To: Bob 
Bcc: Carol , Dave 
Subject: Followup to previous message
Date: Thu, 01 Oct 2020 15:04:26 +0200
Message-ID: 

Hi Bob,

I've changed my mind. Please forget the previous message.

All the best,
Alice

Another example message with several Bcc recipients.

  • Complete removal: The mail client removes the Bcc field from the message
    and delivers the message with a single envelope for all recipients to the outgoing mail server.
    We already encountered this behavior in the previous box.
    As far as I can tell, this is by far the most common behavior in practice.

Mail client ofalice@example.org EnvelopeMAIL FROM:RCPT TO:RCPT TO:RCPT TO: MessageFrom: Alice To: Bob Outgoingmail serverof example.org

The Bcc field is removed from the message for all recipients.
  • Grouped delivery: The mail client splits the recipients into two groups.
    The non-Bcc recipients get the message in which the Bcc field is removed,
    while the Bcc recipients get the original message,
    in which all Bcc recipients are listed.

Mail client ofalice@example.org EnvelopeMAIL FROM:RCPT TO: MessageFrom: Alice To: Bob Outgoingmail serverof example.org

The non-Bcc recipients get the redacted message.

Mail client ofalice@example.org EnvelopeMAIL FROM:RCPT TO:RCPT TO: MessageFrom: Alice To: Bob Bcc: Carol , Dave Outgoingmail serverof example.org

The Bcc recipients get the original message.
  • Individual delivery: While all non-Bcc recipients receive the same message,
    each Bcc recipient receives a separate version of the message,
    in which only they are listed as a Bcc recipient.
    Just like the first approach,
    this prevents Bcc recipients from learning about any other Bcc recipient.

Mail client ofalice@example.org EnvelopeMAIL FROM:RCPT TO: MessageFrom: Alice To: Bob Outgoingmail serverof example.org

All non-Bcc recipients get the same redacted message.

Mail client ofalice@example.org EnvelopeMAIL FROM:RCPT TO: MessageFrom: Alice To: Bob Bcc: Carol Outgoingmail serverof example.org

Carol in Bcc gets her own version of the message.

Mail client ofalice@example.org EnvelopeMAIL FROM:RCPT TO: MessageFrom: Alice To: Bob Bcc: Dave Outgoingmail serverof example.org

And so does Dave.
  • Empty field: While the standard requires that Bcc recipients are never disclosed to non-Bcc recipients,
    it allows the sender to indicate with an empty Bcc field that there were hidden Bcc recipients.
    Such a hint can be provided in any of the other three approaches.
    Therefore, this is more of a second dimension rather than a fourth option,
    increasing the overall number of Bcc possibilities to 3 · 2 = 6.

Mail client ofalice@example.org EnvelopeMAIL FROM:RCPT TO:RCPT TO:RCPT TO: MessageFrom: Alice To: Bob Bcc: Outgoingmail serverof example.org

An empty Bcc field indicates that there were hidden recipients without disclosing them.

The advantage of removing the Bcc field completely is that
the mail client has to submit the message only once to the outgoing mail server.
The disadvantage of this approach is that
Bcc recipients don’t learn why they have received a given message:
They might have been a hidden recipient
or one of the non-hidden addresses might have forwarded the message to their mailbox.
Hidden recipients shouldn’t send a response to non-hidden recipients
because this discloses the fact that they also received the message,
which is what the author of the initial message tried to keep secret.
In my opinion, mail clients should warn users
when they click on “reply to all” for emails that weren’t addressed to them,
but none of the mail clients I tested did.
Even if the Bcc field is removed by the sender,
mail clients could deduce from the added trace information
whether the message was first received for a listed recipient before being forwarded to their mailbox
if the address of the mailbox is not among the recipients.
In other words, the drawback of the complete removal approach could be compensated by mail clients but none of them do.

The only way to be sure that a Bcc recipient won’t reply to all recipients by accident is to
first send the message to the non-Bcc recipients and then forward the message to the hidden recipients.
If the hidden recipients don’t need to be hidden from each other,
you can list them in the To field of the forwarded email.
Otherwise, keep them in the Bcc field.

The Bcc field is often used to send an email to undisclosed recipients:
The primary recipients of the message are put into Bcc in order to prevent them from seeing each other.
Some mail clients, such as the Gmail web interface,
indicate this as the sender by using an empty group construct,
such as undisclosed-recipients:;, in the To field.
As we learned above, this behavior isn’t guaranteed by the standard.
Given how prevalent it is to use Bcc for undisclosed recipients,
I think a new iteration of RFC 5322 should reflect
user expectation and formally deprecate the grouped delivery approach unless the user agreed to this behavior.

While the individual delivery approach is nice in theory
because recipients are informed about why (and to which alias) they received a message,
it isn’t ideal in practice because it shifts work from the server back to the client.
One of the reasons why we use outgoing mail servers
is that mail clients have to submit a message addressed to many recipients only once.
Creating and uploading an individual version of the message for each Bcc recipient
on the client-side defeats this purpose.
While some of the approaches to solve the double-submission problem
can alleviate this issue especially when large attachments are involved,
a simple SMTP extension for submission would do the trick.
Since outgoing mail servers have no standardized way to indicate to mail clients
that they remove the Bcc header field from the message against the intention or at least spirit of the standard,
mail clients might upload individual versions of the message for Bcc recipients in vain.
As a consequence, mail clients should opt for complete Bcc removal by default.
However, they could do much more to recover some of the lost information on the receiving end
and then display this information to their users.
If you know about a free mail client which does this,
please let me know.

Sometimes, the Bcc field is simply used to prevent certain recipients from getting replies
rather than to hide them from other recipients.
An example of this is when you move the person who introduced you to someone else to Bcc
while still thanking them for the introduction in the reply.
This use case could also be addressed with a Do-Not-Reply-To field,
which lists all addresses that should be skipped in a reply.
Such a header field would also solve the no-reply problem.
However, it’s almost impossible to bring innovation to email
because first implementations and then users would have to adopt such a change.

How does Gmail recover the Bcc header field of sent messages?

The Bcc field serves yet another purpose:
It reminds the author to whom they sent a message.
While mail clients should remove the Bcc field
when submitting a message to an outgoing mail server,
they store the message with the Bcc field in the sent folder of the user’s mailbox.
As you might remember, Gmail does things differently, though.
Instead of letting the mail client submit a copy of the message to the sent folder,
the outgoing mail server stores all sent messages in the user’s mailbox automatically.
This leads to the following question:
If mail clients remove the Bcc field from a message before sending it,
does Gmail recover the Bcc field for the user’s copy in the sent folder?
The answer is yes.
I tested this by submitting messages manually with the tool below.
Gmail adds any RCPT TO addresses from the envelope
which are not among the recipients of the message
to a new Bcc field at the very top of the message
(even above the Received and Return-Path header fields,
which emails synchronized via IMAP don’t have).
A consequence is that the display names of Bcc recipients cannot be recovered.
This procedure works reasonably well as long as the mail client submits the message only once
with all recipients in the envelope.
If the mail client opts to deliver a separate version of the message to Bcc recipients,
Gmail fails to merge the Bcc recipients from the second submission into the message from the first submission.
It just ignores the second message with additional Bcc recipients
and the same Message-ID for archiving
even if it is submitted in the same session
by continuing with another MAIL FROM command after submitting the DATA.
If you think you can use this to bypass Gmail’s sent archive,
I must disappoint you:
If you submit another message with the same Message-ID but a different body,
Gmail stores the second message in the sent folder as well.
Moreover, Gmail always removes the Bcc field for recipients,
no matter whether you send the email via SMTP or the website.

Simple Mail Transfer Protocol (SMTP)

The Simple Mail Transfer Protocol (SMTP)
was first specified in RFC 821 in 1982.
As its name suggests, it is a fairly simple protocol:

Client Server (Open connection) 220 server.example.com HELO client.example.org 250 server.example.com MAIL FROM: 250 Ok RCPT TO: 250 Ok DATA 354 Go ahead (Actual message) 250 Ok QUIT 221 Bye (Close connection)

After opening a TCP connection on port 25,
the client sends commands and the server responds with
status codes.
Once they greeted each other, the client transmits the envelope,
followed by the DATA command and the message.

Command syntax

The first question that came to your mind after reading the above
sequence diagram
probably was: Is HELO a typo?
No, it’s not.
SMTP commands simply consist of four characters.
They are almost always written in uppercase,
even though they are case insensitive.
But yes, HELO does stand for “hello”.
The purpose of this command is for the client to identify itself to the server with a domain name or an IP address.
The identity provided by the client is relevant only in rare circumstances.

Why are the MAIL FROM and RCPT TO commands longer than four characters, then?
They’re not.
The commands are just MAIL and RCPT.
FROM and TO denote the subsequent parameter value.
Some ESMTP extensions define additional parameters for the MAIL command.
The name and value of these additional parameters are separated by an equals sign rather than a colon, though.

Field terminology

Historically, the client could also specify
how the message shall be routed.
For this reason, the MAIL FROM address is also known as the reverse path
and the RCPT TO address is also known as the forward path.
Alternative names
for the MAIL FROM address are bounce address,
return path, envelope from, and 5321 from
(according to the most recent RFC for SMTP).
I will stick to MAIL FROM and RCPT TO for the envelope fields
and to From, To, Cc, and Bcc for the message fields.
As we will see later on,
the MAIL FROM address is added to the message in a Return-Path field.
Return-Path is thus a message field rather than an envelope field.

Extended Simple Mail Transfer Protocol (ESMTP)

A framework for extending SMTP was introduced in RFC 1425 in 1993.
The extensible protocol, which is backward compatible with SMTP,
is called the Extended Simple Mail Transfer Protocol (ESMTP).
ESMTP was revised in RFC 1651 (1994),
RFC 1869 (1995),
RFC 2821 (2001),
and most recently in RFC 5321 (2008).
The basic idea behind ESMTP is that the client greets the server
with the “extended hello” command EHLO
instead of the old “hello” command HELO.
This indicates to the server that the client understands ESMTP.
The server responds with all the SMTP extensions it supports.
For the rest of the session,
the client can then make use of the server’s advertised capabilities.

ESMTP tool

Let’s put theory into practice.
The following tool generates the command sequence to submit or relay an email
with parameters of your choice.
One way of using the tool is simply to observe how parameter changes affect the protocol flow.
The reason for building this tool, however,
is that you can copy the commands to your command-line interface
and send messages without the assistance of a mail client.
Since you shouldn’t enter your email password on a random website like this one,
I recommend that you use the mode for submission only with demo accounts which you’ve created for this purpose.
The password is stored in the local storage
of your browser without any protections until you erase the history.
Having said that, the tool is open source
like the rest of this website, and if you don’t trust me that this website is served from those files,
you can also build and run this website locally.
The tool uses Thunderbird’s database
and Google’s DNS API
to resolve the server you want to connect to and the API by ipinfo.io
to determine your IP address when you click on Determine next to the Client field.
The text in gray mimics what the responses from the server likely look like.
What you actually receive from the server will be different.
As long as the returned status code
starts with a 2 or a 3, you should be fine.
If the returned status code starts with a 4 or a 5, something went wrong.
I list some ideas for things you can try out after the tool.
The boxes after that provide you with more information on various aspects,
which are useful for troubleshooting problems you might run into.
If you need more help, send me an email
(probably with your mail client rather than with this tool). 🙂

Tool instructions
  1. Create a new account at a mailbox provider of your choice.
    If you opt for Gmail, you should read this box first.
  2. Enter the address of your account in the From field and your password in the Password field.
    Set the Mode to Submission.
  3. After composing the message (To, Subject, and Body),
    try to submit it to the outgoing mail server with the listed commands.
  4. The first line opens a TLS channel to the specified Server.
    All other commands are sent to the server inside this channel.
  5. You can copy each line in bold to your clipboard
    by clicking on it, which includes the newline character to submit the command.
  6. If the mail was submitted successfully, you can add more To or Cc recipients.
    By copying only some of the generated RCPT TO commands but the full message,
    you suppress the delivery of the message to the skipped recipients.
    For those that receive the message,
    it looks as if the message was delivered to all the recipients in the message.
    I already mentioned this problem above.
  7. Besides faking recipients, you can also try to fake the sender.
    Switch the mode from Submission to Relay
    and change the From field to an address that you don’t own.
    Now try to send the message directly to the incoming mail server of one of the recipients.
    If the incoming mail server and the domain
    which you try to send the email from are properly configured,
    your message should make it at best into the spam folder of the recipient.
    Chances are that your message will be rejected during the SMTP session or silently dropped thereafter.
    The incoming mail server might also graylist or blacklist your IP address.
    Since you usually don’t relay email from your computer, this is nothing to worry about.
    Forging the sender address is known as spoofing.
    Be careful which domains you try to impersonate.
    If the domain owner configured a DMARC record,
    they might be informed about your spoofing attempt and even receive the content of your message.

Important: Be a nice person and don’t scam others!
If you spoof the sender of an email in bad faith, you likely commit a crime in most countries.
I showed you this attack for educational purposes only because I believe that seeing is believing.
We can improve the state of email security only if consumers start demanding better security.
In this spirit, I encourage you to relay spoofed emails only to your own mailbox.
If such a spoofed email lands in your inbox,
ask your mailbox provider to be more rigorous in filtering scam emails or use the service of a different provider.
You’re hopefully also more motivated now to read the rest of this article.
In short, have fun with the above tool but always remember that with great power comes great responsibility!

Tool explanations
Command-line interface

If you’ve never used the command-line interface
of your operating system before, I suggest that you read a proper introduction first.
If you have no clue about what you’re doing, it’s easy to mess up your computer.
Additionally, you shouldn’t blindly execute arbitrary commands from the Internet.
Ideally, you should always try to understand what a command does based on a separate source first.
Having said that, the default program providing a command-line interface
is called Terminal on macOS.
It’s located in the /Applications/Utilities folder at the top of your file system.
The openssl tool should already be installed.
Continue here to test this.
On Windows, the default command-line program is called Command Prompt.
Various third parties provide OpenSSL binaries for Windows.
Here is a guide
for installing OpenSSL on Windows 10, which you can follow at your own risk.

Clipboard verification

If a website copies commands to your clipboard
for you, you should verify the content of your clipboard
before pasting it into your command-line interface.
Otherwise, a malicious website can display one command and copy another command.
One way to inspect your clipboard is to always paste its content into a
text editor first.
Since this is a hassle, you likely won’t do this for long.
A better approach is to have a window which displays the current content of your clipboard.
On macOS, the Finder has a “Show Clipboard” command in the “Edit” menu.
Unfortunately, this window is visible only if Finder is the active application.
A different approach is to open a new window in your terminal
and paste the clipboard once a second with the watch command:

How to watch your clipboard in your command-line interface.
Press “Control C” (^C) to exit the program.
OpenSSL versus LibreSSL

OpenSSL used to be the most important open-source library for TLS functionality.
(When OpenSSL was first released in 1998, TLS was still called SSL).
After the Heartbleed security vulnerability in April 2014,
the OpenBSD project forked
LibreSSL from OpenSSL.
In order to remain as compatible as possible, the command-line tool is still called openssl.
Since macOS 10.13.5, Apple ships LibreSSL and no longer OpenSSL.
I mention all of this here only because the arguments of the two commands are no longer identical.
Here are the documentations of the s_client subcommand
for OpenSSL
and for LibreSSL.

Execute the following command to figure out
whether openssl is installed on your system and which implementation you have:

Common SMTP extensions

The difference between ESMTP and SMTP is that ESMTP allows the server to list extended capabilities,
which the client can make use of during the session.
Let’s have a look at some common SMTP extensions on the basis of what Gmail supports:

A transcript of a session with the outgoing mail server of Gmail when using Implicit TLS.
[Brackets indicate redacted information.]

As can be seen in the above transcript,
Gmail’s outgoing mail server supports the following SMTP extensions:

  • SIZE (RFC 1870):
    This extension allows the server to specify an upper limit
    on the size of messages it accepts in bytes as part of the EHLO response.
    Gmail apparently accepts messages of almost 36 MB.
    The extension also allows the client to specify the size of the message in bytes
    as part of the MAIL command: MAIL FROM: SIZE=1234.
    The server can then reject the message for individual recipients
    in its response to each RCPT command,
    for example because a mailbox no longer has enough space to store a message of the stated size.
    Doing so has the advantage that a large message doesn’t even have to be transmitted
    if it will be rejected for all recipients based on its size.
    (The declared size can be an estimate.)
  • 8BITMIME (RFC 6152):
    MIME stands for Multipurpose Internet Mail Extensions
    and we’ll discuss this later.
    SMTP originally required the message to consist of 7-bit ASCII characters.
    This extension allows the server to signal that it’ll preserve the 8th bit of each byte in the message body.
    The client can then indicate in the MAIL command that
    the content of the message contains bytes outside of the ASCII range:
    MAIL FROM: BODY=8BITMIME.
    The server can still enforce a limit on the length of each line, though.
    Therefore, this extension doesn’t enable binary data transfer without encoding.
  • AUTH (RFC 4954):
    This extension allows the server to authenticate
    the user in the submission protocol before accepting a message for relay.
    Since the above tool makes extensive use of this extension,
    it deserves its own information box.
  • ENHANCEDSTATUSCODES (RFC 2034):
    This extension allows the server to respond with more precise
    status codes
    than the ones specified in the original standard.
    The server indicates that it returns enhanced status codes to the client
    by listing the extension in its response to the EHLO command.
    The server then prepends the enhanced status codes to the text part of the original status codes.
    The structure of enhanced status codes is class.subject.detail,
    with the values specified in RFC 3463 and maintained in a
    registry by IANA.
  • PIPELINING (RFC 2920):
    The goal of this extension is to reduce the number of round trips
    during an SMTP session.
    Instead of having to wait for a response from the server after each command,
    it allows the client to send several commands in a single packet to the server.
    The standard requires that EHLO, DATA, and QUIT are the last command in a batch of commands.
    AUTH must also be the last command in a batch unless the authentication method is PLAIN,
    which makes the command non-interactive.
    The server then returns all the status codes at once,
    matching the order of the transmitted commands.
    I’ve implemented pipelining in the above tool
    to make copying the commands easier.
  • CHUNKING (Section 2 of RFC 3030):
    This extension allows the client to split the message into several chunks and transfer each chunk separately,
    which is especially useful for large messages.
    Instead of the DATA command,
    the client can send one or several BDAT commands,
    which are immediately followed by the respective chunk.
    When using the BDAT command,
    the client specifies the size of the chunk in bytes,
    which has the advantage that the client doesn’t have to escape lines containing a single period
    and that the server doesn’t have to scan the transmitted data for the {CR}{LF}.{CR}{LF} sequence
    in order to determine the end of the message.
    This length prefix turns SMTP into a binary protocol temporarily.
    The client indicates the last chunk by appending LAST after the chunk size to the BDAT command.
    The RFC contains a simple example.
  • SMTPUTF8 (RFC 6531):
    This extension allows the client to use UTF-8
    instead of just ASCII in the MAIL and RCPT commands as well as the message.
    A server which supports the SMTPUTF8 extension also has to support the 8BITMIME extension.
    SMTPUTF8 facilitates the internationalization of email addresses.

If you connect to a different server,
you likely encounter other extensions as well.
The server indicates the end of the response to the EHLO command
by using a hyphen after the status code for all but the last line.

Backward compatibility

ESMTP uses the same port as SMTP,
so how does ESMTP ensure backward compatibility with SMTP?
(Since submission was split from relay in 1998 while ESMTP dates back to 1993,
we’re talking only about port 25 here.)
Remember that when an outgoing mail server connects to an incoming mail server,
it assumes the role of the client in that interaction.
There are only two cases to consider:

  • Old client ➞ new server:
    ESMTP servers still have to accept the old HELO command in order to remain compatible.
  • New client ➞ old server:
    SMTP servers which don’t understand the EHLO command respond with the error code 500.
    The client can either QUIT the connection or continue with the HELO command.
    According to this source,
    some mail clients send the EHLO command only if the first line from the server,
    which starts with the status code 220, contains ESMTP.
    This explains why most servers include ESMTP in their greeting even if the standard doesn’t require it.
STARTTLS extension

Explicit TLS is implemented with an extension called STARTTLS,
which is specified in RFC 3207.
Gmail didn’t list this extension
because we used SMTP with Implicit TLS on port 465.
If we open a TCP connection on port 587, STARTTLS is listed as well:

The STARTTLS extension is listed when we connect without TLS to Gmail’s outgoing mail server.

If the STARTTLS extension is listed in the response to the EHLO command,
the client can ask the server to upgrade the insecure channel to a secure one with the STARTTLS command.
If the server responds with the status code 220,
the client can continue with the TLS handshake.
Once the handshake is completed, the client and the server are
reset to their initial state.
In particular, the server must forget about the client’s argument to the EHLO command,
whereas the client must forget about the extensions supported by the server.
The client should send another EHLO command,
to which the server can respond with a different list of extensions than before the TLS handshake.
For example, the AUTH extension is missing in the above list
because passwords of users shouldn’t be transmitted over an insecure channel.
You can use the following command in your command-line interface
to let openssl issue the STARTTLS command after an initial EHLO command
and then continue with the TLS handshake:

When using -starttls smtp, openssl starts with a TCP connection
and upgrades it to a TLS connection by issuing the STARTTLS command.

If the server doesn’t list the STARTTLS extension
or responds with a status code other than 220 to the STARTTLS command,
the client has to decide whether it wants to continue or abort the connection.
As explained earlier,
neither Explicit TLS nor Implicit TLS is secure against
downgrade attacks when used
opportunistically.
The belief that only Explicit TLS with STARTTLS has this weakness is a common misunderstanding.
Due to backward compatibility,
it’s up to the client to require a secure channel or to abort otherwise.
If the client does require TLS,
it might no longer be able to submit or relay messages to some servers, though.

As a side note: OpenSSL has a
-name option
to let you specify the argument to the initial EHLO command.
Since the server must forget about this argument after the TLS handshake,
I have no idea what’s the point of providing this option.
This is likely the reason why LibreSSL doesn’t support this option in the first place.

User authentication

In order to protect their reputation and to reduce spam and phishing,
outgoing mail servers authenticate their users before accepting messages for relay.
This is done with the AUTH extension as specified in RFC 4954.
The AUTH extension itself is also extensible:
Servers can support new mechanisms, which clients can then make use of.
Since many application-layer protocols require authentication,
the IETF community abstracted the various mechanisms into the so-called
Simple Authentication and Security Layer (SASL),
which is specified in RFC 4422.
IANA maintains a list of
SASL mechanisms.
SMTP servers list all the mechanisms that they support after AUTH in their response to the EHLO command.
We’re interested in only four of them:

  • PLAIN (RFC 4616):
    The client sends the Base64 encoding
    of the user’s username and password as an argument to the AUTH command to the server.
    The username and password are separated by the null character.
    If you don’t trust the above tool,
    you can compute the encoding on your command line as echo -ne '000username000password' | openssl base64.
    The echo command writes the argument to
    its standard output,
    which is then piped to openssl for the Base64 encoding.
    The -n option to echo suppresses the trailing newline in its output,
    and the -e option enables interpretation of backslash escapes.
    I use four zeros instead of just two in the escape sequence
    to avoid problems if your username or password starts with a number.
    And if you’re wondering why there is a leading null character:
    The standard supports an additional field
    at the beginning, which is usually left empty in the case of SMTP.
    The username and password can consist of any Unicode character except the null character.
    All characters have to be encoded with UTF-8.
  • LOGIN (draft-murchison-sasl-login):
    This mechanism is obsolete but since it’s still widely offered,
    I decided to implement it in the above tool as well.
    Instead of sending the username and the password together,
    the server prompts for them separately once the client has initiated the authentication with AUTH LOGIN.
    The LOGIN mechanism has the same security properties as the PLAIN mechanism,
    it just requires more round trips and prevents pipelining because it’s interactive.
  • CRAM-MD5 (RFC 2195
    and draft-ietf-sasl-crammd5):
    As far as I can tell,
    this mechanism is not widely used by mail servers but still widely supported by mail clients.
    I cover this mechanism in more detail in a separate box.
    The summary is that the client puts the password and a challenge from the server
    through a one-way function
    and sends the output of this function to the server instead of the password.
    This was useful against passive attackers before the widespread deployment of TLS.
  • SCRAM (RFC 5802):
    SCRAM is not much more complicated than CRAM-MD5
    but has much better properties.
    Unfortunately, it’s not widely used so I didn’t bother to implement it in the above tool.
    In my opinion, all weaker password-based authentication mechanisms
    should be replaced with SCRAM or another, similarly secure mechanism.
    Therefore, it also deserves its own box.

Please note that the tool hides the password in the input field but unless you use CRAM-MD5,
anyone who can take a picture of your screen can easily decode the entered password.
When authenticating to an SMTP server, the server responds with either
235 2.7.0 Authentication successful or 535 5.7.8 Error: authentication failed.

Gmail authentication failure

If you want to submit an email to Gmail with the instructions generated by the above tool,
you have to allow access from less secure apps
in your account settings.
If the authentication still fails,
you might have to complete this page
according to these instructions.
Please note that Google disables access from less secure apps automatically if it’s not being used for some time.

Reverse DNS entry

I didn’t go into much detail about reverse DNS lookups
in my previous article about the Internet.
Simply put, IP address ranges are allocated
to Regional Internet Registries (RIR),
which allocate subranges to regional
Internet service providers (ISP).
The DNS zones under the special in-addr.arpa domain are delegated along the same hierarchy.
For example, when the Internet Assigned Numbers Authority (IANA)
allocated the IP address block 123.xxx.xxx.xxx to
the Asia-Pacific Network Information Centre (APNIC),
it also delegated the DNS zone 123.in-addr.arpa to APNIC.
You can check this
with the DNS tool below:

When APNIC allocates the IP block 123.234.xxx.xxx to an ISP,
it also delegates the DNS zone 234.123.in-addr.arpa to this ISP.
The ISP can then create so-called pointer records (PTR) to map IP addresses to domain names.
While DNS is normally used to resolve domain names to IP addresses,
pointer records under in-addr.arpa are used to do the reverse.
The reason why you have to reverse an IP address when doing a reverse DNS lookup
is because in IP addresses the root of the allocation hierarchy is on the left
whereas in domain names the root of the delegation hierarchy is on the right.
In reality, the situation is a bit more complicated because the 32-bit IPv4 address ranges
are no longer just allocated along the byte boundaries but also split at arbitrary positions.
This is known as classless inter-domain routing (CIDR)
and solved by classless in-addr.arpa delegation.

Since Internet service providers usually don’t configure reverse mappings
for the IP addresses of their residential customers, incoming mail servers use this
as a heuristic to fight spam.
If you use the above tool to relay a message directly to an incoming mail server,
your chances of having the message delivered are much higher
if your public IP address has a reverse DNS entry.
Somewhat ironically, this means that spoofing emails often works better
when you use the Wi-Fi of a hotel or a restaurant instead of your own.
If your public IP address has a reverse DNS entry,
the tool determines it when you click on “Determine”:

Newline characters

In teleprinters
(printers that operated like typewriters),
moving the carriage, which outputs the characters onto paper, back to the start of the same line
and moving the page to the next line were two separate instructions.
The former is known as carriage return (CR),
the latter as line feed (LF).
Both CR and LF were included as control characters
in the American Standard Code for Information Interchange (ASCII).
While some operating systems, such as Windows,
opted to encode a newline as a sequence of both CR and LF,
other operating systems, such as Linux
and macOS, use only LF to encode a newline.
As you can imagine, this causes a lot of
interoperability issues.
Both SMTP
and the message format require that lines end with both CR and LF.
By using the -crlf option,
openssl makes sure that this is the case.

Message termination

When using the DATA command,
the transmission of the message is terminated by a period on a line of its own.
So what happens if you include a line with a single period in an email?
SMTP specifies that
the sender has to insert an additional period at the beginning of every line
which starts with a period before transmitting the message.
The recipient then removes the leading period from every line
which has additional characters in order to restore the original message.
Periods at the beginning of lines in a message are escaped
like this only for transmitting the message with the DATA command but not when storing the message
(or when using the BDAT command).
You don’t have to worry about this;
the tool above does the escaping for you.

Origination date

The Date field
indicates the date and time at which the author of the message pushed the “Send” button.
It’s not supposed to reflect when the message is actually sent, though:
If the device is offline when the user clicks on “Send”,
the message is queued locally and the Date field isn’t updated when the message is submitted.
Messages must have a single Date field and
outgoing mail servers may add one if it’s missing.
The outgoing mail servers that I’ve checked don’t enforce any rules on the Date.
I’ve successfully submitted messages whose Date was one year in the past or in the future.
Do mail clients display messages with the date that was chosen by the sender?
Most don’t, some do.
Apple Mail and the webmail interfaces of Gmail, Yahoo, and Outlook
display the date when the message was received,
completely ignoring the Date specified by the sender.
I assume that they determine the received date based on the uppermost Received header field.
Thunderbird and Postbox, on the other hand,
display messages with the sender-chosen Date by default.
Since sender-chosen means attacker-chosen, I think this behavior is problematic,
especially since these mail clients tell all recipients that you’re using them.
For example, a scammer can backdate financial predictions and reference such messages in a current email.
Alternatively, you can backdate an email to meet a passed deadline.
Or if you want to make sure that your message lands at the top of the recipient’s inbox,
you can choose a date in the future.
Such tampering is easy to detect if you know how to inspect the raw message.
For ordinary users, however, a warning should be shown, in my opinion.
I reported this issue
to the Thunderbird team, but they were not interested in addressing it.

Spoofed sender during submission

Can you spoof the sender address
not only during relay but also during submission?
Or more precisely: Can you authenticate to an outgoing mail server as one user
but then use the address of a different user in the MAIL FROM and From fields?
According to RFC 6409,
outgoing mail servers may enforce submission rights, but they don’t have to.
If you want to know how your mailbox provider handles such submissions, you have to try it.
Some mail clients, such as Thunderbird
and Roundcube,
support so-called alternative sender identities to spoof the sender address for you.
If you want to do this manually in order to see the response from the server,
you can change the From address in the above tool
after you have already copied the AUTH command to your command-line interface.
Gmail, for example, accepts submissions with the address of a different user in the MAIL FROM and From fields
but then replaces both with the address of the authenticated user
and adds the spoofed sender address in an X-Google-Original-From header field.
Mail server software, such as Postfix,
needs to be configured
to reject submissions where the sender address doesn’t match the authentication address.
Postfix also has an option
to add the authenticated sender to the Received header field.
I think outgoing mail servers should reject spoofed sender addresses even if there is legitimate use.

I reported to Gandi.net on 27 October 2020
that their outgoing mail server accepts submissions with spoofed MAIL FROM and From addresses.
On the one hand, they told me that some of their customers use alternative sender identities
and that they won’t enforce any rules for them.
On the other hand, they let me know that they would address the issue
before my 90-day disclosure deadline.
When I tested this again before publishing this article,
I got the impression that more of my test messages were rejected
by their spam filter,
but I could still authenticate myself as a Gandi user
and then use my Gmail address in the MAIL FROM and From fields.
This allows an attacker to abuse the reputation of Gandi’s mail server at least for targeted attacks.

Limitations of the above tool
Other SMTP commands

Besides EHLO,
MAIL,
RCPT,
DATA,
and QUIT,
there are some other SMTP commands, which are rarely used in practice:

Command Argument Description
RSET Reset already transmitted sender, recipient, and mail data.
VRFY Mailbox Verify whether the given mailbox exists on the server.
EXPN Mailing list Expand the given mailing list (i.e. return the members).
HELP [Command] Ask for helpful information about the optional command.
NOOP Do nothing besides keeping the connection alive.

These commands can be used at any time during a session.
VRFY and EXPN are usually disabled for security reasons.

Let’s look at two examples:

What a response to the VRFY command usually looks like.
(The reply code of a disabled VRFY command should be 252, though.)

How Gmail responds to the HELP command. 😄

Automatic responses

In certain configurations, mail servers send a message in response to an incoming message,
which leads to the following problems.

Mail loops

If a received message causes a mail server to send one or several messages
which in turn trigger further messages,
we end up with a chain reaction.
In the case of email, chain reactions get out of control
if messages are sent in a loop
or if the forwarding rules result in a combinatorial explosion.
Both of them can happen by accident
or as a denial-of-service attack by an attacker.
Depending on the circumstances under which they happen,
chain reactions are prevented with the following measures:

  • Automatic responses are often sent to inform the sender
    that a message could not be delivered
    or that the recipient won’t read the message anytime soon.
    Sometimes, automatic responses are used to pose a challenge to the sender
    which needs to be completed in order for the message to be delivered.
    The incoming mail server of the sender should not respond to such responses automatically
    as this could result in messages being sent back and forth indefinitely between the two systems.
    Automatic responses should always be sent to the MAIL FROM address,
    which was specified in the envelope of the message.
    By using an empty MAIL FROM address
    (MAIL FROM:<>), a sender can indicate that no automatic response shall be sent back.
    Additionally, automatically submitted messages should be marked with the Auto-Submitted header field,
    which is specified in RFC 3834.
    If an automatic process sends a message in response to another message,
    the value of this header field should be set to auto-replied.
    If the message is triggered by another event,
    the value should be set to auto-generated.
    If a message contains an Auto-Submitted header field with
    a value other than no,
    no automatic responses should be sent.
    Furthermore, Microsoft Exchange Server
    introduced the custom header field X-Auto-Response-Suppress, allowing the sender to control
    which types of automatic responses shall be suppressed.
  • Email forwarding can cause loops as well.
    If alice@example.org was an alias for alice@example.com and vice versa,
    emails would be forwarded in an infinite loop.
    Since only loops should be prevented but neither message forwarding nor automatic responses,
    none of the previous techniques can be used.
    Instead, incoming mail servers add a non-standardized header field, such as Delivered-To or X-Loop,
    with the recipient’s address to messages before forwarding them.
    When incoming mail servers receive a message,
    they can simply go through its header fields to determine
    whether the message has already been delivered to the specified mailbox.
    If the message has already been delivered, they respond with a delivery failure.
    Another way to detect loops is to count
    the Received header fields in a message.
    If they exceed a certain threshold, the message is bounced.
    Both techniques require that mail servers only add additional header fields without removing existing ones.
  • Mailing lists forward incoming messages to all subscribers of the list.
    If two mailing lists are subscribed to one another,
    if automatic responses are sent to the mailing list’s address,
    or if a subscriber automatically forwards messages back to the mailing list,
    a mailing list is involved in a mail loop.
    Such a loop can be busted with the same techniques as before:
    Mailing lists shouldn’t forward messages with a header field of Auto-Submitted: auto-replied
    or Delivered-To followed by the address of the mailing list.
    However, mailing lists pose an additional problem:
    If mailing lists are subscribed to one another,
    the number of combinations before a loop occurs
    explodes with the number of involved mailing lists.
    For example, if you’re subscribed to ten mailing lists which are all subscribed to one another,
    a single message to one of them results in almost one million messages delivered to your inbox.
    To prevent this, mailing lists shouldn’t forward messages from other mailing lists,
    which can be detected with List-* header fields, such as List-Id or List-Unsubscribe.
Bounce messages

When a mail server fails to deliver a message,
it should send a so-called bounce message
to the sender to notify them about the failed delivery.
Since bounce messages are automatic responses,
they must be sent to the MAIL FROM address of the envelope.

3. Retrieve 2. Report 1. Submit Mail client Incomingmail server Outgoingmail server

How the user learns about a delivery failure.


If the delivery of a message fails on the recipient’s side,


the bounce message (in red) is generated by a different system.

Historically, bounce messages were in a format that could be interpreted only by a human sender.
However, many messages are sent by automated systems,
which should also be able to detect when a message couldn’t be delivered.
For example, mailing list software
should be able to remove no longer valid addresses from the list automatically.
Two techniques address this issue:

  • Machine-processable non-delivery reports (NDR):
    RFC 3464 specifies how multipart messages
    can be used to send so-called delivery status notifications (DSN) to the sender in a standardized way.
    In short, the bounce message is marked with
    Content-Type: multipart/report; report-type=delivery-status; boundary="…"
    and the machine-processable part is labeled with Content-Type: message/delivery-status.
    The report contains message-specific
    and recipient-specific fields,
    which are separated by a blank line.
    The RFC includes some examples.
    The advantage of this approach is that even mail clients can make use of the report.
    Its disadvantage is that not everyone supports this format and even if everyone did,
    the sender doesn’t learn for which recipient the message couldn’t be delivered
    if it was forwarded by an alias address.
    Since non-delivery reports include the header fields of the original message,
    this could be recovered from the trace information.
  • Variable envelope return path (VERP):
    Since the MAIL FROM address of the envelope can be different
    from the From address of the message, it can encode the recipient’s address.
    For example, when the mailing list server at list@example.com sends a message to alice@example.org,
    it can choose the MAIL FROM address as list-owner+alice=example.org@example.com.
    Since it has to be a valid address,
    the @ of the recipient’s address has to be replaced with something else, such as =.
    As long as the mailing list software can access the automatic responses that were delivered to such addresses,
    it can easily associate a response with the recipient who sent it.
    The software needs to guess only whether the response denotes a failed delivery
    or an out-of-office reply.
    It should remove addresses from the mailing list
    only if messages to a particular recipient cannot be delivered over a period of several weeks.
    If you look at the Return-Path header field of messages
    sent by mailing list providers, such as Mailchimp,
    you see an address which identifies you.
    The good thing about VERP is that it works very reliably.
    On the downside, mail clients can make use of this technique only with subaddressing.
    Since the syntax for this is specific to each mailbox provider if subaddressing is supported at all,
    I’m not aware of any mail clients which use VERP to enhance the user experience of delivery failures.
    Moreover, VERP requires that the message is transmitted separately for each recipient.
    While a single message can be delivered to several recipients by using several RCPT TO commands,
    the MAIL FROM command can be used only once for each message.
    Finally, the delivery of messages can be delayed due to graylisting
    if the MAIL FROM address includes a value which is unique to each message.
    Unique MAIL FROM addresses allow the mailing list software to identify
    which particular message could not be delivered to a particular recipient.
Backscatter

Given how easy it is to spoof the sender address,
it’s sometimes better not to send a bounce message.
Otherwise, the owner of the forged address might receive a large number of unsolicited bounce messages.
Such collateral spam is called backscatter.
In order to distinguish legitimate bounce messages from misdirected ones,
the outgoing mail server can authenticate the bounce address by extending it
with a hash-based message authentication code (HMAC).
This allows the incoming mail server to reject bounce messages
which are addressed to a non-authenticated address.
The best-known proposal for how to do this is called
bounce address tag validation (BATV).
It is specified in this draft.
In order to prevent the authenticated bounce address from being abused,
the HMAC is calculated
over the original MAIL FROM address and a timestamp
of when the authenticated address expires.
The timestamp and some part of the HMAC are prepended to the original MAIL FROM address
to form the authenticated bounce address.

Password-based authentication mechanisms

The following boxes focus on password-based authentication mechanisms,
which allow users to authenticate themselves to servers with only their username and password.
Due to the nature of the topic, some of the later information boxes are fairly advanced.
If you’re not interested in cryptography,
you may want to skip them.

Dangerous reliance on TLS

As we’ve seen above,
your password is usually sent to the outgoing mail server every time you submit a message.
Many people think that this is no problem because the password is transmitted over a secure channel.
The goal of this box is to convince you that this attitude is naive and dangerous.
In the remaining boxes of this subsection, I will explain how we could do much better.

One of the most important principles in information security
is defense in depth:
Critical systems should have several layers of protections
so that when one layer fails, another can stop the threat.
In risk analysis, this is sometimes referred to as the
Swiss cheese model.
There are three different ways in which
Transport Layer Security (TLS)
can fail to protect sensitive information:

  • Proxy server: TLS connections are often terminated
    at a so-called proxy server,
    which acts as an intermediary between the client and the actual server.
    While such proxies are operated by the same company as the actual server,
    the communication between the client and the server is no longer protected
    between the proxy and the actual server in the company’s private network.
    Running a proxy which appears to the client as the actual server
    is useful for load balancing
    and for accelerating the cryptographic operations with
    special hardware.
    On the downside, an employee or an attacker who compromised the company’s network
    potentially has access to the transmitted information, which is no longer protected by TLS.

Client Proxy Server

While the communication between the client and the proxy is protected (indicated by the blue lines),
the communication between the proxy and the server is exposed in the company’s private network.
  • Wrong server: The user’s mail client might be misconfigured
    to connect to a server controlled by the attacker.
    Instead of communicating with the mailbox provider,
    the client communicates with the attacker.
    In order to avoid detection by not raising any suspicion,
    the attacker may want to connect to the legitimate server themself
    and relay all messages in both directions.
    Both the client and the legitimate server have the impression
    that they communicate with the other party over a secure channel
    when in fact the attacker can read and modify the exchanged messages at will.
    But why would the mail client be misconfigured?
    On the one hand, the user might fall for a
    social engineering attack.
    While phishing is much easier when it comes to websites
    because the user just has to click on a malicious link,
    it’s also possible in the case of mail clients:
    The user just has to follow malicious instructions.
    On the other hand, the mail client might be attacked during autoconfiguration.
    Possible attack vectors are spoofed DNS entries if the client doesn’t require DNSSEC,
    or a compromised configuration database.
    Even if the client checks the domain name with some heuristic,
    the heuristic might be vulnerable to similar attacks,
    especially in the case of custom domains.

Client Attacker Server

The client is misconfigured and connects directly to the attacker,
who forwards all communication without raising any suspicion.
  • Compromised certification authority or compromised server key:
    While the public-key infrastructure
    worked as intended in the previous example
    (the malicious server had a legitimate certificate for its domain name),
    the current infrastructure is far from perfect.
    Its biggest design flaw is that any vendor-approved
    certification authority (CA)
    can issue certificates for any domain by default.
    There have been several attacks
    on the integrity of TLS, ranging from compromised and
    misbehaving certification
    authorities to surveillance programs
    and search warrants for private keys.
    In other words, TLS isn’t guaranteed to be secure,
    but it’s still much better than having no protection at all, of course.
    An illegitimately issued certificate allows the attacker
    to intercept the communication between the client and the server.
    As long as the client considers the certificate to be valid,
    it will accept the certificate and start communicating with what it believes to be the intended server.
    There have been several efforts to prevent certification authorities
    from issuing certificates without the consent of the domain owner,
    such as HTTP Public-Key Pinning (HPKP)
    and Certificate Transparency (CT).
    We will discuss another approach in the last chapter of this article.
    It’s important that these efforts don’t remain limited to the web
    but that mail servers begin to require more secure certificates as well.
    As I will explain to you in the following boxes,
    we don’t even need better certificates to protect the communication between mail clients and mail servers,
    simply using better authentication mechanisms would be enough.
    And unlike the efforts at the security layer,
    which prevent only attacks with maliciously issued certificates,
    better authentication mechanisms also prevent attacks with compromised server keys,
    where an attacker has gained access to the server’s private key.
    When I say that an organization or key is compromised,
    I just mean that its behavior or use is different from what it should be according to standards and agreements.
    Whether the integrity was lost due to an attack or due to deliberate actions by the owner doesn’t matter.
    For a security analysis, it’s also irrelevant whether the perpetrator acts in good faith or in bad faith.
    This article is not about the ethics of information security and
    government backdoors.

Client Server Server

The malicious server (in red) has the same name as the legitimate server (in green).
An attacker can impersonate the server by getting a certificate issued for their public key
or by gaining access to the private key of the server and using the original server certificate.

In these scenarios, the attacker is a so-called
man-in-the-middle (MITM)
in the conversation between the client and the server.

Cryptographic hash functions

Before we can discuss better authentication mechanisms,
we have to cover cryptographic hash functions first.
A cryptographic hash function is an algorithm,
which maps inputs of arbitrary size to outputs of fixed size deterministically and irreversibly:

Infeasible Efficient Input ofany size Output offixed size

Cryptographic hash functions are efficient to compute and infeasible to invert.
For the same hash function, the same input always maps to the same output.
The output is also called image
and the corresponding input its preimage.

The output of a hash function is called the hash of the input.
As a verb, hashing refers to applying the hash function to an input.
More formally, cryptographic hash functions have to fulfill the
following properties.
(A function which maps arbitrary inputs to fixed-sized outputs without fulfilling these properties
is just a hash function.
In this article, I always mean the former, though.)

  • Preimage resistance (also known as one-way function):
    It’s infeasible to find an input which hashes to a given output.

Find input Given output

In these graphics, the given values are displayed in blue and the values to find in green.
  • Second-preimage resistance:
    It’s infeasible to find a different input which hashes to the same output as a given input.

Given input 1 Find input 2 Same output

Knowing one input may not be useful to find another input which hashes to the same output.
  • Collision resistance:
    It’s infeasible to find two different inputs which hash to the same output,
    resulting in a collision.

Find input 1 Find input 2 Same output

Due to the birthday paradox
and attack,
this is a stronger requirement than the previous one.

Since hash functions map an infinite number of inputs to a finite number of outputs,
they have to produce an infinite number of collisions.
The point of cryptographic hash functions is not that they don’t have any collisions,
it just has to be infeasible to find them.
In practice, cryptographic hash functions should also satisfy the
avalanche criterion:
A small change in the input changes the output completely and unpredictably.
A cryptographic hash function is said to be broken
if a preimage or a collision can be found more efficiently
than with a brute-force search
or if the size of the output is so small that
a brute-force search becomes feasible with modern computers.

Regarding notation,
I will use : to assign
a value on the right to a variable on the left in the following boxes.
For example, a hash function is then written as Output: hash(Input).
Sometimes, several values need to be combined into one before hashing.
I’ll use + to concatenate
several values in a secure way: Output: hash(Input1 + Input2).
When implementing this, you can use a special character
which may not occur in any of the values as a delimiter
so that hash("a" + "bc") ≠ hash("ab" + "c").
The null character
is used for this purpose in the case of PLAIN authentication.

The term “hash” is most likely
borrowed from the kitchen, where it means to chop and mix ingredients when preparing food.

Secure Hash Algorithms (SHA)

The National Institute of Standards and Technology (NIST)
standardizes cryptographic hash functions under the name
Secure Hash Algorithms (SHA).
NIST published several Secure Hash Algorithms so far:
SHA-1 in 1995,
SHA-2 in 2001,
and SHA-3 in 2015.
The most commonly used cryptographic hash function is SHA-256.
You can try it and other hash functions with this tool:

The three digits at the end of some algorithm names indicate the size of the output in bits.
SHA-256, for example, hashes inputs of arbitrary size to 256 bits.
SHA-224, SHA-256, SHA-384, and SHA-512 belong to the SHA-2 family of hash functions.
Collisions have been found for MD5 and SHA-1,
which hash to 128 and 160 bits respectively.
These algorithms should no longer be used.

Salts against pooled brute-force attacks

The one-way property of cryptographic hash functions doesn’t prevent an attacker
from generating and testing possible inputs.
If the set of possible (or likely) inputs is small enough,
it can be searched exhaustively to find the preimage of a given hash.
If you try to find the preimage of many hashes,
you have to compute the hash of possible inputs only once.
If you want to find further preimages in the future,
you can store the computed input-output pairs in a reverse lookup table.
Instead of trying all possible inputs again,
which can take a long time,
you simply look up the hash to crack in the output column of your precomputed table.
Since computers have limited memory,
this approach works only for a limited number of inputs.
You can enumerate all possible inputs up to a certain length
or choose them from a dictionary.
You can reduce the amount of required memory by accepting longer lookup times.
This is known as a space-time tradeoff
and is achieved with so-called rainbow tables.

Searching for the preimage of many hashes at once can be prevented
by adding a random value to each input and storing this value together with the output.
While an attacker can still generate and test possible inputs,
they have to spend the required effort on each hash separately.
The additional input value is called salt.
In order to have the intended effect, the salt has to be chosen at random for each input
and should be as long as the output size of the used hash function.

Why is the random value called salt?
No one really knows.
When cooking, salt is something you add to your ingredients before mixing them.
With enough salt, you can make food unenjoyable.
Salting the earth
is also a historic practice to make land less hospitable for your enemy.
We also say to take something with a grain of salt.
Whatever the origin may be, the term fits well.

Nonces against replay attacks

When designing a cryptographic protocol,
you not only want to ensure that an attacker cannot produce certain messages,
you also want to ensure that an attacker cannot record such messages
and reuse them at a later point against one of the legitimate parties.
Such replay attacks
are prevented by including a number which may be used only once,
which is abbreviated to nonce.
The replay of messages needs to be prevented both within a session and across sessions.
The former is typically accomplished with a counter:
A message is accepted if its counter is higher than the previous counter.
The latter is usually achieved by using a random number for the duration of a session
so that no information has to be persisted across sessions.
Instead of including a session-specific value in each message,
choosing some of the cryptographic keys randomly for each session has the same effect.
When the uniqueness is incorporated into temporary keys,
we no longer speak of nonces but rather of ephemeral keys.
Unlike salts, which stay the same throughout the lifetime of a hash and need to be stored,
nonces and ephemeral keys can be thrown away after use.
Besides preventing replay attacks,
mixing some uniqueness into every message has the desirable side effect of preventing an attacker
from learning when the same underlying value is sent again by one of the parties.

Applications of cryptographic hash functions

Before moving on to authentication mechanisms,
I first want to mention some applications
of cryptographic hash functions:

  • Data integrity:
    Due to their collision resistance, cryptographic hash functions produce a unique
    fingerprint of the data which was fed into them.
    In other words, the hash of a file uniquely identifies the file.
    As long as you get the short hash from a trusted source,
    the large file can be downloaded from an untrusted source
    because you can detect potentially malicious changes to the file
    by computing the hash of the file and comparing it with the trusted hash.
    You can compute the SHA-256 hash of a file with openssl sha256 /path/to/file.
    Eliminating trust in the storage provider is really useful for
    content delivery networks (CDN),
    which you might have encountered as mirror sites
    or as subresource integrity (SRI) on the Web.
    This fingerprint property is also used for digital signatures,
    where you sign the hash of a message rather than the message.

Hash File File Contentproducer Storageprovider Consumerhash(File) = Hash?

As long as you get the hash from a trusted source,
the delivery of the file can be outsourced to an untrusted third party.
  • Password protection:
    In order to verify whether a user provided the correct password,
    a server doesn’t have to store the password of the user.
    The server can simply store a salted hash of the password
    and then check whether the user provided the same password as before by computing and comparing its hash.
    The advantage of this approach is that an attacker who compromised the database
    cannot log in as the user as they don’t know the preimage of the salted hash.

Salt, Hash Password Clientof user Server of providerhash(Password + Salt) = Hash? Databaseof provider

A server can reduce the damage of a leaked database by storing individually salted hashes instead of passwords.
  • Key derivation:
    Cryptographic hash functions are designed to run as fast as possible.
    While good performance is desirable for many applications,
    it’s not desirable when hashing passwords.
    Even if the hash of a password is salted,
    an attacker can still perform a brute-force attack
    to find an input which hashes to the given output with the given salt.
    In order to make such attacks costlier,
    passwords are often hashed thousands of times instead of just once.
    Repeated hashing means that you take the output of one round as the input to the next round.
    This also makes the computation costlier for the legitimate parties
    but unlike an attacker, they have to compute the derivation only once per session.
    Making a weak key
    more secure against brute-force attacks by increasing the cost
    is called key stretching.
    One algorithm for doing so is the
    Password-Based Key Derivation Function 2 (PBKDF2),
    which is specified in RFC 8018.
    Additionally, cryptographic keys typically have a desired length,
    which is another reason for using a key derivation function (KDF).

Repeatedhashing pbkdf Password Cryptographic key

By hashing an input repeatedly, you can turn an efficient hash function into an inefficient one.
  • Independent values:
    Another use case of cryptographic hash functions is
    to generate a sequence of unrelatable values from a single source value.
    Such a source value is called a seed
    because a tree of values can grow from it.
    The seed is then hashed with a counter or a timestamp.
    As long as the seed remains secret,
    others cannot compute the next value from the previous one and vice versa.
    Hash functions are used for this purpose in
    contact tracing apps,
    cryptocurrency wallets,
    and one-time passwords (OTP).
    If a hash function fulfills the strict avalanche criterion,
    it can even be used as a pseudo-random number generator (PRNG)
    or as a block cipher for encryption.
    For all these use cases, the seed has to be chosen randomly,
    which means it has to have enough entropy.
    If you don’t like password managers,
    you can use hash functions to generate site-specific passwords as SitePassword: hash(LoginDomain + MasterPassword).
    Unless you know exactly what you’re doing,
    I advise you not to use this technique as there are many pitfalls,
    such as leaving your password in the command history
    or accidentally including newline characters, but it’s certainly a neat idea.
    (The order of LoginDomain and MasterPassword is important as you might be vulnerable to a
    length-extension attack otherwise, see below.)

Seed Value1: hash(1 + Seed) Value2: hash(2 + Seed) ValueX: hash(X + Seed)

Values can easily be derived from a seed (in green)
but they cannot be related to one another (in red).
  • Commitment schemes:
    A commitment scheme allows you to commit yourself to a value
    while keeping the value secret until you reveal it later.
    You can think of it as giving a locked box to a recipient
    while providing the key to open the box only later.
    A commitment scheme has to be both
    binding and hiding:
    The committer may not be able to change the committed value
    and the recipient may not be able to figure out the committed value.
    In order to understand why this is useful,
    let’s look at an example from Wikipedia.
    Suppose Alice and Bob need to resolve a dispute over the Internet.
    If they were at the same place, they could simply
    flip a coin.
    Since they are remote, one would have to trust the other to report the flip correctly.
    As neither of them is willing to trust the other, they come up with the following procedure.
    Alice flips a coin and hashes the outcome with a random nonce.
    She then sends the output to Bob, who replies with the outcome of his own coin flip.
    Finally, Alice reveals her commitment by sending her coin flip and nonce to Bob.
    By verifying whether the flip and the nonce hash to the value he received earlier,
    Bob can detect if Alice attempted to cheat.
    Alice and Bob agreed that if their coin flips are the same, then Alice wins. If not, Bob wins.
    If the hash function is secure, neither of them can skew the result in their favor.

? Alice Bob hash(CoinFlipAlice + Nonce) CoinFlipBob CoinFlipAlice, Nonce

If CoinFlipAlice = CoinFlipBob, Alice wins.
If CoinFlipAlice ≠ CoinFlipBob, Bob wins.
  • Message authentication:
    How can two parties be sure that no one tampered with their communication?
    They can achieve this by extending each message with a value
    which depends on the message and which only they can generate.
    Such a value is called a
    message authentication code (MAC).
    If an attacker modifies a message, the original MAC no longer matches the message
    and the attacker cannot fix this because they cannot generate a valid MAC.
    Both parties compute the MAC for each message they receive
    and reject all messages for which the transmitted MAC is different from the computed MAC.
    One way of implementing message authentication codes is to hash the message together with a value
    which is known only to the legitimate parties.
    This value is a shared secret
    and it is used as a cryptographic key.
    For example, the MAC could be computed as hash(Key + Message).
    Unfortunately, this isn’t secure when used with any of the
    hash functions listed above as they are all vulnerable
    to length-extension attacks.
    The problem is that these algorithms leak their internal state as the result,
    so an attacker can simply continue where the legitimate party left off
    without having to know the shared key.
    This means that, given a Message and the corresponding MAC,
    an attacker can generate a valid MAC for the message Message + MaliciousAddition.
    While swapping the key and the message solves this problem,
    hash(Message + Key) makes the MAC immediately vulnerable
    as soon as the hash function becomes vulnerable to
    collision attacks.
    In order to avoid such issues, cryptographers came up with the
    Hash-based Message Authentication Code (HMAC) in 1996,
    which is defined as follows:
    hmac(Key, Message) = hash([Key' ⊕ OuterPadding] + hash([Key' ⊕ InnerPadding] + Message)),
    where the denotes the bitwise exclusive-or operation
    and the square brackets are used only to make the parenthesis matching easier.
    The paddings are the same for everyone and their purpose is to make the key in the inner hash
    different from the key in the outer hash.
    If the key is longer than the block size
    of the used hash function, it needs to be hashed: Key' = hash(Key).
    Not always hashing the key leads to trivial collisions,
    which should have been avoided when specifying the algorithm.
    As long as you understand what HMAC is good for, the details don’t matter here.
    The SHA-3 algorithms aren’t susceptible to length-extension attacks
    and my understanding is that the much simpler hash(Key + Message) construction works as intended with them.
    What is important to note is that hash-based message authentication codes are symmetric:
    Whoever can verify them can also generate them.
    Unlike digital signatures,
    message authentication codes allow a party to repudiate
    messages which it authenticated
    because the other party could have generated the corresponding MAC as well.

Client Server ClientMessage, hmac(Key, ClientMessage) ServerMessage, hmac(Key, ServerMessage)

A message authentication code is appended to each message.
  • Proof of inclusion:
    When collaborating online,
    it’s sometimes useful to be able to prove to others that a
    record
    has been incorporated into the current state of a system
    without having to share or even disclose all the other records.
    This can be accomplished by repeatedly hashing two hashes into one
    until you’re left with a single hash which captures the state of the whole system.
    The resulting structure is called a Merkle tree.
    Records cannot be added to, removed from, or modified in this structure
    without affecting the so-called root of the tree.
    If someone accepts that a specific root represents the state of the system,
    you can prove to this person that a particular record is included in this state
    by revealing the hash of the branches with which this record has to be hashed in order to arrive at this root.
    This method is interesting for two reasons:
    The proof grows logarithmically
    with the number of records, which makes it scale very well, and the other records have to be
    neither revealed nor transmitted for the verification to succeed.
    Such proofs of inclusion are used in Bitcoin
    for Simplified Payment Verification (SPV),
    in decentralized timestamping
    for document aggregation,
    and in Certificate Transparency for auditing.

Leaf1 Leaf2 Leaf3 Leaf4 Node1: hash(Leaf1) Node2: hash(Leaf2) Node3: hash(Leaf3) Node4: hash(Leaf4) Node5: hash(Node1 + Node2) Node6: hash(Node3 + Node4) Root: hash(Node5 + Node6)

In order to verify that the green leaf is included in the root,
a verifier needs to know only the hashes and the positions of the blue nodes.
  • Proof of work:
    In publicly accessible systems,
    you want to discourage participants from using a shared and limited resource beyond their fair share.
    One way of doing so is by imposing a cost on using the resource,
    which deters anyone who doesn’t value the resource higher than its associated cost.
    The resource owner can either charge a fee for using the resource
    or require its users to waste a limited resource of their own.
    While the former approach is less wasteful,
    the latter approach doesn’t require a global infrastructure for
    micropayments.
    For example, your mailbox is a publicly accessible system and your time is a limited resource.
    What if you could require every unknown sender to waste one minute of computing power
    before they can deliver an email to your inbox?
    This would prevent spammers from sending millions of emails a day
    – or at least make this antisocial behavior much costlier.
    It turns out that there’s a simple way to achieve this:
    You could require that the hash of incoming messages falls into a tiny range.
    Since one cannot influence the output of a hash function,
    senders have to keep appending different nonces to their message
    until its hash finally falls into the desired range.
    As long as the hash function isn’t broken,
    there’s no better way than to keep trying until you’re lucky.
    It’s like trying to hit the bull’s eye on a target
    when you have zero control over the trajectory of your darts.
    While finding an appropriate nonce requires many computations,
    the recipient has to hash the message just once
    in oder to verify whether the required work has been done.
    The average difficulty of the problem can be adjusted by making the target range bigger or smaller.
    This technique was invented in 1992
    as a digital postage stamp
    but saw widespread usage only with the rise of
    cryptocurrency mining.

Content hash(Content + 1) hash(Content + 2) hash(Content + 3) hash(Content + X)

Finding a nonce which makes the hash of the content fall into a certain range requires many attempts.
Exclusive-or operation for perfect encryption

Exclusive or is a binary
truth function,
which means that it combines two inputs into a single output,
where all values are either true or false.
Exclusive or returns true if one of the inputs is true but not both.
Instead of true and false, we will use the symbols 1 and 0.
The operator is often written as a plus in a circle
because it corresponds to binary addition
without the carry.
Functions which map a finite combination of inputs to some output
can be specified simply by listing all possible mappings in a table:

A B  =  C
0   0   0
0   1   1
1   0   1
1   1   0

The truth table of the exclusive-or operation.
The output is 1 if and only if the two inputs are unequal.

Since the above table is exhaustive, you can convince yourself that the following properties hold
simply by studying all cases:

  • Commutativity: The order of the inputs doesn’t matter: A ⊕ B = B ⊕ A.
  • Reversibility: Applying ⊕ to the output and one of the inputs
    gives you the other input: C ⊕ B = A and C ⊕ A = B.
  • Entropy-preservation: If one of the inputs has
    a 50% probability of being 0 and a 50% probability of being 1,
    then the output also has a 50% probability of being 0 and a 50% probability of being 1,
    independent of whether the other input is 0 or 1:
    A ⊕ (50%: 0, 50%: 1) = (50%: 0, 50%: 1).
    In other words, if one of the inputs is truly random, then so is the output.
    In information theory,
    the amount of information in a random variable
    is called entropy.
    As long as two random variables are statistically
    independent
    from one another, combining them with exclusive or can only increase the entropy but
    never decrease it.

Instead of applying a binary function on binary inputs to two single bits,
we can also apply it to two equally long strings of bits
by combining the bits at each position separately.
Every such function has a bitwise equivalent,
which is usually denoted with the same or a similar symbol.
For example, 0011 ⊕ 0101 = 0110, which corresponds to the columns in the above table.

Encryption enables two parties
to transfer confidential information over an insecure channel.
In examples, the parties are typically called Alice and Bob,
while the eavesdropper,
who tries to listen in on their conversation, is usually called
Eve.
The unencrypted message is called plaintext,
the encrypted message is called ciphertext.
In symmetric-key cryptography,
Alice and Bob use the same piece of information,
which is known as a cryptographic key,
to encrypt and decrypt the message.
According to Kerckhoffs’s principle,
the two algorithms should be designed such that the encryption scheme is secure even if the enemy knows them.
It’s considered bad practice to achieve
security through obscurity.
The following graphic depicts all these terms:

Key Key Plaintext Ciphertext Plaintext Encryption Decryption Alice Bob Eve

Eve has access to the ciphertext and knows the algorithms in blue,
while the information in green is known only to Alice and Bob.

The bitwise exclusive-or operation
can be used to construct an encryption scheme,
which is known as the one-time pad.
In order to encrypt a message, Alice computes Ciphertext: Plaintext ⊕ Key.
Bob can then decrypt the message by computing Plaintext: Ciphertext ⊕ Key
thanks to the reversibility property of the exclusive-or operation.
According to the entropy-preservation property,
if the key is completely random, then so is the ciphertext.
To put it differently: If every bit of the key has a 50% probability of being 0
and a 50% probability of being 1, the same is true for every bit of the ciphertext,
regardless of what the plaintext looks like.
Since the ciphertext contains no information at all about the plaintext,
the one-time pad is information-theoretically secure,
which means that even an adversary with infinite computing power cannot break the encryption scheme.
Trying all possible keys doesn’t work
because for every possible plaintext there’s a key (Key: Plaintext ⊕ Ciphertext)
which produces the observed ciphertext.
While this encryption scheme is perfectly secure,
it’s rarely used in practice because the key has to be at least as long as the plaintext
and each key may be used to encrypt only a single message; hence the name one-time pad.
Practical encryption schemes derive an infinite sequence of key material from a finite value,
sacrificing perfect security by doing so.
This makes the distribution of keys much easier,
either by sharing them in advance
or by deriving them when needed.

The one-time pad encryption scheme is highly malleable:
An attacker can truncate the message at an arbitrary position
and flip bits in the ciphertext
to cause a bit flip at the same position in the plaintext,
which allows the attacker to replace all parts of the plaintext which are known to them.
Such attacks can be prevented by protecting the ciphertext
with a message authentication code (MAC)
and requiring the recipient to validate the MAC before decryption.
We’ll revisit this towards the end of this article.

Desirable properties of authentication mechanisms

Now that we’ve covered the cryptographic concepts that we will need
(namely hash functions,
salts,
nonces,
key derivation functions,
message authentication codes,
and exclusive or),
we can turn our attention to password-based authentication mechanisms.
What makes them interesting is that we want to arrive at strong security from relatively weak passwords.

Unfortunately, I couldn’t find any good literature on desirable properties of password-based authentication mechanisms,
which is why I made up the following criteria myself.
Since this isn’t my area of expertise,
let me know if I missed an important aspect.
(Section 5 of RFC 7616
is the best source that I could find, covering security considerations of
Digest Access Authentication.)

An ideal password-based authentication mechanism is resistant to:

  1. Database compromise: An attacker who compromised the server’s authentication
    database cannot impersonate its users.
    For this reason, passwords should never be stored in plaintext.
    Given that an attacker who compromised the database no longer has to interact with the server,
    there is no limit in the number of passwords they can try every second.
    In order to increase resistance against such offline brute-force attacks,
    passwords should be individually salted and repeatedly hashed.
    While authentication mechanisms usually don’t dictate
    how servers have to store the information required to authenticate their users,
    their design can prevent the service provider from applying certain techniques such as salting and stretching.
  2. Replay attacks: An attacker who intercepts the communication
    between the client and the server cannot impersonate the user in self-initiated sessions.
    This is accomplished by making the transmitted authentication information valid only in the current session,
    which limits the harm that a man-in-the-middle can cause to the current session.
    This is especially valuable if the client can demand only a single action per session from the server,
    such as submitting a single message to the outgoing mail server per connection.
    Unfortunately, this isn’t the case for any of the protocols discussed in this article.
    By preventing delayed attacks, resistance to replay attacks is still a desirable property
    because it makes attacks much easier to localize.
  3. Pooled brute-force attacks: An attacker who compromised the communication channel
    of several users cannot generate and test password candidates for several users at once.
  4. Individual brute-force attacks: An attacker who compromised a communication channel encounters
    only stretched derivations from the password, which makes brute-force attacks costlier.
  5. Denial-of-service attacks: An attacker cannot launch a computational
    denial-of-service attack against clients.
  6. Server impersonation: An attacker cannot impersonate a server towards a client
    without relaying the authentication messages to the actual server.
    When combined with measures against man-in-the-middle attacks,
    this prevents sending sensitive information to an attacker
    who just fakes that the authentication was successful without knowing whether this is the case.
    For example, a client shouldn’t submit an email to a server
    which cannot verify whether the password was correct.
  7. Wrong server: The client detects when it’s connected to the wrong server
    (see the box on our dangerous reliance on TLS).
  8. Compromised certification authority:
    The client detects when the used certificate doesn’t belong to the actual server.
  9. Compromised server key: The client detects when the server is impersonated
    even if the same certificate is being used.
  10. Comparison attacks: A compromised server cannot learn
    whether two different accounts are protected with the same password,
    neither when creating an account nor during ordinary authentication.
    Such knowledge can be used to infer that the accounts belong to the same person
    – or to contact and bribe one user to compromise the password of the other user.
  11. Wrong server after database compromise:
    There is no risk in connecting to the wrong server
    even if the server’s database has been compromised.
    This property is desirable because the database compromise might remain undetected.
    And even if the data breach is detected,
    many users are likely too lazy to change their passwords.
    The wrong server might also just be another server where a user uses the same password.
    Reusing the same password should be secure even if you don’t trust all service providers.
    (The “should” refers to an ideal authentication mechanism,
    which is implemented with static code on the client.
    Don’t reuse the same password on different websites!
    I have a separate box about authentication on the Web.)
  12. User impersonation after server compromise:
    An attacker who compromised the server cannot impersonate its users.
    This means that even if the server was compromised temporarily,
    users don’t have to change their password.
    Additionally, this property guarantees the user
    that the server is resistant to a database compromise.
    The database could even be public.

The goal of defense in depth is to limit the potential harm as much as possible.
Given that you can reset the password of many of your online accounts through your email account,
you don’t want to send one of your most valuable passwords
directly to a potential attacker when checking your inbox.
Let’s look on how the three authentication mechanisms perform in this regard:

Resistant to PLAIN CRAM SCRAM
Database compromise
Replay attacks
Pooled brute-force attacks
Individual brute-force attacks
Denial-of-service attacks
Server impersonation
Wrong server
Compromised certification authority
Compromised server keys
Comparison attacks
Wrong server after database compromise
User impersonation after server compromise
A comparison between password-based authentication mechanisms.


means that the authentication mechanism is resistant to the attack.


means that the resistance depends on choices made by programmers.


means that the authentication mechanism is vulnerable to the attack.

Unfortunately, only the PLAIN
authentication mechanism is widely deployed on mail servers.
Before we discuss how CRAM
and SCRAM
do or don’t fulfill the above properties,
let me mention some aspects which are beyond the scope of this analysis:

  1. Bugs in implementation:
    Even if an authentication mechanism is resistant to an attack in theory,
    it can be vulnerable to it in practice because of software bugs.
    All you can do is to actively look for them,
    encourage their disclosure, and fix them soon.
  2. Account theft:
    Authentication mechanisms usually don’t specify how users set and change their password.
    An attacker who intercepts the daily communication between a client and a server shouldn’t be
    able to change the user’s password,
    thereby stealing their account.
    Since changing the password is even beyond the scope of most protocols,
    there isn’t much to say about this here other than that it’s
    a problem of authorization
    rather than a problem of authentication.
    According to the principle of least privilege,
    the credentials required for changing the password are ideally different
    from the ones required for accessing the system.
    OAuth achieves this with
    restricted scopes.
    Some mailbox providers such as Apple
    and Google
    allow users to generate app-specific passwords,
    which can be revoked individually and which aren’t enough to change the user’s password.
    Given that PLAIN is the dominant authentication mechanism,
    app-specific passwords are highly desirable.
  3. Downgrade attacks:
    If a server supports several authentication mechanisms,
    a man-in-the-middle can remove the stronger ones so that
    the client is forced to continue with the weakest one.
    We discussed measures against downgrade attacks
    in the context of backward compatibility earlier.
    New services can support only the strongest authentication mechanism,
    which eliminates this problem as well.
    The weakness here lies not in individual mechanisms but rather in how they are deployed.
  4. Online attacks:
    If the attacker has to interact repeatedly with one of the legitimate parties,
    we speak of an online attack.
    Since users should be able to authenticate themselves from a different network,
    an attacker can do the same interaction with one guessed password at a time.
    Due to the nature of authentication mechanisms,
    online attacks are always possible.
    However, service providers can make them more difficult by
    limiting the rate at which new passwords can be tried
    and by informing the user about failed attempts.
    If not implemented carefully, legitimate users can also be affected by rate limiting.
  5. Server compromise:
    As long as the server is compromised, there’s nothing left to protect by an authentication mechanism.
  6. Client compromise:
    An authentication mechanism cannot prevent users from entering their password into a compromised client.
    The harm can be limited only by using app-specific passwords or OAuth
    (see the second point about account theft).
  7. Compromised certification authority after database compromise
    or compromised server keys after database compromise:
    What I write here will make more sense once you’ve read the box
    on SCRAM.
    An attacker who compromised the database succeeds in the mutual authentication towards the client.
    Since the relay of messages to the actual server is therefore no longer necessary,
    channel binding can no longer prevent these two variants of the
    man-in-the-middle attack.
Challenge-Response Authentication Mechanism (CRAM)

CRAM is a very simple authentication mechanism,
in which the client has to respond to the challenge received from the server:

Client Server (Connect) Challenge Response

How challenge–response authentication works.

CRAM is specified in RFC 2195
and draft-ietf-sasl-crammd5.
Unlike what some documentation suggests,
CRAM has nothing to do with encryption.
The client computes the response as Response: hmac(Password, Challenge),
where the challenge was chosen randomly by the server.
The HMAC could be instantiated with any hash function
but the standard uses MD5,
which is why the full name of the mechanism is CRAM-MD5.
Let’s evaluate which of the above properties
are fulfilled by CRAM-MD5:

  1. Database compromise: In order to verify the response,
    the server needs to be able to perform the same computation as the client.
    Since the password is directly used as an input to the HMAC,
    the server has to store the password rather than its salted hash.
    In this regard, CRAM is worse than PLAIN,
    where only a derivation
    of the password needs to be stored.
    Both the RFC
    and the draft
    say that the security can be marginally improved by storing the state of the hash function
    after feeding in the password instead of the password itself.
    This is putting the length-extension vulnerability
    of many hash functions such as MD5 to supposedly good use.
    However, this doesn’t help at all because
    if the server can continue from the intermediary state to determine the response,
    then so can the attacker who compromised the database and tries to impersonate users.
  2. Replay attacks: As long as the server never issues the same challenge twice,
    the response from the client is valid only in the current session.
    If an attacker replays an old response to a new challenge,
    the server rejects the received value as invalid.
  3. Pooled brute-force attacks: As a man-in-the-middle,
    the attacker can send the same challenge to several clients.
    Therefore, the attacker can test password candidates for several users at once.
  4. Individual brute-force attacks: An authentication mechanism which isn’t resistant to
    pooled brute-force attacks is also not resistant to individual brute-force attacks.
  5. Denial-of-service attacks: Since the number of hashes
    that a client has to compute per authentication is fixed,
    denial-of-service attacks against the client aren’t possible
    (as long as the size of the challenge is limited by the protocol).
  6. Server impersonation: Since the client doesn’t authenticate the server,
    an attacker can impersonate the server in one of the above-mentioned ways
    and fake the success of the user authentication.
  7. Wrong server: The client cannot detect when it’s connected to the wrong server.
  8. Compromised certification authority:
    The client cannot detect when the used certificate doesn’t belong to the real server.
  9. Compromised server key: The client cannot detect when the server is impersonated
    if the same certificate is being used.
  10. Comparison attacks: Since the server stores the passwords,
    it can easily determine if two accounts use the same password.
  11. Wrong server after database compromise:
    If the server’s authentication database has been compromised,
    users have to change their password wherever they’re using it.
  12. User impersonation after server compromise:
    An attacker who has compromised the server can impersonate its users.

Before we move on to SCRAM,
I wanted to visualize how a man-in-the-middle can relay all messages between the two parties:

Client Attacker Server (Connect) (Connect) Challenge Challenge Response Response

A man-in-the-middle attack
on a challenge-response authentication mechanism.

Salted Challenge-Response Authentication Mechanism (SCRAM)

Looking at its name, SCRAM
seems to be just a salted version of CRAM.
This is misleading, however, as SCRAM is much more than that.
SCRAM is specified in RFC 5802
and improves on CRAM with the following, now mostly familiar techniques:

  • Key derivation:
    Instead of using the password directly, SCRAM uses
    PBKDF2 to derive a
    cryptographic key.
    By salting the password and hashing it thousands of times,
    a brute-force search
    for the password given the key becomes very costly.
  • Message authentication:
    The derived key is used to authenticate a message from the client to the server
    and a message from the server to the client with an HMAC.
    The server can authenticate its message only if it knows the derived key.
    We thus have mutual authentication:
    The server is certain that the user is who they claim to be
    and the client is certain that the message came from the right server.
  • Exclusive-or encryption:
    The problem is that the server shouldn’t store this key.
    Otherwise, anyone who compromised its database can impersonate the user
    by authenticating the appropriate message with the stolen key.
    This can be solved by storing a hash of the derived key instead.
    Note that the derived key doesn’t need to be salted and stretched here
    because the best way to find the preimage of the hashed key is to guess the low-entropy password,
    which is itself already salted and stretched to arrive at the derived key.
    The client then uses the hashed key to authenticate its message.
    So far we have only moved the problem, though,
    because the hashed key now has the same role as the derived key before.
    The trick is that the client proves to the server
    that it knows the preimage of the stored key
    by encrypting the preimage with the HMAC.
    Since only the legitimate parties can compute the HMAC,
    the server can decrypt the preimage but the attacker cannot.
    If this is confusing, then re-read this paragraph after you’ve seen the protocol flow below.
  • Optional channel binding:
    The authenticated message includes everything
    which the client and the server have to agree on.
    One useful thing to agree on is that they are connected to the same secure channel.
    Binding the channel on the application layer
    to the channel on the security layer
    prevents man-in-the-middle attacks.
    Channel binding is optional in SCRAM.
    There are different ways to bind the inner channel to the outer channel with different tradeoffs.
    We’ll cover them in the next box.

Client Server

Mutual authentication guarantees only that the inner channel (in green) reaches the counterparty.


Channel binding can be used to ensure that the outer channel (in blue) isn’t interrupted by an attacker.

What follows is a simplified version of the SCRAM protocol.
I believe it has the same properties as the official protocol,
and I’m not aware of any vulnerabilities.
However, be aware that my simplifications haven’t been reviewed.
The SCRAM standard might do things differently for good reasons,
which I just haven’t thought of.
For the sake of compatibility and security,
implement the official protocol!
I simplified the protocol only to make it easier to understand.
The biggest differences are that I don’t separate the “server key” from the “client key”
and that I removed the redundancy in the transmitted and thus authenticated messages.
Reducing the number of variables allows me to use less confusing names for them.
I didn’t just simplify SCRAM, though,
I also provide suggestions for improving SCRAM in my analysis below.
Let’s have a look now at how Simplified-SCRAM works.

As with every password-based authentication mechanism,
the user’s credentials are Username and Password.
We also have:

  • Salt and IterationCount:
    The values to derive the Key from the Password.
    The RFC doesn’t specify who chooses the Salt.
  • ClientNonce and ServerNonce:
    Values chosen at random for each session.
    The former by the client, the latter by the server.
  • ChannelBinding: A string which identifies the TLS channel over which the messages are sent.
    See the next box for options.

The client and the server compute the following values based on the above values:

  • Key: pbkdf2(Password, Salt, IterationCount)
  • HashedKey: hash(Key)
  • Message: Username + ClientNonce + ServerNonce + ChannelBinding
  • HashedKeyMac: hmac(HashedKey, Message)
  • KeyXorHashedKeyMac: Key ⊕ HashedKeyMac
  • KeyMac: hmac(Key, Message)

For each user, the server stores Username, Salt, IterationCount, and HashedKey.
The following messages are exchanged:

Client Server Username, ClientNonce ServerNonce, Salt, IterationCount KeyXorHashedKeyMac KeyMac

The sequence diagram of Simplified-SCRAM.

Since a user has to be able to authenticate themself on a new client with just their Username and Password,
the server has to store the Salt and the IterationCount and provide it to the client on request.
Since the user is not yet authenticated at this stage,
anyone can request the Salt and the IterationCount of any user.
(The IterationCount determines how many times the salted password is hashed.)
After the first two messages, both the client and the server can compose the Message
and compute the HashedKeyMac as the HMAC with the HashedKey.
The client then sends the Key encrypted with the HashedKeyMac to the server,
which decrypts the Key as KeyXorHashedKeyMac ⊕ HashedKeyMac.
In the next step, the server verifies whether hash(Key) = HashedKey.
If this is the case, it has successfully authenticated the client.
If not, the server aborts the session.
At last, the server uses the Key to authenticate the same Message to the client.
By also computing KeyMac, the client can verify that the last message was indeed sent by the server.
Since both parties can compose the Message,
the message authentication codes (MAC) can be sent without the Message.
The Username is included in the Message because it wasn’t authenticated in the first message.
Without this, a man-in-the-middle could replace it to authenticate the user for another account
where they use the same password.
Let’s analyze how Simplified-SCRAM is or can be made resistant
to all but one of the above properties:

  1. Database compromise thanks to salting and stretching:
    An attacker who compromised the server’s database learns only the HashedKey
    but not the Key, which is required to impersonate a user.
    As noted above, the best way to find the preimage of the HashedKey
    is to guess the low-entropy Password.
    Due to the Salt, the Password of each user has to be attacked separately.
    Due to the IterationCount, the search for the Password
    is slowed down by several orders of magnitude.
  2. Replay attacks thanks to the server nonce:
    As long as the server doesn’t issue the same ServerNonce twice,
    earlier KeyXorHashedKeyMac values cannot be replayed
    because the earlier MAC doesn’t match the current Message.
  3. Pooled brute-force attacks thanks to the client nonce:
    Even if the ServerNonce, Salt, and IterationCount are chosen by a man-in-the-middle,
    KeyXorHashedKeyMac depends on the unique ClientNonce, which prevents pooled brute-force attacks.
  4. Individual brute-force attacks thanks to a minimum iteration count:
    Unfortunately, the standard says only
    that servers should choose an IterationCount of at least 4096.
    It’s important, however, that clients are programmed to reject an IterationCount below a certain threshold.
    Otherwise, a man-in-the-middle can send an IterationCount of 1,
    which makes it much easier to search for the Password that led to KeyXorHashedKeyMac.
    While this weakness can easily be addressed when writing a client,
    not standardizing the minimum iteration count can lead to incompatibilities
    between different implementations of the standard.
  5. Denial-of-service attacks thanks to a maximum iteration count:
    The standard notes
    that a compromised server or a man-in-the-middle can perform a computational denial-of-service attack on clients
    by sending a big IterationCount.
    For this reason, clients should reject an IterationCount above a certain threshold.
    This threshold can be relatively high
    because the derived Key can be cached by the client for future authentications.
    This means that each client has to perform the key derivation only once.
    It’s therefore no problem if the derivation takes several seconds.
  6. Server impersonation thanks to mutual authentication:
    The KeyMac prevents an attacker from faking the authentication success.
    Since the KeyMac depends on the ClientNonce,
    the server messages cannot be replayed from an earlier session.
  7. Wrong server thanks to binding to the domain name:
    We will look at the two standardized options for channel binding in the next box.
    For now, let’s imagine that a variant of SCRAM requires
    that the domain name of the server is appended to the Message.
    In other words, ChannelBinding: ServerDomain.
    Due to mutual authentication, a man-in-the-middle is forced
    to relay the communication between the client and the server.
    If the client connects to the wrong server,
    then the Message on the client is different from the Message on the actual server,
    which causes the authentication to fail.
  8. Compromised certification authority thanks to binding to the server certificate:
    We can improve on the domain binding with ChannelBinding: hash(ServerCertificate).
    This prevents a man-in-the-middle from using a different certificate for the same ServerDomain,
    which they might obtain from a compromised certification authority.
  9. Compromised server key thanks to binding to the session key:
    TLS uses a Diffie–Hellman key exchange
    to derive a session key, which is then used to encrypt and authenticate all messages.
    By choosing ChannelBinding: hash(SessionKey),
    we can detect a man-in-the-middle who compromised the private key of the server’s certificate.
    Either the TLS connections from the client to the attacker
    and from the attacker to the server have different session keys,
    or the attacker can neither decrypt nor modify the communication between the client and the server.
    In the latter case, TLS fulfills its purpose.
  10. Comparison attacks thanks to a user-specific salt:
    If I could have written the standard,
    the Salt would be prefixed with the user’s Username in the key derivation:
    Key: pbkdf2(Password, Username + Salt, IterationCount).
    Not only does this prevent a compromised server from determining
    whether two accounts are protected with the same Password,
    it also guarantees the user that
    an attacker cannot run a pooled brute-force attack after compromising the database.
    Otherwise, a faulty server implementation,
    which chooses the same Salt for every user,
    can ruin the brute-force resistance for its users.
  11. Wrong server after database compromise thanks to a server-specific salt:
    I would even go one step further and prefix the Salt also with the ServerDomain:
    Key: pbkdf2(Password, ServerDomain + Username + Salt, IterationCount).
    This prevents one service provider from impersonating the user at another service provider
    once the database of the latter has been compromised.
    Without this prefix, the former service provider can send back
    the Salt and the IterationCount from the compromised database
    and recover the Key used with the latter service provider.
    Since the former provider also knows the HashedKey,
    the mutual authentication will succeed,
    which makes the attack unnoticeable to the user.
    Another desirable benefit of this prefix is that
    it forces different servers to use separate authentication databases.
    For example, an attacker who compromised the outgoing mail server
    would no longer be able to retrieve the user’s mail at the incoming mail server.
    However, these precautions make sense only
    if the Password is never shared with the server, not even when setting the password.
    This means that the client has to choose the Salt and the IterationCount
    and then generate the string Salt + IterationCount + HashedKey
    so that the user can paste it into the account configuration interface.
    For this to work, setting and replacing the password would have to be standardized as well.
  12. User impersonation after server compromise:
    SCRAM is not resistant to a server compromise.
    If an attacker manages to control the server
    (or alternatively to compromise the database and to intercept the communication channel),
    they learn the Key, which is all that is needed to impersonate the user.
    In order to prevent this, we need public-key cryptography.

The remaining boxes in this subsection just add more context.
The conclusion of this little detour is the same as the conclusion of the whole article:
We could do so much better if we only wanted to (and were better informed).
The standards exist, we just need to deploy them…

TLS channel bindings (SCRAM-PLUS)

Channel binding is discussed in section 6 of RFC 5802.
If a server supports channel binding,
it advertises the authentication mechanism as SCRAM--PLUS.
An example is SCRAM-SHA-256-PLUS as specified in RFC 7677.
Since mutual authentication is established on the application layer by SCRAM,
the security layer has to provide only message confidentiality and message authentication
but not party authentication when channel binding is used.
As a consequence, SCRAM-PLUS can be used without a public-key infrastructure,
which means that servers can use self-signed certificates.
Binding the application layer to the security layer doesn’t change the security layer.
A TLS implementation needs to be changed only if it doesn’t allow the application layer to access the necessary values.

RFC 5929 defines three different channel bindings for TLS,
where only two of them are relevant for us:

  • tls-server-end-point
    uses the hash of the server’s certificate: hash(ServerCertificate).
    The advantage of this binding is that
    it can easily be used with a reverse proxy.
    Its disadvantage is that it doesn’t protect against compromised server keys.
  • tls-unique
    uses the first TLS Finished message of the latest
    TLS handshake.
    Since the Finished message contains a hash over all previous handshake messages,
    it uniquely identifies a particular TLS connection.
    For full TLS handshakes, the first Finished message is sent by the client.
    For abbreviated TLS handshakes, the first Finished message is sent by the server.
    Depending on which type of handshake has been performed and which of the two endpoints you implement,
    you have to call either getFinished()
    or getPeerFinished()
    to access the right message for channel binding.
    In theory, tls-unique is the preferred option for channel binding
    because it also prevents attacks with compromised server keys.
    In practice, however, tls-unique requires
    proxy servers to forward the first Finished message to the application server
    so that it can compose the SCRAM Message correctly,
    which makes this option more difficult to deploy.
Authentication on the Web

Given the many desirable properties
of SCRAM,
you might wonder whether we can also use this mechanism when logging in to websites.
The short answer is yes: Apart from channel binding,
you can implement SCRAM with JavaScript
in the browser.
The longer answer is no: Since users cannot trust the code that is loaded by a website,
nothing is gained by implementing SCRAM for logging in to your website.
The most desirable property for authentication mechanisms on the Web
is to prevent phishing,
where a victim is tricked to connect to a wrong server.
Since in this case the code is loaded directly from the attacker,
it can send your password directly to the attacker.
The only way to make password-based authentication on the Web secure
is to move the functionality from a webpage to the browser and to expose it through an API.
This could be achieved with a SCRAM-SHA-256-PLUS extension to
HTTP authentication,
where the browser takes care of the authentication messages and the server sets the
session cookie on success.
This is not likely to happen anytime soon, though.
The trend goes rather towards replacing or supplementing password-based authentication
with public-key cryptography,
for example with the Web Authentication (WebAuthn) standard.
None of this is a problem for mail clients, though,
since their code isn’t loaded from untrusted sources.

Password-authenticated key exchange (PAKE)

The goal of key exchange protocols is to establish
a shared secret between two parties,
which can then be used to encrypt and authenticate all messages between them.
In order to ensure that the secret is shared between the intended parties,
they need to authenticate themselves initially.
Otherwise, a man-in-the-middle can establish
one secret with the first party and another secret with the second party,
allowing them to intercept all messages between the two parties.
One way of achieving this is by relying on third parties,
so-called certification authorities,
to confirm the identity and the public key of a party.
Another way of achieving this is by relying on a secret that they already share.
Password-based authentication mechanisms
such as SCRAM
accomplish this by binding to the secure channel
after it has already been established.
Password-authenticated key exchange (PAKE) protocols,
on the other hand,
accomplish this by using the password during the key exchange for mutual authentication.
You don’t want to use a key derived from the password as the shared secret
because once the password is compromised, all earlier sessions are compromised as well.
Password-authenticated key exchange protocols avoid this
by using public-key cryptography
to establish a secret which is unique to each session and cannot be derived from the password.
One example is the Secure Remote Password (SRP) protocol.
Not only does it achieve the just-mentioned property,
which is called forward secrecy,
it’s also resistant to user impersonation after server compromise:
The server never learns the necessary information to impersonate its users.
TLS supports SRP as a key exchange algorithm
under the label TLSSRP
but just like SCRAM it seems to be
rarely used.
One downside of SRP is that it leaks the username to any eavesdropper during
its TLS handshake.

Access protocols

Besides proprietary protocols,
most incoming mail servers allow mail clients to access the user’s mailbox
with POP3 or IMAP.
If your mail client and your mail server support both protocols,
you should choose the latter as it’s much more powerful.
The main reason for including POP3 in this article
is that it’s much easier to use from the command-line interface.

Communication logging in Apple Mail

Apple Mail allows you to inspect its communication with your mail servers
by clicking on “Connection Doctor” in the “Window” menu and then on “Show Detail”.
You can also enable “Log Connection Activity” there to persist the log of its communication
in the folder ~/Library/Containers/com.apple.mail/Data/Library/Logs/Mail/.
Since the log files include the content of all your messages,
including deleted ones and those of removed accounts,
you should enable this option only if you really need it.

Communication logging in Thunderbird

You can inspect how Thunderbird interacts with your mail servers
by logging its communication with the following commands:

Enter the above commands in your command-line interface,
then open the log file
in a text editor, such as Visual Studio Code.

Post Office Protocol Version 3 (POP3)

The Post Office Protocol Version 3 (POP3)
is specified in RFC 1939.
Similar to ESMTP,
POP3 is a text-based
application-layer protocol,
which can be used with Implicit TLS or with Explicit TLS.
POP3 with Implicit TLS is also known as POP3S.
Just like SMTP,
POP3 commands consist of four letters,
and an extension mechanism was introduced after the initial release of the standard.
After authenticating the user,
POP3 allows the client to list, retrieve, and delete messages.
POP3 is designed to move messages from a remote queue into a local queue.
It doesn’t support read statuses, mailbox folders, message uploads, or partial fetches.

The following POP3 tool works in the same way as the ESMTP tool above.
Most of the remarks I made earlier therefore still apply.
In particular, I advise you to use it only with accounts created for this purpose.
The tool uses Thunderbird’s configuration database
and Google’s DNS API
to resolve the server you want to connect to.
Copy the commands in bold to your command-line interface by clicking on them.
The text in gray mimics what the responses from the server look like.
The actual responses will be different.
Each response starts with either +OK or -ERR.
The former indicates that your command was successful,
the latter indicates that an error occurred.
If necessary, you can always kill the current process
and thereby the connection by pressing ^C (control + c).
If you use Gmail, you have to enable POP3 access
in your account settings
and allow access from insecure apps.

POP3 commands

All commands are case-insensitive and must be terminated with CR+LF.
Responses spanning several lines are terminated by a period on a line of its own.
If a line starts with a period, an additional period is prepended to the line.
After user authentication, the server enumerates all messages in the inbox sorted by their date,
where 1 is assigned to the newest message.
All message numbers are expressed in the decimal system.
The mapping between numbers and messages is valid only for the duration of the session.
To ensure that the numbers remain valid and that the messages remain available for the duration of the session,
the server locks the mailbox.
If the server fails to acquire the lock because the same mailbox is being accessed simultaneously,
it responds with -ERR to the last authentication command.
As long as the server can guarantee consistency for each client, it can allow simultaneous access.
All sizes are specified in bytes.
POP3 servers must support the following commands:

Command Argument Response Description
USER Username Indicate the user whose messages shall be retrieved.
PASS Password –  Transmit the password to authenticate the user.
STAT Count Size Return the count and size of all messages.
LIST [Number] Number Size List the size of all messages [or of the specified one].
RETR Number Message Retrieve the message with the given number.
DELE Number Mark the message with the given number as deleted.
RSET Unmark all messages that were marked as deleted.
NOOP Do nothing besides keeping the connection alive.
QUIT Delete the marked messages and close the connection.
The mandatory commands of POP3.
(The USER and PASS commands are strictly speaking optional.)
POP3 extensions

RFC 2449 defines an extension mechanism for POP3.
It introduces the CAPA command,
to which the server responds with the supported capabilities.
If a server doesn’t recognize an optional command, such as CAPA,
it responds with -ERR.
Each line in the response to the CAPA command indicates a command that the client can use
or a behavior which the client should know about:

Command Argument Response Description
CAPA Capabilities List the supported capabilities.
STLS Upgrade the connection from TCP to TLS just like STARTTLS.
TOP Number X Message Return the header and the top X body lines of the specified message.
UIDL [Number] Number ID List the permanent ID of all messages [or just the specified one].

Some optional commands of POP3.
These commands extend the basic functionality of POP3.
Unlike the message numbering, the IDs are guaranteed to stay the same across sessions.

The server can indicate additional behavior in its response to the CAPA command.
LOGIN-DELAY and EXPIRE allow the server to conserve its resources.

Just like STARTTLS, STLS is advertised only in TCP connections.
Gmail doesn’t support POP3 with Explicit TLS
but Gandi does.

APOP authentication

To the best of my knowledge,
APOP stands for Authenticated Post Office Protocol.
It’s a challenge-response authentication mechanism similar to
CRAM-MD5 with the same properties.
Even though APOP is an optional command,
it’s not advertised in the response to the CAPA command
because a POP3 server already indicates support for the APOP command
by including the challenge in its initial greeting.
The Challenge is of the form .
The Response is the hexadecimal encoding
of md5(Challenge + Password), where MD5
is a cryptographic hash function.
You find an example session in RFC 1939.

Internet Message Access Protocol (IMAP)

The Internet Message Access Protocol (IMAP)
is specified in RFC 3501.
IMAP works similar to ESMTP
and POP3,
it just has many more commands and options.
An IMAP mailbox acts as a remote drive for messages instead of files,
where the drive is being shared among several clients.
IMAP allows users to create, delete, and rename folders,
to upload and move messages between them,
to mark messages as read or as flagged,
to search the mailbox remotely,
and to download messages without their attachments.

The following IMAP tool works just like the ESMTP
and POP3 tools above.
As you might mess up your mailbox or delete messages you still wanted by accident,
you should run the following commands on test accounts only.
If you want to use your real account, you do so at your own risk.
Certain commands have side effects, such as marking messages as read.
Make sure you fully understand a command before using it.
This tool also uses Thunderbird’s configuration database
and Google’s DNS API
to resolve the server you want to connect to.
Neither IMAP nor the tool is self-explanatory.
You find more information in the tooltips and the boxes below.

After the initial greeting by the server,
the client sends commands,
to which the server responds.
Since multiple commands can be in progress at the same time,
the client tags each command with a unique identifier,
such as A, B, C, or a dot ..
The server prefixes each line of its response with *
and completes its response with a line
which starts with the tag chosen by the client.
The tag is followed by a status response:
OK for success, NO for failure, or BAD for protocol errors.
Don’t worry about reusing tags in a single session,
you can run a command repeatedly with the same tag.
If you want to fetch another message, for example,
just enter another message number and copy the generated command again.
If you use Gmail, you have to enable IMAP access
in your account settings
and allow access from insecure apps.

Protocol states

Most IMAP commands can be called only in certain
states.
(The same is true for POP3
but I didn’t deem it worth mentioning.)
Unless the connection has been pre-authenticated,
the IMAP protocol starts in the not-authenticated state.
Before the client can do anything else,
it has to issue a LOGIN
or AUTHENTICATE command.
(When using Explicit TLS,
the client can also send the STARTTLS request.)
While the LOGIN command is followed by the username and the password,
AUTHENTICATE can be used with any SASL mechanism
which is supported by the server.
If the user has been authenticated successfully,
the protocol enters the authenticated state.
The client has to SELECT
or EXAMINE a folder
before it can issue commands that affect existing messages.
Once in the selected state,
the client can SEARCH
and FETCH messages (among other things).
The client can issue the LOGOUT command in any state,
which takes the protocol to the logout state,
in which the server closes the connection.
If you want to inspect, modify, or delete messages in a different folder,
you can CLOSE
or UNSELECT the current folder and open another one.
The difference between these two commands is that
the former removes messages marked for deletion permanently while the latter does not.
(UNSELECT is an extension,
which can be used only if the server supports it.)
By using the new UNAUTHENTICATE command,
which not many servers support yet,
the client can authenticate as a different user without having to re-establish the TCP and TLS connection.
Here is a simplified version of the official state diagram:

LOGOUT CLOSEUNSELECT SELECTEXAMINE UNAUTHENTICATE LOGIN AUTHENTICATE Not authenticated Authenticated Selected Logout

The protocol states and how to transition between them.
LOGOUT can be called in any state except the logout state.
UNAUTHENTICATE can also be called in the selected state.

A word on terminology:
The standard and some mail clients such as Apple Mail speak of mailboxes rather than folders.
When I speak of mailboxes, I usually refer to the mail account as a whole.
Thunderbird, on the other hand, avoids the term completely.
I mostly ignore IMAP folders and how to
CREATE,
DELETE, and
RENAME them.
The only important aspect for us is that INBOX is a
special name
and always refers to the primary folder of the user.

Data formats

While IMAP is also mostly a text-based protocol,
it’s more difficult to read and to write for humans than SMTP and POP3.
This is due to the various data formats it uses,
which are defined in section 4 of RFC 3501.
We’re interested in just three of them:

  • String:
    Strings
    are either unquoted, quoted, or prefixed with their length.
    Prefixing the length has the advantage that the string doesn’t have to be escaped.
    In particular, no periods have to be added to transmit a message.
    This technique turns IMAP into a binary protocol temporarily,
    making it difficult for humans.
    Let’s look at an example for each string variant.

    Unquoted string: Since INBOX contains no spaces, it doesn’t have to be quoted (but it can be).
    Quoted string: Since Sent Mail contains a space, it has to be quoted.
    Otherwise, Sent Mail constitutes two arguments instead of one.

    Length-prefixed string: If a string contains newline characters,
    its length in bytes has to be prefixed in curly brackets.
    Since each ASCII character
    is encoded in a single byte and newlines consist of two characters,
    namely carriage return (r) and line feed (n),
    Subject: Examplernrn consists of 20 bytes.
    (IMAP includes an empty line after the header and the length-prefixed string is part of a list.)
    The standard calls this the literal form.
    When a literal string is transmitted from the client to the server,
    the client has to wait for a continuation response from the server after sending the length.
    Activate Write and Append in the tool above to see an example.

  • Lists:
    Lists
    are used when a variable number of items are to be transmitted.
    A single space is used to separate adjacent items
    and the list is enclosed by parentheses.
    Lists can be nested in other lists and lists can be empty.
    Let’s look at two examples:

    Nested list: The response contains a nested list of flags.
    Empty list: When the message hasn’t been seen yet, the nested list is empty.
  • Nil:
    NIL indicates that an item doesn’t exist.
    You have to consult the formal syntax to see where NIL is allowed.

Message numbers

Similar to POP3,
messages in IMAP can be referenced either by their position in a folder or by their unique identifier (UID):

  • Position:
    If the response to the SELECT or EXAMINE command says with 8 EXISTS that 8 messages exist,
    then 1 refers to the oldest message and 8 to the newest message.
    All numbers in between are guaranteed to refer to messages as well.
    When a message is removed from the folder,
    the position of all subsequent messages is decremented by one.
    Messages are always added at the end of the list:
    When a new message is added to the 8 existing ones,
    it can be referenced by the number 9.
  • UID:
    UIDs are numbers which are assigned in ascending order to messages.
    Unlike the position of a message, which can change within and across sessions,
    its UID is meant to stay the same.
    When a message is deleted, the UIDs of subsequent messages don’t change.
    As a consequence, UIDs are not necessarily contiguous.
    Mail clients use UIDs to synchronize flags and deletions
    of the messages they’ve already retrieved from the server.
    IMAP has a special UID command,
    which allows the client to use SEARCH,
    FETCH,
    STORE, and
    COPY with UIDs instead of positions.
    For example, clients issue the command TAG UID FETCH 1:{LastSeenUIDNEXT-1} FLAGS
    to discover changes to old messages according to the informational
    RFC 4549.
    In other words, clients find out which messages have been deleted while they were offline
    by fetching the flags for all locally stored messages from the server every time they reconnect.
    All messages whose UID is no longer in the response are then removed.
    If the UIDNEXT value in the response to EXAMINE or SELECT
    is bigger than the last time the client connected,
    the client knows that new messages arrived in the meantime.
    If the UIDVALIDITY value in the same response is bigger than the last time it connected,
    the client has to invalidate its UIDs and rebuild its database.
    Due to the overhead this causes, servers should avoid invalidating UIDs.
    However, since folders can be renamed and clients reference them by name,
    the content of a folder can change completely.
    By using the current timestamp as the UIDVALIDITY value
    whenever a folder is created or renamed,
    servers can force clients to refetch all messages in such a folder.
Message sets

FETCH,
STORE, and
COPY operate on a
set of messages.
You can specify a single number, such as 4,
a range of numbers, such as 6:8,
or a combination thereof, such as 4,6:8.
When referencing messages by their position,
6:8 is guaranteed to select three messages as long as there are at least eight messages in the folder.
When using the UID command,
6:8 selects between zero and three messages,
depending on whether messages with UIDs in this range have been deleted.
* represents the largest number in use.
When referencing messages by their position,
* corresponds to the number of messages in the folder.
If the folder is empty, you get an error when using *.
If you want to fetch the flags of all messages,
you can use F UID FETCH 1:* (FLAGS).
If you want to fetch all new messages,
you can use F UID FETCH {LastSeenUIDNEXT}:* (FLAGS BODY.PEEK[]).
{LastSeenUIDNEXT} needs to be replaced with an actual number, of course.
(You have to replace the curly brackets with an actual value in all my examples
except when the curly brackets are used as the length prefix of a literal string.)

Message flags

IMAP messages can be tagged with labels, which are called flags.
Most flags are persisted across sessions but some flags are applied only within a session.
Flags defined by IETF standards start with a backslash.
RFC 3501 defines the following flags:

  • Seen: The message has been seen (i.e. read).
  • Answered: The message has been answered.
  • Flagged: The message is flagged for special attention.
  • Deleted: The message is marked for deletion by
    CLOSE or
    EXPUNGE.
  • Draft: The message is marked as a draft, i.e. it hasn’t been sent yet.
  • Recent: The message has arrived in the folder recently.
    This flag cannot be set or removed by the client.
    If the client uses SELECT
    instead of EXAMINE,
    this flag is no longer set in later sessions.

In the response to the EXAMINE or SELECT command,
the server includes the FLAGS
which are defined in the folder.
As part of the PERMANENTFLAGS response,
the server indicates which of the flags the client can set and remove.
If the list includes *, the client can create custom tags,
which may not start with a backslash.
The formal syntax specifies the permissible characters.

How a custom flag can be created and set if the IMAP server supports it.
Internal date

Besides flags,
messages have other attributes as well.
One of them is the internal date,
which records when the message was received.
Mail clients can display messages with this date
instead of the sender-chosen origination date.
Since Apple Mail also displays the received date instead of the sent date
when fetching messages via POP3,
it seems to rely on the Received header field indeed.
Other attributes which can be fetched
are the message size
and the body structure
of multipart messages.

How to fetch when a message was received.
IMAP commands

Some of the commands used in the above tool benefit from additional information.
This is what you should know about them:

  • EXAMINE vs.
    SELECT:
    Both commands open a folder in order to search and fetch the messages in it.
    The difference is that EXAMINE opens the folder in read-only mode,
    while SELECT also allows the client to change and delete messages.
    This is made visible in the response line which starts with the tag:
    It contains either [READ-ONLY] or [READ-WRITE].
  • SEARCH:
    Saving a search result for later operations requires the SEARCHRES extension.
    If your IMAP server doesn’t support it,
    you have to search without the RETURN (SAVE) part: S SEARCH {Criterion}.
    The server then returns the positions
    of all the messages that match the criterion: * SEARCH 2 5 8.
    Search criteria can also be combined: S SEARCH {Criterion1} {Criterion2} {etc.}.
    IMAP also supports the logical operators NOT and OR
    besides the implicit “and”: NOT {Criterion} and OR {Criterion1} {Criterion2}.
    As you can see, the query language of IMAP is quite powerful.
  • FETCH:
    The first argument to the FETCH command is a set of messages.
    If the server supports saving the search result with RETURN (SAVE),
    you can alternatively reference the search result with the dollar sign.
    The second argument is a list of the data attributes you want to fetch.
    The difference between BODY[{Section}] and BODY.PEEK[{Section}] is that
    the former sets the Seen flag while the latter does not.
    You can use either one to fetch the desired section
    of the specified messages.
  • STORE:
    The STORE command allows the client to alter the flags of a message.
    Similar to FETCH, the first argument is either
    a message set or $ for a search result.
    After that, you can replace the flags of the messages with FLAGS ({NewFlags}),
    add additional flags to the existing flags with +FLAGS ({FlagsToAdd}),
    or remove some flags from the existing flags with -FLAGS ({FlagsToRemove}).
    Messages are deleted by flagging them as Deleted and then using the
    CLOSE or
    EXPUNGE command.
    The former also closes the folder and takes you back to the authenticated state,
    whereas the latter doesn’t do that.
  • APPEND:
    Mail clients use this command to store sent messages in the user’s mailbox.
    Since the target folder is specified in the first argument,
    this command can be used from the authenticated state.
    Besides the flags you want to set on the appended message,
    you can also specify the internal date in an optional third argument.
    The fourth argument is the message that you want to append,
    which has to be transmitted as a length-prefixed string.
    Since counting bytes manually is a hassle,
    the tool does the counting for you when you enable Write and Append.
    You can edit the used message in the ESMTP tool above.
IMAP extensions

Given the importance of IMAP in the email ecosystem,
there are numerous extensions for it.
You can query which extensions a server supports
with the CAPABILITY command.
You see an example when you enable the Search or the Idle option in the tool above.
Before using the enabled commands,
make sure that your server supports the listed extensions.
Issuing C CAPABILITY is often not necessary
since many IMAP servers list their capabilities automatically
in their response to the LOGIN command.

The most important extensions to IMAP are (ignoring the ones for
internationalization,
such as support for UTF-8):

  • IMAP4REV1 (RFC 3501):
    By listing this among its capabilities,
    a server indicates that it supports IMAP version 4 revision 1 as published in 2003.
    IMAP4rev1 is the protocol we’ve been discussing in this article.
    A second revision,
    which adds most of the extensions mentioned here to the core protocol, is in the making.
    The changes to the first revision are listed in
    its appendix.
  • STARTTLS (RFC 2595):
    This extension allows the client to upgrade the connection from TCP to TLS with . STARTTLS.
    You have to use telnet {ServerDomain} 143 to see this capability listed by the server.
    (143 is IMAP’s port for Explicit TLS.)
  • SASL-IR (RFC 4959):
    If the server has this capability,
    the client can append its initial SASL response
    to the AUTHENTICATE command,
    which saves one round trip.
    Example: . AUTHENTICATE PLAIN {Base64EncodingOfUsernameAndPassword}.
  • ENABLE (RFC 5161):
    While CAPABILITY allows the server to list the extensions it supports,
    the ENABLE command allows the client to list the extensions it supports.
    This allows the server to send unsolicited responses defined by these extensions.
  • ID (RFC 2971):
    For improving bug reports and assembling usage statistics,
    it’s useful to know which implementation of the protocol the other party uses.
    The ID command allows the client to send a list of key-value pairs to the server
    and receive a list of key-value pairs in return.
    Some keys are specified in the RFC
    but any string of at most 30 bytes can be used as a key.
    For example, a client can send TAG ID ("name" "ef1p") to the server
    and receive * ID ("name" "Dovecot") in return.
  • IDLE (RFC 2177):
    Instead of regularly polling the server for changes,
    a client can instruct the server with the IDLE command
    to transmit changes to the current folder in real time.
    You can enable Idle in the tool above to see an example.
    As long as the TCP connection between the client and the server remains open,
    the client is notified about new messages immediately.
    In order to avoid timeouts due to inactivity,
    the client can send the NOOP command,
    which does nothing, from time to time.
  • ESEARCH (RFC 4731):
    ESEARCH is an extension to the SEARCH and UID SEARCH commands,
    which allows the client to choose between several result options
    by issuing . SEARCH RETURN ({Options}) {Criteria}.
    The options are MIN to return the position or UID
    of the first message in the folder which satisfies the criteria,
    MAX to return the position or UID of the last message in the folder which satisfies the criteria,
    COUNT to return the number of messages in the folder which satisfy the criteria,
    and ALL to return the numbers of all messages which satisfy the criteria.
    When using the ALL option, the messages are returned in the set syntax
    instead of the space-separated enumeration of all messages.
    For example, a client can query how many messages are flagged with TAG SEARCH RETURN (COUNT) FLAGGED.
  • SEARCHRES (RFC 5182):
    SEARCHRES is an extension to the ESEARCH extension.
    Any server which supports SEARCHRES also has to support ESEARCH.
    SEARCHRES adds the result option SAVE,
    which tells the server to save the search result for later use instead of returning it.
    The client can reference the search result with $ in the
    FETCH, STORE, and some other commands.
    One advantage of this is that the client doesn’t have to wait for the search result
    before it can submit a subsequent command.
  • UIDPLUS (RFC 4315):
    UIDPLUS adds the command UID EXPUNGE
    and additional response codes,
    which inform the client about the UID of an appended or copied message.
    This is useful for clients to synchronize with servers more efficiently.
  • CONDSTORE (RFC 4551):
    CONDSTORE is by far the biggest extension in this list.
    It introduces MODSEQ as an additional message attribute
    and HIGHESTMODSEQ as an additional response to the EXAMINE and SELECT commands.
    MODSEQ works like UID but instead of assigning a permanent, strictly increasing number to each message,
    it assigns a permanent, strictly increasing number to each message modification.
    By remembering the HIGHESTMODSEQ value to which they synchronized,
    clients can use the extended STORE or UID STORE commands
    to modify messages on the server only if no other client modified them in the meantime.
    (CONDSTORE stands for conditional STORE.)
    CONDSTORE also extends other commands.
    For example, clients can use the CHANGEDSINCE modifier
    to fetch changes to messages more efficiently.
    Instead of fetching the flags of all messages every time they connect,
    clients can fetch the flags of just the messages which changed since the last time:
    TAG UID FETCH 1:* (FLAGS) (CHANGEDSINCE {LastSeenHIGHESTMODSEQ}).
    Unfortunately, clients can’t detect message deletions like this.
  • QRESYNC (RFC 5162):
    QRESYNC extends CONDSTORE to allow for quick mailbox resynchronization but it’s rarely supported.
    By remembering the UIDs of expunged messages with the corresponding MODSEQ value,
    servers can inform clients efficiently about deleted messages.
    Both CONDSTORE and QRESYNC were updated in RFC 7162.
    In the absence of QRESYNC,
    clients can perform a “binary” search
    to find the first message whose position changed.
    Clients need to do this only when the EXISTS count from the server is different from the local count
    after adding all the newly arrived messages.
    Clients can retrieve the UIDs of several messages at once
    by issuing TAG UID SEARCH {Position1},{Position2},{etc.}.
  • CHILDREN (RFC 3348):
    This extension allows the server to indicate in its response to the
    LIST command
    whether a folder HasChildren or HasNoChildren.
    This allows the client to display a folder as expandable
    without having to query for potential children with additional requests.
  • SPECIAL-USE (RFC 6154):
    Folders often have a specific purpose such as storing sent or deleted messages.
    This extension allows clients to inform each other about the special use of a folder
    without having to rely on specific names for the folders and
    without having to ask the user where to store specific messages.
    The defined purposes are
    All, Archive, Drafts, Flagged, Junk, Sent, and Trash.
    The purpose can be set when creating a new folder
    and is returned in the response to the LIST command.
    Gmail supports this extension and its response is roughly what you see in the tool above.
  • NAMESPACE (RFC 2342):
    This extension introduces a NAMESPACE command,
    which allows clients to discover the namespaces of personal folders and of shared folders.
    My understanding is that this is mostly used in corporate settings.
  • MOVE (RFC 6851):
    This extension defines the commands MOVE
    and UID MOVE to move messages from one folder to another.
    When MOVE is not supported, clients have to COPY
    messages to another folder and then delete the copied messages in the old folder with
    STORE and
    EXPUNGE.
    This is inefficient for both the client and the server
    and can lead to undesirable side effects.
  • QUOTA (RFC 2087):
    This extension allows clients to get and set the storage quota of their mailbox.
    For example, when using Q GETQUOTA "" in my Gmail test account,
    I get * QUOTA "" (STORAGE 145 15728640).
    The first number is the current usage in kibibytes,
    the second number the resource limit,
    which matches the 15 GB of free storage as advertised by Google.
    When using Q SETQUOTA "" (STORAGE 1000), I get Q NO [CANNOT] Permission denied. (Failure).
    "" denotes the so-called quota root, which allows different folders to share the same resource limit.

In addition to the extensions which are standardized by IETF,
mailbox providers are free to define their own extensions.
According to the IMAP standard, the name of an experimental or independent extension
has to start with an X.
For example, Gmail’s custom extension
is advertised as X-GM-EXT-1 in the response to the
CAPABILITY command.
Among other things, it allows clients to use
Gmail’s search syntax
and Gmail’s message ID.

JSON Meta Application Protocol (JMAP)

Over the last forty years, email in general and
IMAP in particular
became a patchwork of extensions.
Given the complexity and the varying support of these extensions,
writing a mail client is much more difficult than it should be.
While there are efforts to
unify the patchwork somewhat,
there has also been a fresh start over the last couple of years.
An IETF working group
designed a modern protocol for client to server interaction:
The JSON Meta Application Protocol (JMAP).
JSON itself stands for JavaScript Object Notation,
which is a popular format for storing and exchanging human-readable data.
JMAP is specified in RFC 8620
and it can be used for more than just email.
The data model for synchronizing email is specified in RFC 8621.
If you don’t like the RFC formatting, you can also read the two standards
here and here.

JMAP is designed to be interoperable with IMAP mailboxes
and thus shares the concepts of folders and flags with IMAP.
The protocol itself, however, is completely new
and addresses the following shortcomings of IMAP (and message submission):

  • Permanent identifiers:
    JMAP servers assign permanent identifiers to all objects.
    In the case of messages, these identifiers can no longer be invalidated
    and they no longer change when a message is moved from one folder to another.
    In the case of folders, JMAP clients can detect when a folder has been renamed
    and no longer need to fetch all the messages in it again.
  • Efficient synchronization:
    JMAP provides a simple method
    for getting the identifiers of created, updated, and destroyed messages and folders.
    As we have seen above, synchronizing a mailbox with IMAP is easy only
    if you stay connected to the server, which isn’t an option for mobile clients.
  • Push mechanism:
    In order to be informed immediately about changes to a folder, such as newly arrived messages,
    IMAP clients use the IDLE command.
    If they want to be informed about changes to several folders,
    they have to open a separate connection for each folder.
    JMAP, on the other hand, allows clients
    to subscribe to all changes on the server at once.
    Clients which can keep a connection to the server open can subscribe via the
    EventSource interface.
    Other clients, such as those on mobile phones, can register a
    callback URL,
    which allows them to use their platform-specific
    push technology.
  • Batching of chained commands:
    When the IMAP server doesn’t support certain extensions such as SEARCHRES,
    IMAP clients often need to wait for the response to one command
    before they can construct the followup command.
    JMAP allows clients to batch several commands
    and to reference the results
    from earlier commands in the same request.
    Doing so avoids round trips
    and makes updates more atomic
    (i.e. it becomes less likely that only some of the issued commands are being executed).
  • Widespread data format:
    JMAP data doesn’t have to be encoded as JSON.
    Future standards can specify other data formats.
    The same is true for the transport protocol:
    While JMAP currently uses HTTPS as its transport protocol,
    other protocols can be added in the future.
    The choice of JSON and HTTPS is mostly due to their widespread adoption:
    There are suitable libraries for all relevant programming languages
    and software engineers know how to use those.
    It’s worth mentioning that JMAP doesn’t wrap binary data in JSON.
    Binary data is exchanged in separate connections.
  • Complexity on server:
    JMAP moves the complexity of handling email’s message format from the client to the server.
    While clients can still fetch the raw message if needed,
    for example when implementing end-to-end security,
    the server has to deal with multipart messages,
    content encodings, line-length limits, etc.
    Clients can download and upload messages as a
    simple JSON object.
    Please note that this affects neither how messages are stored on servers
    nor how they are relayed to others.
    It just relieves programmers who want to integrate email
    from having to take care of encoding and decoding messages correctly.
  • Message submission:
    The previous point makes sense only if clients can also
    submit messages for delivery in the same format.
    If the JMAP server supports submission,
    a client can instruct it
    to send a stored message to its recipients.
    The client can generate the envelope itself or let the server do it.
    By first storing the message as a draft and then moving it to the sent folder after sending it
    (see this example),
    JMAP also solves the double-submission problem.
  • Flood control:
    Since it’s not always possible to anticipate how much data the server will send back,
    JMAP lets clients restrict the size of responses.
    This feature is especially valuable on devices with limited bandwidth or expensive roaming.

Support for JMAP is still quite rare,
which is not surprising given that the standard was published only in 2019.
We yet have to see whether it will become a relevant protocol for accessing one’s mailbox.
I certainly hope so, but email is really resistant to innovation.

Email filtering

It can be useful to filter incoming messages
according to custom rules.
For example, you may want to move certain messages to a certain folder,
mark certain messages as read, or delete certain messages automatically.
Most mail clients allow their users to configure such rules,
which are executed when the mail client receives a new message.
There are several advantages of filtering incoming mail on the server rather than on the client, though:

  • Synchronization:
    If the filtering rules are stored on the incoming mail server,
    they can be inspected and edited through any of the user’s mail clients.
    Otherwise, users have to remember on which client they’ve created the rule that they want to modify now.
  • No race conditions:
    If the filtering rules are stored on a mail client,
    then the rules are not applied when this mail client is offline.
    In this situation, other mail clients see unfiltered messages.
    If these mail clients apply rules of their own,
    you might run into race conditions,
    where the order in which clients see incoming messages determines the outcome of the filtering.
  • Rules for absence:
    Some rules, such as sending out-of-office replies,
    shall run precisely when all mail clients are offline.
    This is not possible when the rules must be executed by mail clients.
  • Rejection during delivery:
    Unlike clients, incoming mail servers can reject a message
    during its delivery.
    By sending the 550 response code
    during the SMTP session,
    the incoming mail server can inform the sender about the rejection
    without causing backscatter with bounce messages.

To achieve server-side filtering,
we need a standardized mail filtering language
and a standardized filter management protocol.

Mail clientof sender Outgoingmail serverof sender Incomingmail serverof recipient Offlinemail clientof recipient Onlinemail clientof recipient

How a message is delivered from the mail client of the sender to the mail clients of the recipient.
Messages can be filtered by the incoming mail server (in green) or by an online mail client (in blue).

Mail filtering language (Sieve)

Sieve
is a language for filtering messages on the incoming mail server.
It is specified in RFC 5228 and it is fairly simple:
Using the control commands if, elsif, and else,
you can specify under which conditions
a specific action shall be applied.
You can find plenty of examples throughout the RFC as well as
here
and here.
There are just a couple of things you should know to understand them:

  • Arguments: Most commands in the Sieve language take arguments.
    Mandatory arguments are determined by their position,
    optional arguments are identified by a
    colon followed by their name.
    Some optional arguments can take arguments themselves: :name value.
    This is similar to arguments in the command-line interface
    but with : instead of - before the name.
    When optional arguments are not provided, their default values are used instead.
  • Extensions: The Sieve language is extensible.
    A script has to list the extensions which it uses at the top of its code
    with require.
  • Implicit keep: Each message is stored in the inbox
    unless it is moved to a folder, forwarded to an address, or discarded explicitly.
  • String lists: Wherever a list of strings
    is expected, such as ["To", "Cc"], a string without brackets, such as "To", can be used.
  • Prefix notation: Commands and arguments are nested not with parentheses
    but by earlier tokens consuming later ones.
    For example, the negation of the condition exists "Date" is not exists "Date".
    This is similar to the prefix notation.
  • Comments: If you use # outside of double quotes,
    the incoming mail server ignores all characters
    including this one until the end of the line.
    Comments
    which span less or more than a line have to be enclosed in /* and */.
  • No loops: The Sieve language doesn’t support loops.
    Each block is executed once or not at all.

You can generate simple filtering rules with the following tool.
Make sure that the Argument makes sense for the chosen Action.
Move requires the name of a folder, Forward an email address,
Flag the name of a flag, and Reply the text of the reply.

Users don’t have to learn the Sieve language.
Mail clients can offer a graphical user interface (GUI)
similar to the tool above, where users don’t have to see the generated code.
You find a list of all the extensions to the Sieve mail filtering language
on Wikipedia.

Among the reasons
for sending an automatic response to the sender of a message are:

  • Vacation notice: Inform the sender that the message won’t be read in the coming days.
  • Change-of-address notice: Inform the sender that the recipient’s email address has changed.

Prior to JMAP,
where servers can support
the configuration of vacation responses,
Sieve and ManageSieve
with the vacation extension
were the only standardized way to configure such responses.
According to RFC 3834,
the same response should be sent to the same sender only once within a period of several days
even when the sender sends additional messages.

A simple Sieve script for an automatic vacation response, which I’ve adapted
from Gandi.
Support by mailbox providers and mail clients

Unfortunately, none of the big free mailbox providers support Sieve.
If you pay for your mailbox, though, chances are that you can use the Sieve language
since it is implemented by the most popular mail servers.
Providers with Sieve support include Fastmail,
mailbox.org,
Proton Mail,
and Gandi.
Other mailbox providers support server-side filters with proprietary rules through their web interface.
One example is Gmail:

Gmail users can go to the Filters and Blocked Addresses tab of their settings and click on “Create a new filter”. While Gmail has an API for managing filters, other mail clients won’t support such a proprietary protocol.

You might struggle more to find a suitable mail client.
When it comes to desktop clients, there’s basically just a
plugin for Thunderbird.
If you’re willing to use a web client, Roundcube has you covered as well.

Filter management protocol (ManageSieve)

ManageSieve is a protocol for managing Sieve scripts remotely.
It is specified in RFC 5804
and works similar to the protocols we have seen so far.
After an initial greeting from the server,
the client sends commands to which the server responds.
Just like IMAP,
responses are completed with a line which starts with OK or NO;
but unlike IMAP, the commands are not preceded with a tag.
Just like IMAP, multiline strings are prefixed with their length;
but unlike IMAP, the client can include a plus
to continue with the string without having to wait for a continuation response from the server.
Just like SMTP for Relay,
there’s no variant of ManageSieve which can be used with Implicit TLS.
The server sends its capabilities
automatically in its greeting and after successful
STARTTLS and
AUTHENTICATE commands.
As part of the capabilities, the server indicates which extensions to the Sieve language
and which SASL mechanisms it supports.
According to RFC 5804,
ManageSieve servers have to support PLAIN over TLS
and SCRAM-SHA-1.

The following tool shows you how to use the ManageSieve commands
from your command-line interface.
Unlike the previous tools,
you have to configure the address and the port number of the server manually
as this information is not included in Thunderbird’s configuration files.
The standard describes how to locate the ManageSieve server
with SRV records,
and the autoconfiguration tool above does query the _sieve._tcp subdomain.
However, since virtually no one configures such SRV records (at least not for the ManageSieve protocol),
I didn’t bother to implement this discovery mechanism here.
ManageSieve servers listen on port 4190 by default.
The Thunderbird plugin, which I mentioned earlier,
simply probes this port
on the IMAP server
in order to configure itself.

Important: Since LibreSSL doesn’t support the ManageSieve STARTTLS command,
you have to use OpenSSL
(see the boxes below).

Explanation: While you can have multiple scripts on the server,
at most one of them can be active.
You cannot delete the active script.
You can deactivate the active script by activating
another script or by using an empty script name to set no script active.


You can also generate the argument to PLAIN yourself
with echo -ne '000username000password' | openssl base64.

LibreSSL doesn’t support ManageSieve

With the -starttls option,
you tell openssl for which protocol you want to start TLS.
There are two implementations of openssl:
OpenSSL supports ManageSieve,
LibreSSL doesn’t.
If you provide -starttls sieve, OpenSSL executes
this code.
Can’t we use one of the other protocol options to let LibreSSL send
STARTTLS to the server?
The answer is no, unfortunately:

  • IMAP:
    LibreSSL first issues . CAPABILITY to check whether the server supports STARTTLS.
    ManageSieve servers ignore this as an invalid command.
    LibreSSL then tries to initiate TLS anyway and sends . STARTTLS.
    Since the ManageSieve protocol doesn’t use tags,
    this line fails to achieve what we want.
  • POP3:
    Using -starttls pop3 doesn’t work because POP3 clients use
    STLS instead of STARTTLS to upgrade the connection.
  • SMTP:
    Using -starttls smtp could work but for some reason it also doesn’t work.
    LibreSSL first sends the EHLO command,
    which is ignored by ManageSieve servers as an invalid command.
    Continuing anyway, LibreSSL sends STARTTLS to the server and doesn’t check the response,
    which is exactly what we were looking for.
    Unfortunately, this still fails.
    If you know why, please let me know.

Of all the protocols we have seen so far,
not two of them initiate TLS in the same way.
Thus, if you want to use the ManageSieve protocol from the command line,
you have to install OpenSSL.
You can check what you have with openssl version.

How to install OpenSSL on macOS

The easiest way to install OpenSSL on macOS is with Homebrew.
You can check whether Homebrew is already installed with:

If this is not the case, you can install Homebrew with:

Afterwards, you can install OpenSSL with:

By default, OpenSSL is installed in the following location without replacing the preinstalled LibreSSL:

Click here
to use this as the OpenSSL command in the tool above.

Format

The format of an email message is specified in RFC 5322.
The goal of this chapter is to make you comfortable reading raw messages.

How to display the raw message

Mail clients don’t display all header fields by default.
Here is how you can display the raw message as it arrived in your mailbox:

  • Gmail:
    Open a message, click on ⋮ in the upper right corner, then on “Show original”.
  • Yahoo:
    Open a message, click on ⋯ in the bottom middle, then on “View raw message”.
  • Outlook:
    • Web: Click on ⋯ in the upper right corner, then on “View” and “View message source”.
    • Desktop: Double-click a message, click on the “File” menu and then select “Properties”.
  • Thunderbird:
    • Raw message: Select a message, click on the “More” button and then “View Source” (or use ⌘U).
    • All header fields: Click on the “View” menu, then on “Headers” and “All” (or on “Normal” to go back).
  • Apple Mail:
    • Raw message: Click on the “View” menu, then on “Message” and “Raw Source” (or use the shortcut ⌥⌘U).
    • All header fields:
      Click on the “View” menu, then on “Message” and “All Headers” (or use the shortcut ⇧⌘H).
    • Change preferences:
      In the “Viewing” tab of the preferences, you can configure which header fields are displayed.

File format

Since messages, including attachments, are just text,
they can be stored as simple text files.
A common filename extension
for emails is .eml.
Such files can be viewed with any text editor.
Desktop clients usually have an option to save a message as a file,
and among Web clients, at least Gmail allows you to download a message in the “⋮” menu,
which is located in the upper right corner.

Storage format

For their own purposes, mail clients can store messages in whatever format they want.
The two formats which are used by several mail clients and servers to store messages
are Mbox and Maildir.
By default, Thunderbird uses the former but it can also be configured to use the latter.
The Mbox format is specified in RFC 4155.
All messages are appended in their raw format to a single file.
Mbox is a text-based format,
which means that a given string, namely From …, is used to delimit the messages
and that occurrences of this string in messages have to be escaped.
Storing all the messages in a single file is not ideal
as it might easily get corrupted if it’s not properly locked
while reading from and writing to it.
Additionally, this format is inappropriate for backup systems
that copy the complete file and not just the differences
when the content of a file has changed.
Thunderbird stores the messages at
~/Library/Thunderbird/Profiles/{RandomString}.default/ImapMail/{MailServer} on macOS.
If you use another operating system, you find the storage location on
this page.
This directory contains two files for each of your mailbox folders.
For example, you should have a large INBOX file and a much smaller INBOX.msf file,
which is used to index the messages in the former file.
(MSF stands for mail summary file.)
You can use the tail command
to display the specified number of lines of the last message that you’ve received:
tail -n 100 INBOX.
Unless you want to transfer all your messages to a new computer,
you shouldn’t move or modify such files as this likely causes problems for your mail client.

Apple Mail storage format

Similar to Maildir,
Apple Mail stores each message in a separate file at ~/Library/Mail/.
The used format is proprietary and there’s no official documentation about it
but it’s fairly easy to reverse engineer.
After a folder with the version number of the format, V7 in my case,
Apple Mail generates a folder for each of the added email accounts
with a Universally Unique Identifier (UUID) as its name.
Inside these accounts folders, Apple Mail generates a folder ending with .mbox
for each of the IMAP folders,
such as INBOX.mbox, Sent Messages.mbox, and so on.
These mailbox folders contain another folder with a UUID,
which finally contains the Data folder with the actual messages in further folders.
Put together, the folder nesting is as follows: ~/Library/Mail/V7/{UUID}/INBOX.mbox/{UUID}/Data.

Apple Mail enumerates the messages with a single counter across all your accounts.
It uses the filename extensions
.emlx for messages without attachments and .partial.emlx for messages with attachments.
In these emlx files, Apple Mail prepends the length of the message in bytes to the raw message
and appends a property list with additional information.
It’s a text-based format that you can open with any text editor.
The messages are stored in a Messages folder inside the Data folder with their number used as their name.
For example, you might have …/Data/Messages/123.emlx.
If the message contains attachments,
Apple Mail removes the attachments (at least in most cases)
and stores them separately in an Attachments folder.
For example, if message 123 has an attachment,
the message is stored at …/Data/Messages/123.partial.emlx
and its attachment at …/Data/Attachments/123/{Position}/Filename.pdf.
The Position encodes where the attachment was included
in the message’s multipart hierarchy.
In an effort to limit the number of files to 1’000 per folder,
Apple Mail creates subfolders when the message number becomes larger than 999.
For example, message 1234 is stored at …/Data/1/Messages/1234.emlx,
message 12345 at …/Data/2/1/Messages/12345.emlx,
and message 123456 at …/Data/3/2/1/Messages/123456.emlx.

Please note that you have to give the Terminal
full disk access in the “System Preferences” under “Security & Privacy” and then “Privacy”
if you want to access the ~/Library/Mail/ folder from the command line because of the
System Integrity Protection (SIP) of macOS.
With full disk access enabled, you can find the message with a particular number
with find ~/Library/Mail/ -name '1234.*emlx'.
If you need to convert .emlx files back to .eml files,
for example to migrate them to a different mail client or mailbox provider,
you may want to have a look at this project.

Line-length limit

According to RFC 5322,
each line of a message may consist of at most 1’000 ASCII characters,
including CR + LF.
Implementations are free to accept longer lines,
but since some implementations cannot handle longer lines,
you shouldn’t send them.
The RFC even recommends limiting lines at 80 characters
to accommodate clients that truncate longer lines in violation of the standard.
In order to leave the line wrapping to the mail client of the recipient,
the mail client of the sender has to encode the body
if the body contains lines which are too long.
If a header field is too long,
it must be broken into several lines with
folding whitespace:
{CR}{LF} followed by at least one space or tab.
If a line in the header section of a message starts with whitespace,
its content belongs to the header field on the previous line.
The procedure of breaking lines as done by the sender is called folding,
the procedure of joining lines as done by the recipient is called unfolding.
When unfolding, runs of whitespace characters are replaced with a single
space character.

Message identification

There are three header fields
to identify the current message and the previous messages
in the same thread:

  • Message-ID: The Message-ID identifies the current message.
    Its format is <{Value}@{Domain}>.
    Although outgoing mail servers may add this field
    if it’s missing, the Message-ID should be chosen by the mail client.
    Otherwise, the copy stored in the sent folder
    on the incoming mail server lacks this field, which defeats its purpose.
    Whoever chooses the Message-ID should make sure that it’s unique.
    Mail clients often choose the Value as a universally unique identifier (UUID)
    and the Domain as the domain part of the user’s email address.
    The sender has to decide whether two messages are the same and thus share the same Message-ID.
    If the client generates different versions of the same message due to Bcc recipients,
    it should use the same Message-ID for all of them.
  • In-Reply-To: If a user replies to a message,
    the Message-ID of the replied-to message is put into the In-Reply-To header field.
  • References: While In-Reply-To refers only to the direct parent message,
    the References field lists the Message-IDs of all ancestor messages,
    including the direct parent message.
    This is useful to reconstruct a conversation even
    if not all intermediary messages were sent to you.
    Clients compose this field by adding the Message-ID of the replied-to message
    to the References of the replied-to message.
    When determining which messages belong to the same thread,
    clients use additional heuristics,
    such as comparing the Subject line after stripping common prefixes,
    to avoid grouping messages where a person replies to a message
    just to send an unrelated message to the sender of the message.
Message-ID:  <64F0D157-FD89-4858-9589-9BDD22870B22@example.org>
In-Reply-To: 
References:  
             
An example of what the three message identification header fields look like.
The References field contains the message ID of the In-Reply-To field.
Mandatory header fields

According to RFC 5322,
only two header fields must be included in every message:
the From field and the Date field.
While not strictly mandatory, every message should have a Message-ID,
and every reply should have an In-Reply-To and a References header field
if the replied-to message had a Message-ID.

Quoting the previous message

It’s a common practice to quote the text of the original message in the reply,
but this is completely optional, and the format for doing so isn’t standardized.
Most mail clients prefix quoted lines
with the greater-than sign in a text-based response and wrap the quoted text in a

element
when using HTML.
Modern clients typically display quoted text with a vertical bar.

Text of the current message
> First line of the parent message
> Second line of the parent message
>> Quoted text in the parent message

How text is usually quoted at different nesting levels.
Quoting the text allows you to reply below each paragraph
of the original message.

If mail clients quote the message to which you reply,
they also add an attribution line,
which mentions the author and the date of the original message.
Quoting text with > is mentioned only in RFC 1849
and RFC 3676,
the former being related to Usenet rather than traditional email.
One problem of quoting text with > is that
this can cause lines to exceed the imposed length limit.

Universally Unique Identifier (UUID)

Universally Unique Identifier (UUID)
is a standard for generating globally unique identifiers without coordination among the involved parties.
The standard has been published by various organizations,
including IETF in RFC 4122.
A universally unique identifier is a 128-bit number,
which is encoded as 32 hexadecimal digits
with 4 hyphens inserted at fixed positions.
The format of UUIDs is XXXXXXXX-XXXX-AXXX-FXXX-XXXXXXXXXXXX,
where the four bits of A encode the used algorithm,
and the first one to three bits of F encode the used format.
Please note that I use A and F as variable names here.
All the bits, including the actual values of A and F, are encoded as hexadecimal digits.
X stands for four bits. How those bits are determined depends on A and F.

The format F is in binary either
0xxx for the original format,
10xx for the RFC format, or
110x for Microsoft’s format.
111x is reserved for a potential future format,
and the lowercase x stands for a single bit.

When using the RFC format, the algorithm A, which is used to determine the remaining bits, is one of the following:

Trace information

According to RFC 5321,
whenever a mail server receives a message,
it must add a Received header field at the beginning of the message
without changing or deleting already existing Received header fields.
Received header fields have the following format:

Received: from {EhloArgument} ({DnsReverseLookup} {IpAddressOfClient})
    by {DomainNameOfServer}
    with {Protocol}
    id {SessionId}
    for {AddressOfRecipient};
    {DayOfWeek}, {Day} {Month} {Year} {Hour}:{Minute}:{Second} {TimeZone}

The format of Received header fields.
The curly brackets stand for values which need to be inserted.
The with, id, and for clauses are optional.
The newlines can be in other places, and additional information is often added
as comments in parentheses in various places.

According to RFC 5321,
the Protocol is either SMTP or ESMTP.
RFC 3848 specified additional values:
ESMTPA when ESMTP is used
with successful user authentication,
ESMTPS when ESMTP is used with Implicit or Explicit TLS,
and ESMTPSA when the session has been secured and the user has been authenticated.
RFC 8314 specifies an additional tls clause,
which can be used after the for clause to record the
TLS ciphersuite
which has been used.
Gmail adds such information as a comment instead:
(version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256).
Checking the Received header fields of a received message gives you an idea
whether the message was secured during transport.
Note, however, that Received header fields are not authenticated:
The mail servers through which a message passes can change the Received header fields
that were added by mail servers through which the message already passed.
In addition, not all mail servers might support the newer protocol values,
and relays over a private network are often not protected with TLS.
A message typically has at least four Received header fields,
which makes sense only when you look at the official architecture
instead of the simplified architecture.
A Received header field is added by the mail submission agent (MSA), the outgoing mail transfer agent (MTA),
the incoming mail transfer agent (MTA), and the mail delivery agent (MDA).
Here is a Received header field, which was added by my outgoing mail server:

Received: from [192.168.1.2] (unknown [203.0.113.167])
    (Authenticated sender: kaspar@ef1p.com)
    by relay12.mail.gandi.net (Postfix) with ESMTPSA id 7974D200009
    for ; Thu, 3 Dec 2020 14:14:48 +0000 (UTC)

What an actual Received header field looks like.
I’ve only replaced my IP address with an address reserved for documentation.
Due to Network Address Translation (NAT),
the private IP address
that my computer used in the EHLO command
and the public IP address that the outgoing mail server saw were different.
Since my public IP address doesn’t have a reverse DNS entry,
the server recorded the name of the client as unknown.
(Authenticated sender: kaspar@ef1p.com) is a comment,
which the server added to indicate which user submitted the message.
(Postfix) is also a comment, indicating the name of the server implementation.
Everything else matches the format which I’ve described above.

An incoming mail server which delivers a message
must add the MAIL FROM address of the envelope
in a Return-Path header field to the message.
While a message can have several Received header fields,
it may have at most one Return-Path header field.
If a message is resubmitted, for example by a filtering rule,
the Return-Path header field should be removed,
and its value should be used as the MAIL FROM address.
As we discussed earlier,
the Return-Path header field can be different from the From header field.

Return-Path: 
What a Return-Path header field looks like.
Recover why you received a message

Since the Bcc recipients are usually removed from the message
even for the Bcc recipients themselves,
mail clients don’t know whether a message has been forwarded
or whether the user was a hidden recipient
if the user’s address is not listed among the recipients.
By inspecting the Received header fields,
mail clients could easily distinguish between the two scenarios in most cases:
If the message has been forwarded,
there should be a Received header field with one of the recipient addresses in the for clause.
Recovering the address through which a message has been forwarded to your mailbox
could be useful for filtering incoming messages into different folders automatically.
And as we discussed earlier,
mail clients shouldn’t offer a reply-to-all option for messages
where the user was a Bcc recipient
as this would leak what the sender tried to hide by using the Bcc field.

Local Mail Transfer Protocol (LMTP)

The Local Mail Transfer Protocol (LMTP)
is a variant of the Extended Simple Mail Transfer Protocol (ESMTP),
in which the server can reject an incoming message for each recipient individually.
In the case of ESMTP, the server can send only a single reply after the message has been transferred.
If the message can be delivered to some of the recipients but not all of them,
the ESMTP server has to queue the message in order to deliver it to the pending recipients at some later point.
In the case of LMTP, the server has to confirm the acceptance of the message for each recipient
which was provided with the RCPT TO command.
Being able to reject a message for individual recipients frees LMTP servers from having to manage a mail queue.

LMTP is specified in RFC 2033 and may be used only in a local network.
LMTP uses LHLO instead of EHLO to greet the server.
I’ve included LMTP only because you might encounter it as LMTP[S][A]
in the with clause of a Received header field.
LMTP also pops up in other places,
for example in the code
to which I linked earlier.

Content encoding

RFC 5322 specifies a format for text messages,
whose lines may consist of at most 1’000 ASCII characters.
Whenever the content of a message doesn’t fulfill this requirement,
it must be encoded according to the
Multipurpose Internet Mail Extensions (MIME)
as specified in RFC 2045.
When mail clients encode messages according to MIME,
they indicate this with the following header field:

MIME-Version: 1.0
The header field used to indicate that a message is formatted using MIME.

In theory, the version number allows the Internet community to make changes to the standard.
In practice, however, the standard didn’t specify
how mail clients are supposed to handle messages with an unknown MIME version.
As a consequence, you cannot change the version number without breaking email communications,
which makes this header field completely useless.
The version 1.0 survived the last 30 years and will likely survive the next 30 years.
MIME also introduced additional message header fields,
which we’ll cover in this and the following subsections.

Unless all involved SMTP servers support the BINARYMIME extension
as specified in RFC 3030, which is rarely the case,
content containing non-ASCII characters or lines longer than 1’000 characters
must be encoded with one of the following two methods:

  • Quoted-Printable:
    Any byte which doesn’t represent a printable ASCII character
    is encoded with the equality sign
    followed by the value of the byte encoded as two hexadecimal digits.
    Since = is used as the escape character,
    it has to be encoded with its hexadecimal ASCII value as =3D.
    Lines may be at most 78 characters long, including {CR}{LF}.
    Longer lines have to be broken by inserting ={CR}{LF}.
    All sequences of these three characters are removed when decoding the Quoted-Printable encoding.
    Since some mail servers add or remove trailing whitespace,
    tabs and spaces which are followed by {CR}{LF} also need to be encoded with hexadecimal digits.
    Any sequence of bytes can be encoded with this method.
    However, the Quoted-Printable encoding makes sense only
    if most of the bytes are printable ASCII characters.
    This is the case for those European languages which share most of their characters
    with the English alphabet.
    Texts in such languages remain largely readable when using the Quoted-Printable encoding.
    The probability that a random byte falls into the range of printable ASCII characters
    is just a bit bigger than one third, though.
    Thus, the size of binary data, such as images, more than doubles with this encoding.
    The following tool allows you to encode and decode Quoted-Printable:

  • Base64:
    Binary data and non-Western-European languages are best encoded with Base64.
    While hexadecimal digits encode 4 bits each, Base64 digits encode 6 bits each.
    6 bits can represent 26 = 64 different values.
    Base64 uses the characters AZ, a – z, 0 – 9, +, and / to encode these 64 values.
    What makes the Base64 encoding special is that bytes and digits don’t align:
    Three bytes are encoded with four Base64 digits.
    If you shift the input by one or two bytes, the Base64 encoding looks completely different.
    If the size of the input is not a multiple of three,
    one or two equality signs are appended to the output
    in order to make the output a multiple of four.
    This procedure is known as padding.
    In order to respect the line-length limit,
    a line break is inserted after at most 76 Base64 characters.
    Base64 encoding increases the size of the content by 33%
    and the line breaks add another 2.6% on top of that.
    You can encode and decode Base64 with the following tool:

The mail client of the sender informs the mail client of the recipient with the following header field
that the content is encoded:

Content-Transfer-Encoding: {Value}

This header field indicates with which method the content of the message has to be decoded.


The Value can be quoted-printable, base64, or 7bit if no content encoding has been used.


(If the 8BITMIME or BINARYMIME extensions are supported, the value can also be 8bit or binary.)

If the message already consists of only printable ASCII characters,
the line-length limit can also be achieved with soft line breaks.

Character encoding (charset)

The character encoding
determines how each character is encoded as a sequence of zeros and ones.
I’ve already covered this in the previous article.
What we’re interested in now is how this affects the Quoted-Printable and Base64 encodings.
The most popular character encodings are
ASCII,
ISO-8859-1,
and UTF-8.
ASCII encodes each character with 7 bits,
leaving the 8th bit in each byte unused.
Its set of characters (charset for short) includes only the English alphabet.
ISO-8859-1 extends ASCII with characters used in Western European languages,
such as à, á, â, ã, ä, å, and the like.
Each ISO-8859-1 character is encoded with 8 bits.
UTF-8, on the other hand, encodes all the code points
defined by Unicode with 1, 2, 3, or 4 bytes.
Both ISO-8859-1 and UTF-8 encode the ASCII characters if the first bit of a byte is zero.
For non-ASCII characters, UTF-8 needs at least two bytes.
Therefore, if all the characters in a message can be encoded with ISO-8859-1,
the Quoted-Printable and the Base64 encodings are shorter
if the input string is encoded with ISO-8859-1 rather than with UTF-8.
You can verify this with the tool above: The Quoted-Printable encoding of the
inverted exclamation mark
is =A1 when using ISO-8859-1 and =C2=A1 when using UTF-8.

ascii7 bits iso-8859-18 bits utf-88 to 32 bits

ASCII encodes only a subset of the characters defined in ISO-8859-1,
and ISO-8859-1 encodes only a fraction of the characters available in UTF-8.
Percent encoding (URL encoding)

If you ever did some web development,
you might have encountered the Percent encoding.
It’s used to encode arbitrary data in a
Uniform Resource Identifier (URI),
such as a Uniform Resource Locator (URL).
The Percent encoding is specified in RFC 3986,
and it works similar to the Quoted-Printable encoding.
The difference is that the Percent encoding has a longer list of
reserved characters
and that the percent sign
is used as the escape character instead of the equality sign.
Additionally, whitespace,
including newlines, have to be encoded, and Percent encoding is usually used on UTF-8 strings.
I’ve included this box and the following tool mostly for the sake of completeness.
The only place where you might find Percent-encoded strings in emails
is in links of HTML messages.

Decoding on the command line

If you use POP3 or IMAP
to fetch messages from your command-line interface,
you likely also want to decode the received messages on the command line.
The following commands read the string to encode or decode from their
standard input
and write the encoded or decoded string to their
standard output.
This allows you to use the commands both in pipelines,
such as echo -n 'input' | {Command},
and with files,
such as {Command} < input.txt > output.txt.

How to encode (-e) and decode (-d) Quoted-Printable with qprint.
Use brew install qprint to install this command on macOS.

How to encode (-e) and decode (-d) Quoted-Printable with the
quoted-printable
package if you already have Node.js.

How to encode (-e) and decode (-d) Base64 with
OpenSSL.
Use the option -A to have no newline characters inserted or expected.

How to encode and decode Quoted-Printable, Base64, and Percent with Perl,
which is likely preinstalled on your computer.
You can use explainshell.com to learn more about the used options.
The code uses the MIME::QuotedPrint,
MIME::Base64, and
URI::Escape modules.

How to encode and decode Quoted-Printable, Base64, and Percent with
Python,
which is likely preinstalled on your computer.


The commands use the quopri,
base64, and
urllib.parse modules
and the first four commands operate on the raw bytes.


If you want to use a character encoding other than UTF-8,
you can just save the file which you use as the input accordingly.

RFC 2047 specifies how one can use non-ASCII characters
in certain header field values,
such as the subject and the display names.
Instead of introducing new header fields to specify the encoding of existing header fields,
encodings in header fields indicate which character encoding
and which content encoding has been used.
This results in the so-called Encoded-Word encoding.
Its format is as follows: =?{CharacterEncoding}?{ContentEncoding}?{EncodedText}?=,
where CharacterEncoding is usually either ISO-8859-1 or UTF-8,
ContentEncoding is either Q for Quoted-Printable or B for Base64,
and EncodedText is the field value encoded according to the previous parameters.
The Quoted-Printable encoding is slightly modified when used to encode header field values:
Question marks, tabs, and underlines are escaped with their hexadecimal representation
and spaces are encoded with underlines.
In order to adhere to the line-length limit,
whitespace between adjacent Encoded Words is removed completely,
which allows the encoder to break long words with a newline
(and also to mix different character encodings).
The following tool does all of that for you.
It uses Quoted-Printable or Base64 depending on which encoding is shorter,
and it supports only ISO-8859-1 and UTF-8.

In case you haven’t noticed yet: The ESMTP tool above
automatically encodes the Subject and the Body if necessary.
If you want to use non-ASCII characters in display names,
you have to paste the Encoded Word into the address field yourself.
The following boxes explain how non-ASCII characters are supported in
domain names,
which is really interesting but also fairly advanced.

Punycode encoding

Punycode is yet another encoding
of Unicode with ASCII characters.
While domain names may consist of arbitrary bytes,
many protocols require that the domain names of servers
contain only letters, digits, and hyphens (LDH) from the ASCII character set.
This is known as the preferred name syntax,
and (E)SMTP is one of the protocols
which uphold the LDH rule.
In order to remain backward compatible and to require no changes to the
DNS infrastructure,
domain names with non-ASCII characters have to be encoded with just ASCII letters, digits, and hyphens.
Punycode is an encoding which does exactly that.
It is specified in RFC 3492
and it tries to be as space-efficient as possible.

Punycode encodes Unicode strings in three steps:

  1. Remove and sort the non-ASCII characters: In the first step,
    the Punycode encoder removes all non-ASCII characters from the string which is to be encoded.
    For example, Zürich, the city in which I live, becomes Zrich.
    Since ü is the only non-ASCII character, there is nothing to sort.
    If there were several non-ASCII characters,
    the encoder would have to sort them according to their Unicode
    code point.
  2. Determine the deltas: In the second step,
    the encoder determines how many iterations the decoder has to do nothing
    before inserting the non-ASCII characters back in.
    The decoder loops through the positions of the current string and through all Unicode characters.
    In the first iteration,
    the decoder would add the first non-ASCII code point at the first position.
    Since ASCII uses all 7 bit numbers from 0 to 127,
    the first non-ASCII code point is 128.
    In the second iteration,
    the decoder would add the character with the code point 128 at the second position, and so on.
    Once the decoder reaches the last position of the string,
    it goes back to the first position to potentially insert the next higher code point there.
    Let’s look at our example again.
    There are six positions where a character might be inserted: 1Z2r3i4c5h6.
    The first (and only) character to insert is ü with the
    code point 252.
    The decoder has to loop through the string 252 – 128 = 124 many times
    before it is ready to insert the character ü.
    Since the string has six positions and we want to insert the ü at the second position,
    the decoder has to do nothing for 124 · 6 + 1 = 745 iterations
    before inserting the current character ü at the current position 2.
    If there were more characters to insert,
    there would now be seven positions for doing so.
    The decoder would continue from its current state (ü at position 2)
    and skip again the number of iterations as specified by the encoder.
    The number of skipped iterations is called “delta”.
    The result of this second step is a delta for each non-ASCII character
    which needs to be inserted into the string of ASCII characters as determined by step one.
    While Zrich [745] decodes to Zürich,
    Zrich [745, 0] decodes to Züürich,
    Zrich [745, 1] decodes to Zürüich,
    Zrich [745, 2] decodes to Züriüch, and so on.
  3. Encode the deltas: In the third step,
    the encoder encodes the list of numbers from the second step with letters and digits.
    A hyphen is used to separate the encoded deltas from the string of ASCII characters from step one.
    To make the encodings as compact as possible,
    Punycode encodes the deltas without a delimiter between them.
    It uses variable-base integers
    with a variable termination threshold instead.
    Since domain names in the preferred name syntax are case-insensitive,
    the case of the letters may not matter for the encoding of the deltas.
    The letters a to z represent the decimal numbers 0 to 25
    and the digits 0 to 9 represent the decimal numbers 26 to 35.
    Unlike all the numbers you’re used to,
    the positions of Punycode numbers get more significant to the right.
    Each position has its own threshold and its own base.
    If a digit at a position is below the threshold there,
    it marks the end of the current number.
    Let’s imagine, for a moment, that we use only the digits 0 to 9
    and a fixed threshold value of 5.
    Counting then works as follows: 0, 1, 2, 3, 4
    (so far, each number has been terminated by the digit being below the threshold,
    but from now on we need an additional digit to terminate the number), 50, 60, 70, 80, 90
    (we cannot go back to a digit below 5 in the first position as this would terminate the number
    so we choose the base of the second position to be 10 minus the threshold of the first position),
    51 (5 · 1 + 1 · 5 = 10), 61 (6 + 5 = 11), 71, 81,
    91 (9 + 5 = 14), 52 (5 · 1 + 2 · 5 = 15), and so on.
    After 94 comes 550 and after 990 (9 · 1 + 9 · 5 = 54) comes 551 (5 · 1 + 5 · 5 + 1 · 25 = 55).
    The base in the third position is determined by multiplying the base in the second position
    with the number of available symbols minus the threshold in the second position (5 · (10 – 5) = 25).
    The base in the fourth position will be 25 · (10 – 5) = 125, and so on.
    The higher the threshold value, the more likely it is
    that you don’t need an additional digit to terminate the number.
    On the other hand, a higher threshold value means that the base at the next position is lower.
    This in turn means that less progress is made in the next position
    and an additional position might be needed.
    Punycode sets the threshold as the position times the number of symbols minus the current bias
    and limits all thresholds to a certain range.
    The bias is 72 initially and the range for thresholds is 1 to 26.
    Thus, the threshold at position 1 is max(1, 1 · 36 – 72) = 1,
    the threshold at position 2 is max(1, 2 · 36 – 72) = 1, and
    the threshold at position 3 is min(26, 3 · 36 – 72) = 26 initially.
    The bias is adapted after each delta because
    the current delta indicates the likely size of the next delta.
    In our example, 745 is encoded as kva.
    Since k stands for 10, v for 21, and a for 0,
    10 · 1 + 21 · 35 + 0 · (35 · 35) indeed equals 745.
    Since the threshold is always at least 1,
    a always terminates the current delta.
    While Zrich-kva decodes to Zürich,
    Zrich-kvaa decodes to Züürich,
    Zrich-kvab decodes to Zürüich,
    Zrich-kvac decodes to Züriüch, and so on.
    The bias is adapted to 0 after the first delta,
    which makes the threshold at the first position min(26, 1 · 36 – 0) = 26.
    This means that Zrich-kvaz is a valid encoding while Zrich-kva0 is not
    because the 0 (representing 26) needs to be terminated with an a: Zrich-kva0a.
    You can try all of this yourself with the following tool.
    The domain name option is explained in another box.

Warning: The domain option is a very crude approximation
of the standard.
Use the official utility when correctness matters!

A few additional observations:

  • Punycode transforms a sequence of Unicode code points
    irrespective of their encoding, such as UTF-8
    or UTF-16.
  • The deltas can be only positive.
    This is why the non-ASCII characters have to be sorted before they can be encoded.
  • If the encoded word contains a hyphen, then the decoded word contains ASCII characters
    and the last hyphen is interpreted as the delimiter between the ASCII characters and the deltas.
    If the decoded word doesn’t contain ASCII characters, then the encoded word doesn’t contain a hyphen.
    a is encoded as a-, - as --, ü as tda, and the empty string as the empty string.
  • Punycode encodes non-ASCII symbols like ¡ and with letters, digits, and hyphens,
    but it doesn’t escape the remaining printable ASCII characters, such as !, =, and &.
    Punycode would be more flexible if the initial state started with a code point of 0 instead of 128.
    As we will see soon, this doesn’t matter for internationalized domain names, though.
  • After a potentially large initial delta,
    the subsequent deltas are small if all the characters come from the same language.
    This is what makes Punycode so efficient.
    For example, Ελληνικά is encoded as twa0c6aifdar,
    which consists of just four more characters.
    Even more astonishingly, the UTF-8 encoding of Ελληνικά takes 16 bytes,
    whereas the UTF-8/ASCII encoding of twa0c6aifdar takes just 12 bytes.
Unicode normalization

Unicode is designed to be as inclusive as possible.
Any character and symbol that people want to express gets included in the standard.
While a unified encoding of all writing systems
and earlier character encodings
is great for interoperability, it’s really bad for comparing strings because
characters that we humans consider to be equal can be encoded by different code points.
When you search for a string encoded in one variant,
you also want to find strings encoded in other variants.
For this reason, Unicode strings need to be normalized
before comparing them so that identical strings have the same binary representation.

Unicode normalization
distinguishes between encodings that are syntactically identical
and encodings that are semantically similar but not identical.
The former is called canonical equivalence, the latter compatibility equivalence.
Additionally, some characters can be represented by a single code point or by several code points.
The former is the composed representation, the latter the decomposed representation.
Based on these options, Unicode defines the following four
normalization forms (NF):

  Composition Decomposition
Canonical NFC NFD
Compatibility NFKC NFKD
The four normalization forms of Unicode.

Replacing characters by compatibility equivalence also replaces characters that are canonically equivalent.
There are no normalization forms for the latter without the former.
The relationship between canonical equivalence and compatibility equivalence can thus be visualized as follows:

Canonicalequivalence Compatibilityequivalence

Compatibility equivalence includes canonical equivalence.

Before we look at examples, let me introduce the following tool to you.
It outputs the code points of the given input after applying the given normalization.
It uses JavaScript’s normalize function for the Unicode normalization,
and it allows you to input characters by their code point(s)
with JavaScript’s escape notation,
which means that you can specify a code point with two, four, or a variable number of hexadecimal digits:
xXX, uXXXX, and u{X…X}, where X represents a hexadecimal digit.

Examples of canonical equivalence:

Examples of compatibility equivalence:

A few additional remarks:

  • Notation: I used ↔ when the left side can be converted to the right side and vice versa,
    and → when the left side is normalized to the right side
    and the right side can no longer be reverted to the left side.
  • Lossy conversion:
    Both the canonical normalization and the compatibility normalization lose information,
    but in the case of canonical normalization, the loss is usually desired.
    In general, a normalized string cannot be reverted to its original form.
  • Idempotence:
    As long as the same normalization is used,
    applying the normalization repeatedly doesn’t change the result.
    More formally, normalize(normalize(Input)) = normalize(Input).
    (Earlier Unicode versions had exceptions to this rule.)
  • Substring:
    If a string is normalized, then so are all its substrings.
  • Concatenation:
    Even if two strings are normalized, their concatenation
    might not be normalized.
  • Growth through NFC normalization:
    In rare situations,
    NFC normalization can make a string longer.
  • Surprises: While most NFKC normalizations are quite reasonable,
    you will get unexpected results if you play long enough with the above tool.
    For example, normalizes to 2⁄3, but the latter uses
    the fraction slash
    and not the ASCII slash.
    Similarly, the hyphen
    doesn’t normalize to the ASCII hyphen or minus.
    And while the trademark symbol normalizes to TM,
    the copyright symbol stays the same.
    Depending on your requirements, you may thus want to replace additional characters.
  • Clipboard: It’s not always clear when programs normalize strings, which can lead to subtle bugs.
    For example, when I copy a string to my clipboard with Firefox on macOS,
    the string gets normalized to NFC when pasted into other programs.
    I assume this is due to how Firefox stores text to the clipboard.
    If I copy a string with Chrome, its form is preserved, even when pasted in Firefox.
    This is why I write “depends on your system” next to the no-normalization option in the tool above.
    The tool itself won’t transform the string in this case,
    but the string might have been normalized before it reached the input field.
    Another example is that Chrome used to NFC normalize strings when submitting a form,
    which led to problems for certain languages.
  • Verification: Since you can’t be certain how programs interact with the clipboard,
    you have to do a hex dump
    if you want to verify how text has been stored to a file by a certain program: hexdump -C file.txt.
  • Programming: If you copy 'mañana' === 'mañana' to the JavaScript console of your
    web development tools,
    you get false because 'maxF1ana' !== 'manu0303ana'.
    If you want to prank a friend, replace the ordinary semicolon ;
    with the Greek question mark ; in their
    source code.
    These two problems could be solved by normalizing source code to NFC.
    However, 'hi'.normalize('NFKC') === 'h​i'.normalize('NFKC') would still be false
    because the right h​i contains an invisible zero-width space,
    which is not normalized away even under compatibility equivalence.
    For developers, the complexity of Unicode is quite scary.
    Presumably simple things like counting the number of symbols in a string or reversing a string
    become surprisingly difficult
    – even before considering right-to-left (RTL) text
    and its combination with left-to-right (LTR) text
    into bidirectional (BiDi) text.
  • Backdoors: An attacker can use Unicode to include backdoors
    which cannot be spotted during code review
    unless your code editor warns you about uncommon characters.
    Invisible Unicode characters can be used to introduce
    invisible variables,
    confusable Unicode characters
    in variable names can make conditions pass or fail unexpectedly
    (e.g. the alveolar click character ǃ makes environmentǃ=PRODUCTION
    an assignment
    instead of a comparison),
    Unicode control characters can turn what appears to be a comment
    into source code and vice versa, and so on.
    Since November 2021, the popular code editor Visual Studio Code by Microsoft
    highlights uncommon characters by default.
  • Emojis:
    Modifiers
    change the appearance of the preceding emoji.
    This is how skin tones, hair styles, gender, professions, and families are encoded.
    Click on the following emojis to see what they’re made of:
    👍🏻 ,
    👨🏼‍🦱 ,
    🤷🏽‍♂️ ,
    👩🏾‍🔬 , and
    👨‍👩‍👧‍👦 .
    The zero-width joiner (ZWJ)
    is used to combine characters which also exist separately,
    and the variation selector 16
    with the code point FE0F
    is used to render the preceding character as an emoji rather than as a text symbol.
    For example, u26A0 gives you ,
    whereas u26A0uFE0F gives you ⚠️ .
    Please note that such emojification is not supported by all fonts.
  • Artistic use: Unicode can also be used to change the appearance of ASCII text.
    For example, you can flip text upside down
    or overuse diacritics, which results in so-called
    Zalgo text.
  • Sources: To learn more about normalization,
    you can read the technical report
    and the FAQ by the
    Unicode Consortium.
Unicode case folding

The domain name system has been
case-insensitive since its inception.
This means that if you search for EF1P.com,
you still get the records for ef1p.com.
Furthermore, if I have a DNS record at www.ef1p.com
and a wildcard record at *.ef1p.com,
querying WWW.EF1P.COM returns the former.
Since DNS servers are supposed to preserve the case,
they have to do the case-insensitive comparison of ASCII strings.
(In theory, you’re supposed to get back the domain name
as it’s capitalized in the zone file of the authoritative name server.
In practice, however, many DNS servers use case-insensitive name compression in their responses,
which means that you often get back the domain name as you capitalized it in your query.
Pointing from the answer section to the question section in order to make the DNS response smaller
even if the case doesn’t match is explicitly allowed by the RFC.)
Since DNS servers don’t know about Punycode
and Punycode encodes non-ASCII uppercase and lowercase letters differently,
internationalized domain names have to be case-normalized on the client-side
because users expect that case-insensitivity also applies to internationalized domain names,
such as ÖBB.at and öbb.at.

Unicode distinguishes between case mapping and case folding.
The former maps characters to their lowercase, uppercase,
or titlecase equivalent,
while the latter tries to “remove” the case for case-insensitive comparisons of text.
If uppercase and lowercase letters had a one-to-one correspondence,
we could simply lowercase both strings before comparing them.
Unfortunately, this doesn’t work for Unicode strings
even if we NFKC-normalize both of them
to get rid of ligatures.
(The problem with ligatures is that they often exist only in lowercase,
which means that 'ff'.toUpperCase() === 'FF' and 'ff'.toUpperCase().toLowerCase() !== 'ff'.)
The two examples why lowercasing isn’t enough for Unicode strings are
the German eszett ß and
the Greek sigma ς.
The former existed only in lowercase until 2017,
at which point the capital eszett was officially adopted.
While the capital eszett has already been added to Unicode
with the code point 1E9E in 2008,
the capitalization of ß is still defined as SS:
'ß'.toUpperCase() === 'SS' but 'ẞ'.toLowerCase() === 'ß'.
Therefore, neither x.toUpperCase().toLowerCase() === x nor
x.toLowerCase().toUpperCase() === x is true in general.
The lowercase sigma ς is used only at the end of words.
Within words, σ is used.
Since there is only one uppercase sigma,
both 'ς'.toUpperCase() === 'Σ' and 'σ'.toUpperCase() === 'Σ'.
And since Unicode maps the case of characters without considering their context, 'Σ'.toLowerCase() === 'σ'.
For these reasons, ß is mapped to ss and ς to σ before case-insensitive string comparisons.
Since case folding is guaranteed to be stable,
this won’t change in future Unicode versions.

A few additional remarks:

  • Duplications: In order to keep case operations as context-independent as possible,
    the Latin,
    Greek,
    and Cyrillic scripts
    have separate code points in Unicode
    even for optically identical characters.
    For example, Unicode has a Latin B,
    a Greek Β,
    and a Cyrillic В,
    which map to b,
    β,
    and в.
    While this is great for case operations,
    it’s bad for internationalized domain names.
  • Localization: For some characters, the case mapping still depends on the language.
    This is why JavaScript has a toLocaleLowerCase
    and a toLocaleUpperCase method.
    For example, in the Turkish language,
    'I'.toLocaleLowerCase('tr') === 'ı' and
    'i'.toLocaleUpperCase('tr') === 'İ'.
  • Titlecase: Digraphs,
    such as the used in Eastern European alphabets,
    usually exist in lowercase, uppercase, and
    titlecase.
    For example, Unicode defines dž,
    DŽ, and Dž.
    The Dutch digraph ij, on the other hand,
    is capitalized together, such as in IJsselmeer,
    which is why only ij
    and IJ exist.
    Since digraphs are usually written as two separate characters in practice,
    titlecase algorithms which simply capitalize the first letter get this wrong.
Internationalized domain names (IDNs)

Now that we know what Punycode, Unicode normalization,
and case folding are, we’re finally ready to discuss
internationalized domain names (IDNs).
As you might remember,
domain names consist of labels, which are separated by a dot.
Each label of a domain name is internationalized separately.
In order to distinguish Punycode-encoded labels from ordinary labels,
Punycode-encoded labels are prefixed with xn--.
This is known as the ASCII-Compatible Encoding (ACE) prefix.
A label may be Punycode-encoded only if it contains non-ASCII characters.
This ensures that Punycode-encoded labels never end with a hyphen.
(The preferred name syntax
requires that labels neither start nor end with a hyphen.)
Each Punycode-encoded label may be at most 63 characters long, including the ACE prefix.
If the encoding of a Unicode label is longer, the user input must be rejected.

What makes internationalized domain names even more complicated is that there are two versions: IDNA2003 & IDNA2008.
(IDNA stands for Internationalized Domain Names for Applications.)
IDNA2008 supersedes IDNA2003, which means that IDNA2003 should no longer be used.
Since a lot of the confusion comes from the differences between them, we’ll look at both:

  • IDNA2003: IDNA2003 is specified in RFC 3490.
    It uses the Nameprep profile
    as specified in RFC 3491
    of the Stringprep algorithm, which is specified in RFC 3454.
    Nameprep prepares an arbitrary user input to be encoded with Punycode:

Arbitrary user input Remove certain characters Case fold all characters nfkc-normalize the labels Reject certain characters Encode with Punycode Domain name in ascii

How IDNA2003 normalizes user input.
The normalization fails only if the output contains prohibited characters
or violates the rules for bidirectional text.

  • IDNA2008: IDNA2008 is specified in RFC 5890,
    RFC 5891,
    RFC 5892,
    RFC 5893,
    and RFC 5894.
    Instead of prohibiting certain characters,
    IDNA2008 accepts only characters with specific properties,
    which makes it easier to migrate to newer versions of Unicode.
    (IDNA2003 used Unicode version 3.2 only.)
    Additionally, IDNA2008 no longer specifies how characters are to be mapped,
    it only encourages applications to meet user expectations.
    Removing the mapping of characters from the standard allows applications to map them
    according to the language which is being used.
    Since the IDNA standard is the same around the globe,
    it cannot consider the local context for character mappings.

Arbitrary user input Reject symbols and punctuation marks Lowercase or reject uppercase characters nfkc-normalize or reject non-normalized labels Accept only valid characters Encode with Punycode Domain name in ascii

How IDNA2008 normalizes user input.
The steps in gray are required but not standardized.

So how does IDNA2008 differ from IDNA2003? Let’s look at a few examples:

  • Symbols:
    P≠NP.org was valid under IDNA2003
    but is no longer valid under IDNA2008 since symbols are no longer allowed.
    (Due to the limitations of Punycode,
    P=NP.org, on the other hand, was never valid.)
    Disallowing symbols also prevents attackers from faking URL separators
    in domain names, which is a special variant of a homograph attack.
    For example,
    ef1p.com∕email.article.example,
    which uses a division slash in the domain label com∕email
    under the top-level domain .example,
    was also valid under IDNA2003 but is no longer valid under IDNA2008.
  • Emojis:
    Being a kind of symbol, emojis were allowed in IDNA2003 but are no longer allowed in IDNA2008.
    Since IDNA2003 was limited to Unicode version 3.2, only a tiny subset of emojis could be used,
    namely those which were originally added as text characters
    (mostly in Unicode version 1.1 in 1993)
    and given an emoji presentation in 2010.
    The variation selector 16
    was added to Unicode in version 3.2 to render text symbols as emojis;
    just in time for IDNA2003.
    As a consequence, ❤️.com
    was once valid while 💙.com never was.
    Emojis were intentionally disallowed in IDNA2008 because humans likely confuse different emojis
    even without combining characters,
    such as skin tones and hair styles.
    For example,  
    and ♥️ 
    are two different hearts, where both of them were
    valid under IDNA2003.
  • German eszett ß:
    In IDNA2003, ß was case-folded to ss.
    For example, Gießen.de was transformed to giessen.de before making the DNS lookup.
    Since ß is allowed in IDNA2008, Gießen.de is now transformed to xn--gieen-nqa.de.
  • Greek sigma ς:
    Similarly, ς was case-folded to σ in IDNA2003 but is now allowed in IDNA2008.
    For example, ἑλλάς.gr was transformed to xn--hxa3aa7a0420a.gr in IDNA2003
    and is now transformed to xn--hxa3aa3a0982a.gr in IDNA2008.

Since some characters that were previously removed,
such as the zero-width joiner,
are now allowed in certain contexts
and other characters, such as ß and ς, are no longer mapped,
some internationalized domain names are interpreted differently
under IDNA2008 than under IDNA2003.
These changes require a transition period from IDNA2003 to IDNA2008,
where domain name registries
reserve the newer mapping of an internationalized domain name for the registrant of the older mapping,
bundle different mappings of a new registration,
or block the registration of deviating mappings.
You can read more about compatibility processing of internationalized domain names
in the Unicode Technical Standard 46
and the IDN FAQ.

IDNA2008 validation

Unfortunately, there is no JavaScript library
to validate internationalized domain names.
I’ve approximated the IDNA2008 rules
in the Punycode tool
as follows:
/^[p{Letter}p{Number}][p{Letter}p{Mark}p{Number}p{Join_Control}]*(?:-+[p{Letter}p{Number}][p{Letter}p{Mark}p{Number}p{Join_Control}]*)*(?:.[p{Letter}p{Number}][p{Letter}p{Mark}p{Number}p{Join_Control}]*(?:-+[p{Letter}p{Number}][p{Letter}p{Mark}p{Number}p{Join_Control}]*)*)*$/u.

This regular expression uses Unicode property escapes
and is easier to read as /^LD(?:-+LD)*(?:.LD(?:-+LD)*)*$/u,
where LD is [p{Letter}p{Number}][p{Letter}p{Mark}p{Number}p{Join_Control}]*.
If your input matches the regular expression,
I lowercase your input,
NFKC normalize it, and
make sure that the Unicode normalization has introduced no additional dots,
such as ⒈ → 1..
After Punycode encoding the internationalized domain,
I also check that each label of the domain name consists of at most 63 characters.
If you have a suggestion for how I can improve my validation, let me know.

Homograph attack

Domain names which look identical but resolve to different addresses are a serious security issue.
For example, the lowercase letter l, the uppercase letter I, and the number 1
can easily be mistaken for one another depending on the font,
and so can the capital letter O and the number 0.
While the problem already existed with ASCII-only domain names,
internationalized domain names made the situation considerably worse.
For example, the Latin B,
the Greek Β,
and the Cyrillic В all look the same.
While BBC.com takes you to the website of the
British Broadcasting Corporation (BBC),
ВВС.com takes you to a completely different website.
Deceiving users with optically similar characters in order to obtain sensitive information
is known as a homograph attack.
While phishing cannot be fully eliminated,
such attacks can be mitigated by the client, the registry, and the user:

  • Client:
    Browsers and mail clients should warn the user about suspicious domain names
    and display such domain names in Punycode/ASCII rather than Unicode.
    Domain names are suspicious when they use characters which don’t belong to the user’s preferred language
    or when they mix characters from different scripts.
    Additionally, it’s a good idea to lowercase and normalize domain names before displaying them
    in a font which clearly distinguishes between visually similar characters.
  • Registry:
    Domain name registries should develop registration policies
    for their top-level domains.
    Registries are free to permit characters only from certain
    scripts
    or not to support internationalized domain names at all.
    For example, the Russian top-level domain .рф
    permits only subdomains in the Cyrillic script.
    Registries which allow the use of different scripts should ensure
    that the different scripts cannot be mixed in a single label.
    The Unicode Technical Standard 39
    with its data set
    contains more information about confusable characters.
    On top of this, registries should bundle or block variants of the same word
    as outlined in RFC 4290.
    Wikipedia lists which top-level domains support IDNs
    and which top-level domains are internationalized themselves.
  • User:
    Users should be trained to recognize phishing attempts
    and to always enter the address of important online services themselves instead of following a link.
    In the above example, the fact that ВВС.com looks just like BBC.com is not a problem
    if users enter the perceived address into the address field rather than copying it there.
Email address internationalization (EAI)

So far, we have seen how non-ASCII characters can be encoded in the message body,
in header fields and in domain names.
The only thing that is missing is the internationalization
of the local part of email addresses.
This is achieved by the following RFCs, which extend the email protocols
and the message format to allow Unicode characters encoded in UTF-8 everywhere:

  • RFC 6530
    introduces the framework for internationalized email.
    It explains the problem and
    defines the used terminology.
    Unlike earlier proposals,
    internationalized messages
    are no longer downgraded in transit
    because the local part of an address is to be interpreted
    only by the host specified in the domain part of the address.
    If an intermediary mail server doesn’t support UTF-8,
    the message has to be returned to the sender.
    If an internationalized message shall be delivered to legacy mail servers,
    it has to be downgraded before or during message submission.
    Additionally, the incoming mail server of the recipient may
    downgrade messages after the final delivery
    so that they can be retrieved by legacy mail clients of the recipient (see the points below).
    The RFC recommends that
    incoming mail servers normalize the local part of an email address
    ideally to NFKC but at least to NFC as part of the address normalization.
    Senders, however, should not normalize the addresses of recipients.


    Mailbox providers which provide their service to the general public
    need to be aware that allowing Unicode characters in the local part of email addresses
    makes it easier to impersonate their users with homograph attacks.
    Just as domain name registries,
    public mailbox providers should either restrict the permitted characters to ASCII
    or a single Unicode script.
    Otherwise, they should bundle or block addresses with
    confusable characters.
    Other than domain names, which are case-insensitive,
    mailbox providers may (but should not) distinguish between different addresses
    based on the capitalization of the local part.
    Therefore, mail clients cannot lowercase the local part before displaying it
    even though this would help to tell characters such as capital i and lowercase L apart.
  • RFC 6531
    defines an SMTP extension with the keyword SMTPUTF8.
    If the SMTP server indicates this capability,
    the SMTP client can transfer a UTF-8 message with UTF-8 envelope addresses
    by using the MAIL FROM command with the SMTPUTF8 parameter.
    This RFC also defines additional protocol types,
    which can be used in the with clause of Received header fields.
  • RFC 6532
    extends the syntax rules of RFC 5322
    to allow the use of UTF-8 characters everywhere.
    It also introduces an additional content type
    with the identifier message/global to describe internationalized messages encoded in UTF-8.
  • RFC 6533
    brings UTF-8 to delivery status notifications (DSN), such as non-delivery reports (NDR).
  • RFC 6855
    specifies an IMAP extension
    which allows mail clients to access internationalized messages
    (and to use Unicode characters in folder names).
    The UTF8=ACCEPT capability
    indicates that the IMAP server supports UTF-8 in strings.
    The UTF8=ONLY capability
    indicates that the IMAP server requires UTF-8 support from clients
    because it won’t downgrade internationalized messages for them.
    The UTF8=ONLY capability implies the UTF8=ACCEPT capability
    and clients have to indicate that they can handle UTF-8
    by sending . ENABLE UTF8=ACCEPT to the server.
  • RFC 6856
    specifies a POP3 extension
    to upgrade an ASCII-only session to an UTF-8 session.
    The POP3 server indicates that it supports UTF-8 with the
    UTF8 capability.
    A POP3 client can then enable the UTF-8 mode with the
    UTF8 command.
    This RFC also introduces a LANG capability and command,
    which allows the client to configure a different language for the response texts.
    This can be useful when the client presents error messages from the server directly to the user.
  • RFC 6857
    specifies an advanced downgrading mechanism for internationalized messages.
    POP3 and IMAP servers can use it to convert UTF-8 messages to ASCII-only messages
    before delivering them to mail clients which don’t support UTF-8.
    The conversion is relatively straightforward:
    Everywhere where the Encoded-Word encoding is allowed,
    this encoding is used to encode UTF-8 strings as ASCII strings.
    The Encoded-Word encoding is also used if necessary for
    unknown header fields.
    Internationalized domain names are downgraded
    with the Punycode encoding.
    Email addresses with non-ASCII characters in the local part
    are rewritten by encoding the whole address as an Encoded Word
    and replacing the address with an empty group construct.
    For example, From: José is converted to
    From: =?UTF-8?Q?Jos=C3=A9_?= =?UTF-8?Q?jos=C3=A9=40example=2Ecom?= :;
    thanks to RFC 6854.
    Since this string encodes an empty group instead of an address,
    the recipient cannot reply to such a message without manual intervention.
    RFC 6857 requires the use of UTF-8 as the character encoding
    and RFC 2047 requires that
    the @ symbol and the period are also encoded when the Encoded Word precedes an address.
    If the internationalized email address is part of an address group,
    the whole group is encoded with this technique because groups cannot be nested.
    Header fields in which addresses are used but the group syntax is not allowed
    need to be encapsulated:
    A header field such as Message-Id is replaced with Downgraded-Message-Id
    so that its value can be encoded as an Encoded Word.
    The Received header fields are an exception to this rule:
    Any clauses with non-ASCII characters are simply removed.
    Lastly, the message body is left as is,
    even if the content transfer encoding is 8bit.
  • RFC 6858
    specifies a simpler downgrading mechanism for internationalized messages,
    which accepts the loss of information in favor of an easier implementation.
    Internationalized email addresses are replaced with an
    invalid address,
    such as invalid@internationalized.invalid.
    The original address can optionally be encoded in the display name of the invalid address.
    The subject field is encoded as an Encoded Word,
    and all other header fields with non-ASCII characters are simply removed.
    This RFC also extends IMAP
    so that the server can indicate to the client which messages were downgraded.
    In order to prevent permanent loss of information,
    mail clients shouldn’t remove the internationalized message on the server.
    Automatically removing retrieved messages on the server is especially common
    among POP3 clients.
    Another problem is that clients often cache messages indefinitely.
    Even if the client is upgraded to support internationalized messages,
    it likely still accesses the downgraded messages from the local message store.
    Last but not least, downgrading message header fields invalidates DKIM signatures.

Content type

Now that we can encode arbitrary content,
we need a way to inform the client how to interpret the decoded content.
This is done with the Content-Type header field,
which has the following format:

Content-Type: {Type}/[{Tree}.]{Subtype}[+{Suffix}][; {Parameter}]*

The curly brackets need to be replaced as described below,
the content in the square brackets is optional,
and the asterisk indicates that there may be several parameters.

The content type is also called media type.
IANA maintains a long list
of registered media types.
A content type consists of:

The type, the subtype, and the parameter names are case-insensitive.
RFC 6838 doesn’t specify whether the tree and the suffix are also case-insensitive
but I assume that this is the case.
Whether a parameter value is case sensitive depends on the parameter.
The default content type for emails
is text/plain; charset=us-ascii.
As specified in RFC 1945,
HTTP uses the same header field with the same media types.

Example content types: text/csv, text/html, image/png, image/svg+xml,
image/vnd.adobe.photoshop, audio/mpeg, video/mp4, font/otf,
application/javascript, application/pdf, application/vnd.apple.pages, and application/vnd.ms-excel.

Enriched Text

The ability to send formatted text
was first introduced in 1992 with a content type of text/richtext.
In order to avoid confusion with Microsoft’s Rich Text Format (RTF),
the content type was renamed to text/enriched the following year
and revised again in RFC 1896.
Enriched Text is a
markup language
with HTML-like tags.
Let’s look at a simple example:

MIME-Version: 1.0
Content-Type: text/enriched; charset=us-ascii

Roses are
redred.
An example Enriched Text message.
Click here
to use this example in the ESMTP tool above.

This data format has mostly been superseded by HTML and is not widely supported.
Apple Mail strips all the tags and displays the text without formatting.
Gmail doesn’t recognize the format and offers the option to download the content instead.
Only Thunderbird displays the text with formatting, but it doesn’t support the tag.

HTML emails

Nowadays, most messages are formatted with the Hypertext Markup Language (HTML).
The text/html media type is specified in RFC 2854.
The message from the previous box looks as follows when it is formatted with HTML:

MIME-Version: 1.0
Content-Type: text/html; charset=us-ascii


  
    Roses are
    red.
  

An example HTML message.
Click here
to use this example in the ESMTP tool above.

This example works as intended in Apple Mail, Gmail, and Thunderbird.
We’ll discuss in the next box how to style HTML emails.
For security reasons, mail clients don’t execute sender-provided JavaScript.
Gmail and some other mailbox providers still support dynamic content, though.
Furthermore, HTML messages cause serious privacy issues, which I’ll cover later.

Email styling

HTML is styled with Cascading Style Sheets (CSS).
There are three ways to add CSS to an HTML page:


  
     rel="stylesheet" href="styles.css">
  
  
    Hello,
    World!
  

External CSS: Load an external style sheet with a
element.


  
    
  
  
    Hello,
    World!
  

Internal CSS: Embed the style with a

href="https://en.wikipedia.org/wiki/Roses_Are_Red">Roses are red.