Hello. Today I wanted to tell you something about a problem I have recently encountered with Postfix mail server when sending mails through it. I have a mail server running on virtual machine on my company’s quite powerful server. It was always working great, but recently I have stumbled upon a problem with mails with attachments sent from my account transferring really slow. All other services, including mail received by IMAP service was running o.k. and with nominal transfer speeds.
At first I didn’t notice that the problem existed, because I wasn’t sending any mails with big attachments at all. However today, when I wanted to send an attachment weighting few megabytes I have seen that it’s transfer to the SMTP Postfix server is really slow. I decided to investigate the cause.
So the first thing I did was to launch the best network diagnosis tool: Wireshark of course. I have noticed that the Postfix mail server is replying with TCP error packets with “TCP Zero Window Size” set. After encountering those packets and observing that they occur repeatedly over the course of mail transfer I immediately understood that there are three possible causes:
- Application or server buffers are full and the TCP/IP stack is signaling it cannot receive any further data for now, so it’s sending “TCP Zero Window” to postpone client sending until the buffers are emptied.
- Server is out of network sockets, which would mean that mail server is under heavy load, signaling probable Denial Of Service attack.
- Or the application – in this case the Postfix mail server – is running very slow for some reason.
After logging into mail server I have checked “htop” for signs of overload. Situation seemed normal as the server had 2% CPU usage, and less than 1% memory usage. I have assumed that since physical loads on the server were negligible this must not be the cause. However, I still haven’t rooted out the I/O load on the server, but left it for checking later. Checking current physical loads confirmed that buffers are probably not full, but I couldn’t dismiss it for now.
The next step was to ensure that the mail server wasn’t under DOS attack. Quickly checking “netstat” confirmed that indeed there are too few connections to the SMTP service so the network socket pool would be exhausted by the system. Especially when taking into account that IMAP service was working nominally.
So the last thing to check was the Postfix mail server application itself. Since “htop” didn’t tell me it’s overloaded I have began checking if the mail queues are not overloaded. I have issued “mailq” and then “qshape.pl” commands, which told me that mail queues are almost empty. So they were not the cause of the problem.
There was one last thing to check which was the I/O load on the Postfix application. I have issued “iotop” command and observed it for a while. It seemed that one of the Postfix child processes was generating unusually high amount of I/O writes to disk. So I have issued “tail -f /var/log/mail.info” to watch for any errors and immediately I have found the cause for this problem. Postfix was writing heavy amounts of debugging data to the mail log.
Now I knew that I must have screwed the configuration of /etc/postfix/main.cf. Quickly browsing through the configuration file told me that I had set “debug_peer_level = 3” and “debug_peer_list” option value had been set to my home computer IP address. Setting debug_peer_level back to 2, commenting out “debug_peer_list” and restarting Postfix resolved the problem.
It seemed that I have forgotten to remove debugging options within the Postfix server configuration when I had to debug some other issue with mail server previously. 🙂
Anyway. I hope this little tutorial may some day help someone. ’till the next time folks!