How to resolve high CPU usage by mail server and large mail queue
First of I am using Plesk Onyx 17.0.17 on Linux using the Postfix mailserver.
So all my websites became dreadfully slow over the weekend. On signing into Plesk I could see 70% CPU usage.
In the health monitor I could see that the Mailserver CPU usage in services was spiking since the websites slowed down.
What follows is how I went about diagnosing the problem and resolving it. For my own sanity as much as yours.
Identifying what is using the CPU
So first off I run the top command on SSH
top
That showed me that "postfix" was going mad for the CPU. So I went back to Plesk to check out the mail queue (Tools & setting > Mail server settings > Mail queue). Here I found over 65k deferred messages. On inspection they were mostly sent from mailto@imf.org. At this point I knew we definitely had a problem with outgoing spam email.
So what now? We know the problem but where is it? This domain isn't hosted with me, so how is that email address the sender? Is it a compromised mailbox or a malicious script that has found its way on to the server?
This is what I done next to diagnose the culprit.
Steps to locate the the script sending the email.
1) Create the file /usr/sbin/sendmail.postfix-wrapper
vi /usr/sbin/sendmail.postfix-wrapper
Then type this to enter insert mode
i
Then put exactly this in the file
#!/bin/sh
(echo X-Additional-Header: $PWD ;cat) | tee -a /var/tmp/mail.send|/usr/sbin/sendmail.postfix-bin "$@"
Lastly hit Escape followed by
:wq
2) Create /var/tmp/mail.send log file and set a+rw permissions. Make the wrapper executable, rename the old sendmail.postfix file, and link it to the new wrapper:
a) Create the log file
touch /var/tmp/mail.send
b) Set a+rw permissions
chmod a+rw /var/tmp/mail.send
c) Make that wrapper executable
chmod a+x /usr/sbin/sendmail.postfix-wrapper
d) Back up the original sendmail.postfix
mv /usr/sbin/sendmail.postfix /usr/sbin/sendmail.postfix-bin
e) Link our sendmail.postfix-wrapper
ln -s /usr/sbin/sendmail.postfix-wrapper /usr/sbin/sendmail.postfix
At this point the mail.send file will be populating with all the information for the currently sent email. Wait for a while (5 mins for my size problem was enough).
f) After enough time has passed to obtain enough data we need to switch back the original sendmail.postfix file. Remove symbolic link when prompted.
rm /usr/sbin/sendmail.postfix
mv /usr/sbin/sendmail.postfix-bin /usr/sbin/sendmail.postfix
At this point everything is working as it originally did.
g) Checkout the mail.send file. There will be a Additional headers that tell you the location of the script that sent the emails. When you've finished with the file
rm /var/tmp/mail.send
You may need to repeat this a few times whilst tracking the problem down. If so, repeat steps: 2a, 2b, 2d, 2e, 2f and 2g as required.
See this for more information on finding out what is sending mail: https://support.plesk.com/hc/en-us/articles/213914405-Many-email-message...
What I learned and did to resolve
From looking at the information contained the in mail.send log file, I was able to see that the emails was 99.99% related to one mailbox. I checked out said mailbox and sure enough there they were. So I changed the password from my users poor "blackcat" password. (Note to self - force more complex passwords). That resolved the problem resolved for me.
However I was poised for the additional header to show me a directory with a script in it outside of qmail. The additional header that is added, is the directory containing the script that sent the email. It would likely have been inside a document root somewhere ie a website with malware. Removing the malicious script and locking that domain down properly would have resolved things too.
Clear the mail queue
Finally I cleared the whole queue with this command.
/usr/local/psa/admin/sbin/mailqueuemng --clean
Give it 20 minutes and the server should be back to normal.