How to monitor if an algo is alive or has crashed?

When you are deploying an algo live with real money, there are several events that you need to monitor in order to avoid disasters. What if you manage your stops in your code and the algo crashes? If this happens and you don’t notice that your algo has crashed, your open positions can run into substantial losses without your algo monitoring them. To prevent this from happening, I use a simple bash script and two tools: Pushover and crontab. This is unix specific, you may translate this to Windows with powershell and Tasks.

In my algos I persist the algo state into a json file that gets updated very frequently. If that is not the case, you can monitor a log file or an specific watchdog file that your algo updates frequently. If this file hasn’t changed in a specified time, you use Pushover to send a notification to your mobile phone.

#!/bin/sh

# Input file
FILE=/your/algo/path/your_algo_data.json
# How many seconds before file is deemed "older"
OLDTIME=30
# Get current and file times
CURTIME=$(date +%s)
FILETIME=$(stat $FILE -c %Y)
TIMEDIFF=$(expr $CURTIME - $FILETIME)

# Check if file older
if [ $TIMEDIFF -gt $OLDTIME ]; then
curl -s -F "user=YOUR_PUSHOVER_USERKEY" -F "token=YOUR_PUSHOVER_TOKEN" -F "title=Algorithm Stopped?" -F "message=Check algo state. $FILE has not been updated in a while." https://api.pushover.net/1/messages.json
fi

Then you only need to schedule the execution of this script in cron in the range of days/hours and frequency required:

*/5 * * * 1-4  /usr/local/bin/checkalgo.sh
*/5 0-22 * * 5  /usr/local/bin/checkalgo.sh

Ant the magic is done!

I do also have Pushover notifications from within the algo code in order to receive order fills and other important events like daily performance reports. But this allows you to check from outside your code if the algo is alive.