WATCHDOGD(5) (smm)
WATCHDOGD(5) File Formats Manual (smm) WATCHDOGD(5)

watchdogd.conf
watchdogd configuration file

The default watchdogd(8) use-case does not require a configuration file. However, enabling a health monitor plugin or the process supervisor is done using /etc/watchdogd.conf.
Available monitor plugins are:
Process supervisor, monitor the heartbeat of processes
File descriptor monitor, also covers sockets, and other descriptor based resources
CPU load average monitor
Memory usage monitor

This file is a standard UNIX .conf file with sub-sections and '=' for assignment. The '#' character marks start of a comment to end of line, and the '\' character can be used as an escape character. Whitesapce is ignored, unless inside a string.
Warning: do not set the below WDT timeout and kick interval too low. The daemon (usually) runs as a regular ‘SCHED_OTHER’ background task and the monitor plugins (as well as your other services) need CPU time as well.
SEC
The WDT timeout before reset. Default: 20 sec.
SEC
The kick interval, i.e. how often watchdogd(8) should reset the WDT timer. Default: 10 sec
true | false
With safe-exit enabled (true) the daemon will ask the driver disable the WDT before exiting (SIGINT). However, some WDT drivers (or HW) may not support this. Default: disabled
/path/to/reboot-action.sh
Script or command to run instead of reboot when a monitor plugin reaches any of its critical or warning level. Setting this will disable the default reboot action on critical, it is therefore up to the script to perform reboot, if needed. The script is called as:
script.sh {filenr, loadavg, meminfo} {crit, warn} VALUE
    
Health monitor plugins also have their own local script setting.
{}
This section controls the reset reason, including the reset counter. By default this is disabled, since not all systems allow writing to disk, e.g. embedded systems using MTD devices with limited number of write cycles.
Another backend can be implemented and linked to the daemon, but make sure to ‘--disable-rcfile’ with the configure script first.
true | false
Enable or disable storing reset cause, default: disabled
/var/lib/watchdogd.state
The default file setting is a non-volatile path, according to the FHS. It can be changed to another location, but make sure that location is writable first.
Note: This section was previously called reset-cause, which is deprecated and may be removed in a future release.

{}
Instrumented processes can have their main loop supervised. Processes subscribe to this service using the libwdog API, see the docs for more on this. When enabled watchdogd switches to ‘SCHED_RR’ with elevated realtime priority. When disabled it runs as a regular ‘SCHED_OTHER’ process.
true | false
Enable or disable supervisor, default: disabled
NUM
The realtime priority. Default: 98
/path/to/script.sh
When a supervised process fails to meet its deadline the supervisor by default performs an unconditional reset, saving the reset cause first. However, if a script is provided in this section it will be called instead:
script.sh supervisor CAUSE PID LABEL
        
The CAUSE value is documented in watchdogctl(1).
The LABEL can be any free form string the supervised process used when registering with the supervisor, hence it is given as the last argument to the script.
The return value of the script determines how the system continues to operate: POSIX OK (0) means the script has handled the situation in some manner and watchdogd stops supervising the offending process, a non-zero return value from script means the script has either failed to handle the situation or prefers to delegate to watchdogd to save the reset cause and perform the actual system reset.
The global script setting does not apply to this section. However, the same script can be used, due to the unique first argument.
IMPORTANT: Calling watchdogctl(1) from the script with the fail command will cause an infinite loop. It is strongly advised to return non-zero from the script instead.

{}
Monitors file descriptor leaks based on ‘/proc/sys/fs/file-nr’.
true | false
Enable or disable plugin, default: disabled
SEC
Poll interval, default: 300 sec
true | false
Log current stats every poll interval. Default: disabled
LEVEL
High watermark level, alert sent to log.
LEVEL
Critical watermark level, alert sent to log, followed by reboot or script action.
/path/to/reboot-action.sh
Optional script to run instead of reboot if critical watermark level is reached. If omitted the global ‘script’ action is used. The scripts are called in the same way as the global script, same arguments.

{}
Monitors load average based on sysinfo(2) from ‘/proc/loadavg’. The trigger level for warning and critical watermarks is composed from the average of the 1 and 5 min marks.
Note: load average is a blunt instrument and highly use-case dependent. Peak loads of 16.00 on an 8 core system may be responsive and still useful but 2.00 on a 2 core system may be completely bogged down. Read up on the subject and test your system before enabling the critical level.
true | false
Enable or disable plugin, default: disabled
SEC
Poll interval, default: 300 sec
true | false
Log current stats every poll interval. Default: disabled
LEVEL
High watermark level, alert sent to log.
LEVEL
Critical watermark level, alert sent to log, followed by reboot or script action.
/path/to/reboot-action.sh
Optional script to run instead of reboot if critical watermark level is reached. If omitted the global ‘script’ action is used. The scripts are called in the same way as the global script, same arguments.

{}
Monitors free RAM based on data from ‘/proc/meminfo’.
true | false
Enable or disable plugin, default: disabled
SEC
Poll interval, default: 300 sec
true | false
Log current stats every poll interval. Default: disabled
LEVEL
High watermark level, alert sent to log.
LEVEL
Critical watermark level, alert sent to log, followed by reboot or script action.
/path/to/reboot-action.sh
Optional script to run instead of reboot if critical watermark level is reached. If omitted the global ‘script’ action is used. The scripts are called in the same way as the global script, same arguments.

{}
Monitor a generic script. Trigger warning and critical actions based on the exit code of the script.
true | false
Enable or disable plugin, default: disabled
SEC
How often to run the monitor-script, default: 300 sec
SEC
Maximum amount of seconds monitor-script is allowed to run, default: 300 sec
VAL
High watermark level, alert sent to log if exit status from monitor-script is greater or equal to this value.
VAL
Critical watermark level, alert sent to log, followed by reboot or script action if monitor-script exit status is greater or equal to this value.
/path/to/generic-script.sh
Monitor script to run every interval seconds.
/path/to/reboot-action.sh
Optional script to run instead of reboot if critical watermark level is reached. If omitted the global ‘script’ action is used. The scripts are called in the same way as the global script, same arguments.

### /etc/watchdogd.conf 
timeout   = 20 
interval  = 10 
safe-exit = false 
 
supervisor { 
    enabled  = true 
    priority = 98 
} 
 
reset-reason { 
    enabled = true 
    file    = "/var/lib/watchdogd.state" 
} 
 
### Checkers/Monitors ################################################## 
# 
# Script or command to run instead of reboot when a monitor plugin 
# reaches any of its critical or warning level.  Setting this will 
# disable the built-in reboot on critical, it is therefore up to the 
# script to perform reboot, if needed.  The script is called as: 
# 
#    script.sh {filenr, loadavg, meminfo} {crit, warn} VALUE 
# 
#script = "/path/to/reboot-action.sh" 
 
# Monitors file descriptor leaks based on /proc/sys/fs/file-nr 
filenr { 
    enabled  = true 
    interval = 300 
    logmark  = false 
    warning  = 0.9 
    critical = 0.95 
#    script = "/path/to/alt-reboot-action.sh" 
} 
 
# Monitors load average based on sysinfo() from /proc/loadavg 
# The level is composed from the average of the 1 and 5 min marks. 
loadavg { 
    enabled  = true 
    interval = 300 
    logmark  = false 
    warning  = 1.0 
    critical = 2.0 
#    script = "/path/to/alt-reboot-action.sh" 
} 
 
# Monitors free RAM based on data from /proc/meminfo 
meminfo { 
    enabled  = true 
    interval = 300 
    logmark  = false 
    warning  = 0.9 
    critical = 0.95 
#    script = "/path/to/alt-reboot-action.sh" 
} 
 
# Generic site-specific script 
generic { 
    enabled  = true 
    interval = 60 
    timeout = 10 
    warning = 10 
    critical = 100 
    monitor-script = "/path/to/monitor-script.sh" 
#    script = "/path/to/alt-reboot-action.sh" 
}

watchdogd(8) watchdoctl(1)

watchdogd.conf is an improved version of the original, created by Michele d'Amico and adapted to uClinux-dist by Mike Frysinger. It is maintained by Joachim Wiberg at GitHub.
April 19, 2021 Debian