WATCHDOGD(5) (smm)
WATCHDOGD(5) File Formats Manual (smm) WATCHDOGD(5)

NAME

watchdogd.confwatchdogd configuration file

DESCRIPTION

The default watchdogd(8) use-case does not require a configuration file. However, enabling a health monitor plugin or the process supervisor is done using /etc/watchdogd.conf.
Available monitor plugins are:
supervisor
Process supervisor, monitor the heartbeat of processes
filenr
File descriptor monitor, also covers sockets, and other descriptor based resources
loadavg
CPU load average monitor
meminfo
Memory usage monitor

SYNTAX

This file is a standard UNIX .conf file with sub-sections and '=' for assignment. The '#' character marks start of a comment to end of line, and the '\' character can be used as an escape character. Whitesapce is ignored, unless inside a string.
Warning: do not set the below WDT timeout and kick interval too low. The daemon (usually) runs as a regular ‘SCHED_OTHER’ background task and the monitor plugins (as well as your other services) need CPU time as well.
timeout = SEC
The WDT timeout before reset. Default: 20 sec.
interval = SEC
The kick interval, i.e. how often watchdogd(8) should reset the WDT timer. Default: 10 sec
safe-exit = true | false
With safe-exit enabled (true) the daemon will ask the driver disable the WDT before exiting (SIGINT). However, some WDT drivers (or HW) may not support this. Default: disabled
script = /path/to/reboot-action.sh
Script or command to run instead of reboot when a monitor plugin reaches any of its critical or warning level. Setting this will disable the default reboot action on critical, it is therefore up to the script to perform reboot, if needed. The script is called as:
script.sh {filenr, loadavg, meminfo} {crit, warn} VALUE
    
Health monitor plugins also have their own local script setting.
reset-reason {}
This section controls the reset reason, including the reset counter. By default this is disabled, since not all systems allow writing to disk, e.g. embedded systems using MTD devices with limited number of write cycles.
Another backend can be implemented and linked to the daemon, but make sure to ‘--disable-rcfile’ with the configure script first.
enabled = true | false
Enable or disable storing reset cause, default: disabled
file = /var/lib/watchdogd.state
The default file setting is a non-volatile path, according to the FHS. It can be changed to another location, but make sure that location is writable first.
Note: This section was previously called reset-cause, which is deprecated and may be removed in a future release.

Process Supervisor

supervisor {}
Instrumented processes can have their main loop supervised. Processes subscribe to this service using the libwdog API, see the docs for more on this. When enabled watchdogd switches to ‘SCHED_RR’ with elevated realtime priority. When disabled it runs as a regular ‘SCHED_OTHER’ process.
enabled = true | false
Enable or disable supervisor, default: disabled
priority = NUM
The realtime priority. Default: 98
script = /path/to/script.sh
When a supervised process fails to meet its deadline the supervisor by default performs an unconditional reset, saving the reset cause first. However, if a script is provided in this section it will be called instead:
script.sh supervisor CAUSE PID LABEL
        
The CAUSE value is documented in watchdogctl(1).
The LABEL can be any free form string the supervised process used when registering with the supervisor, hence it is given as the last argument to the script.
The return value of the script determines how the system continues to operate: POSIX OK (0) means the script has handled the situation in some manner and watchdogd stops supervising the offending process, a non-zero return value from script means the script has either failed to handle the situation or prefers to delegate to watchdogd to save the reset cause and perform the actual system reset.
The global script setting does not apply to this section. However, the same script can be used, due to the unique first argument.
IMPORTANT: Calling watchdogctl(1) from the script with the fail command will cause an infinite loop. It is strongly advised to return non-zero from the script instead.

File Descriptor Monitor

filenr {}
Monitors file descriptor leaks based on ‘/proc/sys/fs/file-nr’.
interval = SEC
Poll interval, default: 300 sec
logmark = true | false
Log current stats every poll interval. Default: disabled
warning = LEVEL
High watermark level, alert sent to log.
critical = LEVEL
Critical watermark level, alert sent to log, followed by reboot or script action.
script = /path/to/reboot-action.sh
Optional script to run instead of reboot if critical watermark level is reached. If omitted the global ‘script’ action is used. The scripts are called in the same way as the global script, same arguments.

CPU Load Average Monitor

loadavg {}
Monitors load average based on sysinfo(2) from ‘/proc/loadavg’. The trigger level for warning and critical watermarks is composed from the average of the 1 and 5 min marks.
Note: load average is a blunt instrument and highly use-case dependent. Peak loads of 16.00 on an 8 core system may be responsive and still useful but 2.00 on a 2 core system may be completely bogged down. Read up on the subject and test your system before enabling the critical level.
interval = SEC
Poll interval, default: 300 sec
logmark = true | false
Log current stats every poll interval. Default: disabled
warning = LEVEL
High watermark level, alert sent to log.
critical = LEVEL
Critical watermark level, alert sent to log, followed by reboot or script action.
script = /path/to/reboot-action.sh
Optional script to run instead of reboot if critical watermark level is reached. If omitted the global ‘script’ action is used. The scripts are called in the same way as the global script, same arguments.

Memory Usage Monitor

meminfo {}
Monitors free RAM based on data from ‘/proc/meminfo’.
interval = SEC
Poll interval, default: 300 sec
logmark = true | false
Log current stats every poll interval. Default: disabled
warning = LEVEL
High watermark level, alert sent to log.
critical = LEVEL
Critical watermark level, alert sent to log, followed by reboot or script action.
script = /path/to/reboot-action.sh
Optional script to run instead of reboot if critical watermark level is reached. If omitted the global ‘script’ action is used. The scripts are called in the same way as the global script, same arguments.

Generic Script Monitor

generic {}
Monitor a generic script. Trigger warning and critical actions based on the exit code of the script.
interval = SEC
How often to run the monitor-script, default: 300 sec
timeout = SEC
Maximum amount of seconds monitor-script is allowed to run, default: 300 sec
warning = VAL
High watermark level, alert sent to log if exit status from monitor-script is greater or equal to this value.
critical = VAL
Critical watermark level, alert sent to log, followed by reboot or script action if monitor-script exit status is greater or equal to this value.
monitor-script = /path/to/generic-script.sh
Monitor script to run every interval seconds.
script = /path/to/reboot-action.sh
Optional script to run instead of reboot if critical watermark level is reached. If omitted the global ‘script’ action is used. The scripts are called in the same way as the global script, same arguments.

EXAMPLE

### /etc/watchdogd.conf 
timeout   = 20 
interval  = 10 
safe-exit = false 
 
supervisor { 
    enabled  = false 
    priority = 98 
} 
 
reset-reason { 
    enabled = false 
#    file    = "/var/lib/watchdogd.state" 
} 
 
### Checkers/Monitors ################################################## 
# 
# Script or command to run instead of reboot when a monitor plugin 
# reaches any of its critical or warning level.  Setting this will 
# disable the built-in reboot on critical, it is therefore up to the 
# script to perform reboot, if needed.  The script is called as: 
# 
#    script.sh {filenr, loadavg, meminfo} {crit, warn} VALUE 
# 
#script = "/path/to/reboot-action.sh" 
 
# Monitors file descriptor leaks based on /proc/sys/fs/file-nr 
filenr { 
    interval = 300 
    logmark  = false 
    warning  = 0.9 
    critical = 0.95 
#    script = "/path/to/alt-reboot-action.sh" 
} 
 
# Monitors load average based on sysinfo() from /proc/loadavg 
# The level is composed from the average of the 1 and 5 min marks. 
loadavg { 
    interval = 300 
    logmark  = false 
    warning  = 1.0 
    critical = 2.0 
#    script = "/path/to/alt-reboot-action.sh" 
} 
 
# Monitors free RAM based on data from /proc/meminfo 
meminfo { 
    interval = 300 
    logmark  = false 
    warning  = 0.9 
    critical = 0.95 
#    script = "/path/to/alt-reboot-action.sh" 
} 
 
# Generic site-specific script 
generic { 
    interval = 60 
    timeout = 10 
    warning = 10 
    critical = 100 
    monitor-script = "/path/to/monitor-script.sh" 
#    script = "/path/to/alt-reboot-action.sh" 
}

SEE ALSO

watchdogd(8) watchdoctl(1)

AUTHORS

watchdogd.conf is an improved version of the original, created by Michele d'Amico and adapted to uClinux-dist by Mike Frysinger. It is maintained by Joachim Nilsson at GitHub.
January 10, 2020 Debian