NAME

watchdogd —

Advanced system & process monitor daemon

SYNOPSIS

watchdogd

[

-hnsVx

] [

-f FILE

] [

-l LEVEL

] [

-t SEC

] [

-T SEC

] [

/dev/watchdogN

]

DESCRIPTION

watchdogd is an advanced system and process supervisor daemon, primarily intended for embedded Linux and server systems. It can monitor critical system resources, supervise the heartbeat of processes, record any deadline transgressions, and initiate a controlled reset if needed. All while taking care of "kicking" one or more watchdog device nodes.

Available system monitors, see watchdogd.conf(5) for details:

File descriptor leaks
File system usage
Generic script
Load average
Memory leaks
Process live locks
Reset counter, e.g., for snmpEngineBoots (RFC 2574)
Temperature

When the system starts up, watchdogd determines the reset cause by querying the kernel. In case of system reset, and not power loss, the reset reason is available already in a file, stored by watchdogd before the reset. This reset reason can be then be used by an operator or network management system (NMS) to put the system in an operational safe state, or non-operational safe state. Use watchdogctl(1) to query status and control watchdogd.

WATCHDOG

A watchdog timer (WDT) is something most motherboards of laptops and servers, and virtually all embedded systems, today are equipped with. It is basically a small timer connected to the reset circuitry so that it can reset the board when the timer expires. It is up to the software to ensure it never does.

The Linux kernel provides a common userspace interface /dev/watchdog, created automatically when the appropriate driver module is loaded. If your board does not have a WDT the kernel provides a "softdog" module which could be good enough.

The idea is to have a process in userspace that runs in the background of your system, with the sole purpose of making sure the HW timer never expires by "kicking" it periodically. In case of system overload, when there is no more CPU time for the process to run, it fails to "kick" the kernel WDT driver, which in turn causes the WDT to reset the system.

OPTIONS

Without any arguments, watchdogd opens the /dev/watchdog WDT device node, forks to the background, tries to set a 20 sec WDT timeout, and then kicks the WDT every 10 sec. See OPERATION for more information.

watchdogd follows the usual UNIX command line syntax, with long options starting with two dashes (`-'). The options are as follows:

-f, --config FILE: Use FILE for daemon configuration. Default: /etc/watchdogd.conf
-h, --help: Show summary of command line options and exit.
-l, --loglevel LEVEL: Set log level: none, err, info, notice, debug.
-n, --foreground: Start in foreground, required when started by systemd or Finit, default is to daemonize and background.
-s, --syslog: Use syslog(3) for log messages, warnings and error conditions. This is the default when running in the background. When running in the foreground, see -n, log messages are printed to stderr.
-t, --interval SEC: HW watchdog (WDT) kick interval, in seconds, default: 10
-T, --timeout SEC: HW watchdog timer (WDT) timeout, in seconds, default: 20
-V, --version: Show program version and exit.
-x, --safe-exit: Disable HW watchdog (WDT) on orderly exit from watchdogd. Not supported in all WDT drivers due to HW limitations. Some drivers emulate support by keeping an in-kernel thread to continue kicking the WDT. Make sure to try it first, or verify the WDT driver source code.

Example

watchdogd -T 120 -t 30 /dev/watchdog2

OPERATION

By default, watchdogd forks off a daemon in the background, opens the /dev/watchdog device, attempts to set the default WDT timeout to 20 seconds, and then enters its main loop where it kicks the watchdog every 10 seconds.

If a WDT device driver does not support setting the timeout, watchdogd attempts to query the actual (possibly hard coded) watchdog timeout and then uses half that time as the kick interval.

When watchdogd backgrounds itself, syslog is implicitly used for all informational and debug messages. If a user requests to run the daemon in the foreground watchdogd will also log to stderr, unless the user gives the --syslog option to force use of syslog.

See watchdogd.conf(5) for all available settings, and the command line tool watchdogctl(1) to enable more features, query status, and control operation.

SIGNALS

watchdogd responds to the following signals:

TERM: Safe exit if started with the -x flag, otherwise same as PWR.
INT: Same as TERM
PWR: Force a system reboot. Systems with Finit use this to reboot.
QUIT: Same as TERM
HUP: Reload configuration file
USR1: Ignored, was used in an earlier version.
USR2: Same as USR1

FILES

/etc/watchdogd.conf: Daemon configuration file. Read once when starting up and on SIGHUP or reload command from watchdogctl(1).
/var/lib/misc/watchdogd.state: State pre boot, lists coming (re)boot reason. Do not rely on the contents of this file, it is used by watchdogd to maintain state across boots. If you want the status and reset reason of the last boot, read /run/watchdogd/status instead, or preferbly, use watchdogctl(1).
/run/watchdogd/pid: For convenience to other processes when sending signals. Also a useful synchronization point, because the PID file is only created when watchdogd is ready to receive signals and register processes with the process supervisor API. Touched as a response to SIGHUP or reload command.
/run/watchdogd/status: Current status, in JSON format, contains kernel WDT reset cause, watchdogd timeout and period, and the reset reason watchdogd determined from this boot. Please note, output format has changed to JSON since v4.0. It now shows all configured devices and their status, including capability flags.
/run/watchdogd/sock: UNIX domain socket used by libwdog and watchdogctl to connect to watchdogd

AUTHORS

watchdogd is an improved version of the original, created by Michele d'Amico and adapted to uClinux-dist by Mike Frysinger. It is maintained by Joachim Wiberg at GitHub.