watchdogd
—
Advanced system & process monitor daemon
watchdogd |
[ -hnsVx ]
[-f
FILE ]
[-l
LEVEL ]
[-t
SEC ]
[-T
SEC ]
[/dev/watchdogN ] |
watchdogd
is an advanced system and process
supervisor daemon, primarily intended for embedded Linux and server systems.
It can monitor critical system resources, supervise the heartbeat of
processes, record any deadline transgressions, and initiate a controlled reset
if needed. All while taking care of "kicking" one or more watchdog
device nodes.
Available system monitors, see
watchdogd.conf(5)
for details:
- File descriptor leaks
- File system usage
- Generic script
- Load average
- Memory leaks
- Process live locks
- Reset counter, e.g., for snmpEngineBoots (RFC 2574)
- Temperature
When the system starts up,
watchdogd
determines the
reset cause by querying the
kernel. In case of system reset, and not power loss, the
reset reason is available already in a file,
stored by
watchdogd
before the reset. This
reset reason can be then be used by an operator or network management system
(NMS) to put the system in an operational safe state, or non-operational safe
state. Use
watchdogctl(1)
to query status and control
watchdogd
.
A watchdog timer (WDT) is something most motherboards of laptops and servers,
and virtually all embedded systems, today are equipped with. It is basically a
small timer connected to the reset circuitry so that it can reset the board
when the timer expires. It is up to the software to ensure it never does.
The Linux kernel provides a common userspace interface
/dev/watchdog, created automatically when
the appropriate driver module is loaded. If your board does not have a WDT the
kernel provides a "softdog" module which could be good enough.
The idea is to have a process in userspace that runs in the background of your
system, with the sole purpose of making sure the HW timer never expires by
"kicking" it periodically. In case of system overload, when there is
no more CPU time for the process to run, it fails to "kick" the
kernel WDT driver, which in turn causes the WDT to reset the system.
Without any arguments,
watchdogd
opens the
/dev/watchdog WDT device node, forks to the
background, tries to set a 20 sec WDT timeout, and then kicks the WDT every 10
sec. See
OPERATION for more
information.
watchdogd
follows the usual UNIX command line
syntax, with long options starting with two dashes (`-'). The options are as
follows:
-f,
--config
FILE
- Use FILE for daemon configuration. Default:
/etc/watchdogd.conf
-h,
--help
- Show summary of command line options and exit.
-l,
--loglevel
LEVEL
- Set log level: none, err, info, notice,
debug.
-n,
--foreground
- Start in foreground, required when started by systemd or Finit, default is
to daemonize and background.
-s,
--syslog
- Use syslog(3)
for log messages, warnings and error conditions. This is the default when
running in the background. When running in the foreground, see
-n
, log messages are printed to
stderr.
-t,
--interval
SEC
- HW watchdog (WDT) kick interval, in seconds, default: 10
-T,
--timeout
SEC
- HW watchdog timer (WDT) timeout, in seconds, default: 20
-V,
--version
- Show program version and exit.
-x,
--safe-exit
- Disable HW watchdog (WDT) on orderly exit from
watchdogd
. Not supported in all WDT
drivers due to HW limitations. Some drivers emulate support by keeping an
in-kernel thread to continue kicking the WDT. Make sure to try it first,
or verify the WDT driver source code.
watchdogd -T 120 -t 30 /dev/watchdog2
By default,
watchdogd
forks off a daemon in
the background, opens the
/dev/watchdog
device, attempts to set the default WDT timeout to 20 seconds, and then enters
its main loop where it kicks the watchdog every 10 seconds.
If a WDT device driver does not support setting the timeout,
watchdogd
attempts to query the actual
(possibly hard coded) watchdog timeout and then uses half that time as the
kick interval.
When
watchdogd
backgrounds itself, syslog is
implicitly used for all informational and debug messages. If a user requests
to run the daemon in the foreground
watchdogd
will also log to stderr, unless
the user gives the
--syslog
option to force
use of syslog.
See
watchdogd.conf(5)
for all available settings, and the command line tool
watchdogctl(1)
to enable more features, query status, and control operation.
watchdogd
responds to the following signals:
- TERM
- Safe exit if started with the
-x
flag,
otherwise same as PWR.
- INT
- Same as TERM
- PWR
- Force a system reboot. Systems with Finit use this to reboot.
- QUIT
- Same as TERM
- HUP
- Reload configuration file
- USR1
- Ignored, was used in an earlier version.
- USR2
- Same as USR1
- /etc/watchdogd.conf
- Daemon configuration file. Read once when starting up and on SIGHUP or
reload
command from
watchdogctl(1).
- /var/lib/misc/watchdogd.state
- State pre boot, lists coming (re)boot reason. Do not rely on the contents
of this file, it is used by
watchdogd
to maintain state across boots. If you want the status and reset reason of
the last boot, read
/run/watchdogd/status instead, or
preferbly, use
watchdogctl(1).
- /run/watchdogd/pid
- For convenience to other processes when sending signals. Also a useful
synchronization point, because the PID file is only created when
watchdogd
is ready to receive signals
and register processes with the process supervisor API. Touched as a
response to SIGHUP or reload
command.
- /run/watchdogd/status
- Current status, in JSON format, contains kernel WDT
reset cause,
watchdogd
timeout and period, and the
reset reason
watchdogd
determined from this boot.
Please note, output format has changed to JSON since v4.0. It now shows
all configured devices and their status, including capability flags.
- /run/watchdogd/sock
- UNIX domain socket used by libwdog and
watchdogctl
to connect to
watchdogd
watchdogctl(1)
watchdogd.conf(5)
watchdogd
is an improved version of the
original, created by Michele d'Amico and adapted to uClinux-dist by Mike
Frysinger. It is maintained by Joachim Wiberg at
GitHub.