Problem
When a LaunchAgent fails silently (e.g., script path breaks after a repo rename), there's no notification — the job just stops running and nobody notices for days. Currently the only visibility is manually checking log files in ~/.local/log/.
Proposed Solution: Hybrid Approach
1. Wrapper script with failure notification (scripts/run-with-notify.sh)
- Wraps all scheduled jobs; on non-zero exit, sends a macOS notification via
osascript
- Plist templates updated to run jobs through the wrapper
- Handles the common case: script ran but failed
2. Lightweight watchdog (scripts/launchd-watchdog.sh)
- Runs every 4 hours via its own LaunchAgent
- Checks if each monitored job's log file was modified in the last 36 hours
- Sends macOS notification for stale (never-ran) jobs
- Handles the edge case the wrapper can't: job never executed (broken path, plist unloaded)
Scope
- Monitor ns-bootstrap's own agents only (
com.ns-bootstrap.update-daily, com.ns-bootstrap.update-interactive)
- Config file at
~/.config/ns-bootstrap/watchdog-jobs.conf allows adding labels later, but don't design a multi-project framework upfront
- macOS only initially; Ubuntu can defer since systemd has built-in
OnFailure= support
Files to Add/Modify
scripts/
├── run-with-notify.sh # NEW: wrapper with failure notification
├── launchd-watchdog.sh # NEW: checks jobs ran recently
├── launchd/
│ ├── com.ns-bootstrap.update-daily.plist.template # MODIFY: use wrapper
│ ├── com.ns-bootstrap.update-interactive.plist.template # MODIFY: use wrapper
│ └── com.ns-bootstrap.watchdog.plist.template # NEW: runs watchdog every 4h
install/
│ └── bootstrap.sh # MODIFY: generate default watchdog config
Why hybrid over pure watchdog?
- The wrapper catches 90% of failures (runtime errors) with near-zero complexity (~15 lines)
- The watchdog catches the remaining edge case ("job never ran") by checking log freshness — no
launchctl list parsing needed
- A pure watchdog that only polls
launchctl list has its own silent-failure problem (turtles all the way down)
Alternatives Considered
| Approach |
Catches runtime failures |
Catches "never ran" |
Complexity |
| Watchdog only (polling) |
Yes |
Yes |
Medium — extra plist, config file, launchctl parsing |
| Wrapper only |
Yes |
No |
Low — ~15 lines |
| Hybrid (recommended) |
Yes |
Yes |
Low — two simple scripts |
Problem
When a LaunchAgent fails silently (e.g., script path breaks after a repo rename), there's no notification — the job just stops running and nobody notices for days. Currently the only visibility is manually checking log files in
~/.local/log/.Proposed Solution: Hybrid Approach
1. Wrapper script with failure notification (
scripts/run-with-notify.sh)osascript2. Lightweight watchdog (
scripts/launchd-watchdog.sh)Scope
com.ns-bootstrap.update-daily,com.ns-bootstrap.update-interactive)~/.config/ns-bootstrap/watchdog-jobs.confallows adding labels later, but don't design a multi-project framework upfrontOnFailure=supportFiles to Add/Modify
Why hybrid over pure watchdog?
launchctl listparsing neededlaunchctl listhas its own silent-failure problem (turtles all the way down)Alternatives Considered