Linux: Monitoring Concepts and Configurations
Service Monitoring
Service Level Indicators (SLIs)
are specific metrics such as uptime, response time, or error rates. It is used to measure the performance of a service.
Service Level Objectives (SLOs)
are targets to meet based on measurements such as maintaining 99.9 percent uptime.
Service Level Agreements (SLAs)
is a formal promises to customers or stakeholders outlining expected level of service and consequences if expectations are not met.
Network Monitoring
Network monitoring is the process of keeping track of devices like routers, switches, and servers to make sure everything is running properly.
SNMP - Simple Network Monitoring Protocol
SNMP allows devices to report performance data using a structure called MIB, or Management Information Base
. The MIB acts as a built-in database that defines everything that can be monitored on a device, including CPU load, memory usage, and network interface status.
The MIB contains Object Identifier (OID)
. and OID is a unique number used to locate and retrieve specific information.
SNMP Traps
are automatic alerts triggered by specific events like hardware failure or dropped network connections.
Agent-agent vs Agentless Monitoring
Agent-based monitoring uses a software on the monitored device to collect monitored information. SNMP is an agent-based monitoring tool.
An agentless monitoring collects data using existing remote access protocols without requiring any additional software installation on the monitored devices. On Windows systems, protocols like Windows Management Instrumentation
allow similar agentless access.
Event-driven Data Collection
Health Checks
Health checks allow systems to automatically test whether a service is running and responding as expected.
# checks if a web service returns a success response
curl -I http://localhost
# check if a systemd service is up and running
systemctl is-active ssh
Webhooks
Webhooks are often used for realtime integrations between services.
Log Aggregation
Log aggregations is the collection of logs from across the network and storing them in a central location.
Event Management
Logging
Logging provides the raw data needed to understand what is happening across a system. Logs are typically stored in the directory /var/log/
and includes files like syslog
, auth.log
, dmesg
, and more.
SIEM
Security Information and Event Management System. It collects and analyzes logs from across the network to help identify security threats, system issues, and unusual activity in real time.
Events
Events are generated when specific patterns or conditions are detected in the log data that indicate something noteworthy has happened.
Alerting and Notifications
Notifications
Notifications are how a Linux admin is informed when the system detects that something may require attention. They can be sent via Email, Text Messages, Desktop pop-ups, ticketing system or collaboration platforms.
Alerts
Alerts are the system's internal triggers that causes te notifications to be sent.