Skip to content

Monitoring

Linux: Monitoring Concepts and Configurations

Service Monitoring

Service Level Indicators (SLIs) are specific metrics such as uptime, response time, or error rates. It is used to measure the performance of a service.

Service Level Objectives (SLOs) are targets to meet based on measurements such as maintaining 99.9 percent uptime.

Service Level Agreements (SLAs) is a formal promises to customers or stakeholders outlining expected level of service and consequences if expectations are not met.

Network Monitoring

Network monitoring is the process of keeping track of devices like routers, switches, and servers to make sure everything is running properly.

SNMP - Simple Network Monitoring Protocol

SNMP allows devices to report performance data using a structure called MIB, or Management Information Base. The MIB acts as a built-in database that defines everything that can be monitored on a device, including CPU load, memory usage, and network interface status.

The MIB contains Object Identifier (OID). and OID is a unique number used to locate and retrieve specific information.

SNMP Traps are automatic alerts triggered by specific events like hardware failure or dropped network connections.

Agent-agent vs Agentless Monitoring

Agent-based monitoring uses a software on the monitored device to collect monitored information. SNMP is an agent-based monitoring tool.

An agentless monitoring collects data using existing remote access protocols without requiring any additional software installation on the monitored devices. On Windows systems, protocols like Windows Management Instrumentation allow similar agentless access.

Event-driven Data Collection

Health Checks

Health checks allow systems to automatically test whether a service is running and responding as expected.

# checks if a web service returns a success response
curl -I http://localhost

# check if a systemd service is up and running
systemctl is-active ssh

Webhooks

Webhooks are often used for realtime integrations between services.

Log Aggregation

Log aggregations is the collection of logs from across the network and storing them in a central location.

Event Management

Logging

Logging provides the raw data needed to understand what is happening across a system. Logs are typically stored in the directory /var/log/ and includes files like syslog, auth.log, dmesg, and more.

SIEM Security Information and Event Management System. It collects and analyzes logs from across the network to help identify security threats, system issues, and unusual activity in real time.

Events

Events are generated when specific patterns or conditions are detected in the log data that indicate something noteworthy has happened.

Alerting and Notifications

Notifications

Notifications are how a Linux admin is informed when the system detects that something may require attention. They can be sent via Email, Text Messages, Desktop pop-ups, ticketing system or collaboration platforms.

Alerts

Alerts are the system's internal triggers that causes te notifications to be sent.