Monitoring
Components to be monitored for Managed Windows
The following components are monitored for Managed Windows:
Component | Threshold/AlarmState | Alarmtype | |
---|---|---|---|
Workload | Availability (ICMP) | over 10 min. | Critical |
RAM | Usage | 95% to 100% after 360 min. 85% bis 95% after 90 min. | Critical Warning |
CPU | Usage | 90% to 100% after 360 min. 85% bis 95% after 90 min. | Critical Warning |
*System Drive | Usage | <= 3GB after 25 min. 3GB <= 10GB after 15 min. | Critical Warning |
*) Swisscom is only responsible for monitoring OS drives. All application drives are the responsibility of the customer.
In addition, the following Windows services are monitored or configured with desired state:
Service | Display name | Threshold/AlarmState | Alarmtype |
---|---|---|---|
RPC Service | RPC Endpoint Mapper Remote Procedure Call (RPC) Locator Remote Procedure Call (RPC) | Not Running | Critical |
windows_exporter | windows_exporter | Not Running | Critical |
Log Management Agent | Not Running | Warning | |
Virus Protection Agent | Not Running | Critical | |
wuauserv | Windows Update | Disabled | Critical |
EventLog | Windows Event Log | Not Running | Critical |
MpsSvc | Windows Firewall | Not Running | Critical |
Netlogon | Netlogon | Not Running | Critical |
Schedule | Task Scheduler | Not Running | Critical |
VMTools | VMTools | Not Running | Warning |
W32Time | Windows Time | Not Running | Critical |
WinRM | Windows Remote Management (WS-Management) | Not Running | Critical |
The thresholds or services may change depending on their nature.
Components to be monitored for Managed RHEL
The following components are monitored for Managed RHEL servers:
Component | Metric |
---|---|
Workload | Availability |
RAM | Usage |
CPU | Usage |
Mountpoint "/" | Usage |
Mountpoint "/boot" | Usage |
Mountpoint "/var" | Usage |
Mountpoint "/var/log" | Usage |
Mountpoint "/var/log/audit" | Usage |
Mountpoint "/opt" | Usage |
Mountpoint "/usr/local" | Usage |
Mountpoint "/tmp" | Usage |
Mountpoint "/opt/ds_agent" | Usage |
Alarming
If one of the defined threshold values is exceeded, an alert is sent to the Swisscom support organisation. With the Managed OS service, no alerts are sent to the customer or the owner of the VM.
Logging
All relevant events to operation are collected centrally. Examples are given below:
Events:
- Rebooting the system
- Critical exceedance of threshold values
Metrics:
- System availability (Monitoring Agent Heartbeat)
- File system load
- CPU load
- Memory Utilization
Thresholds are defined for these metrics (see chapter Monitoring above), which trigger an alarm if the respective metric is exceeded. System logs and metrics are stored centrally. System logs are delivered via a log forwarder, metrics are collected at 5 minute intervals.
Properties | Description |
---|---|
Restrictions and rules | All log and metering data are assigned to a unique CI. The retention period of the metering raw data is 6 months while the VM exists. This raw data is also deleted 15 days after the VM is deleted. Data from monitoring and metering serve as the basis for the reporting of the SLA. |
Log Data | Logs and metrics are collected and stored centrally according to the description of the component. |
Reporting | This component itself provides the basis for reporting. |