Monitoring

Components to be monitored for Managed Windows

The following components are monitored for Managed Windows:

ComponentThreshold/AlarmStateAlarmtype
WorkloadAvailability (ICMP)over 10 min.Critical
RAMUsage95% to 100% after 360 min.
85% bis 95% after 90 min.
Critical
Warning
CPUUsage90% to 100% after 360 min.
85% bis 95% after 90 min.
Critical
Warning
*System DriveUsage<= 3GB after 25 min.
3GB <= 10GB after 15 min.
Critical
Warning

*) Swisscom is only responsible for monitoring OS drives. All application drives are the responsibility of the customer.

In addition, the following Windows services are monitored or configured with desired state:

ServiceDisplay nameThreshold/AlarmStateAlarmtype
RPC ServiceRPC Endpoint Mapper
Remote Procedure Call (RPC) Locator
Remote Procedure Call (RPC)
Not RunningCritical
windows_exporterwindows_exporterNot RunningCritical
Log Management AgentNot RunningWarning
Virus Protection AgentNot RunningCritical
wuauservWindows UpdateDisabledCritical
EventLogWindows Event LogNot RunningCritical
MpsSvcWindows FirewallNot RunningCritical
NetlogonNetlogonNot RunningCritical
ScheduleTask SchedulerNot RunningCritical
VMToolsVMToolsNot RunningWarning
W32TimeWindows TimeNot RunningCritical
WinRMWindows Remote Management (WS-Management)Not RunningCritical

The thresholds or services may change depending on their nature.

Components to be monitored for Managed RHEL

The following components are monitored for Managed RHEL servers:

ComponentMetric
WorkloadAvailability
RAMUsage
CPUUsage
Mountpoint "/"Usage
Mountpoint "/boot"Usage
Mountpoint "/var"Usage
Mountpoint "/var/log"Usage
Mountpoint "/var/log/audit"Usage
Mountpoint "/opt"Usage
Mountpoint "/usr/local"Usage
Mountpoint "/tmp"Usage
Mountpoint "/opt/ds_agent"Usage

Alarming

If one of the defined threshold values is exceeded, an alert is sent to the Swisscom support organisation. With the Managed OS service, no alerts are sent to the customer or the owner of the VM.

Logging

All relevant events to operation are collected centrally. Examples are given below:

Events:

  • Rebooting the system
  • Critical exceedance of threshold values

Metrics:

  • System availability (Monitoring Agent Heartbeat)
  • File system load
  • CPU load
  • Memory Utilization

Thresholds are defined for these metrics (see chapter Monitoring above), which trigger an alarm if the respective metric is exceeded. System logs and metrics are stored centrally. System logs are delivered via a log forwarder, metrics are collected at 5 minute intervals.

PropertiesDescription
Restrictions and rulesAll log and metering data are assigned to a unique CI.
The retention period of the metering raw data is 6 months while the VM exists. This raw data is also deleted 15 days after the VM is deleted.
Data from monitoring and metering serve as the basis for the reporting of the SLA.
Log DataLogs and metrics are collected and stored centrally according to the description of the component.
ReportingThis component itself provides the basis for reporting.
Last Updated: