Monitoring

Components to be monitored for Managed Windows

The following components are monitored for Managed Windows:

Component		Threshold/AlarmState	Alarmtype
Workload	Availability (ICMP)	over 10 min.	Critical
RAM	Usage	95% to 100% after 360 min. 85% bis 95% after 90 min.	Critical Warning
CPU	Usage	90% to 100% after 360 min. 85% bis 95% after 90 min.	Critical Warning
*System Drive	Usage	<= 3GB after 25 min. 3GB <= 10GB after 15 min.	Critical Warning

*) Swisscom is only responsible for monitoring OS drives. All application drives are the responsibility of the customer.

In addition, the following Windows services are monitored or configured with desired state:

Service	Display name	Threshold/AlarmState	Alarmtype
RPC Service	RPC Endpoint Mapper Remote Procedure Call (RPC) Locator Remote Procedure Call (RPC)	Not Running	Critical
windows_exporter	windows_exporter	Not Running	Critical
Log Management Agent		Not Running	Warning
Virus Protection Agent		Not Running	Critical
wuauserv	Windows Update	Disabled	Critical
EventLog	Windows Event Log	Not Running	Critical
MpsSvc	Windows Firewall	Not Running	Critical
Netlogon	Netlogon	Not Running	Critical
Schedule	Task Scheduler	Not Running	Critical
VMTools	VMTools	Not Running	Warning
W32Time	Windows Time	Not Running	Critical
WinRM	Windows Remote Management (WS-Management)	Not Running	Critical

The thresholds or services may change depending on their nature.

Components to be monitored for Managed RHEL

The following components are monitored for Managed RHEL servers:

Component	Metric
Workload	Availability
RAM	Usage
CPU	Usage
Mountpoint "/"	Usage
Mountpoint "/boot"	Usage
Mountpoint "/var"	Usage
Mountpoint "/var/log"	Usage
Mountpoint "/var/log/audit"	Usage
Mountpoint "/opt"	Usage
Mountpoint "/usr/local"	Usage
Mountpoint "/tmp"	Usage
Mountpoint "/opt/ds_agent"	Usage

Alarming

If one of the defined threshold values is exceeded, an alert is sent to the Swisscom support organisation. With the Managed OS service, no alerts are sent to the customer or the owner of the VM.

Logging

All relevant events to operation are collected centrally. Examples are given below:

Events:

Rebooting the system
Critical exceedance of threshold values

Metrics:

System availability (Monitoring Agent Heartbeat)
File system load
CPU load
Memory Utilization

Thresholds are defined for these metrics (see chapter Monitoring above), which trigger an alarm if the respective metric is exceeded. System logs and metrics are stored centrally. System logs are delivered via a log forwarder, metrics are collected at 5 minute intervals.

Properties	Description
Restrictions and rules	All log and metering data are assigned to a unique CI. The retention period of the metering raw data is 6 months while the VM exists. This raw data is also deleted 15 days after the VM is deleted. Data from monitoring and metering serve as the basis for the reporting of the SLA.
Log Data	Logs and metrics are collected and stored centrally according to the description of the component.
Reporting	This component itself provides the basis for reporting.

# Monitoring

# Components to be monitored for Managed Windows

# Components to be monitored for Managed RHEL

# Alarming

# Logging

Monitoring

Components to be monitored for Managed Windows

Components to be monitored for Managed RHEL

Alarming

Logging