Use Health Exposure
Health events will be published by cloud services:
- immediately on status change
- regularly with a frequency (interval) defined by each cloud service
Please read the specific cloud service documentation for more details on the health event frequency.
Health Status Types
The following health statuses are forwarded to the event topic:
OK
: the service instance is functioning normallyWARNING
: the component/service instance is usable, but there are problems, which the customer of the service can resolve.DEGRADED
: The component/service instance is usable, but the service promise is not completely fulfilled.ERROR
: the service instance is either not running or in a state not matching the SLA
Event Schema
The health events forwarded to the topic follow a schema that is described below:
Field | Type | Description |
---|---|---|
version | string | The version of the event schema. Fixed: 1.0 |
timestamp | string | The UTC time of the event in ISO8601 format. Check the specific cloud service documentation about the precise semantics of this time. |
id | string | The UUID of this event generated by the cloud service |
platform | string | Constant ESC |
tenant_name | string | The name of the tenant owning this event |
business_group_name | string | The name of the business group where the service instance was deployed |
cloud_service_name | string | The name of the cloud service that produced this event (see the Cloud Services section) |
service_instance.id | string | The UUID of the service instance that produced this event |
service_instance.name | string | The name of the service instance that produced this event |
service_instance.instance_class_name | string | The type of the service instance that produced this event |
health.status | string | One of: OK , WARNING , DEGRADED , ERROR |
health.description | string | Details about the health status |
Example:
{
"version": "1.0",
"timestamp": "2023-02-14T12:40:00.000Z",
"id": "958e11dd-a12s-425e-8738-7ba3a83958c6",
"platform": "ESC",
"tenant_name": "orion-123",
"cloud_service_name": "Managed OS RHEL",
"business_group_name": "marketing",
"service_instance": {
"name": "orion1230001",
"id": "df22bc90-ebdd-4c8e-a051-b088f89f8897",
"instance_class_name": "Managed RHEL"
},
"health_status": "ERROR",
"health_description": "this is the error description"
}
Health Event Stream
┌─────────────-─┐ ┌────────────────┐
│ Service │ health-orion-123 │ Customer │
│ ├──────► [ ----------------- ] ◄──────│ │
│ health events │ (Kafka topic) │ Kafka client │
└──────────────-┘ └────────────────┘
Service health events are exposed using the schema above and published to the tenant's Kafka topic. The topic name is the name of your tenant prefixed with health-
.
The customer needs to consume the health events, by configuring a Kafka consumer client configuring a Kafka consumer client on an ESC VM. A list of Kafka clients is provided in below.
Service Monitoring:
The service produces health events as specified by the Health Exposure Service with a frequency of 1 hour.
Retention
The Kafka topic will retain events for at least 24 hours.