Use Health Exposure

Health events will be published by cloud services:

  • immediately on status change
  • regularly with a frequency (interval) defined by each cloud service

Please read the specific cloud service documentation for more details on the health event frequency.

Health Status Types

The following health statuses are forwarded to the event topic:

  • OK: the service instance is functioning normally
  • WARNING: the component/service instance is usable, but there are problems, which the customer of the service can resolve.
  • DEGRADED: The component/service instance is usable, but the service promise is not completely fulfilled.
  • ERROR: the service instance is either not running or in a state not matching the SLA

Event Schema

The health events forwarded to the topic follow a schema that is described below:

FieldTypeDescription
versionstringThe version of the event schema. Fixed: 1.0
timestampstringThe UTC time of the event in ISO8601 format. Check the specific cloud service documentation about the precise semantics of this time.
idstringThe UUID of this event generated by the cloud service
platformstringConstant ESC
tenant_namestringThe name of the tenant owning this event
business_group_namestringThe name of the business group where the service instance was deployed
cloud_service_namestringThe name of the cloud service that produced this event (see the Cloud Services section)
service_instance.idstringThe UUID of the service instance that produced this event
service_instance.namestringThe name of the service instance that produced this event
service_instance.instance_class_namestringThe type of the service instance that produced this event
health.statusstringOne of: OK, WARNING, DEGRADED, ERROR
health.descriptionstringDetails about the health status

Example:

{
    "version": "1.0",
    "timestamp": "2023-02-14T12:40:00.000Z",
    "id": "958e11dd-a12s-425e-8738-7ba3a83958c6",
    "platform": "ESC",
    "tenant_name": "orion-123",
    "cloud_service_name": "Managed OS RHEL",
    "business_group_name": "marketing",
    "service_instance": {
        "name": "orion1230001",
        "id": "df22bc90-ebdd-4c8e-a051-b088f89f8897",
        "instance_class_name": "Managed RHEL"
    },
    "health_status": "ERROR",
    "health_description": "this is the error description"
}

Health Event Stream

┌─────────────-─┐                                     ┌────────────────┐
│   Service     │          health-orion-123           │    Customer    │
│               ├──────► [ ----------------- ] ◄──────│                │
│ health events │            (Kafka topic)            │  Kafka client  │
└──────────────-┘                                     └────────────────┘

Service health events are exposed using the schema above and published to the tenant's Kafka topic. The topic name is the name of your tenant prefixed with health-.

The customer needs to consume the health events, by configuring a Kafka consumer client configuring a Kafka consumer client on an ESC VM. A list of Kafka clients is provided in below.

Service Monitoring:

The service produces health events as specified by the Health Exposure Service with a frequency of 1 hour.

Retention

The Kafka topic will retain events for at least 24 hours.

Last Updated: