Use HES

Exposed Events

Health events will be published:

  • immediately on status change
  • regularly with a frequency (interval) defined by each cloud service

Please read the specific cloud service documentation for more details on the health event frequency.

Health Status Types

The following health status (health_status TODO exact name?) are forwarded to the event topic:

  • OK: the service instance is functioning normally
  • ERROR: the service instance is either not running or in a state not matching the SLA

Details for the errors can be collected from the health_description field (TODO exact name?).

Cloud Services

Below is the list of cloud services currently exposing their events:

Cloud Service NameDescription
Log ExposureTODO ERROR events are published when the LES Kafka topic is not available for consumption or the log events cannot be forwarded to the Kafka topic due to the Log Exposure Service degradation.
Managed OS WindowsTODO
Managed OS RHELTODO
Managed Reverse ProxyExamples: OK if the instance is active and reachable from internet, WARNING if health check is not applicable as the instance has been created with internet connection set to false, ERROR if the instance is active, but not anymore reachable from internet. Note: During instance provisioning operations no health events will be sent.

Event Schema

The health events forwarded to the topic follow a schema that is described below:

FieldTypeDescription
versionstringThe version of the event schema. Fixed: 1.0
timestampstringThe UTC time of the event in ISO8601 format. Check the specific cloud service documentation about the precise semantics of this time.
idstringThe UUID of this event generated by the cloud service
platformstringConstant ESC
tenant_namestringThe name of the tenant owning this event
business_group_namestringThe name of the business group where the service instance was deployed
cloud_service_namestringThe name of the cloud service that produced this event (see the Cloud Services section)
service_instance.idstringThe UUID of the service instance that produced this event
service_instance.namestringThe name of the service instance that produced this event
service_instance.instance_class_namestringThe type of the service instance that produced this event
health_statusstringOne of: OK, ERROR
health_descriptionstringDetails about this health status

Example:

{
    "version": "1.0",
    "timestamp": "2023-02-14T12:40:00.000Z",
    "id": "958e11dd-a12s-425e-8738-7ba3a83958c6",
    "platform": "ESC",
    "tenant_name": "orion-123",
    "cloud_service_name": "Managed OS RHEL",
    "business_group_name": "marketing",
    "service_instance": {
        "name": "orion1230001",
        "id": "df22bc90-ebdd-4c8e-a051-b088f89f8897",
        "instance_class_name": "Managed RHEL"
    },
    "health_status": "ERROR",
    "health_description": "this is the error description"
}

Health Event Stream

┌─────────────-─┐                                     ┌────────────────┐
│   Service     │          health-orion-123           │    Customer    │
│               ├──────► [ ----------------- ] ◄──────│                │
│ health events │            (Kafka topic)            │  Kafka client  │
└──────────────-┘                                     └────────────────┘

Service health events are exposed using the schema above and published to the tenant's Kafka topic. The topic name is the name of your tenant prefixed with health-.

The customer needs to consume the health events, by configuring a Kafka consumer client on an ESC VM. A list of Kafka clients is provided in below.

Retention

The Kafka topic will retain events for at least 24 hours.

Last Updated: