Use HES

Exposed Events

Health events will be published:

immediately on status change
regularly with a frequency (interval) defined by each cloud service

Please read the specific cloud service documentation for more details on the health event frequency.

Health Status Types

The following health status (health_status TODO exact name?) are forwarded to the event topic:

OK: the service instance is functioning normally
ERROR: the service instance is either not running or in a state not matching the SLA

Details for the errors can be collected from the health_description field (TODO exact name?).

Cloud Services

Below is the list of cloud services currently exposing their events:

Cloud Service Name	Description
`Log Exposure`	TODO `ERROR` events are published when the LES Kafka topic is not available for consumption or the log events cannot be forwarded to the Kafka topic due to the Log Exposure Service degradation.
`Managed OS Windows`	TODO
`Managed OS RHEL`	TODO
`Managed Reverse Proxy`	Examples: `OK` if the instance is active and reachable from internet, `WARNING` if health check is not applicable as the instance has been created with internet connection set to false, `ERROR` if the instance is active, but not anymore reachable from internet. Note: During instance provisioning operations no health events will be sent.

Event Schema

The health events forwarded to the topic follow a schema that is described below:

Field	Type	Description
version	string	The version of the event schema. Fixed: `1.0`
timestamp	string	The UTC time of the event in ISO8601 format. Check the specific cloud service documentation about the precise semantics of this time.
id	string	The UUID of this event generated by the cloud service
platform	string	Constant `ESC`
tenant_name	string	The name of the tenant owning this event
business_group_name	string	The name of the business group where the service instance was deployed
cloud_service_name	string	The name of the cloud service that produced this event (see the Cloud Services section)
service_instance.id	string	The UUID of the service instance that produced this event
service_instance.name	string	The name of the service instance that produced this event
service_instance.instance_class_name	string	The type of the service instance that produced this event
health_status	string	One of: `OK`, `ERROR`
health_description	string	Details about this health status

Example:

{
    "version": "1.0",
    "timestamp": "2023-02-14T12:40:00.000Z",
    "id": "958e11dd-a12s-425e-8738-7ba3a83958c6",
    "platform": "ESC",
    "tenant_name": "orion-123",
    "cloud_service_name": "Managed OS RHEL",
    "business_group_name": "marketing",
    "service_instance": {
        "name": "orion1230001",
        "id": "df22bc90-ebdd-4c8e-a051-b088f89f8897",
        "instance_class_name": "Managed RHEL"
    },
    "health_status": "ERROR",
    "health_description": "this is the error description"
}

Health Event Stream

┌─────────────-─┐                                     ┌────────────────┐
│   Service     │          health-orion-123           │    Customer    │
│               ├──────► [ ----------------- ] ◄──────│                │
│ health events │            (Kafka topic)            │  Kafka client  │
└──────────────-┘                                     └────────────────┘

Service health events are exposed using the schema above and published to the tenant's Kafka topic. The topic name is the name of your tenant prefixed with health-.

The customer needs to consume the health events, by configuring a Kafka consumer client on an ESC VM. A list of Kafka clients is provided in below.

Retention

The Kafka topic will retain events for at least 24 hours.

# Use HES

# Exposed Events

# Health Status Types

# Cloud Services

# Event Schema

# Health Event Stream