Use HES
Exposed Events
Health events will be published:
- immediately on status change
- regularly with a frequency (interval) defined by each cloud service
Please read the specific cloud service documentation for more details on the health event frequency.
Health Status Types
The following health status (health_status
TODO exact name?) are forwarded to the event topic:
OK
: the service instance is functioning normallyERROR
: the service instance is either not running or in a state not matching the SLA
Details for the errors can be collected from the health_description
field (TODO exact name?).
Cloud Services
Below is the list of cloud services currently exposing their events:
Cloud Service Name | Description |
---|---|
Log Exposure | TODO ERROR events are published when the LES Kafka topic is not available for consumption or the log events cannot be forwarded to the Kafka topic due to the Log Exposure Service degradation. |
Managed OS Windows | TODO |
Managed OS RHEL | TODO |
Managed Reverse Proxy | Examples: OK if the instance is active and reachable from internet, WARNING if health check is not applicable as the instance has been created with internet connection set to false, ERROR if the instance is active, but not anymore reachable from internet. Note: During instance provisioning operations no health events will be sent. |
Event Schema
The health events forwarded to the topic follow a schema that is described below:
Field | Type | Description |
---|---|---|
version | string | The version of the event schema. Fixed: 1.0 |
timestamp | string | The UTC time of the event in ISO8601 format. Check the specific cloud service documentation about the precise semantics of this time. |
id | string | The UUID of this event generated by the cloud service |
platform | string | Constant ESC |
tenant_name | string | The name of the tenant owning this event |
business_group_name | string | The name of the business group where the service instance was deployed |
cloud_service_name | string | The name of the cloud service that produced this event (see the Cloud Services section) |
service_instance.id | string | The UUID of the service instance that produced this event |
service_instance.name | string | The name of the service instance that produced this event |
service_instance.instance_class_name | string | The type of the service instance that produced this event |
health_status | string | One of: OK , ERROR |
health_description | string | Details about this health status |
Example:
{
"version": "1.0",
"timestamp": "2023-02-14T12:40:00.000Z",
"id": "958e11dd-a12s-425e-8738-7ba3a83958c6",
"platform": "ESC",
"tenant_name": "orion-123",
"cloud_service_name": "Managed OS RHEL",
"business_group_name": "marketing",
"service_instance": {
"name": "orion1230001",
"id": "df22bc90-ebdd-4c8e-a051-b088f89f8897",
"instance_class_name": "Managed RHEL"
},
"health_status": "ERROR",
"health_description": "this is the error description"
}
Health Event Stream
┌─────────────-─┐ ┌────────────────┐
│ Service │ health-orion-123 │ Customer │
│ ├──────► [ ----------------- ] ◄──────│ │
│ health events │ (Kafka topic) │ Kafka client │
└──────────────-┘ └────────────────┘
Service health events are exposed using the schema above and published to the tenant's Kafka topic. The topic name is the name of your tenant prefixed with health-
.
The customer needs to consume the health events, by configuring a Kafka consumer client on an ESC VM. A list of Kafka clients is provided in below.
Retention
The Kafka topic will retain events for at least 24 hours.