Libraries for Player Health Checks

Description of the Feature
At the moment it is quite difficult to monitor the "health" of a player. In particular, if a player takes too long to process the events and calculate the commands, it enters a "delusional state" and believes itself to be in rounds long past. How does the player realize that that it is "behind", that events are missing and what can it do to prevent robots from sending meaningless or send meaningless or incorrect commands? The commands it sends can be successful "by chance". This also includes the option of querying the player state via defined endpoints in the distributed system. via defined endpoints in the distributed system. The aim of this feature is to develop a library in each of the player languages (Java, Kotlin, Typescript, Rust, Python) that can be used to monitor the health of a player.
Author
Omar Fourati
Implemented During
WASP II "Microservices und Event-getriebene Architektur (MEA), SS2024, as topic (P1)
Status
ongoing
Repositories
Last Update
2025-02-04

Libraries for Player Health Checks

Monitoring the health of a player, especially in distributed systems, is a critical task. The current challenge lies in detecting when a player has fallen “behind,” i.e., when it fails to process game events and calculate commands in real-time. If not monitored, the player can enter a “delusional state,” where it incorrectly assumes itself to be operating in past rounds, thus sending inaccurate or outdated commands. These erroneous commands might succeed “by chance,” but their long-term impact can destabilize the game or cause malfunctions in its logic.

To mitigate this, we are implementing a library in each of the main player languages—Java, Kotlin, TypeScript, Rust, and Python—to facilitate health monitoring. Each library will provide a set of tools and endpoints to monitor and query the player’s state, ensuring timely detection of discrepancies between the player’s internal state and the current game state.

Key Objectives

  1. Detecting Delays and Missing Events:
  • The system will automatically check whether a player has processed all game events up to the current round. If the player is lagging behind in event processing, it will be flagged as “behind.”
  1. Preventing Erroneous Command Execution:
  • By analyzing the player’s internal state and comparing it with the game’s current round, the system can determine if the commands the player intends to send are valid or likely to cause errors due to being outdated.
  1. Health Check Endpoints:
  • Each player will expose health check endpoints within the distributed system, allowing external services to query its status. These endpoints will return whether the player is behind and provide additional diagnostic information if necessary.

Current Implementation in Java

We have updated the implementation of the health check mechanism in the Java player to include the following features:

  • Actuator-Based Health Check for Liveness and Readiness: The player now uses the Spring Boot Actuator /actuator/health endpoint for both Kubernetes Liveness and Readiness probes. This ensures that Kubernetes can monitor the basic health status of the player, restarting it if necessary.

  • Scheduled Round Check Mechanism: Using Spring Boot’s scheduler (@Scheduled), the player automatically fires a /roundCheck request at regular intervals to determine if the player is “behind” the current game round. This check compares the player’s last processed round with the current round retrieved from the game server. If the player is behind, a warning is logged for further analysis.

  • Round Comparison Logic: The health check logic compares the current game round (retrieved from the game server) with the player’s last processed round. If the player’s round is less than the current game round, it is considered “behind.” This information is logged for further analysis.

Logging:

The system utilizes logging to track when a player falls behind and when discrepancies between the player state and the game state are detected. This information can be critical for debugging and ensuring the overall stability of the game environment.

Future Work

In addition to the Java implementation, similar libraries will be developed for the other player languages—Kotlin, TypeScript, Rust, and Python. Each library will follow the same fundamental design principles, ensuring consistency across the different implementations, while being adapted to the unique features and idioms of each language.

Additional Resources

Game Status Game Status On Platform Player Health Check New Game Endpoint Player Logs Round Check Endpoint Round Check Endpoint2

Last Update: 2024-07-30