Spring Boot 심화: Actuator Health와 Kubernetes Probe 연계 운영

1) 버전 기준

Spring Boot 릴리즈 기준: v4.1.0-M1 (GitHub Releases)
공식 문서 기준: Actuator Endpoints, Kubernetes Probes(운영 가이드)

2) 핵심 개념

Spring Boot Actuator는 애플리케이션 상태를 관측 가능한 엔드포인트로 노출합니다. 공식 레퍼런스는 health 그룹과 상태 매핑을 통해 liveness/readiness 용도를 분리하도록 안내합니다.

/actuator/health는 종합 상태를 제공합니다.
환경에 따라 liveness/readiness 그룹을 분리해 쿠버네티스 probe와 연결할 수 있습니다.
외부 의존성(DB, 메시지 브로커) 장애를 readiness 쪽에 반영하면 무분별한 재시작보다 트래픽 차단을 우선할 수 있습니다.

3) 트레이드오프

모든 의존성을 liveness에 포함: 장애 감지는 빠르지만 불필요한 재시작이 늘어날 수 있습니다.
의존성을 readiness 중심으로 분리: 재시작 폭증을 줄이지만, 복구 판단 로직 설계가 필요합니다.
health detail 과다 노출: 운영 디버깅에는 유리하지만 보안 노출 면이 커질 수 있습니다.

4) 장애 재현-해결

재현 시나리오

DB 일시 장애를 발생시킵니다.
애플리케이션이 DB health를 liveness와 readiness 모두에 동일 반영하도록 설정되어 있습니다.
쿠버네티스가 Pod를 반복 재시작하여 전체 지연이 커집니다.

해결 절차

Actuator health 그룹을 liveness/readiness로 분리합니다.
외부 의존성 상태는 readiness 그룹 위주로 반영합니다.
쿠버네티스 probe를 해당 엔드포인트로 재매핑합니다.
재시작 횟수, 서비스 엔드포인트 편입/제외 동작을 배포 전 검증합니다.

5) 체크리스트

[ ] liveness/readiness health 그룹이 분리되어 있는가?
[ ] DB/브로커 장애가 readiness 중심으로 반영되는가?
[ ] health detail 노출 범위(권한/네트워크)가 제한되어 있는가?
[ ] probe 실패 임계값과 타임아웃이 실제 시작시간에 맞는가?
[ ] 장애 리허설(DB down) 시 재시작 폭증이 없는가?

6) 공식 링크

Spring Boot Actuator Reference: https://docs.spring.io/spring-boot/reference/actuator/index.html
Spring Boot Kubernetes Probes: https://docs.spring.io/spring-boot/reference/actuator/endpoints.html#actuator.endpoints.kubernetes-probes
Spring Boot Releases: https://github.com/spring-projects/spring-boot/releases

7) 실전: Actuator 엔드포인트 커스텀 설정

운영 환경에서는 모든 엔드포인트를 노출하지 않고, 필요한 것만 선별적으로 열어야 합니다. 특히 env, beans, configprops는 민감 정보를 포함할 수 있습니다.

# application.yml — 운영 환경 권장 설정
management:
  server:
    port: 9090                    # 메인 포트와 분리
  endpoints:
    web:
      exposure:
        include: health, info, metrics, prometheus
      base-path: /actuator
  endpoint:
    health:
      show-details: when-authorized  # 인증된 요청만 상세 정보
      show-components: when-authorized
      probes:
        enabled: true              # K8s liveness/readiness 엔드포인트 활성화
    info:
      enabled: true
  health:
    diskspace:
      enabled: true
      threshold: 1GB
    db:
      enabled: true
    redis:
      enabled: true

8) 커스텀 Health Indicator 구현

외부 API 연동, 메시지 큐, 캐시 등 애플리케이션 고유의 의존성에 대한 헬스체크를 추가하면 장애 원인 파악이 빨라집니다.

import org.springframework.boot.actuate.health.Health;
import org.springframework.boot.actuate.health.HealthIndicator;
import org.springframework.stereotype.Component;
import org.springframework.web.client.RestTemplate;

@Component
public class PaymentGatewayHealthIndicator implements HealthIndicator {

    private final RestTemplate restTemplate;

    public PaymentGatewayHealthIndicator(RestTemplate restTemplate) {
        this.restTemplate = restTemplate;
    }

    @Override
    public Health health() {
        try {
            var response = restTemplate.getForEntity(
                "https://api.payment.com/health", String.class
            );
            if (response.getStatusCode().is2xxSuccessful()) {
                return Health.up()
                    .withDetail("service", "payment-gateway")
                    .withDetail("responseTime", response.getHeaders().getDate())
                    .build();
            }
            return Health.down()
                .withDetail("status", response.getStatusCodeValue())
                .build();
        } catch (Exception e) {
            return Health.down()
                .withException(e)
                .withDetail("service", "payment-gateway")
                .build();
        }
    }
}

9) Prometheus + Grafana 연동

Actuator의 /actuator/prometheus 엔드포인트를 활성화하면 Micrometer가 수집한 메트릭을 Prometheus 형식으로 노출합니다.

<!-- pom.xml -->
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

<!-- Prometheus scrape config -->
# prometheus.yml
scrape_configs:
  - job_name: 'spring-boot'
    metrics_path: '/actuator/prometheus'
    static_configs:
      - targets: ['app-service:9090']
    scrape_interval: 15s

10) 관련 글

Spring Actuator 운영 심화 — 커스텀 메트릭과 Micrometer 태그 설계까지 다루는 심화 가이드입니다.
Spring Boot + Kubernetes 가이드 — Actuator Probe를 K8s에서 활용하는 전체 흐름을 정리합니다.
Prometheus + Grafana K8s 가이드 — Actuator 메트릭을 Prometheus로 수집하고 Grafana 대시보드를 구성합니다.