Docker Health Check 운영 심화 – 테오의 저장소

Docker Health Check란?

컨테이너가 “Running” 상태라고 해서 애플리케이션이 정상 동작하는 것은 아니다. 프로세스는 살아 있지만 DB 연결이 끊어지거나, 메모리 누수로 응답이 안 되는 상황이 흔하다. Docker의 HEALTHCHECK는 컨테이너 내부에서 주기적으로 애플리케이션 상태를 확인하여, 실제로 서비스가 가능한지 판단한다.

HEALTHCHECK 기본 문법

# Dockerfile
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 
  CMD curl -f http://localhost:3000/health || exit 1

옵션	기본값	설명
`--interval`	30s	체크 간격
`--timeout`	30s	응답 대기 시간 (초과 시 실패)
`--start-period`	0s	시작 유예 기간 (이 동안 실패 무시)
`--retries`	3	연속 실패 횟수 → unhealthy

컨테이너 상태 3단계

starting: start-period 내, 헬스체크 결과 무시
healthy: 헬스체크 성공
unhealthy: retries만큼 연속 실패

애플리케이션별 Health Check 패턴

Node.js / NestJS

# curl 없는 경량 이미지용 (node로 직접 체크)
HEALTHCHECK --interval=30s --timeout=5s --start-period=15s --retries=3 
  CMD node -e "require('http').get('http://localhost:3000/health', (r) => { process.exit(r.statusCode === 200 ? 0 : 1) }).on('error', () => process.exit(1))"

NestJS Terminus와 조합하면 DB, Redis, 디스크 등 세부 의존성까지 체크할 수 있다.

Spring Boot

# Actuator health endpoint 활용
HEALTHCHECK --interval=30s --timeout=5s --start-period=60s --retries=3 
  CMD curl -f http://localhost:8080/actuator/health || exit 1

Spring Boot는 시작이 느리므로 start-period를 넉넉히(60s+) 설정하라. Actuator의 /actuator/health는 DB, 디스크, 외부 서비스 상태를 종합 판단한다.

PostgreSQL

HEALTHCHECK --interval=10s --timeout=5s --retries=5 
  CMD pg_isready -U postgres -d mydb || exit 1

Redis

HEALTHCHECK --interval=10s --timeout=3s --retries=3 
  CMD redis-cli ping | grep -q PONG || exit 1

Nginx

HEALTHCHECK --interval=30s --timeout=3s --retries=3 
  CMD curl -f http://localhost/nginx-health || exit 1

# nginx.conf에 health 경로 추가
# location /nginx-health { return 200 'ok'; add_header Content-Type text/plain; }

Docker Compose에서 Health Check

# docker-compose.yml
services:
  db:
    image: postgres:16
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 30s

  redis:
    image: redis:7-alpine
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 3s
      retries: 3

  api:
    build: .
    depends_on:
      db:
        condition: service_healthy   # DB가 healthy일 때만 시작
      redis:
        condition: service_healthy   # Redis가 healthy일 때만 시작
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 5s
      retries: 3
      start_period: 15s

  nginx:
    image: nginx:alpine
    depends_on:
      api:
        condition: service_healthy   # API가 healthy일 때만 시작

핵심: depends_on에 condition: service_healthy를 사용하면, 의존 서비스가 실제로 준비될 때까지 기다린다. 단순 depends_on은 컨테이너 시작만 보장하지 서비스 준비를 보장하지 않는다.

시작 순서 제어: 실전 패턴

# 완전한 의존성 체인
services:
  db:
    healthcheck: ...

  migration:
    depends_on:
      db:
        condition: service_healthy
    command: npx prisma migrate deploy
    # 마이그레이션은 한 번만 실행
    restart: "no"

  api:
    depends_on:
      db:
        condition: service_healthy
      migration:
        condition: service_completed_successfully  # 마이그레이션 완료 후
    healthcheck: ...

service_completed_successfully는 일회성 작업(마이그레이션, 시드 등)이 성공적으로 종료된 후 다음 서비스를 시작한다.

curl 없는 이미지에서 Health Check

Distroless나 scratch 이미지에는 curl이 없다. 대안:

# 방법 1: wget (Alpine 기본 포함)
HEALTHCHECK CMD wget -qO- http://localhost:3000/health || exit 1

# 방법 2: 언어 런타임 활용
# Node.js
HEALTHCHECK CMD node -e "..."

# Python
HEALTHCHECK CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')"

# 방법 3: 전용 바이너리 (Go로 빌드)
# /healthcheck 바이너리를 멀티스테이지 빌드로 포함
COPY --from=builder /healthcheck /healthcheck
HEALTHCHECK CMD ["/healthcheck"]

unhealthy 시 자동 재시작

Docker 자체는 unhealthy 컨테이너를 자동 재시작하지 않는다. Docker Swarm이나 autoheal을 사용해야 한다:

# docker-autoheal로 자동 재시작
services:
  autoheal:
    image: willfarrell/autoheal
    restart: always
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    environment:
      AUTOHEAL_CONTAINER_LABEL: all  # 모든 컨테이너 대상
      AUTOHEAL_INTERVAL: 30          # 30초마다 체크
      AUTOHEAL_START_PERIOD: 60      # 60초 유예

디버깅

# Health 상태 확인
docker inspect --format='{{json .State.Health}}' my-container | jq

# 출력 예시:
# {
#   "Status": "unhealthy",
#   "FailingStreak": 5,
#   "Log": [
#     {
#       "Start": "2026-03-17T00:00:00Z",
#       "End": "2026-03-17T00:00:05Z",
#       "ExitCode": 1,
#       "Output": "curl: (7) Failed to connect"
#     }
#   ]
# }

# healthy 컨테이너만 필터
docker ps --filter health=healthy

# unhealthy 컨테이너 찾기
docker ps --filter health=unhealthy

K8s Probe와의 관계

Kubernetes 환경에서는 Docker HEALTHCHECK 대신 K8s Probe(liveness/readiness/startup)를 사용한다. K8s는 Docker HEALTHCHECK를 무시한다. 다만 로컬 개발(Docker Compose)과 프로덕션(K8s) 모두에서 동일한 /health 엔드포인트를 사용하도록 설계하면, 양쪽에서 재활용할 수 있다.

정리

Docker HEALTHCHECK는 컨테이너의 실제 서비스 가능 상태를 판단하는 핵심 메커니즘이다. depends_on: condition: service_healthy와 조합하면 서비스 시작 순서를 안전하게 제어할 수 있다. 애플리케이션마다 적절한 체크 명령과 타이밍을 설정하고, autoheal 등으로 자동 복구까지 구성하면 안정적인 컨테이너 운영이 가능하다.