K8s Custom Metrics HPA 심화 – 테오의 저장소

왜 Custom Metrics HPA인가?

기본 HPA는 CPU와 메모리만으로 스케일링을 결정한다. 하지만 실제 서비스에서는 요청 대기 큐 길이, 초당 요청 수(RPS), 메시지 큐 적체량, 응답 지연 시간 등 비즈니스 메트릭이 스케일링의 더 정확한 기준이 된다. Custom Metrics HPA는 Prometheus 등 외부 메트릭 시스템의 지표를 기반으로 Pod를 자동 확장한다.

이 글에서는 Prometheus Adapter 설치, Custom Metrics API 연동, 다양한 메트릭 타입별 HPA 설정, External Metrics까지 실전 운영 수준으로 다룬다.

아키텍처: Metrics Pipeline

┌──────────────┐     ┌───────────────┐     ┌──────────────────────┐
│  Application │────▶│  Prometheus   │────▶│  Prometheus Adapter  │
│  (메트릭 노출) │     │  (수집·저장)    │     │  (Custom Metrics API)│
└──────────────┘     └───────────────┘     └──────────┬───────────┘
                                                       │
                                           ┌───────────▼───────────┐
                                           │  HPA Controller       │
                                           │  (스케일링 결정)        │
                                           └───────────────────────┘

HPA는 Kubernetes Metrics API를 통해 메트릭을 조회한다. 세 가지 API가 있다:

API	용도	예시
`metrics.k8s.io`	CPU/메모리 (기본)	Metrics Server 제공
`custom.metrics.k8s.io`	K8s 오브젝트 연관 커스텀 메트릭	Pod별 RPS, 큐 길이
`external.metrics.k8s.io`	K8s 외부 시스템 메트릭	SQS 큐 메시지 수, RDS CPU

Prometheus Adapter 설치

helm repo add prometheus-community 
  https://prometheus-community.github.io/helm-charts

helm install prometheus-adapter prometheus-community/prometheus-adapter 
  --namespace monitoring 
  --values adapter-values.yaml

# adapter-values.yaml
prometheus:
  url: http://prometheus-server.monitoring.svc
  port: 9090

rules:
  default: false
  custom:
    # Rule 1: HTTP 요청률 (Pod별)
    - seriesQuery: 'http_requests_total{namespace!="",pod!=""}'
      resources:
        overrides:
          namespace: {resource: "namespace"}
          pod: {resource: "pod"}
      name:
        matches: "^(.*)_total$"
        as: "${1}_per_second"
      metricsQuery: 'rate(<<.Series>>{<<.LabelMatchers>>}[2m])'

    # Rule 2: 요청 대기 큐 길이
    - seriesQuery: 'http_request_queue_length{namespace!="",pod!=""}'
      resources:
        overrides:
          namespace: {resource: "namespace"}
          pod: {resource: "pod"}
      name:
        as: "http_request_queue_length"
      metricsQuery: '<<.Series>>{<<.LabelMatchers>>}'

    # Rule 3: 응답 시간 P95
    - seriesQuery: 'http_request_duration_seconds_bucket{namespace!="",pod!=""}'
      resources:
        overrides:
          namespace: {resource: "namespace"}
          pod: {resource: "pod"}
      name:
        as: "http_request_duration_p95"
      metricsQuery: 'histogram_quantile(0.95, rate(<<.Series>>{<<.LabelMatchers>>}[5m]))'

  external:
    # External Rule: SQS 큐 메시지 수
    - seriesQuery: 'aws_sqs_approximate_number_of_messages_visible'
      resources: {}
      name:
        as: "sqs_messages_visible"
      metricsQuery: '<<.Series>>{<<.LabelMatchers>>}'

커스텀 메트릭 확인

# Custom Metrics API 조회
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq .

# 특정 메트릭 조회
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/production/pods/*/http_requests_per_second" | jq .

# External Metrics 조회
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/production/sqs_messages_visible" | jq .

HPA: RPS 기반 스케일링

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-server-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 2
  maxReplicas: 50
  metrics:
    # 1. CPU도 함께 사용 (멀티 메트릭)
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

    # 2. Custom Metric: Pod당 RPS
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: "100"  # Pod당 100 RPS 목표

  behavior:
    scaleUp:
      stabilizationWindowSeconds: 30
      policies:
        - type: Percent
          value: 100        # 최대 2배까지 스케일업
          periodSeconds: 60
        - type: Pods
          value: 10          # 또는 최대 10개씩
          periodSeconds: 60
      selectPolicy: Max
    scaleDown:
      stabilizationWindowSeconds: 300  # 5분 안정화
      policies:
        - type: Percent
          value: 10          # 10%씩 천천히 축소
          periodSeconds: 60

HPA: 큐 길이 기반 스케일링

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: worker-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: queue-worker
  minReplicas: 1
  maxReplicas: 30
  metrics:
    # Object 타입: 특정 K8s 오브젝트의 메트릭 참조
    - type: Object
      object:
        describedObject:
          apiVersion: v1
          kind: Service
          name: queue-worker
        metric:
          name: http_request_queue_length
        target:
          type: Value
          value: "50"  # 큐 길이 50 이하 유지

  behavior:
    scaleUp:
      stabilizationWindowSeconds: 0  # 큐 적체 시 즉시 스케일업
      policies:
        - type: Pods
          value: 5
          periodSeconds: 30

HPA: External Metrics (SQS/Kafka)

Kubernetes 외부 시스템의 메트릭으로 스케일링한다. SQS 큐 메시지 수, Kafka 컨슈머 랙 등이 대표적이다.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: sqs-consumer-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: sqs-consumer
  minReplicas: 1
  maxReplicas: 20
  metrics:
    - type: External
      external:
        metric:
          name: sqs_messages_visible
          selector:
            matchLabels:
              queue: "order-processing"
        target:
          type: AverageValue
          averageValue: "5"  # Pod당 5개 메시지 처리 목표

애플리케이션 메트릭 노출: Spring Boot

// Spring Boot + Micrometer — Prometheus 형식 메트릭 노출
@RestController
@RequiredArgsConstructor
public class OrderController {

    private final Counter orderCounter;
    private final Gauge queueGauge;

    public OrderController(MeterRegistry registry, OrderQueue queue) {
        this.orderCounter = Counter.builder("http_requests")
            .tag("endpoint", "/orders")
            .register(registry);
        this.queueGauge = Gauge.builder("http_request_queue_length",
            queue, OrderQueue::size)
            .register(registry);
    }

    @PostMapping("/orders")
    public Order createOrder(@RequestBody OrderDto dto) {
        orderCounter.increment();
        return orderService.create(dto);
    }
}

# /actuator/prometheus 출력 예시
http_requests_total{endpoint="/orders"} 15234
http_request_queue_length 42
http_request_duration_seconds_bucket{le="0.1"} 12000
http_request_duration_seconds_bucket{le="0.5"} 14500

멀티 메트릭 HPA 동작 원리

여러 메트릭을 지정하면 HPA는 각 메트릭에서 계산한 레플리카 수 중 가장 큰 값을 선택한다.

예: 현재 Pod 5개
- CPU 70% 목표, 현재 35% → 계산: 5 * (35/70) = 2.5 → 3개
- RPS 100 목표, 현재 150 → 계산: 5 * (150/100) = 7.5 → 8개
- 큐 50 목표, 현재 200 → 계산: 5 * (200/50) = 20개

결과: max(3, 8, 20) = 20개로 스케일업

트러블슈팅

# HPA 상태 확인
kubectl describe hpa api-server-hpa -n production

# 메트릭 조회 실패 시 Adapter 로그 확인
kubectl logs -n monitoring deployment/prometheus-adapter

# PromQL 직접 테스트
kubectl exec -n monitoring prometheus-server-0 -- 
  promtool query instant http://localhost:9090 
  'rate(http_requests_total{namespace="production"}[2m])'

# API 등록 상태 확인
kubectl get apiservices | grep metrics

주의점과 안티패턴

안티패턴	문제점	해결책
안정화 윈도우 미설정	메트릭 흔들림에 빈번한 스케일링	scaleDown stabilizationWindow 300s+
rate() 윈도우 너무 짧음	노이즈에 과민 반응	2~5분 rate 윈도우 사용
minReplicas: 0 설정	콜드스타트 지연, 메트릭 수집 불가	최소 1 유지 (KEDA만 0 지원)
메트릭 카디널리티 폭발	Prometheus 메모리 고갈	라벨 제한, recording rule 활용
CPU + 커스텀 메트릭 충돌	예상과 다른 스케일링	max 선택 원리 이해, behavior 튜닝

마무리

Custom Metrics HPA는 CPU/메모리의 한계를 넘어 비즈니스 지표 기반의 정밀한 오토스케일링을 실현한다. Prometheus Adapter로 커스텀 메트릭을 HPA에 연결하고, behavior 설정으로 스케일링 속도를 세밀하게 제어한다. KEDA 이벤트 오토스케일링이 scale-to-zero를 지원하는 반면, Custom Metrics HPA는 Kubernetes 네이티브로 추가 컴포넌트 없이 동작하는 장점이 있다. Spring Micrometer로 애플리케이션 메트릭을 노출하면 즉시 HPA와 연동할 수 있다.