Spring Retry 재시도 전략 심화

Spring Retry란? 왜 재시도가 필요한가

분산 시스템에서 일시적 장애(transient failure)는 피할 수 없다. 네트워크 타임아웃, DB 커넥션 풀 고갈, 외부 API 일시 장애 — 이런 문제는 한 번 더 시도하면 성공하는 경우가 많다. Spring Retry는 이러한 재시도 로직을 선언적으로 처리하는 프레임워크다. @Retryable 어노테이션 하나로 복잡한 재시도 로직을 깔끔하게 분리할 수 있다.

이 글에서는 Spring Retry의 핵심 메커니즘부터 BackOff 전략, Recovery 처리, RetryTemplate 커스터마이징, Resilience4j와의 비교까지 실전 수준으로 다룬다.

의존성 추가와 기본 설정

<!-- Maven -->
<dependency>
    <groupId>org.springframework.retry</groupId>
    <artifactId>spring-retry</artifactId>
</dependency>
<dependency>
    <groupId>org.springframework</groupId>
    <artifactId>spring-aspects</artifactId>
</dependency>

Configuration 클래스에 @EnableRetry를 추가하면 AOP 기반 재시도가 활성화된다.

@Configuration
@EnableRetry
public class RetryConfig {
}

@Retryable 선언적 재시도

가장 간단한 사용법. 메서드에 @Retryable을 붙이면 예외 발생 시 자동으로 재시도한다.

@Service
@Slf4j
public class PaymentGatewayService {

    @Retryable(
        retryFor = {PaymentTimeoutException.class, ConnectionException.class},
        noRetryFor = {PaymentRejectedException.class},
        maxAttempts = 3,
        backoff = @Backoff(delay = 1000, multiplier = 2.0, maxDelay = 10000)
    )
    public PaymentResult processPayment(PaymentRequest request) {
        log.info("결제 시도: orderId={}", request.getOrderId());
        return gateway.charge(request);
    }

    @Recover
    public PaymentResult recoverPayment(
            PaymentTimeoutException ex, PaymentRequest request) {
        log.error("결제 최종 실패: orderId={}", request.getOrderId(), ex);
        // 폴백: 대체 결제 수단 또는 수동 처리 큐에 등록
        return PaymentResult.pendingManualReview(request.getOrderId());
    }
}

핵심 속성을 정리하면:

속성	설명	기본값
`retryFor`	재시도할 예외 타입	모든 Exception
`noRetryFor`	재시도하지 않을 예외	없음
`maxAttempts`	최대 시도 횟수 (첫 시도 포함)	3
`backoff.delay`	재시도 간 대기 시간 (ms)	1000
`backoff.multiplier`	지수 백오프 승수	0 (고정 간격)
`backoff.maxDelay`	최대 대기 시간 상한	0 (제한 없음)

BackOff 전략 심화

재시도 간격 전략은 시스템 안정성에 직접적인 영향을 미친다. 고정 간격은 thundering herd 문제를 일으킬 수 있으므로, 실무에서는 지수 백오프에 랜덤 지터(jitter)를 추가하는 것이 권장된다.

@Retryable(
    retryFor = ExternalApiException.class,
    maxAttempts = 5,
    backoff = @Backoff(
        delay = 500,
        multiplier = 2.0,
        maxDelay = 30000,
        random = true  // 지터 추가 — 동시 재시도 분산
    )
)
public ApiResponse callExternalApi(String endpoint) {
    return restClient.get()
        .uri(endpoint)
        .retrieve()
        .body(ApiResponse.class);
}

random = true는 각 재시도 간격에 0~delay 범위의 랜덤 지터를 추가한다. 수백 개의 클라이언트가 동시에 재시도할 때 부하를 균등하게 분산시킨다.

RetryTemplate: 프로그래밍 방식 제어

어노테이션이 아닌 세밀한 제어가 필요할 때는 RetryTemplate을 직접 구성한다.

@Configuration
public class RetryTemplateConfig {

    @Bean
    public RetryTemplate retryTemplate() {
        // 재시도 정책: 특정 예외만, 최대 4회
        Map<Class<? extends Throwable>, Boolean> retryableExceptions = Map.of(
            SocketTimeoutException.class, true,
            ConnectTimeoutException.class, true,
            HttpServerErrorException.class, true
        );

        SimpleRetryPolicy retryPolicy = new SimpleRetryPolicy(
            4, retryableExceptions, true
        );

        // 백오프 정책: 지수 + 지터
        ExponentialRandomBackOffPolicy backOff = new ExponentialRandomBackOffPolicy();
        backOff.setInitialInterval(1000);
        backOff.setMultiplier(2.0);
        backOff.setMaxInterval(15000);

        return RetryTemplate.builder()
            .customPolicy(retryPolicy)
            .customBackoff(backOff)
            .withListener(new RetryListenerSupport() {
                @Override
                public <T, E extends Throwable> void onError(
                        RetryContext context, RetryCallback<T, E> callback,
                        Throwable throwable) {
                    log.warn("재시도 #{}: {}",
                        context.getRetryCount(), throwable.getMessage());
                }
            })
            .build();
    }
}

사용 시:

@Service
@RequiredArgsConstructor
public class InventoryService {

    private final RetryTemplate retryTemplate;
    private final InventoryClient client;

    public StockInfo checkStock(String sku) {
        return retryTemplate.execute(
            ctx -> client.getStock(sku),           // 재시도 콜백
            ctx -> StockInfo.unknown(sku)           // Recovery 콜백
        );
    }
}

@Recover: 최종 실패 처리

모든 재시도가 소진되면 @Recover 메서드가 호출된다. 반드시 같은 클래스에 위치해야 하며, 첫 번째 파라미터는 예외 타입, 나머지는 원본 메서드와 동일한 파라미터를 받는다.

@Service
public class NotificationService {

    @Retryable(retryFor = SmtpException.class, maxAttempts = 3)
    public void sendEmail(String to, String subject, String body) {
        mailSender.send(to, subject, body);
    }

    @Recover
    public void recoverSendEmail(SmtpException ex, String to,
                                  String subject, String body) {
        // 폴백 1: 메시지 큐에 적재하여 비동기 재발송
        deadLetterQueue.enqueue(new FailedEmail(to, subject, body, ex));

        // 폴백 2: 대체 채널(SMS)로 발송
        smsService.sendFallback(to, "이메일 발송 실패, SMS로 대체 알림");
    }
}

Stateful Retry: 트랜잭션과 함께

기본 재시도는 stateless다. 같은 스레드에서 루프를 돌며 재시도한다. 하지만 트랜잭션 롤백 후 재시도가 필요하면 stateful retry를 사용해야 한다.

@Retryable(
    retryFor = OptimisticLockingFailureException.class,
    maxAttempts = 3,
    stateful = true  // 트랜잭션 경계 밖에서 재시도
)
@Transactional
public void updateAccountBalance(Long accountId, BigDecimal amount) {
    Account account = accountRepository.findById(accountId)
        .orElseThrow();
    account.addBalance(amount);
    accountRepository.save(account);
}

Stateful retry는 예외를 호출자에게 던지고, 다음 호출 시 재시도 카운트를 기억한다. JPA의 낙관적 잠금(@Version)과 함께 사용할 때 필수적이다.

RetryListener: 모니터링과 메트릭

@Component
@Slf4j
public class MetricsRetryListener implements RetryListener {

    private final MeterRegistry meterRegistry;

    public MetricsRetryListener(MeterRegistry meterRegistry) {
        this.meterRegistry = meterRegistry;
    }

    @Override
    public <T, E extends Throwable> boolean open(
            RetryContext context, RetryCallback<T, E> callback) {
        return true; // true면 재시도 진행, false면 중단
    }

    @Override
    public <T, E extends Throwable> void onSuccess(
            RetryContext context, RetryCallback<T, E> callback, T result) {
        if (context.getRetryCount() > 0) {
            meterRegistry.counter("retry.success",
                "attempts", String.valueOf(context.getRetryCount())
            ).increment();
        }
    }

    @Override
    public <T, E extends Throwable> void onError(
            RetryContext context, RetryCallback<T, E> callback,
            Throwable throwable) {
        meterRegistry.counter("retry.error",
            "exception", throwable.getClass().getSimpleName()
        ).increment();
    }
}

Spring Retry vs Resilience4j 비교

항목	Spring Retry	Resilience4j Retry
접근 방식	AOP (@Retryable)	함수형 데코레이터
Circuit Breaker	별도 구현 필요	내장 지원
Rate Limiter	미지원	내장 지원
Bulkhead	미지원	내장 지원
리액티브 지원	제한적	Reactor/RxJava 네이티브
설정 방식	어노테이션 중심	application.yml 중심
적합한 상황	단순 재시도, Spring 생태계	복합 장애 대응, 마이크로서비스

실무 권장: 단순 재시도만 필요하면 Spring Retry, Circuit Breaker + Rate Limiter + Retry를 조합해야 하면 Resilience4j를 선택한다. 둘을 함께 사용할 수도 있다.

실전 안티패턴과 주의점

안티패턴	문제점	해결책
모든 예외 재시도	비즈니스 예외도 재시도하여 부작용 발생	retryFor로 일시적 장애만 지정
고정 간격 재시도	Thundering herd, 서버 과부하	지수 백오프 + random jitter
멱등성 미보장	재시도로 중복 처리 발생	idempotency key 도입
@Recover 누락	최종 실패 시 예외가 그대로 전파	반드시 Recovery 전략 구현
같은 클래스 내부 호출	AOP 프록시 우회로 재시도 미작동	별도 Bean으로 분리 또는 self-injection

마무리

Spring Retry는 분산 환경의 일시적 장애를 우아하게 처리하는 핵심 도구다. @Retryable의 선언적 접근과 RetryTemplate의 프로그래밍적 접근을 상황에 맞게 선택하고, 반드시 지수 백오프 + 지터 + 멱등성을 함께 고려해야 한다. Micrometer 메트릭과 연동하면 재시도 현황을 실시간으로 모니터링할 수 있고, Actuator 커스텀 엔드포인트로 운영 중 재시도 정책을 동적으로 조회할 수도 있다.