복잡한 queryDSL에 대한 최적화 고민과 회고

문제 배경

사용자 질의 기록을 분석하여 분석 결과 요약 리스트를 제공하는 API를 개발하면서, 다음과 같은 기능 요구사항이 있었습니다

특정 기간 내 사용자 요청을 기준으로 응답/미응답 여부를 필터링
분석 결과는 3가지 알고리즘 (a 분류, b 분류, c 유사도) 별로 각각 존재
분석 결과 외에도 시나리오 정보, 엔티티 탐지 정보 등 다양한 부가 정보도 함께 보여줘야 함

즉, 기준 테이블(A)에서 다수의 결과 테이블(B, C, D)를 각각 조인 또는 별도로 조회해야 하는 상황이었고, 단일 쿼리로 모든 결과를 가져오면 다음과 같은 문제가 있었습니다

알고리즘마다 JOIN이 추가되며 성능 저하 발생
응답/미응답 여부 판단을 위해 알고리즘별 조건이 달라 별도 조건 분기를 동적으로 처리해야 함
UI에는 하나의 레코드처럼 보여야 하므로, nluId 기준으로 3개의 분석 결과를 통합하여 반환해야 함

해결 전략

불필요한 JOIN 제거, 핵심 정보만 서브 조회
- a 분류, b 분류, c 유사도는 결과 테이블을 통해 조회되며,
- 각 분석 결과는 nluId와 알고리즘 종류를 기준으로 별도 테이블에서 조회 가능
- 따라서 메인 쿼리에서는 JOIN 없이 nluId만 추출한 후, 알고리즘 결과는 별도 Map<Long, T> 으로 조회
성능 최적화를 위한 데이터 분리 구조
- 메인 쿼리: nluId, 사용자 요청, 응답 여부, 엔티티 탐지 결과 등 핵심 필드만 조회
- 보조 쿼리: 알고리즘별 intentName, confidence, 학습 ID 등을 Map으로 캐싱
- 최종적으로 stream.map() 을 통해 nluId 기준으로 HashMap에서 결과를 가져와 DTO로 변환
복잡한 조건 필터링은 BooleanBuilder + 동적 where 처리
- 예: “미응답”의 경우 → 특정 분석 결과가 없거나 OUT_OF_SCOPE(90)로 분석된 경우만 포함
- 동적 where 조건은 BooleanBuilder를 활용하여 filterType에 따라 조립
QueryDSL에서 where 조건과 case 문 중복 방지
- 응답여부 필드는 SELECT CASE 문으로 만들었으나, where 절에서도 동일 조건 사용
- 중복 계산 방지를 위해 별도의 BooleanExpression으로 추출해 재사용

<보안을 위하여 이름을 수정하였습니다.>

불필요한 조회와 조인을 줄이기 위하여, 필터링을 추가하고 동적쿼리로 base 데이터를 조회하게 구현했습니다.

/**
 * 사용자 입력 로그 기반 분석 결과 리스트 조회 서비스
 * - 메인 테이블: UserInputLog
 * - 보조 분석 결과 테이블: AlgoResult (알고리즘별 intentName, confidence 등)
 * - 불필요한 JOIN 제거 및 Map 캐싱으로 최적화한 QueryDSL 예시
 */

public AnalysisResultDto.ListResult getAnalyzedQueryList(QuerySearchParam searchParam, Long tenantId) {
    JPAQueryFactory queryFactory = dbService.getQueryFactory();

    // 1단계: 분석 대상 ID 추출
    JPAQuery<Long> inputIdQuery = queryFactory
        .select(userInputLog.inputId)
        .from(userInputLog)
        .where(userInputLog.tenantId.eq(tenantId));

    BooleanBuilder whereBuilder = new BooleanBuilder()
        .and(searchParam.where());

    // 2단계: 응답/미응답 필터 조건 처리
    AnalysisStatus filterType = Optional.ofNullable(searchParam.getFilter())
        .map(QuerySearchParam.Filter::getResponseStatus)
        .orElse(null);

    if (filterType != null) {
        inputIdQuery.leftJoin(analysisSummary).on(analysisSummary.inputId.eq(userInputLog.inputId));
        if (filterType == AnalysisStatus.NO_RESPONSE) {
            whereBuilder.and(getNoResponseCondition());
        } else if (filterType == AnalysisStatus.RESPONDED) {
            whereBuilder.and(getNoResponseCondition().not());
        }
    }

    inputIdQuery.where(whereBuilder);
    List<Long> inputIds = inputIdQuery.fetch();

JOIN 대신 O(1) Hash 조회로 처리되기 때문에 대규모 데이터에서도 효율적이라 판단

    // 3단계: 알고리즘별 intentName / confidence 정보 Map 조회
    Map<Long, String> conditionNameMap = getIntentNameMap(inputIds, AlgorithmType.CONDITION_CLASSIFIER);
    Map<Long, String> conditionConfMap = getConfidenceMap(inputIds, AlgorithmType.CONDITION_CLASSIFIER);

    Map<Long, String> intentNameMap = getIntentNameMap(inputIds, AlgorithmType.INTENT_CLASSIFIER);
    Map<Long, String> intentConfMap = getConfidenceMap(inputIds, AlgorithmType.INTENT_CLASSIFIER);

    Map<Long, String> vectorNameMap = getIntentNameMap(inputIds, AlgorithmType.VECTOR_SIMILARITY);
    Map<Long, String> vectorConfMap = getConfidenceMap(inputIds, AlgorithmType.VECTOR_SIMILARITY);

여러 개의 필드(컬럼) 를 한 번의 쿼리로 동시에 조회할 때 유용한 JPAQuery<Tuple> 사용

1. DTO로 바로 매핑하지 않고, 여러 컬럼을 유연하게 다루기 위해

다양한 필드를 조회할 경우, DTO 생성자나 Projections.fields/bean을 사용하면 매핑 순서나 타입 제한이 생겨 유연성이 떨어진다.

2. 조건부 컬럼(CASE WHEN 등)을 함께 조회할 때

getResponseStatusCase()는 CASE 문으로 구성된 표현식이라 DTO 매핑이 까다로움.
Tuple.get(expression)을 통해 간단히 처리 가능.

3. 필요한 필드만 뽑고, 추가로 가공하는 로직이 있을 때

코드에서는 각 알고리즘 결과들을 Map 캐싱으로 따로 조회하고 나서 Tuple에서 꺼낸 nluId를 키로 결과를 합쳐줌

4. 간단한 조회지만 DTO로 바로 매핑하면 구조가 복잡해질 때

복잡한 서브쿼리/연산을 포함한 쿼리는 DTO로 직접 매핑하는 것보다 Tuple로 받아서 직접 파싱하는 게 가독성 + 유지보수에 유리함.

튜플은 내부적으로 List<Expression, Object> 구조와 비슷하다고 생각하면된다.

    // 4단계: 메인 쿼리 - 핵심 정보 조회
    JPAQuery<Tuple> baseQuery = queryFactory
        .select(
            userInputLog.inputId,
            userInputLog.sessionId,
            userInputLog.userText,
            getResponseStatusCase(),
            detectedEntity.entityName,
            analysisSummary.scenarioDescription,
            userInputLog.analyzedAt
        )
        .from(userInputLog)
        .leftJoin(detectedEntity).on(detectedEntity.id.inputId.eq(userInputLog.inputId))
        .leftJoin(analysisSummary).on(analysisSummary.inputId.eq(userInputLog.inputId))
        .where(userInputLog.inputId.in(inputIds))
        .orderBy(userInputLog.analyzedAt.desc());

    if (searchParam.isPageValid()) {
        baseQuery.offset(searchParam.getOffset());
        baseQuery.limit(searchParam.getPageSize());
    }

    List<AnalysisResultDto> results = baseQuery.fetch().stream().map(row -> {
        Long inputId = row.get(userInputLog.inputId);
        return AnalysisResultDto.builder()
            .inputId(inputId)
            .sessionId(row.get(userInputLog.sessionId))
            .userText(row.get(userInputLog.userText))
            .responseStatus(row.get(getResponseStatusCase()))
            .entityName(row.get(detectedEntity.entityName))
            .conditionResult(conditionNameMap.get(inputId))
            .conditionConfidence(conditionConfMap.get(inputId))
            .intentResult(intentNameMap.get(inputId))
            .intentConfidence(intentConfMap.get(inputId))
            .vectorResult(vectorNameMap.get(inputId))
            .vectorConfidence(vectorConfMap.get(inputId))
            .analyzedAt(row.get(userInputLog.analyzedAt))
            .build();
    }).collect(Collectors.toList());

    return new AnalysisResultDto.ListResult((long) results.size(), searchParam, results);
}

private BooleanExpression getNoResponseCondition() {
    return userInputLog.analysisResultCode.eq(AnalysisStatusCode.NO_RESPONSE)
        .and(analysisSummary.conditionStatus.eq(AnalysisDetail.FAIL))
        .and(analysisSummary.vectorStatus.eq(AnalysisDetail.FAIL))
        .and(analysisSummary.entityStatus.eq(AnalysisDetail.FAIL));
}

private Expression<Integer> getResponseStatusCase() {
    return new CaseBuilder()
        .when(getNoResponseCondition())
        .then(AnalysisStatusCode.NO_RESPONSE.value())
        .otherwise(AnalysisStatusCode.RESPONDED.value());
}

private Map<Long, String> getIntentNameMap(List<Long> inputIds, AlgorithmType algorithm) {
    QAlgoResult algo = QAlgoResult.algoResult;
    QIntentCatalog intentCatalog = QIntentCatalog.intentCatalog;

    return dbService.getQueryFactory()
        .select(algo.id.inputId, intentCatalog.intentName)
        .from(algo)
        .leftJoin(intentCatalog).on(intentCatalog.intentId.eq(algo.intentId))
        .where(
            algo.id.inputId.in(inputIds),
            algo.algorithm.eq(algorithm)
        )
        .transform(GroupBy.groupBy(algo.id.inputId).as(intentCatalog.intentName));
}

private Map<Long, String> getConfidenceMap(List<Long> inputIds, AlgorithmType algorithm) {
    QAlgoResult algo = QAlgoResult.algoResult;

    return dbService.getQueryFactory()
        .select(algo.id.inputId, algo.confidence.max().stringValue())
        .from(algo)
        .where(
            algo.id.inputId.in(inputIds),
            algo.algorithm.eq(algorithm)
        )
        .groupBy(algo.id.inputId)
        .transform(GroupBy.groupBy(algo.id.inputId).as(algo.confidence.max().stringValue()));
}

성과 및 회고

동적쿼리가 필요하다고 판단하여 처음에는 native query를 이용한 Stored Procedure를 도입하려고 했으나,

현재 상황에서 아직 Sp 도입한 사례가 없어서, 최대한 쿼리로 풀어내보려고 했습니다.

처음 native qeury를 작성하고, 그대로 query DSL로 변환하는 과정에서 머리가 많이 복잡했습니다.

다른 개발자가 이해할 수 있는가, 가독성은 괜찮은지 개발 패쇄의 원칙이 지켜지는지에 대하여..

아직 추가 리펙토링은 하지 않았지만, 엔티티의 연관관계를 더 분석하고 조인 전략을 효율적으로 사용하는 것에 대하여 고민 중에 있습니다.

약 20,000건 이상의 분석 이력을 대상으로도 2초 이내로 응답 시간 유지
JOIN 최소화와 HashMap 캐싱 구조 덕분에 병목 구간 제거
QueryDSL의 동적 쿼리 조립 능력을 적극 활용하면서도, 코드 가독성과 확장성을 확보

결론

처음엔 복잡한 조건 때문에 JOIN 중심으로 접근했지만, 실제 필요한 데이터의 흐름을 정리해보면 굳이 JOIN 없이도 효율적인 방식이 있다는 걸 배웠습니다. 핵심은 "모든 데이터를 한 번에 가져오려 하지 말고, 목적에 따라 쪼개고 캐싱하라"는 점입니다.

리펙토링 배경 (2025-06-15)

처음 쿼리를 작성할 때, join을 어떻게 줄이면서 속도를 높일 수 있을지에 대해 빠져서 다른 것들을 고려하지 못했다..

현재 쿼리의 문제점

QueryDSL 쿼리를 알고리즘별로 3번
시나리오 정보, 엔티티 정보 등 JOIN을 남발
메인 쿼리 + 부가정보 + 알고리즘 결과까지 총 12번 조회

조회 결과를 위해 12번이나 조회하게 작성했다니....

해결 전략: 단일 쿼리 기반 최적화

목표

모든 데이터 조회를 1회로 줄이되, 성능 저하 없이 구성
불필요한 JOIN 제거
분석 결과는 알고리즘별 Map 캐싱
응답/미응답 판단 로직은 CASE 절로 통일

최종 코드

public ChatBotAnalyzeStatSearch.ChatBotAnalyzeStatDto.DtoList getChatBotAnalysisList(ChatBotAnalyzeStatSearch searchParam, Long domainId) {
        JPAQueryFactory queryFactory = rdbService.getQueryFactory();

        QAnalyzeIntent ic = new QAnalyzeIntent("ic");
        QAnalyzeIntent cc = new QAnalyzeIntent("cc");
        QAnalyzeIntent vs = new QAnalyzeIntent("vs");
        QIntentmaster icMaster = new QIntentmaster("icMaster");
        QIntentmaster ccMaster = new QIntentmaster("ccMaster");
        QIntentmaster vsMaster = new QIntentmaster("vsMaster");

        BooleanBuilder where = new BooleanBuilder()
                .and(analyzeHistory.domainId.eq(domainId))
                .and(searchParam.where());

        // DynamicQuery - filter 여부에 대한 동적 쿼리
        Optional.ofNullable(searchParam.getFilter())
                .map(ChatBotAnalyzeStatSearch.AnalyzeHistoryFilters::getIsOutOfScope)
                .ifPresent(filterType -> {
                    if (filterType == NluAnalysedResult.OUT_OF_SCOPE) {
                        where.and(isOutOfScopeCondition());
                    } else if (filterType == NluAnalysedResult.COMPLETE) {
                        where.and(isOutOfScopeCondition().not());
                    }
                });

        JPAQuery<ChatBotAnalyzeStatSearch.ChatBotAnalyzeStatDto> query = queryFactory
                .select(Projections.constructor(ChatBotAnalyzeStatSearch.ChatBotAnalyzeStatDto.class,
                        analyzeHistory.nluId,
                        analyzeHistory.ucid,
                        analyzeHistory.requestQuery,
                        isRespond(),
                        analyzeEntity.entityName,
                        ccMaster.intentName,
                        cc.analysedConfidence.stringValue(),
                        nodeUserSays.userSayValue,
                        analyzeResult.nodeTitle,
                        icMaster.intentName,
                        ic.analysedConfidence.stringValue(),
                        vsMaster.intentName,
                        vs.analysedConfidence.stringValue(),
                        analyzeHistory.workTime
                ))
                .from(analyzeHistory)
                .leftJoin(analyzeEntity).on(analyzeEntity.id.nluId.eq(analyzeHistory.nluId))
                .leftJoin(analyzeResult).on(analyzeResult.nluId.eq(analyzeHistory.nluId))

                // conditionClassification
                .leftJoin(cc).on(cc.id.nluId.eq(analyzeHistory.nluId)
                        .and(cc.analyseAlgorithm.eq(NluAnalyseAlgorithm.CONDITION_CALSSFIER))
                        .and(cc.analysedConfidence.eq(getMaxConfidenceSubQuery(cc, NluAnalyseAlgorithm.CONDITION_CALSSFIER))))
                .leftJoin(ccMaster).on(cc.analysedIntentId.eq(ccMaster.intentId))

                // intentClassification
                .leftJoin(ic).on(ic.id.nluId.eq(analyzeHistory.nluId)
                        .and(ic.analyseAlgorithm.eq(NluAnalyseAlgorithm.INTENT_CLASSIFIER))
                        .and(ic.analysedConfidence.eq(getMaxConfidenceSubQuery(ic, NluAnalyseAlgorithm.INTENT_CLASSIFIER))))
                .leftJoin(icMaster).on(ic.analysedIntentId.eq(icMaster.intentId))

                // vectorSimilarity
                .leftJoin(vs).on(vs.id.nluId.eq(analyzeHistory.nluId)
                        .and(vs.analyseAlgorithm.eq(NluAnalyseAlgorithm.VECTOR_SIMILARITY))
                        .and(vs.analysedConfidence.eq(getMaxConfidenceSubQuery(vs, NluAnalyseAlgorithm.VECTOR_SIMILARITY))))
                .leftJoin(vsMaster).on(vs.analysedIntentId.eq(vsMaster.intentId))

                // keyWordResult
                .leftJoin(nodeUserSays).on(nodeUserSays.nodeMaster.id.scenarioId.eq(analyzeResult.scenarioId)
                        .and(nodeUserSays.nodeMaster.id.scenarioVerId.eq(analyzeResult.scenarioVerId)
                        .and(nodeUserSays.nodeMaster.id.nodeId.eq(analyzeResult.nodeId)))
                )

                .where(where)
                .orderBy(analyzeHistory.responseTime.desc());

        if (searchParam.isPageValid()) {
            query.offset(searchParam.getOffset());
            query.limit(searchParam.getCount());
        }

        List<ChatBotAnalyzeStatSearch.ChatBotAnalyzeStatDto> results = query.fetch();
        return new ChatBotAnalyzeStatSearch.ChatBotAnalyzeStatDto.DtoList((long) results.size(), searchParam, results);
    }
    
      // 응답 여부 CASE 문
    private Expression<Integer> isRespond() {
        return new CaseBuilder()
                .when(isOutOfScopeCondition())
                .then(OUT_OF_SCOPE.value())
                .otherwise(NluAnalysedResult.COMPLETE.value());
    }

    public BooleanExpression isOutOfScopeCondition() {
        return analyzeHistory.analysedResult.eq(OUT_OF_SCOPE)
                .and(analyzeResult.analyzeResultEntity.eq(NluAnalysedResultDetail.FAILURE))
                .and(analyzeResult.analyzeResultCc.eq(NluAnalysedResultDetail.FAILURE))
                .and(analyzeResult.analyzeResultVs.eq(NluAnalysedResultDetail.FAILURE));
    }

가장 고민을 많이 했던 부분

1:N 의 데이터에서 Max 값을 구하는 쿼리

 // Confidence 최대값
    private SubQueryExpression<BigDecimal> getMaxConfidenceSubQuery(QAnalyzeIntent base, NluAnalyseAlgorithm algorithm) {
        QAnalyzeIntent sub = new QAnalyzeIntent(base.getMetadata().getName() + "_sub");
        return JPAExpressions
                .select(sub.analysedConfidence.max())
                .from(sub)
                .where(sub.id.nluId.eq(base.id.nluId)
                        .and(sub.analyseAlgorithm.eq(algorithm)));
    }

base는 외부 쿼리에 사용될 QAnalyzeIntent 인스턴스 (ex : cc, ic, vs)
sub는 서브쿼리 전용 alias로 새롭게 생성 (ex : cc_sub, ic_sub, vs_sub) — alias를 다르게하여 동적 서브쿼리로 구현

문제 배경

해결 전략

1. DTO로 바로 매핑하지 않고, 여러 컬럼을 유연하게 다루기 위해

2. 조건부 컬럼(CASE WHEN 등)을 함께 조회할 때

3. 필요한 필드만 뽑고, 추가로 가공하는 로직이 있을 때

4. 간단한 조회지만 DTO로 바로 매핑하면 구조가 복잡해질 때

성과 및 회고

결론

리펙토링 배경 (2025-06-15)

해결 전략: 단일 쿼리 기반 최적화

목표

티스토리툴바