조달청 우수제품 평가시스템 eval-system-premium HEAD bd5d51a feat/onprem-h100-deploy (PR #197) 의존성 갭 4건

eval-system-premium 온프렘 의존성 감사 + 컨테이너 부족분 분석

2026-05-22 · 분석 위치 pps-mono-repo-pr162-fix (브랜치 h100-pull, origin/feat/onprem-h100-deploy 추적) · 외장 DRIVE1 미마운트 / 707MB zip 미해제 상태에서 코드 기준 추론

TL;DR — 메일 707MB zip 의 도커 이미지 8개만으로는 온프렘 시연 불가. ① hwpx-intelligence 가 default 로 google/gemma-4-31b-it:exacto via OpenRouter 호출 → 온프렘 차단 시 vLLM 로컬 endpoint 로 강제 override 필요, ② frontend 가 호출하는 /api/chat/*·/api/compare/*·/api/engine/* 가 PR #163 Revert 로 BFF 라우터에서 사라짐 → UI 404 발생, ③ ai_engine 컨테이너는 현재 BFF HEAD 가 호출하지 않으므로 시연용으론 불필요 (다만 chat/compare UI 살리려면 필요), ④ 루트 compose include 시 networks.default conflicts — premium sub-compose 의 networks 블록 제거 필수.

1호출 체인 (HEAD bd5d51a 기준)

[Browser] │ ├── /app/eval-premium/ ───► eval-system-premium-frontend (Next.js, 3030) │ ├── /api/specs ─┐ ├── /api/proposals/* ─┼─► eval-system-premium-bff (FastAPI, 4001) ├── /api/admin/migrate-images ─┘ │ │ ├── PostgreSQL (BFF DB · eval_system_premium) │ ├── MinIO (선택 — 미설정 시 data_uri inline) │ └── HTTP → hwpx-intelligence (8001) │ │ │ ├── LLM : OpenRouter google/gemma-4-31b-it:exacto ◀── default │ ├── VLM : OpenRouter qwen/qwen3-vl-8b-thinking │ └── (env override 가능: HWPX_LLM_BASE_URL → vLLM) │ ├── /api/chat/* ◀── frontend 호출 ░░░ BFF 라우터 없음 (404) ░░░ ← PR #163 Revert ├── /api/compare/* ◀── frontend 호출 ░░░ BFF 라우터 없음 (404) ░░░ ├── /api/engine/* ◀── frontend 호출 ░░░ BFF 라우터 없음 (404) ░░░ └── /api/admin/chat/* ◀── frontend 호출 ░░░ BFF 라우터 없음 (404) ░░░

현재 BFF 라우터 (실측)

Path	Method	파일
`/api/specs`	POST	`app/api/specs.py`
`/api/proposals` · `/{id}/file` · `/{id}/images/{name}` · `/{id}/extracted` · `/{id}` DELETE · `/bulk-delete` · `/bulk-bid` · `/stats`	GET/POST/PATCH/DELETE	`app/api/proposals.py`
`/api/admin/migrate-images`	POST	`app/api/admin.py`
`/internal/done`	POST (hwpx webhook)	`app/api/internal.py`

2핵심 발견 4건

발견 ① — hwpx-intelligence 의 default LLM 은 OpenRouter gemma (온프렘 차단됨)

~/Mirror/Github/1_Projects/hwpx-intelligence/src/hwpx_intelligence/config.py:

llm_base_url: str = "https://openrouter.ai/api/v1"
llm_model:    str = "google/gemma-4-31b-it:exacto"
vlm_base_url: str = "https://openrouter.ai/api/v1"
vlm_model:    str = "qwen/qwen3-vl-8b-thinking"
# ApiBackend = Literal["openrouter", "openai", "vllm", "custom"]

온프렘에서 .env 미override 시 OpenRouter 차단 = 100% 실패. HWPX_LLM_BASE_URL, HWPX_VLM_BASE_URL 을 로컬 vLLM 으로 강제해야 함.

발견 ② — frontend ↔ BFF 라우터 불일치 (PR #163 Revert 잔재)

frontend 가 호출하는 endpoint vs BFF 의 실제 라우터:

frontend 호출 (실측)	BFF 라우터 존재?
`/api/chat/sessions/...`, `/api/chat/bids/...`	✗ 없음 (PR #163 Revert)
`/api/compare/core-tech/{bid_id}`, `/api/compare/requirements/...`, `/api/compare/evaluation/{bid_id}/proposals/{pid}/summary`	✗ 없음
`/api/engine/bids/...`	✗ 없음
`/api/admin/chat/sessions`, `/api/admin/chat/feedbacks`	✗ 없음
`/api/auth/login`, `/api/auth/session`	✗ 없음
`/api/bids`, `/api/client`	✗ 없음
`/api/proposals/*` (8 routes)	✓ 있음
`/api/specs`	✓ 있음

git 이력에서 f73d326 · 956e497 가 "Revert Revert #163" 시도였으나 현재 HEAD bd5d51a 에는 반영 안 됨.

발견 ③ — ai_engine 컨테이너는 BFF HEAD 가 호출하지 않음 (사용자 직관과 차이)

사용자 의심: "LLM 호출도 ai_engine 함수 호출" — HANDOFF.md 의 T4 commit b7f923e 에 app/adapters/ai_engine_client.py 추가가 있었으나 PR #163 Revert 로 제거됨. 현재 BFF 코드의 ai_engine grep 결과 0건.

LLM 호출 실체는 hwpx-intelligence 가 자체적으로 (OpenAI 호환 client → vLLM 또는 OpenRouter). ai_engine 의 MODEL_CONFIG_* 변수는 ai_engine 내부 라우팅용이지 BFF 의 의존이 아님.

→ eval-system-premium 데모만이라면 ai_engine 컨테이너 불필요. chat/compare UI 까지 살리려면 ai_engine + redis + celery + analysis_pipeline + knowledge_hub 모두 필요 (PR #162 복원).

발견 ④ — 루트 compose include merge 시 networks.default conflicts

premium sub-compose 의 networks: default: driver: bridge 가 루트의 networks.default.name = pps-network 와 충돌. docker compose config 가 invalid compose project 로 실패.

# apps/eval-system-premium/docker-compose.yml — 마지막 5줄 제거
- networks:
-   default:
-     driver: bridge
-     labels:
-       service: eval-system-premium

services 의 networks: [- default] 는 그대로 두면 루트 default(pps-network) 가 자동 적용됨.

3컨테이너 필요 매트릭스

컨테이너	이미지	premium-only (specs/proposals)	+ chat/compare UI	비고
postgres	`postgres:16-alpine`	필수	필수	BFF DB (eval_system_premium) · standard 5432
eval-system-premium-bff	`pps-eval-system-premium-bff:latest`	필수	필수	4001 · alembic 자동 upgrade
eval-system-premium-frontend	`pps-eval-system-premium-frontend:latest`	필수	필수	3030 (Next.js standalone)
hwpx-intelligence	별도 private repo 빌드 필요	필수	필수	8001 · LLM/VLM 호출 본체
vLLM Qwen3-32B	`vllm/vllm-openai` (또는 호스트 직접)	필수	필수	8305 (LLM) · H100 80GB
vLLM Qwen3-VL-8B	`vllm/vllm-openai`	필수	필수	8303 (VLM) · H100 같은 GPU 공유 가능 (메모리 여유 시)
minio + minio-init	`minio/minio:latest` · `minio/mc:latest`	선택	선택	이미지 영구 저장 · 없으면 data_uri inline (5MB+ 페이로드)
portal-nginx	`pps-portal-nginx` (빌드)	선택	선택	80 · 라우팅 통합 · SSE proxy_buffering off
pgadmin	`dpage/pgadmin4:latest`	선택	선택	5050 · 운영 디버그용
ai_engine	`pps-ai-engine`	불필요	필수	8200 · chat·compare 복원 시
redis	`redis:7-alpine`	불필요	필수	6379 · Celery broker + chat cancel-flag
celery-worker	`pps-celery-worker`	불필요	필수	analysis fanout 처리
knowledge-hub	`pps-knowledge-hub`	불필요	필수	8100 · eval_db cross-write
analysis-pipeline	`pps-analysis-pipeline`	불필요	필수	8102 · compare 매칭
qdrant	`qdrant/qdrant:v1.12.6`	불필요	선택	6333 · KB semantic search 시
data-pipeline / ai-dashboard-*	각 빌드	불필요	불필요	RFP 흐름 전용

메일 707MB zip 의 도커 이미지 8개 추정 — Gmail 본문이 HTML 멀티파트라 plain text 추출 실패. 압축 풀어 docker-images/pps-images.tar 의 manifest.json 의 RepoTags 로 정확 리스트 확인 권장:

tar -xOf docker-images/pps-images.tar manifest.json | jq -r '.[].RepoTags[]'

추정 (premium 데모 시점에 묶인 것): postgres:16-alpine, redis:7-alpine, minio/minio:latest, minio/mc:latest, qdrant/qdrant:v1.12.6, pps-portal-nginx, pps-eval-system-premium-bff, pps-eval-system-premium-frontend = 8개. hwpx-intelligence 이미지가 안 들어있을 가능성 큼 (별도 private repo · 별도 빌드).

4온프레미스 환경변수 패치 (핵심)

① hwpx-intelligence — OpenRouter → 로컬 vLLM

# 컨테이너 env (hwpx-intelligence)
HWPX_LLM_BACKEND=vllm
HWPX_LLM_BASE_URL=http://host.docker.internal:8305/v1   # H100 host vLLM
HWPX_LLM_MODEL=Qwen/Qwen3-32B
HWPX_VLM_BACKEND=vllm
HWPX_VLM_BASE_URL=http://host.docker.internal:8303/v1
HWPX_VLM_MODEL=Qwen/Qwen3-VL-8B-Instruct
# OPENROUTER_API_KEY 는 비워두기 (default 폴백 방지)

② eval-system-premium BFF — hwpx endpoint pin

HWPX_BASE_URL=http://hwpx-intelligence:8001
HWPX_CALLBACK_BASE=http://eval-system-premium-bff:4001
BFF_DATABASE_URL=postgresql+asyncpg://pps_user:pps_password@postgres:5432/eval_system_premium
MINIO_ENDPOINT=minio:9000           # 없으면 미설정 → data_uri inline
PREMIUM_BFF_DISABLE_SELF_POLLER=false # HANDOFF gap #3 — webhook 누락 대비

③ vLLM 호스트 부팅 (H100)

# LLM (32B fp16 ≒ 64GB VRAM)
python -m vllm.entrypoints.openai.api_server \
  --model Qwen/Qwen3-32B --port 8305 \
  --gpu-memory-utilization 0.85 --max-model-len 32768

# VLM (8B 별 GPU 권장, 같은 H100 80GB 면 메모리 빠듯)
python -m vllm.entrypoints.openai.api_server \
  --model Qwen/Qwen3-VL-8B-Instruct --port 8303 \
  --gpu-memory-utilization 0.15 --max-model-len 8192

Qwen3-32B fp16 ≈ 64GB + Qwen3-VL-8B fp16 ≈ 16GB → H100 80GB 한 장에 동거 시 KV cache 여유 거의 없음. 2 GPU 분리 권장. 1 GPU 강제 시 4-bit AWQ 양자화 또는 vLLM --enforce-eager + 짧은 context.

5compose 패치 (즉시 적용 가능)

패치 ① — premium sub-compose 의 networks 블록 제거

# apps/eval-system-premium/docker-compose.yml — 파일 마지막 5줄 삭제
- networks:
-   default:
-     driver: bridge
-     labels:
-       service: eval-system-premium

패치 ② — start.sh inspect 8개 이미지 전수 확인

for img in pps-ai-engine pps-celery-worker pps-knowledge-hub \
           pps-data-pipeline pps-analysis-pipeline \
           pps-ai-dashboard-backend pps-portal-nginx pps-rfp-backend \
           pps-eval-system-premium-bff pps-eval-system-premium-frontend; do
  docker image inspect "$img" >/dev/null 2>&1 || need_load=1
done
[ -n "$need_load" ] && docker load < "$IMAGES_TAR"

패치 ③ — 루트 compose 의 build-only 서비스에 image: 명시

# docker-compose.yml — 7개 서비스에 image: 라인 추가
ai-engine:
  image: pps-ai-engine:latest
  build: { context: ., dockerfile: services/ai_engine/Dockerfile }
# 같은 패턴으로 celery-worker / knowledge-hub / data-pipeline /
# analysis-pipeline / ai-dashboard-backend / portal-nginx / rfp-backend

6해결안 비교 — 데모일 D-Day 의사결정

옵션	scope	이미지 필요분	frontend chat/compare UI	예상 소요
A. premium-only 모드 (specs/proposals 만)	현재 HEAD 코드 그대로	4 (bff, frontend, postgres, hwpx) + vLLM 2 + minio(opt)	UI 비활성화 필수 (404 회피)	1-2h
B. PR #162 cherry-pick 복원	chat/compare/engine 라우터 BFF 에 복원 + Celery fanout	+ ai_engine, redis, celery-worker, analysis-pipeline, knowledge_hub	정상 동작	4-6h + 회귀 테스트
C. T4 만 minimal 복원 (chat 만)	chat.py + ai_engine_client.py + redis 만	+ ai_engine, redis	chat 만 동작 / compare 는 404	2-3h

권고: 5/22 데모 시간 압박 시 A (frontend 의 chat/compare 메뉴는 hidden 처리). 이후 사후 패치로 B 추진. C 는 어중간해서 비추.

7온프레미스 추가 주의사항

외부 호출 차단: hwpx default OpenRouter / Anthropic / Google AI 모두 outbound 차단 가정. HWPX_LLM_BASE_URL + OPENROUTER_API_KEY= (빈 값) 필수.
이미지 pull 차단: docker daemon 이 hub.docker.com 접근 불가 가정. 모든 base image (postgres, redis, minio 등) 가 tar 에 들어있어야. postgres:16-alpine 등 빠지면 부팅 실패.
알파인 vs glibc: postgres:16-alpine entrypoint 에 dos2unix 설치 — 인터넷 없으면 apk add 실패. 패치 4: postgres:16 debian 기반 + init-databases.sh 라인 종료 정리 (LF) 로 변경 권장.
portal-nginx SSE: hwpx 의 진행 SSE 가 nginx 통과 시 proxy_buffering off + proxy_read_timeout 충분(≥600s). 기본 60s 면 끊김.
MinIO subpath redirect: MINIO_BROWSER_REDIRECT_URL 가 외부 도메인(localhost) 가리키면 콘솔 깨짐. 내부 IP 로 변경.
frontend ↔ BFF same-origin: NEXT_PUBLIC_API_BASE_URL=/app/eval-premium-api — portal-nginx 라우팅 의존. nginx 미사용 시 절대 URL 로 override.
HWPX webhook 누락 (HANDOFF gap #3): hwpx-intelligence JobManager.set_done_callback 미구현 → BFF self-poller 켜기 (PREMIUM_BFF_DISABLE_SELF_POLLER=false).
cross-DB write (HANDOFF gap #4): T6 이 살아있다면 analysis_pipeline 의 tech_matcher 가 eval_db 에 쓰는데 BFF 는 eval_system_premium 을 읽음 → compare 결과가 fixture 그대로. A 옵션이면 무관, B 옵션이면 양쪽 동기화 코드 필요.

8부팅 검증 시퀀스 (온프렘 서버에서)

# 1) 이미지 tar 무결성
tar -tf docker-images/pps-images.tar | head
tar -xOf docker-images/pps-images.tar manifest.json | jq '.[].RepoTags'

# 2) compose 정합성 검증 (networks 패치 적용 후)
docker compose config --services
docker compose config --images

# 3) vLLM 헬스
curl -fsS http://localhost:8305/v1/models | jq '.data[].id'
curl -fsS http://localhost:8303/v1/models | jq '.data[].id'

# 4) hwpx-intelligence 헬스 (vLLM 연결 확인)
docker compose up -d hwpx-intelligence
docker compose logs hwpx-intelligence | grep -i "llm_base_url\|vlm_base_url"
curl -fsS http://localhost:8001/api/health

# 5) BFF 헬스
docker compose up -d eval-system-premium-bff
curl -fsS http://localhost:4001/api/health

# 6) e2e: 김기열 주무관님 hwpx 1건 업로드
curl -X POST http://localhost:4001/api/specs \
  -F "file=@samples/2._배전반_규격서/01.hwpx" | jq '.spec_id'
# spec_id 받은 뒤 60s 대기 → /api/proposals/{spec_id}/extracted 호출