Glass Pipeline is a stateless extraction service. Each request flows through a deterministic sequence of stages, fully scoped to the factory.
Complex email documents (threads with attachments) are the hardest case in glass extraction. Our approach:
Email PDF received
│
├─ Split: detect email markers (De:/From:/Envoyé:)
│ ├─ Email body segments (context)
│ └─ Attachment pages (tables, drawings)
│
├─ Extract context from email body (Haiku)
│ → compositions, instructions, modifications
│
└─ Extract measurements from attachments
with email context injected into prompt
| Component | Technology | Role |
|---|---|---|
| PDF Parser | PyMuPDF + pdfplumber | Text, tables, page images |
| VLM | Claude Sonnet 4.6 (Bedrock) | Vision extraction |
| Classification | Claude Haiku 4.5 (Bedrock) | Doc type, client info |
| Normalizer | Python (deterministic) | Factory-scoped rules |
| Matcher | Snake API (SAT + Fuzzy) | Article matching |
| Server | FastAPI + uvicorn | Async HTTP |
| Infra | EC2 t3.medium + nginx + certbot | HTTPS, reverse proxy |
Sonnet: eu-west-3 → eu-central-1 → us-east-1 → us-west-2 Haiku: eu-west-3 → us-west-2 → us-east-1
On 500/503/timeout, automatically cycles to next region. Non-retryable errors (400/403) skip immediately.
| ID | Factory | Normalization | Snake Scope |
|---|---|---|---|
| 1 | VIT | FE→rTherm, bare spacer→alu gris | VIT articles |
| 3 | Monce | FE→LowE, standard rules | Monce articles |
| 4 | VIP | FE→rTherm, bare spacer→alu gris | Riou articles |
| 9 | Eurovitrage | FE→LowE | Eurovitrage articles |
| 10 | TGVI | FE→LowE | TGVI articles |