alden ho
← Work

AI document processing pipeline

Fine-tuned classification and extraction models for high-volume document ingestion. Identified a training-data quality regression and architected the remediation path.

Client
Confidential — fintech
Role
AI infrastructure engineer
Year
2025
Stack
OpenAI fine-tuningPythonAWSS3Step Functions

End-to-end ownership of an LLM-powered document processing pipeline: classification, extraction, validation, and human-review escalation.

What we did

  • Built the fine-tuning data pipeline (curation, deduplication, validation against ground truth).
  • Stood up an evaluation harness so model regressions were visible before deploys, not after.
  • Diagnosed a quality regression that turned out to be label drift in the training set, not the model. Wrote up the post-mortem and the data-quality controls that prevent it from recurring.
  • Migrated the production inference path from a fragile single-region setup to a queueable, retry-safe pipeline on Step Functions.

Outcome

Measurable accuracy lift on the priority document classes. Clear, repeatable retraining cadence. The team stopped firefighting model regressions.