ML Engineer
About ApprovalMax
ApprovalMax is a fast-growing B2B SaaS company that helps businesses automate their approval workflows and financial controls. With a global team of over 150 people spanning the UK, Europe, North America, Australia, and South Africa, we build software that matters and we’re scaling quickly.
The Role
Our Capture product extracts structured data from hundreds of thousands of financial documents monthly - invoices, bills, POs - through an OCR pipeline that matches extracted fields against customer accounting systems. Your KPI is zero-touch rate: the percentage of documents where the system output requires zero manual correction. Your job is to move it up - systematically, measurably, and permanently.
We’ve built the foundation, a validated accuracy measurement framework on full production data, a comprehensive error taxonomy of root causes, an error identification methodology, and the first shipped production fixes. You inherit the methodology and the backlog. We need a dedicated owner to execute and scale it. The work splits roughly 70% forensic data investigation / 30% ML engineering, shifting toward 50/50 as models go to production. Four error origins drive the roadmap:
Entity matching (~50% of fixable errors). OCR extracts field values correctly, but the pipeline matches them to the wrong account, supplier, or tax code. Planned: embedding-based similarity search, recommender systems, consensus-based coding prediction - a standalone ML service the core pipeline calls.
Pipeline logic (~25%). Our post-processing pipeline introduces errors through its own deterministic logic - tax treatment misclassification, rounding, spurious adjustment lines. Planned: forensic investigation per pattern, tracing data through processing steps, designing and validating rule-based fixes.
OCR extraction (~25%). The OCR engine misreads the document - wrong currency, phantom line items, structural parsing failures. Planned: build an OCR correction layer - the right approach may be LLM with guardrails, an alternative OCR engine, a correction model from HuggingFace, or a combination. Freedom to choose; rigour required to validate.
User overrides (~equal to the above combined, lower priority). Users change correct values for business reasons. Future: learn organisation/vendor correction patterns, build recommendation systems from historical data.
Remote - applicants must be based in the UK, Serbia, or Moldova.
Key Responsibilities:
Accuracy Investigation & Measurement (~70% initially)
Investigate why documents fail at population scale - query production datasets, compare multiple data representations per document, find statistical patterns that explain hundreds of failures at once. Balance population-level analysis with individual-document forensics where needed.
Own and evolve the accuracy measurement framework. Every fix has an expected uplift, a measured uplift, and a post-deployment monitoring plan.
Inherit and improve the error identification methodology. Two modes: LLM-assisted analysis for discovering new patterns across large document batches, and direct SQL investigation for patterns with clean statistical signatures. The methodology is proven and documented; you upgrade it with your own analytical instincts and DS expertise.
Design fixes for pipeline logic errors by reading the C# codebase, understanding processing step sequence, and identifying root causes. Hand validated designs to C# engineers for production implementation. Verify measured results.
ML Engineering & Model Development (~30% initially, growing)
Build an embedding-based entity matching service: encode supplier/description signals into vector representations, evaluate retrieval quality against ground truth, iterate on ranking. Deploy as a Python service integrated with the core C# pipeline.
Build an OCR correction layer to fix extraction errors before pipeline processing. Evaluate candidates: vision-capable LLMs with structured output validation, alternative OCR engines, document-oriented correction models. Design evaluation harnesses per error pattern, measure correction rate and false-positive risk, productionise what works.
Set up ML pipeline orchestration and MLOps practices: experiment tracking, model versioning, DAG-based pipeline management (Airflow, Azure ML Pipelines, or equivalent), containerised model serving, production monitoring and alerting.
Explore recommender and pattern detection approaches for user override learning. Build correction history datasets, evaluate consensus algorithms, design explainable recommendations.
Collaboration:
Work embedded in the Capture team - standups, sprint context, understanding of the product. Accuracy is your mission; the team is your operating environment.
Collaborate with the AI team on model architecture, ML/AI best practices, and deployment infrastructure. They own the ML platform; you own the accuracy application on top of it.
Work closely with C# backend engineers on the Capture team. You investigate and design; they implement and ship. Tight, daily collaboration.
Essential Skills:
Investigation & Measurement
Measurement rigour as a core discipline. Ground-truth design, false-positive control, holdout validation, selection-bias awareness. You distrust confident results until independently verified.
Forensic data investigation at scale. You’ve found systematic errors across millions of records in messy, real-world data - fraud detection, payment reconciliation, billing accuracy, data quality, or similar domains. You think in distributions and read raw source data before forming hypotheses.
Strong SQL (PostgreSQL, complex analytical queries) and Python (pandas, NumPy, scikit-learn) as your daily investigation tools.
ML Engineering - Structured Document Processing
Hands-on experience with the structured document processing domain: document layout analysis, table extraction, field-level information extraction, OCR output correction. Understanding of how OCR engines work, where they fail, and how to build post-processing that compensates.
Practical ML skills to deliver projects end-to-end: embeddings (sentence-transformers / Hugging Face) and vector similarity search (FAISS, pgvector); recommender and ranking systems; retrieval evaluation; classification (scikit-learn, gradient boosting). From experiment through validation through production deployment.
LLM integration for structured data tasks: prompt engineering for extraction and correction (LangChain / LangGraph), structured output parsing and validation (Pydantic, JSON-schema), confidence scoring, cost/accuracy/latency tradeoff evaluation.
MLOps and deployment: ML pipeline orchestration (Airflow or Azure ML Pipelines), experiment tracking (MLflow), LLM tracing and evaluation (Langfuse), containerised model serving, production monitoring. You ship models that run reliably, not notebooks that demo well.
Working Style:
Self-directed and autonomous. You’re the only person dedicated to accuracy full-time. You own the analytical direction, prioritise your own investigation, and drive results with minimal supervision.
Collaboration across disciplines. You work daily with C# engineers, a product manager, and an AI team. You communicate findings clearly enough for an engineer to implement and technically enough for the AI team to review.
Comfortable reading and tracing C# / .NET code. Our core platform and processing pipeline are C# - this won’t be rewritten, though new services can be separated. You diagnose failures by reading the pipeline. Writing C# is optional and can be delegated to C# engineers.
Nice to Have:
Financial document or accounting domain knowledge (invoices, charts of accounts, tax treatment, Xero/QBO).
Experience with managed OCR services (Azure DI, Google Cloud Vision, AWS Textract) or open-source alternatives.
Experience with pre-trained document understanding models (LayoutLM, Donut, or similar).
Experience building LLM-as-judge or LLM-as-corrector evaluation systems.
Experience with document-oriented processing tools (docling, pdfplumber, PyPDF, or equivalents).
What We Offer
Growing international business with 20,000+ subscribers
Regular performance-based compensation reviews
26 days of paid time off
1 additional day off for your birthday
Remote office assistance
Service-years recognition financial reward
- Department
- Engineering
- Role
- ML
- Locations
- Belgrade, London, Chișinău
- Remote status
- Fully Remote
About ApprovalMax
ApprovalMax provides end-to-end accounts payable automation software for businesses. With 20,000+ businesses using our software worldwide and strong product-market fit built over 8+ years, we've grown by solving a problem that genuinely matters to the people who deal with it every day. Teams who need purchasing and bill approvals to just work, without the back-and-forth and the paperwork.
We're a remote-first, globally distributed team with people across Europe, Australia, and beyond. We move at pace, we're not precious about hierarchy, and we believe the best ideas can come from anywhere in the business. The product has earned its reputation by making our customers' working lives meaningfully easier and that's something the whole team takes pride in.