The client
Series B HealthTech company in Berlin. Around 120 people.
Diagnostic imaging software for hospitals across the DACH region.
Strong engineering team. Existing computer vision pipeline already in production.
Not a research lab. A regulated medical device company with CE marking obligations.
They had hit the ceiling on their existing models. They needed someone to lead AI research and push model accuracy — while working with product and clinical teams.
Why the client came to us
The CTO had been searching for three months through academic networks and conference contacts. Plenty of researchers. None with production experience in regulated medical devices.
The standard recruiting approach — post on LinkedIn, filter by PhD — was producing the same profile over and over: strong papers, zero shipped products.
They needed someone who understood the difference between a model that wins a benchmark and a model that gets CE clearance. That filter doesn't exist on LinkedIn.
The core difficulty
There was a fundamental contradiction in this profile.
Research-depth people — PhDs, published in top venues — rarely have production experience.
Production-oriented ML engineers rarely have the research depth for medical imaging.
So the person needed to:
- have deep research chops (computer vision, medical imaging),
- speak with engineers AND clinicians,
- understand regulatory constraints (CE marking, medical devices),
- go hands-on when needed,
- eventually build a team of 3–5.
This is the research-to-production chasm. Most people are on one side or the other.
Research-first profile
- Strong publications
- Novel architectures
- Benchmark-focused
- Rarely shipped to production
Production-first profile
- Fewer publications
- Proven architectures
- Real-world data focused
- Shipped and validated
We needed the second column with enough of the first to push the science forward.
First round: we warned them
First candidate: PhD from a top European university. Published in MICCAI and Nature Medicine. Glowing academic references.
We raised concerns during screening: strong researcher, never shipped anything to production.
The client was excited about the publication record. They hired him anyway.
Three months in: he optimized for paper-worthy accuracy improvements — 0.3% gains on benchmarks — while the product team needed models that work reliably across different scanner types.
He couldn't bridge research perfection and production pragmatism.
They parted ways.
What's important: we didn't disappear
This is a crucial moment.
After the unsuccessful first hire, we didn't walk away.
We went back to the CTO for a real conversation:
- what exactly didn't work,
- pure academia doesn't translate to product,
- publication count is not the filter,
- the real question: “has this person deployed a model that doctors actually used?”
- non-negotiable: must have worked in a regulated environment.
The focus shifted:
- less publication record,
- more production deployment experience,
- higher bar for regulatory understanding,
- must understand that a model needs to work on a 5-year-old GE scanner, not just pristine research datasets.
Second round: the right hire
Search took 6 weeks.
Candidate found in Amsterdam.
She'd been at a medical device company building ML models for radiology.
Fewer publications. But:
- shipped 2 FDA/CE-cleared AI products,
- understood validation protocols,
- understood clinical feedback loops,
- spoke the language of both engineers and radiologists.
Not everyone was convinced immediately. Strong hires at this level rarely produce unanimous enthusiasm.
The CTO made the call.
She passed probation.
Within 5 months:
- real-world model accuracy improved by 12%,
- built a reproducible validation pipeline,
- hired 2 ML engineers.
“You were right about the first candidate. We should have listened.”
— CTO
2
Search rounds
3 months
Total active search
35+
Profiles reviewed
Passed
Probation
What this person actually does
This wasn't a pure researcher. This was a research-to-production bridge.
Model Development
Push accuracy on diagnostic imaging models. Design experiments. Choose architectures. But always with production constraints in mind.
Validation & Regulatory
Build reproducible validation pipelines that satisfy CE marking requirements. Understand what “clinically validated” actually means — not just statistically significant.
Clinical Collaboration
Work directly with radiologists. Translate clinical feedback into model improvements. Understand that a 0.3% benchmark gain means nothing if the model fails on a GE scanner from 2019.
Production Engineering
Ensure models work across scanner types, image qualities, and hospital IT environments. Not just research-grade data.
Team Building
Hire 2-3 ML engineers. Set research direction. Build a function that bridges the lab and the product.
Business impact
Two rounds. Three months of active search. One failed hire. But the outcome justified everything:
- Real-world diagnostic accuracy improved by 12% -- measured on clinical data, not benchmarks
- Reproducible validation pipeline built -- critical for CE marking renewal
- Two ML engineers hired -- research function now self-sustaining
- Clinical team went from skeptical to collaborative -- the new hire spoke their language
- CTO regained confidence in AI research after the first hire's failure
Our value
- Warned the client about the first candidate. They didn't listen, but we were right.
- Didn't disappear after the failure.
- Recalibrated the profile with the CTO.
- Found the balance between research depth and production pragmatism.
Two rounds. Three months. How you handle failure defines whether you're a partner or a vendor.