Medical Knowledge Curation

Medical Knowledge Curation

Medical Knowledge Curation

Accelerating Research and Decision-Making for Doctors

Accelerating Research and Decision-Making for Doctors

Accelerating Research and Decision-Making for Doctors

Ada’s medical knowledge base is the foundation of its condition suggestions, built over years by a team of in-house clinicians. Today, it includes around 700 conditions—an impressive number, but still only a fraction of what exists. Adding each new condition requires hours of manual research, clinical judgment, and review. At the current pace, even modest growth would take years—too slow for a system that needs to keep up with the scale and speed of modern medicine.

To address this, we began exploring whether new technologies could help. In particular, we saw potential in natural language processing—a way to automatically extract relevant insights from vast volumes of medical literature. This opened up the possibility of rethinking how medical knowledge is created at scale, without compromising on clinical rigour or trust.

Solution

Accelerating Condition Creation Through Intelligent Symptom Extraction

To support faster and more scalable condition creation, we developed Curator—a tool that helps doctors move through the research phase with greater speed and confidence. Rather than relying on time-consuming manual searches, Curator automatically extracts symptoms from medical literature, aggregates the findings, and presents them in a structured way—giving doctors a clear starting point for building new condition models.

Symptoms

Symptoms

Symptoms

The core purpose of Curator is to accelerate symptom extraction. The interface focuses entirely on presenting symptoms, ranked by relevance to the condition being researched. A visual relevance bar next to each symptom provides an immediate sense of how strong the supporting evidence is—helping doctors focus on what matters most, first.

Evidence Panel

Evidence Panel

Evidence Panel

To support transparency, each symptom comes with an evidence panel that reveals the underlying sources used to calculate its relevance. Doctors can see exactly which articles contributed to a symptom’s score, and how often it appears across them—building confidence in the output without slowing down the workflow.

To support transparency, each symptom comes with an evidence panel that reveals the underlying sources used to calculate its relevance. Doctors can see exactly which articles contributed to a symptom’s score, and how often it appears across them—building confidence in the output without slowing down the workflow.

To support transparency, each symptom comes with an evidence panel that reveals the underlying sources used to calculate its relevance. Doctors can see exactly which articles contributed to a symptom’s score, and how often it appears across them—building confidence in the output without slowing down the workflow.

Source Context

For every symptom, users can trace the evidence back to its origin. They can not only access the article it was found in, but even view the exact sentence it was extracted from. This level of traceability reinforces trust in the system and supports deeper clinical review when needed.

For every symptom, users can trace the evidence back to its origin. They can not only access the article it was found in, but even view the exact sentence it was extracted from. This level of traceability reinforces trust in the system and supports deeper clinical review when needed.

For every symptom, users can trace the evidence back to its origin. They can not only access the article it was found in, but even view the exact sentence it was extracted from. This level of traceability reinforces trust in the system and supports deeper clinical review when needed.

While Curator is the tool doctors use, its intelligence is powered by earlier work behind the scenes. To train the system to extract symptoms accurately, we first needed to teach it how to read medical literature.

To this end, we built a custom annotation tool that allowed our in-house doctors to tag symptoms and other medical concepts across research articles. This work created the high-quality training data our NLP models relied on—laying the foundation for Curator to deliver relevant, structured insights at scale.

Flexible Annotation System

We designed a robust annotation system that allows clinicians to tag a variety of medical concepts—including symptoms, conditions, and risk factors—with flexibility to mark predictions, confirmations, or edge cases. It also supports complex scenarios like overlapping or discontinued annotations, ensuring high fidelity training data.

We designed a robust annotation system that allows clinicians to tag a variety of medical concepts—including symptoms, conditions, and risk factors—with flexibility to mark predictions, confirmations, or edge cases. It also supports complex scenarios like overlapping or discontinued annotations, ensuring high fidelity training data.

We designed a robust annotation system that allows clinicians to tag a variety of medical concepts—including symptoms, conditions, and risk factors—with flexibility to mark predictions, confirmations, or edge cases. It also supports complex scenarios like overlapping or discontinued annotations, ensuring high fidelity training data.

Efficient Annotation Workflows

To support large-scale annotation, we built in efficient interaction patterns for adding, correcting, and deleting annotations. The tool also supports concept normalization, allowing users to map varied medical expressions to a single standardized term. This ensures consistency across annotations and improves the quality of training data for our NLP models.

Result

Delivering Impact in Research, Not Just in Product

We successfully delivered an MVP of Curator, validating that natural language processing can meaningfully accelerate the creation of medical knowledge. Doctors were able to use the tool to access structured symptom insights extracted from real literature—showing clear potential for reducing manual research time.


The project was funded by the German Federal Ministry of Education and Research as part of a broader initiative to explore machine learning in healthcare. While the product was later hibernated due to shifting company priorities, we concluded the project having met its core goal: demonstrating that clinical expertise and NLP can work hand in hand to scale knowledge creation without sacrificing trust or quality.


In the months following delivery, the project was showcased at the Qurator Conference as an example of applied AI in the medical domain.

Design Decisions

Making Machine Intelligence Usable, Trustworthy, and Collaborative

Designing Curator was less about creating a traditional interface and more about designing a relationship between doctors and machine intelligence. From the beginning, we knew that Curator wouldn’t succeed unless it earned the trust of the people using it—especially in a context as sensitive and high-stakes as medical knowledge creation.

Focusing on One Thing, Really Well

Instead of trying to replicate the full complexity of clinical research, we made a deliberate choice to keep Curator focused on a single task: helping doctors extract relevant symptoms. This narrowed scope allowed us to design with clarity and intention—every interaction, every detail, served that purpose. It also made the tool feel approachable, not overwhelming, especially in an environment where trust is fragile and attention is limited.

Designing for Traceability to Build Trust

For Curator to succeed, doctors needed to trust not only the symptoms it surfaced, but also the process behind them. That meant every piece of information had to be traceable back to its source. We designed the interface so that a symptom’s journey—from extraction to relevance ranking—was never hidden. The evidence panel listed supporting articles, and doctors could drill down to the exact sentence where a symptom was found. This level of traceability didn’t just make the product feel more credible—it made it usable in a clinical context, where verifying evidence is as important as finding it.

Balancing Explicit and Implicit Feedback

To improve both the relevance algorithm and the underlying prediction model, we needed real-world feedback from doctors. But asking for explicit input—like confirming symptoms or rating results—would have interrupted their workflow and added unnecessary friction. At the same time, relying solely on passive usage data risked being too indirect. We chose to capture implicit feedback through natural interactions such as clicks, expansions, and ignored content. This allowed the system to learn and adapt over time without putting extra demands on users—treating feedback as a byproduct of use, not a separate task.

Co-Designing with Clinicians, Not Just Testing with Them

Throughout the project, our in-house doctors weren’t just users—they were collaborators. Their input shaped everything from terminology and tone to interaction patterns and thresholds for confidence. This early involvement helped us catch edge cases, make smarter defaults, and ensure the product reflected real clinical needs—not just engineering goals. It also laid the groundwork for trust, which was essential for a tool that could eventually shift how they work.

Supporting a Shift in Research Mindset

One of the most subtle but important challenges was the shift in how doctors approached medical research. Traditionally, they relied on a few high-quality articles—often behind paywalls—that provided curated, peer-reviewed insights. Curator flipped that model, using NLP to process hundreds of open-access papers and surface symptoms based on volume and pattern.


This introduced a more quantitative, probabilistic view of research, which required a change in how doctors interpreted relevance and trust. We addressed this through design—by surfacing evidence transparently, making predictions traceable, and clearly communicating what the system could and couldn’t do. Designing for adoption meant not just delivering value, but helping doctors feel in control of a new kind of workflow.

Closing Thoughts

This project pushed me to think differently about what it means to design for intelligence. Working with NLP required balancing technical ambition with clinical rigour, and finding ways to make complex systems feel transparent and trustworthy. I learned how to design for uncertainty—how to surface it, not hide it—and how to build trust into every interaction.


It also reinforced the importance of close collaboration. By involving doctors early and often, we were able to build something that respected their expertise and earned their confidence. Even though Curator didn’t move beyond MVP, it shaped how I approach designing with emerging technologies: by staying grounded in real-world use, and by designing with—not just for—the people who rely on it.