AI for Biomedical Research Data: The BimmoH Dataset and EU Compliance

Biomedical research increasingly depends on structured, machine-readable datasets derived from complex human biology models. The BimmoH project — Biomedical models Hub — represents one of the more ambitious efforts to use artificial intelligence tools to collect, curate, and structure data arising from human-biology-based model systems. As BimmoH and similar initiatives scale, they intersect with a dense web of EU legal obligations spanning data protection, AI governance, and the regulation of experimental biology.

What Is the BimmoH Project?

BimmoH is an initiative aimed at creating a centralised, interoperable hub for computational and experimental biomedical models. Its scope covers mathematical representations of human physiology, organ-on-chip experimental outputs, organoid characterisation data, and systems-biology network models. The project uses AI-assisted tools to extract structured metadata from published literature and laboratory information management systems, enabling researchers to query, compare, and reuse model data across institutions and disciplines.

The practical problem BimmoH addresses is fragmentation: biomedical model data currently exists in siloed formats across thousands of research groups, journals, and institutional repositories. Manual curation is slow and error-prone. AI extraction tools that parse experimental protocols, extract quantitative parameters, and populate standardised ontological fields can accelerate the assembly of a coherent, searchable knowledge base by orders of magnitude compared to human-only approaches.

GDPR Obligations: Article 9 and the Research Derogation

The first and most significant compliance consideration for any project handling biomedical model data is whether that data constitutes special category personal data under Article 9 of the General Data Protection Regulation. Article 9(1) prohibits the processing of data revealing health information unless one of the exhaustive derogations in Article 9(2) applies.

Not all biomedical model data is personal data. Mathematical models of generic human physiology, anonymised organoid parameters, and published in-vitro experimental results may fall outside the personal data definition entirely where no individual is identifiable. The critical analysis is whether the data can be linked back to a natural person directly or indirectly, considering all means reasonably likely to be used by the controller or a third party.

Where data does involve identifiable health information — for example, patient-derived organoid lines with retained biobank identifiers, or clinical trial datasets feeding into model calibration — Article 9(2)(j) provides the relevant derogation: processing is permitted where necessary for scientific research purposes, subject to Union or member state law and to appropriate safeguards. Article 89 GDPR elaborates on those safeguards, requiring technical and organisational measures that ensure data minimisation, pseudonymisation where possible, and restrictions on use for purposes incompatible with the original research aim.

Research institutions running BimmoH data pipelines must conduct a data protection impact assessment under Article 35 GDPR where the processing is likely to result in a high risk to individuals. Large-scale processing of health data using novel AI profiling techniques is explicitly listed in the Article 29 Working Party guidelines on DPIA as a category warranting mandatory assessment.

EU AI Act Obligations for Research AI Tools

The AI extraction and structuring tools used in BimmoH fall within the scope of the EU AI Act, which defines an AI system in Article 3(1) as a machine-based system designed to operate with varying levels of autonomy that generates outputs such as predictions, recommendations, decisions, or content. Literature extraction systems that classify experimental parameters, infer biological relationships, and populate structured databases clearly satisfy this definition.

The risk classification of such systems is critical. Article 6 and Annex III of the AI Act define high-risk AI systems; research AI tools used purely for internal data curation and not deployed to make decisions affecting individuals are unlikely to fall into the high-risk categories listed in Annex III as currently drafted. However, AI systems used to assist in clinical research design or that feed outputs into regulatory submissions may cross into higher-risk territory.

Regardless of risk tier, all AI systems within scope must comply with Article 13 transparency obligations. Article 13 requires that high-risk AI systems be designed to allow those deploying them to understand the system's capabilities and limitations. For research AI tools, this translates to documentation of training data provenance, model architecture, validation methodology, and known failure modes — requirements that align well with good scientific practice standards.

Article 10 data governance obligations require that training data used for AI systems in scope meet quality criteria appropriate to the system's intended purpose. For AI tools trained on scientific literature, this means attention to corpus composition, temporal coverage, and representation of minority biological models that might otherwise be underrepresented in published literature.

Directive 2010/63/EU and the 3Rs Principle

Directive 2010/63/EU on the protection of animals used for scientific purposes introduces Article 4's three Rs principle — replacement, reduction, and refinement — as a binding obligation on member states and the research establishments they authorise. Competent authorities are required to consider whether the proposed scientific purpose can be achieved without the use of animals, and whether the number of animals used is the minimum necessary.

BimmoH's computational and organ-on-chip model data directly supports the Replacement and Reduction prongs of Article 4. By providing validated, reusable computational models of human biology, BimmoH reduces the pressure on researchers to conduct redundant animal experiments when in-silico or in-vitro alternatives exist. Research institutions that contribute to and draw from BimmoH can document their 3R compliance efforts with reference to the availability of validated computational alternatives in the Hub.

Competent authorities reviewing licence applications for animal experiments under Article 38 of the Directive are increasingly receptive to evidence that applicants have searched structured model repositories before concluding that animal use is necessary. A well-curated BimmoH database thus functions as part of the compliance infrastructure for 3R obligations.

Practical Steps for Research Institutions

Research institutions engaging with BimmoH or similar AI-driven biomedical data platforms should implement a structured compliance programme covering four domains. First, data protection: classify all datasets by personal data status, conduct DPIAs for health data processing, and establish data sharing agreements that specify the legal basis and safeguards applicable to cross-institutional transfers. Second, AI governance: document the AI tools used for data extraction against Article 13 and Article 10 standards, maintain version-controlled records of model updates, and implement human review checkpoints for high-consequence classifications. Third, 3R documentation: integrate BimmoH search results into pre-experiment planning records submitted to institutional animal welfare bodies and competent authorities. Fourth, data quality: establish metadata validation procedures that catch classification errors before they propagate across the hub, given the systemic risk that a widely shared dataset carries.

Frequently Asked Questions

Does anonymised biomedical model data fall outside GDPR scope? Only if re-identification is not reasonably possible considering all available means. Organ-on-chip data derived from patient biopsies with retained biobank codes remains personal data. Purely synthetic or population-average computational models with no linkage to identifiable individuals are generally outside GDPR scope, but institutions should document the anonymisation rationale explicitly.

What documentation must accompany an AI literature extraction tool under Article 13 EU AI Act? Article 13 requires documentation sufficient for deployers to understand capabilities, limitations, and appropriate use conditions. In practice this means training data description, validation benchmark results, confidence scoring methodology, and a record of known failure modes. For research tools not classified as high-risk, these requirements are less prescriptive but represent scientific best practice in any event.

How does BimmoH data help with Directive 2010/63/EU licence applications? Applicants must demonstrate under Article 38 that they have considered non-animal alternatives. A documented search of validated computational and in-vitro models in a hub like BimmoH, with a reasoned explanation of why available models are insufficient for the proposed research question, constitutes evidence of compliance with the 3R replacement obligation.

Sources

Regulation (EU) 2016/679 (GDPR), Articles 9, 35, 89
Regulation (EU) 2024/1689 (EU AI Act), Articles 3, 6, 10, 13, Annex III
Directive 2010/63/EU of the European Parliament and of the Council on the protection of animals used for scientific purposes, Article 4, Article 38
Article 29 Working Party, Guidelines on Data Protection Impact Assessment, WP248rev.01
BimmoH Project Consortium, Technical Description and Data Model, 2024
European Commission, Recommendation on responsible conduct of research in relation to AI and data, C(2022)4704

Leveraging artificial intelligence tools to collect and structure information on human biology-based models used in biomedical research : the Biomedical models Hub (BimmoH) dataset

What you need to know: Leveraging artificial intelligence tools to collect and structure information on human biology-based models used in biomedical research : the Biomedical models Hub (BimmoH) dataset

AI for Biomedical Research Data: The BimmoH Dataset and EU Compliance

What Is the BimmoH Project?

GDPR Obligations: Article 9 and the Research Derogation

EU AI Act Obligations for Research AI Tools

Directive 2010/63/EU and the 3Rs Principle

Practical Steps for Research Institutions

Frequently Asked Questions

Sources

Key takeaways: Leveraging artificial intelligence tools to collect and structure information on human biology-based models used in biomedical research : the Biomedical models Hub (BimmoH) dataset

See how your site scores

Related Regulation

Related Posts

Schrems II and Consent Management: What EU Organisations Must Do Now

Schrems II and Consent Management: What EU Organisations Must Do Now

EDPB/EDPS: Clinical Trials—Health Data Safeguards Required