Responsible use of AI in scientific advice

Impact lead: David Budtz Pedersen

Subproject: ADD Knowledge Broker Unit

Introduction to the case by Prof. David Budtz Pedersen

Context

Across the world, science advisers act as knowledge brokers providing decisionmakers, civil servants, and politicians with information on how science and technology intersect with societal issues. Recent advances in artificial intelligence (AI) have stoked various discussions around large language models (LLMs), such as ChatGPT and others, that can generate text in response to typed prompts. Less discussed is how such technologies might be used constructively, to create tools that summarize scientific evidence
for policymaking.

AI-based tools could increase the capacity of science advisers and help policymakers become informed about rapidly evolving issues, emergencies, and debates. But how should such tools be designed and used? In this ADD subproject, we engaged with an international group of scholars and policy professionals to highlight the need for rigorous testing, bias screening, and safeguarding against errors and hallucinations. While LLMs could increase the availability of evidence in government, the use of AI to inform policymakers pose serious risks and uncertainties that research need to address.

Our main finding of the international project is that institutions should experiment with using LLMs to provide decision-support but that such exercises should be closely monitored, supported by codes of conducts, and embedded in ethical reviews. This call for action and the policy recommendations based on the subproject resonated with a broader policy community who attended workshops, downloaded our research commentaries, and expressed deep engagement with the findings.

The study

AI-based tools are emerging, presenting new opportunities to improve scientific advice to policymakers, making access to evidence more agile, rigorous and targeted. But leveraging such tools for good will require science advisers and policy institutions to create guidelines and to carefully consider the design and responsible use of this nascent technology. In this study, we explored two tasks for which generative AI tools hold promises for policy guidance: (1) synthesising evidence and (2) drafting briefing papers.

Currently, evidence searches are time-consuming and involve various human resources. Hard-pressed science advisers must take what they can get. But what if the evidence searches could utilize algorithmic outcomes? Two main approaches are used to synthesise evidence for policymakers: systematic reviews and subjectwide evidence syntheses. Both require time and resources to run. AI-based platforms are
increasingly seeking to make such syntheses less time-consuming, freeing subject-matter experts to focus on more complex analytical aspects.

Systematic reviews – such as Cochrane reviews in health and medicine – identify a question of interest and then systematically locate and analyse all relevant studies to find the best answer. Increasingly, machine learning can automate the search, screening, and data-extraction processes that form the early stages of systematic reviews. We found that AI tools can be useful in making sense of emerging domains of research, in which review papers and disciplinary journals might be lacking. For instance, techniques for natural language processing can systematically classify research on AI itself, and graph algorithms are being used to detect emerging ‘clusters’ of research in the broader literature.

Main finding

Large language models are rapidly increasing their ability to synthesise scientific evidence for policymakers. While such technologies might be used constructively for identifying research gaps and simulating policy outcomes, experiments should be guarded and closely monitored by humans and professionals, supported by codes of conducts and ethical review guidelines. The study found the following key aspects of AI in science for policy:

Strengths:
AI can rapidly process evidence, identify gaps, and create predictive models (e.g., in public health or environmental policy). Specialized AI can also flag errors in manuscripts.
Risks and pitfalls:
Large Language Models (LLMs) can produce “hallucinations” (false information), inherit biases from training data, and lack the moral judgment necessary for ethical policymaking.
Human-in-the-loop:
Effective use requires human experts to verify AI-generated content, ensure transparency, and manage ethical concerns.
Policy adoption:
Research institutions should develop frameworks to ensure responsible AI use, prioritizing accountability and transparency in scientific research.

Pathways to impact

The research project’s pathway to impact was achieved through strategic and interactive communication with international science-to-policy communities, networks, and decision-makers. Meetings, panels, and conferences were used to convey the messages and open avenues for mutual discussion between project participants and policymakers, creating relational and conceptual impact pathways. The research paper from the project was published in the leading international journal Nature, which helped to disseminate the project’s results to a wider audience.

The article generated the following PlumX metrics:

Metrics details

This article has three policy citations and one further report mention:

Tuomi, Ilkka (2024). Fostering knowledge-sharing within and among S4P actors. 7 October 2024. Publications Office of the European Union by Directorate-General for Research and Innovation (European Commission).
European Commission (2024). Successful and timely uptake of artificial intelligence in science in the EU. 15 April 2024. Publications Office of the European Union by Directorate-General for Research and Innovation.
Royal Netherlands Academy of Arts and Sciences (2024). Successful and timely uptake of Artificial Intelligence in in science in the EU. 12 April 2024. KNAW.
The Behavioural Insights Team (2024). A Blueprint for Better International Collaboration on Evidence. 9 Sep 2024.

Furthermore, the article and the research were presented at the following events and policy meetings:

World Science Forum, Budapest, 20-23 November 2024
American Association for the Advancement of Science Annual Meeting, Denver, 2024
European Commission & US State Department Meeting, “How Might Artificial Intelligence Support Evidence Informed Policymaking?”. 10 November 2023.
International Network for Government Science Advice (INGSA) Global Meeting “The Transformation Imperative: Expanded Evidence for Inclusive Policies in Diverse Contexts”. Kigali, Rwanda 1-3 May 2024
EU Scientific Advice Mechanism and ALLEA – The European Federation of Academies of Sciences and Humanities, “Upholding Integrity in Scientific Advice: Key Principles and Challenges”, Budapest, 4 November 2025.
OECD and Quebec Chief Scientist Meeting in Brussels, “AI as a critical infrastructure for evidence-informed policymaking?”, General Delegate of Québec. 20 November. 2024.

Policy impact

The study resulting from the subproject was published in Nature in September 2023. It was circulated and mentioned in several news posts and online social media, e.g., highlighted by the Nature journal itself (see PlumX metrics). The article was mentioned by the US House Admin Committee Hearing (September 27, 2023) in Washington DC. Timothy M. Persons (Chief Scientist and Managing Director of the Science,
Technology Assessment, and Analytics team of the United States Government Accountability Office) included a reference to the project in his remarks.

On November 10, 2023, the ADD Knowledge Broker David Budtz Pedersen presented the recommendations to the US State Department and US National Science Foundation in Washington DC. On the same occasion, high-level policymakers from the European Commission were present, and interacted with the findings and recommendations.

The article was debated at the US-EU Transatlantic Science Policy Forum “How Might Artificial Intelligence Support Evidence Informed Policymaking?”. The article formed the basis for a scientific session adopted for the official programme at the American Association for the Advancement of Science (AAAS) Annual Meeting in Denver on February 14, 2024, at which occasion David Budtz Pedersen presented findings to
an audience of 150 persons.

Conclusion

To realise the potential of AI tools in drawing together evidence while minimising possible drawbacks, the following three recommendations from the project were communicated in open conversation with policymakers:

Consistency. Many academic journals use standardised formats for reporting study results,
but there is great variation across disciplines. Other sources of information, including working
papers, project reports and publications from international agencies, non-governmental organisations and industry, are even more mismatched. Such diversity in presentation makes it difficult to develop fully automated methods to identify specific findings and study criteria. For example, it is usually important to know over what period an effect was measured or how large the sample was, but this information can be buried in the text. Presenting the research methodology and results in a more consistent manner could help.
Credibility. Science advisers judge whether evidence is trustworthy in five ways: the plausibility
of the findings (assessed on the basis of the advisers’ subject knowledge and evaluation of the research); the authors’ reputations; the standing of the authors’ institutions; the views of others in the field; and the perspectives of colleagues and peers. This multifaceted judgement is hard to replicate in an AI tool. Publication metrics, such as impact factors and citation counts, are found to be poor measures of research quality. Which dimensions of credibility are most important might also differ, depending on the policy question and context. Experts will need to agree on standards for research quality before these can be automated in AI-based tools – a significant task, although progress is being made.
Database selection and access. Currently, conducting systematic reviews requires searching across databases – mostly proprietary ones – to identify relevant scientific literature. The choice of database matters and can have a substantial impact on the outcome. But requirements by governments to publish funded research as open access could make it easier to retrieve study results. For research topics that governments deem as funding priorities, eliminating paywalls will enable the creation of evidence databases and ensure alignment with copyright laws.

While policymakers might use LLMs in preparation of policies, experiments should be guarded and closely monitored by humans and professional staff, supported by codes of conducts and ethical review guidelines and continued joint organisational learning.

This case study shows that AI tools may strengthen science-for-policy work by helping advisers synthesise evidence, identify research gaps, and support the preparation of briefings in fast-moving policy contexts. At the same time, the study makes clear that such tools are not ready to replace expert judgement: risks of hallucinations, embedded bias, weak credibility assessment, and uneven access to research all make human oversight essential.

The project’s main contribution has been to frame a responsible pathway for experimentation, emphasizing consistency in research reporting, clearer standards of credibility, and improved access to evidence as key preconditions for trustworthy use. Through publication and engagement with international science-for-policy networks and policy institutions, the study has helped place these questions on the agenda and
contributed to ongoing discussion about how AI can be used responsibly in evidence-informed policymaking.

References

Tyler, C., Akerlof, K. L., Allegra, A., Arnold, Z., Canino, H., Doornenbal, M. A., Goldstein, J. A.,
Budtz Pedersen, D. & Sutherland, W. J. (2023) AI tools as science policy advisers? The potential
and the pitfalls: Nature. 622, 7981

Pedersen, D. B., (2025). Principles for Science Advise: A Comparative Framework of International
Guidelines., (submitted): Issues in Science and Technology.

Pedersen, D. B., (2024). Mapping and Strengthening Ecosystems of Science for Policy, In: The Transformation Imperative: Expanded Evidence for Inclusive Policies in Diverse Contexts. Quebec
Office of Science: International Network for Governmental Science Advice.