Miranda Sheild Johansson is an academic at UCL anthropology who has just published a book with Cambridge University Press and was asked to sign a new generative licensing agreement.
Christine Daouti is a copyright librarian at UCL and part of a working group of looking at copyright and AI.

Attribution: Created by ChatGPT. Prompt by Miranda Sheild Johansson: Create an image that illustrates the legal complexity around AI and copyright. DALL·E 2024-12-20 12.13.17 – A visually striking illustration depicting the legal complexities of artificial intelligence and copyright. The scene features a symbolic sc.webp
The researcher’s perspective
As educators we’ve been thinking a lot about how to deal with AI in teaching, but another part of this story is how our research outputs are used to train AI. Recently, publishers have been asking authors to sign Generative AI licensing agreements (see here for updates on which publishers are doing this https://sr.ithaka.org/our-work/generative-ai-licensing-agreement-tracker/). In theory, these agreements shore up copyright protection with regards AI, allow for royalties to be collected when material is used by AI, and pave the way for publishers to create their own AI tools. In reality, we know very little about the future relationship between our research outputs and AI products, but perhaps these licensing agreements provide a space for us as ‘content creators and trainers’ to have a say and influence the future use of AI in higher education and research? This short piece asks some initial questions that might be useful to think through when it comes to us and AI, it also includes provides key information and tips on how to talk to publishers about these agreements from Christine Daoutis, a copyright librarian UCL.
When I, Miranda, received one of these licensing agreements from my publishers, I wanted to stop and think before I signed. The AI industry is fast moving and the legal and regulatory landscape around it is ever evolving, desperately and often chaotically trying to keep pace with technological advances. In early 2024 the UK government planned to broker a code of best practice across different industries, but it was abandoned as the many stakeholders couldn’t reach a consensus on the basic principles of any agreement (see here for latest development of from UK IP Office Consultation on copyright and AI https://www.gov.uk/government/consultations/copyright-and-artificial-intelligence). Legal cases are currently being fought which will determine the legal future of AI and copyright. For instance, the Authors Guild lawsuit against OpenAI in the Southern District of New York for copyright infringement of their works of fiction will create precedence for the future. As the use of AI products proliferates it mostly feels like this is a ship we, as authors, do not captain and we simply have to buckle in for the ride. But the licensing agreements need our consent and therefore might offer us a little room to question the ‘what’ and ‘how’ of these agreements.
I got in touch with the copyright librarian at my institution, Christine Daouti, and found out that librarians have been doing a lot of work around this already, especially clarifying how current copyright law applies to AI usage. Christine is following live legal cases around copyright and AI, and creating, and updating guidance, for staff and students about legal use of AI and concerns to consider.
Talking to Christine about her findings and thoughts convinced me that we, as a community of researchers, writers, and curators of knowledge (like librarians), need to have a larger conversation about our role in training AI. What are our worst fears and what are our hopes, can we influence the content of these new licencing agreements? Going forward, the quality and impact of AI will depend on what content is legally accessible for its training. Making high quality and rigorous research available to AI democratises knowledge. However, concerns over of attribution, misattribution, infringement, bias, and error abound, as well as broader concerns over the ethical responsibilities of AI companies, working conditions for AI ‘trainers’, and the environmental impact of accelerating AI usage. And, as Christine has highlighted in her conversations with me and the department, how do we deal with the risks of models stifling creativity (by producing outputs based on averages rather than encouraging outliers, which would be much more creative)? The image above illustrates this, being immediately recognisable as an AI original, and lacking many things that an anthropologist might have included if given the same prompt – Create an image that illustrates the legal complexity around AI and copyright – for instance, humans at work.
As a first step, we should all be discussing this within departments and institutions. What does this, the licensing agreements and AI more broadly, mean for us, as authors, and what part can we play in these unfolding AI issues?
Christine has put together some clear and current guidance, which is helpful for this moment in time. We are building a small resources bank with recent relevant publications and websites all linked below. We hope this can be of use to all.
The copyright librarian’s perspective
This commentary draws from guidance on the UCL Library website, https://library-guides.ucl.ac.uk/generative-ai/copyright
The explosion of GenAI in late 2022 certainly added new perspectives to my copyright support role at UCL. Copyright is complex and fascinating enough when humans are involved; adding to this tools that can access and process a huge amount of information in record time, let alone generated works that may or may not be seen as the original creations of a human, and you have copyright conundrums to last you a lifetime.
As Miranda mentions above, our main focus has been the use of GenAI in teaching and learning and, to some extent, the use of AI in research. Using AI tools may or may not constitute copyright infringement; a question complicated by the fact that there is currently little transparency on what AI models are being trained on. Are the sources lawfully accessed and used, and what are the implications of this for AI developers but also users? Answers to this will be largely shaped by court cases (for example, the ongoing court cases (for example, Getty images vs Stability AI and the Authors’ Guild vs OpenAI) but it may be years before a precedent is established. Furthermore, what is decided in one jurisdiction (for example, that training an AI model in the US was ‘fair use’) might be different to a decision in other countries, where different laws apply.
Besides possible infringement and liabilities, there are broader concerns around accuracy, biases and lack of attribution (or misattribution) of the generated material. These points, and even broader concerns around the ethics and environmental impact of AI companies, are some of the issues being highlighted to students and staff in Higher Education. However, relatively less attention has been given to the impact of our own research being used to train AI.
Miranda contacted me a few months ago to discuss an agreement she was asked to consider signing with the publisher of her upcoming book. This was shortly after Taylor and Francis (Informa) Taylor and Francis (Informa) announced a deal with Microsoft allowing the use of their content for AI training, without informing or seeking the consent of their authors. While it could be argued that the copyright to publications included in the deal has been assigned to the publisher and therefore consent was not necessary, contractual interpretation suggests that if an agreement did not specify AI use, permission should be sought from the authors (see Kluwer copyright blog; also see the positions of the Authors’ Guild in the US and the Society of Authors in the UK. Authors being asked to give consent for their works to be used in AI training, receiving royalties and being attributed are among the requirements set out in their positions.
So, what should I advise an author to do when receiving such an agreement? More generally, what should we be considering when facing the prospect of our works being accessed, processed and possibly reused in new AI generated works?
The answer to this is not simple and the first thing that comes to mind is that there is no ‘one size fits all’ approach. Commercial authors in the creative industries will have different interests, concerns and opinions to academic authors; within academia, too, publishing a textbook that may attract royalties will raise different questions and concerns to publishing a journal article with no remuneration. Authors in the humanities will almost certainly have different concerns to SMT authors, and authors of open access publications (intended to be accessed and reused under open licences) may react differently to the prospect of their works, already licensed for reuse, being used by AI.
Other important factors when thinking about AI include who, how, and for what purpose is using the materials in AI tools. We may think differently if a large commercial company is using materials to train its models for profit; differently if a research student is mining our research to look for patterns with the help of AI; and yet differently if a commercial research company draws from our publications to create solutions that might save lives. Overall, it is important to note that AI models being trained on peer-reviewed scholarly research is likely to reduce biases and increase the accuracy and reliability of the models. (For a more extensive discussion on this, please see the Knowledge Rights 21 Principles on Artificial Intelligence, Science and Research).
This diversity in interests, priorities, concerns and approaches must be acknowledged and captured in any consultations and decisions made by policy and law makers. The current UK government consultation on copyright and AI is focussing on introducing a broad copyright exception allowing the use of lawfully accessed materials for AI purposes unless the rights holder has opted out of the exception: in that case, a licence would be necessary. While this solution may be the preferred one for some stakeholders (the consultation proposals were developed with the creative industries in mind, not researchers), it will introduce barriers to accessing and reusing scholarly research with the help of AI.
But back to publisher agreements…Bearing in mind that an AI using your research may well be considered lawful anyway (depending on the jurisdiction) what should you do when asked to agree to your work being included in AI training datasets?
The decision is yours and it will greatly depend on many factors besides copyright. The points below are just some issues you will need to consider and discuss with your publisher.
- Accurate attribution. If parts of your publication is to be reproduced in GenAI outputs, how will you be attributed?
- The purpose and nature of the reuse. What is covered by current licensing agreements with AI providers? For example, your work could be used to improve the accuracy of the tools or to develop new tools. How would this be achieved?
- Are there any providers or uses that you have an objection to, for ethical or other reasons?
- Control over new uses in the future
- Royalties (in the case of books)
You should be able to ask your editor for more information before you make a decision.
Resources:
Gershon, I. (2023). Bullshit Genres: What to Watch for When Studying the New Actant ChatGPT and Its Siblings. Suomen Antropologi: Journal of the Finnish Anthropological Society, 47(3), 115–131. https://doi.org/10.30676/jfas.137824
Tamer Nawar; Generative Artificial Intelligence and Authorship Gaps. American Philosophical Quarterly 1 October 2024; 61 (4): 355–367. doi: https://doi.org/10.5406/21521123.61.4.05
https://sr.ithaka.org/our-work/generative-ai-licensing-agreement-tracker/
https://www.gtlaw.com.au/insights/getty-images-vs-stability-ai
https://cla.co.uk/ai-and-copyright/principles-for-copyright-and-generative-ai/
https://library-guides.ucl.ac.uk/generative-ai/copyright
https://www.gov.uk/government/consultations/copyright-and-artificial-intelligence
