24 de junho de 2025
Share
EUIPO publishes report on the development of generative AI from a copyright perspective
In May 2025, the European Union Intellectual Property Office (EUIPO) released the report “The Development of Generative Artificial Intelligence from a Copyright Perspective”, which analyzes the challenges and opportunities brought by generative artificial intelligence (GenAI) in the context of European copyright law.
The aim of the study is to clarify how GenAI interacts with copyright from technical, legal, and economic perspectives. To this end, the document examines how copyright-protected content is used in the training of GenAI models, what the applicable legal framework is in the European Union, how creators can reserve their rights through opt-out mechanisms, and what technologies exist to mark or identify AI-generated content.
At the outset, the study explains that GenAI collects data—often through techniques such as web scraping—which may be protected by copyright and processes it to generate new content. This initial data collection and training process is referred to in the study as “GenAI Input”, defined as the set of steps comprising data collection, annotation, cleaning, and processing, as well as the training phases of the AI—ranging from model pre-training to fine-tuning—aimed at optimizing the execution of specific tasks. The process may also include the addition of new data for the reinforcement learning stage.
The report also explains that, beyond using copyright-protected content in its training phase, GenAI may also use such content in the generation of new outputs. Due to the high cost and complexity of training AI systems, Real-time Augmented Generation (RAG) technologies have become increasingly used. These technologies combine generative AI with search mechanisms to access up-to-date information online and produce more current outputs without the need to completely retrain the model. The content generated by GenAI is referred to as “GenAI Output.”
Subsequently, the report presents the legal framework in the European Union (EU) for the use of protected content by AI, highlighting the Copyright in the Digital Single Market Directive (CDSM Directive) and the EU Artificial Intelligence Act (AI Act). The CDSM Directive provides for exceptions to the exclusive reproduction rights of copyright holders. Article 3 of the directive allows for text and data mining (TDM) for AI training purposes without the need for rights holders’ authorization, provided it is carried out for scientific research purposes. Article 4, in turn, allows TDM for other purposes, provided that the rights of holders are respected and that they have the ability to exercise an opt-out to prevent unauthorized use of their works. The AI Act imposes obligations on the use of AI in the EU, requiring compliance with TDM opt-outs, transparency in training data, and clear identification of content generated by GenAI.
From an economic perspective, the study highlights the role of direct licensing agreements between copyright holders and GenAI developers for the use of protected content in AI training. These agreements are driven by factors such as the scarcity of high-quality data and the power dynamics in negotiations. Sectors such as news media and scientific publishing are identified as being best positioned to benefit from these opportunities, especially in applications based on RAG technologies.
The document also examines legal and technical tools available for rights holders to protect their works during the Input phase. One such mechanism is the Robots Exclusion Protocol (REP), which is used by websites to try to control access by data collection bots. However, the study notes that REP is not an effective mechanism, functioning only as a temporary solution.
Given that no single solution effectively enables a TDM opt-out, the study indicates that a combination of legal and technical measures is often employed to safeguard copyright. The legal measures for rights reservation mentioned in the report include unilateral declarations by copyright holders, contractual licensing restrictions, and website terms and conditions.
Technical measures for rights reservation, beyond REP, include: solutions specifically designed to support TDM opt-outs (such as TDM reservation protocols); technologies being adapted for this purpose (such as C2PA content authenticity initiatives); and tools still under development that aim to build broader infrastructures for online copyright management, such as the Liccium Trust Engine Infrastructure or Valuenode’s Open Rights Data Exchange platform. In addition, the study points out that the EUIPO plays a crucial role in providing technical support and raising awareness about the management of rights reservations applied to GenAI.
Finally, the report notes that the AI Act requires transparency regarding outputs generated by GenAI systems and assesses possible solutions for identifying and disclosing the nature of such content. These measures include provenance tracking, AI-generated content detection, and re-identification processing—each with its own strengths and limitations. Developers are also adopting techniques such as filtering and model editing to reduce the risk of copyright violations in GenAI outputs.
The document can be accessed via the link: The Development of Generative Artificial Intelligence from a Copyright Perspective
Note: For quick release, this English version is provided by automated translation without human review.