Large-scale AI language models that generate text or programme code have the potential to significantly increase the competitiveness and innovative strength of the German economy. However, the currently most advanced generative AI models originate mainly from the USA and China and often do not meet the ethical and legal standards that are being discussed in Europe. A recently published white paper from the Plattform Lernende Systeme illustrates the opportunities and challenges of such language models using concrete practical examples. It also analyses the conditions under which companies can exploit the potential of this technology with confidence and legal certainty.
The experts recommend the creation of an open, commercially usable data set in German that corresponds to European values and rules. This data set should support the development of language models in Germany and ensure that they fulfil local requirements.
AI language models as an economic booster
The economic potential of generative AI language models is enormous. For example, the technology promises improved access to company knowledge for employees and the automation of repetitive and error-prone processes such as the processing of business documents. In medicine, generative AI models can support a more precise prediction of disease progression and relieve healthcare professionals of documentation and administrative tasks.
"From the perspective of AI research, large language models represent a significant technological breakthrough. They unlock the intelligence of language. They enable solutions that were previously beyond technological possibilities and now seem to be taken for granted," says Volker Tresp, Professor of Machine Learning at the Ludwig Maximilian University of Munich and Head of the Technological Pathfinders and Data Science working group of the Learning Systems Platform.
AI models are reusable
An important reason for the wide range of possible applications of the language models: the elaborately trained models are reusable and can be adapted to industry- and company-specific requirements. The resource-intensive training of your own AI model is no longer necessary. In many applications, however, the AI models must be able to access sensitive data, such as patient information in medicine or business data when used in companies. According to the white paper, few companies would be prepared to hand over such data to external model providers.
Legally compliant use of large AI models is important for companies. The model should be developed in line with European values and rules. It should also be transparent which data was used to train the models. Prominent models from the USA and China largely do not fulfil these requirements. In addition, non-European AI models only have a comparatively small proportion of German text data in the training dataset, which can lead to errors in the generated German texts.
European AI models are necessary
In view of the increasing influence of non-European models, there is a need to create alternatives in order to drive innovation and competitiveness in Germany and Europe, the authors of the white paper emphasise. In terms of digital sovereignty, language models are needed for the German language and according to our value systems. The basic prerequisite for training large language models is correspondingly extensive and curated training data sets.
"Many companies currently want to use language models. Unfortunately, there are hardly any high-performance German-language models on offer. A strong open source offering made in Germany can create new competition. This only requires 20 terabytes, i.e. 10 laptop hard drives, of German-language data. An indispensable prerequisite is therefore an open source project that makes this data commercially usable in high quality and offers it licence-free, so that a broad community can create secure AI," says Alexander Löser, founder and spokesperson of the Data Science Research Centre at the Berlin University of Applied Sciences and member of the Learning Systems Platform.
Another prerequisite for the training and application of large AI language models in Germany is a powerful computing infrastructure. This should grow in line with the increasing demands on the computing power required for AI models, the authors recommend. In addition, the AI community should be strengthened and talents in research and industry with the relevant AI knowledge and domain expertise should be promoted.
The entire white paper can be downloaded free of charge from the Learning Systems Platform website.