Close

Morten Hesseldahl on the development of Danish language models

IDA Conference, November 6th, 9:00 a.m. - 2:00 p.m.

Thank you for the invitation to participate in the conversation about a Danish language model. Everything is very new, and as a society, we still lack a full understanding of what we’re talking about when we discuss artificial intelligence and language models. And that understanding is crucial.

Without language, we have no understanding, no awareness, and no capacity to make sensible choices. The opening line of the Gospel of John is not coincidental: “In the beginning was the word.”

Words enable us to distinguish, think, and act.

Right now, for example, there is some confusion about what a “language model” actually is, so briefly—and primarily for my own benefit: “A language model is a type of artificial intelligence trained to understand, generate, and predict text based on an analysis of vast amounts of uploaded data.”

With artificial intelligence, Microsoft and Google are shifting from being “search engines” to becoming “answer engines.”

While search engines directed users to various sources—influencers, Wikipedia, Lex.dk, you name it—with answer engines, one receives a source-less but otherwise coherent answer generated from what is currently available online.

Is this a problem?

If answer engines of the future simply lead to an improvement in quality, it will be good. However, if future answer engines weaken the accumulation and dissemination of more qualified knowledge, it will be bad. And it will be very bad if we can’t scrutinize the answers.

For creative people, answer engines will be a powerful tool to shape inspiring scenarios and provide useful building blocks in their work, although it will likely also lead many others to feign creativity and artistry, as they will be able to deliver something that looks the part, yet simply isn’t.

Two issues stand out regardless: First, that future answers will not be equipped with immediately verifiable sources, and second, that answers will most likely contain a bias depending on who has designed the language model’s architecture.

A Value-Based Language Model

When the idea of developing a Danish language model arises, it’s likely because of a concern about whether Danish values can hold their own against answer engines based on, for example, American or Chinese values.

It’s not problematic if one asks about the distance to Mars or the recipe for Peking Duck, but it becomes potentially toxic if one asks about active euthanasia, abortion, and suicide.

Or whether Israel is a legitimate state, whether Putin is the aggressor, or whether it’s the West that is. And what about the colonial period? Did it pave the way for Third World freedom, or is it purely a story of exploitation and oppression?

In the future, a potential moral and political coloring of the answer engines will be an influential factor in crucial areas.

This brings me to another aspect that, in my view, is receiving far too little attention, namely the question of “intention.”

Intention

When we go to a sports store and buy a football, we expect that the ball can be used for play and competition. The identity of the creator isn’t important, as long as the ball can bounce.

In other cases, however, the creator’s identity and intention are integral to the “product.”

If someone expresses their love, we want to know that it’s sincerely meant. When we interact with art, we want to know that there’s a will behind it—and thus a person who wants something of us. Otherwise, it’s not art we’re dealing with.

Mozart composed more than 600 works. Artificial intelligence can compose infinitely more that will sound just like Mozart, but they aren’t Mozart and would be as lifeless as a marriage to a mindless robot.

When we seek Mozart, we’re not looking for an imitation. The moment we realize that’s what it is, we lose interest. That’s why market value plummets as soon as buyers realize that the art they were considering isn’t by Banksy or Rembrandt.

Counterfeit art is not art. False love is not love. The intention makes the difference. That’s why we need to know it.

The originator may not matter when we buy a ball or use a calculator, but it does matter every time the creator’s intention is integral to what we seek. And that applies to most areas that aren’t simply trivial.

Artificial intelligence and future answer engines blur both sources and intentions. And that is, of course, a problem.

Danish Language Models

In an interview with Forbes early last year, Bill Gates predicted that “AI will be the hottest topic in 2023.” And he continued, “And you know what? It’s justified. AI is just as important as the PC was, as the internet was.”

The expectation is, in other words, that artificial intelligence will change everything. Including the way we think. Perhaps that’s true.

What is definitely true…

  1. is that the copyright of texts that language models are based on is undermined
    that sources are blurred
  2. that answers can and will be biased by opaque values
  3. It is also true that the growing prevalence of tools like ChatGPT will weaken our already limited ability to express ourselves in writing. A skill that isn’t constantly practiced will indeed wither.

Some skills are trivial, while others are not. The ability to read and write is not trivial. On the contrary, it is probably the most crucial skill when distinguishing between a strong and a weak civilization.

Therefore, it’s fair to ask whether we should simply let this development happen. We might also ask whether there’s anything we can do about it. Hasn’t that ship already sailed?

For example, can we in Denmark even take on the task of creating a Danish language model? Or would it be more realistic to apply some form of Danish value and cultural veneer on top of foreign models?

The large language models like GPT and BERT are already trained in multiple languages, allowing them to understand and produce texts in Danish and thus provide responses that are not inferior to those generated in any other language.

Are we comfortable with this? A clarification would be helpful.

We are well underway in deploying artificial intelligence within the judicial system, healthcare, education, indeed across the entire public administration. These are sectors where a precise understanding of the Danish language, culture, and values has always been foundational, and we cannot simply replace that with foreign-controlled language models without risking something valuable.

The obvious gain would be faster case processing times, while the risk is that we face a massive loss in control, education, and identity, simply because we don’t understand what source validation means and don’t clearly grasp that knowledge does not flow neutrally or value-free from friendly answer engines.

If you asked one of these machines, it might tell you that culture is a dynamic entity. Something that has always been influenced by other cultures and foreign trends. And that’s true.

But it’s also true that just as every culture begins with language and develops through it, it dies the moment it loses precisely that. Its own language.

“Good luck,” one might say to all of us. Or, of course, I mean: “held og lykke!”

Morten Hesseldahl