| |

AI, and all that I fear

First published in The Mandarin 26 March 2025 AI, and all that I fear

“It’s too new. It’s just guessing. I don’t know how it works. It’s just a probabilistic word generator. It’s taking my job. I don’t know how to use it”,

These are a few of the reasons raised at a recent Australian Public Sector conference on concerns about AI, signalling the reluctance to embrace it.  The worst of all?

Studies show the public doesn’t trust it, so we [the public sector] shouldn’t use it.”

The evidence is conflicted. More than half our population don’t trust AI, but we are also a world leading adopter. The public is showing their willingness to use AI outweighs their concerns over trust. In fact, we have one of the highest use rates in the world.

The development of generative AI, Large Language Models specifically, is transformational. The way we have stored and accessed data is being turned on its head. We now have access to vast knowledge never before available to us, almost instantaneously, almost magically. We access it through natural language, and receive results aggregated, conversationally, and complete. Compare that to the traditional search engine response of a list of thousands of articles to be reviewed by the reader. This is the visible end of LLMs, but they offer so much more. We need to embrace the opportunity, nationally, to increase productivity and leverage more competitive positions internationally.

While there is clearly some adoption in the public sector, it seems limited, reticent. Almost as if the public sector is an introverted bystander waiting to see what unfolds. Unfortunately, it’s not just the lost opportunity within the public sector that’s concerning, it’s the further impact on national confidence that may be a brake on national adoption. Ideally the public sector will become a visible leader, bolstering confidence by rapid and experimentation and adoption.

Cautions in adopting AI are real and reasonable, but conversations at a recent conference suggested a high level of pseudo-knowledge underpinned by shallow understanding, reinforced by misconceptions and group story telling.

Without being able to conceptualise maths, vectors, matrices, dot products, parallel processing and an alternate universe, AI feels like magic.  A better understanding of what is happening in this space is needed, and ten-minute YouTube videos are not the answer.

I’m hoping this article finds a balance between the two approaches. My apologies to the real experts, for I have taken some shortcuts. It’s not always as technically accurate as they might like, and some metaphors are a stretch, and it’s still a bit of a challenge to read.  But if we understand the concepts better, we can have a better conversation about our concerns, and perhaps just lead the country a little better.  

Here’s a quick look.  We store data in books, on shelves and in computers in very structured, rule-based ways. We built the internet, providing access to immense data. We query it through search engines and if we use the right key words, and the data is structured properly, we are given list of data relevant to our query.

A seminal paper published by eight collaborating authors from Google (Attention Is All You Need) turned that approach on its head. Google and Open AI shared the concepts with the world, sparking extraordinary developments that are reshaping our interactions with technology.

Artificial Intelligence (AI) is a broad field of computer science focused on reasoning and learning. Generative AI generates music, video, imagery and text. Large Language Models (LLMs) are a subset designed for text and natural language processing. They are trained on vast amounts of text and respond in human language. GPT (Generative Pre-trained Transformer) is a proprietary implementation of an LLM, built on an AI architecture called a Transformer. Most modern LLMs rely on Transformers, though alternative architectures are emerging.

Transformers, introduced in 2017, revolutionised natural language processing. Think of them as an engine that allows an AI to process long pieces of text efficiently, capturing relationships between words, concepts, and context.

The approach is very different to most of our traditional algorithmic computing experience. Same input, same data, same algorithm, same output, every time. Repeatable, certain, predictable. It has worked well, albeit with some notable failures, from payroll systems to moonshots.

Traditionally our knowledge is captured in books, rules, and algorithms. While explicit and structured, and easily understood, the approach lacks nuance and context. For example, in conversation we instinctively adjust the meaning of the term “model” to reflect the context in which we are using the word. It could mean an engineering model, an airplane model or a fashion model. We then effortlessly attach layers of additional meaning and associations based on our experiences.

LLMs achieve something similar.

This is the breakthrough concept of the modern LLM. They don’t store knowledge in structured databases, retrieve facts like search engines, or follow fixed rules like algorithms. They operate in this huge multi-dimensional environment, like a “universe of meaning”, called the latent space. A word (called a token) is given a position in this universe, which allows the LLM to determine meaning, relationships and concepts.

Like standing on the bridge of Star Trek’s Enterprise, on the edge of the Klingon empire, your position determines your context (battle stations) and relationships (threat). The position of a token is determined during the training of the LLM.

Traditional systems would store the word King in a dictionary or table of definitions. An LLM embeds King in this Latent Space close to Queen because they share meaning, but far from Apple because they are unrelated.  King would also have a relationship with gender because it is male. Apple does not.  

When an LLM is trained, it ingests massive amounts of text. Every word or phrase (token) is mapped into latent space as a mathematical vector. This vector positions the token. These vectors are not static but shift as the model refines its understanding of patterns, relationships, and contexts they relate to that word (token).

The LLM doesn’t just store facts, it understands associations like:

Paris → France (explicit factual knowledge)

Vaccine → Efficacy (probabilistic relationships from data)

Cause → Effect (abstract reasoning based on language patterns).

This blending of explicit facts, structured knowledge, and inferred meaning is what makes LLMs so powerful. Unlike traditional systems, LLMs can adapt to new contexts, and accommodate ambiguity. They generate responses based on context, usage and current understanding, not on pre-written rules. They use complex probability calculations that consider “if this occurred and this occurred and this occurred, what is the probability of this occurring”. For the mathematically oriented it is Bayesian like, but not exactly.

Early LLMs relied solely on language patterns, making them prone to filling gaps with plausible but incorrect statements, known as hallucinations. Modern LLMs address this issue by incorporating a process called retrieval-augmented generation (RAG), which cross-checks external knowledge sources to improve factual accuracy.

The key point is that LLMs don’t capture, store and retrieve data and facts the way we do. They capture the knowledge (facts, data, context, associations, relationships) by encoding the way we use our knowledge in our language. They do this at a scale that we cannot contemplate or track, and it changes the way we access that knowledge.

We can ask a question of an LLM in natural language, called a prompt. Unlike a search engine, which retrieves pre-existing information, an LLM constructs its answer in real time. It does this by predicting the next most likely word one step at a time.

Initially the LLM breaks the prompt into separate tokens (words, parts of words or some other element) and then embeds them into the Latent Space, based on context. It them compares the prompt against all the relationships it has learned, building up a complete response based on billions of patterns and relationships across massive amounts of data.

It uses probabilities to choose the most likely next word when generating its response, which generates some creativity and diversity in response. That is why the same question asked twice doesn’t deliver an identical answer.

LLMs aren’t guessing when they generate a response. They are working on immense volumes of internal knowledge. They search external sources to verify facts, adjust style based on the context of the prompt (summarising versus explaining for example) and modify outputs based on dynamic user interaction.

The better the prompt, the better the response. A vague question leads to a vague answer. A precise, well-structured prompt guides the model toward higher-quality, more useful insights.

LLMs don’t actually use spreadsheets, they use matrices and vectors—mathematical structures that allow them to process information cleverly. For those not mathematically inclined, a vector is like a single column in a spreadsheet, representing a piece of data, and a matrix is like the spreadsheet itself.

Traditional computing operates sequentially—one calculation at a time. When you multiply numbers together, you do it step by step. LLMs, however, multiply entire matrices of data simultaneously. Instead of processing one calculation at a time, an LLM can operate on every cell in a vector or matrix simultaneously. Crunch. Done. Millions of parameters updated at once, thanks to parallel processing, neural networks and matrices.

LLMs are operating at scale, processing complex relationships at unparalleled speeds, at a cost. Huge storage requirements, massive computational power and energy hungry data centres. GPT-3 reportedly holds 128 billion parameters. Advances in hardware efficiency, optimisation techniques, and even quantum computing will drive costs down over time, as it has done for all our technology advances.

Human reasoning is probabilistic. We don’t store huge databases in our brains. We make decisions based on intuition, judgment and with incomplete knowledge, drawing on past patterns and experience to predict what happens next.

LLMs capture our knowledge by encoding the way we refer to that knowledge in our language. They respond to prompts by processing the prompt and unpacking the most likely pathway through its knowledge universe. In doing so the LLM deals with gaps in knowledge, context and ambiguity. It makes best choices based on the most probable path. They also get things right most of the time, but not always, also like us.

Unlike us, though, LLMs have vastly more data, global patterns of knowledge and language, and an ability to analyse and connect issues almost instantly.

The opportunity for us to use LLMs to enhance our decision making is immense.

It’s time to go boldly where no one has gone before.

Similar Posts