In the last blog post, we argued that LLMs are a new type of computational unit, a general-purpose computational unit. We discussed their core and emergent properties and observed how different they are from all other tools in the software engineering world. We concluded that they are a fundamentally new computational paradigm, which now require a new set of frameworks and design patterns for building software.
In this blog post, we’ll dive deeper and start by defining what a computational unit is. Then we’ll see how to build a real-world software system following this new computational paradigm.
Since LLMs are self-contained computational units, let’s start with this as our layer of abstraction. Let’s generalize the concept of a computational unit such that it also encapsulates traditional software development and human-in-the-loop (HITL) systems.
Let’s assume that a computational unit can be any self-contained, executable operator which transforms a sequence of bits (input data) to another sequence of bits (output data), such as:
• LLMs
• Vector DB (e.g. processing input data and returning k-most similar examples in a DB)
• Python functions (i.e. mapping a list of input variables to return objects)
• Microservices APIs (i.e. mapping an input query to a return object)
• Humans (e.g. mapping input data and returning output data)
• Groups of humans (e.g. mapping input data and returning output data)
• …
This is a very general definition. By this definition, even a chain of units would also itself constitute a computational unit. For example, two LLMs coupled in a chain is also a computational unit.
For simplicity, let’s further assume that each computational unit is stateless and idempotent. Given the same input, the computational unit will always produce the same output. Under this framework, stateful data can be modeled as inputs to a given computational unit (e.g. the row of a database is fed as input to the computational unit, but given the same row as input the function will give the same output). Similarly, randomness can be modeled as an input to the computational unit (e.g. a random seed can be given as an input to the computational unit, and given the same random seed and input it is expected to always return the same output).
This framework gives us a way to define and talk about computational units, but how can we connect them together to build software systems?
Every computational unit takes as input a sequence of bits and outputs another sequence of bits. Therefore, the connection (or the “link”) between computational units simply represents the flow of bits between them (the “flow of information” or the “flow of data”).
Now keep in mind that we’re operating in a probabilistic framework where the inputs at any given point in the system may involve randomness such as random errors. In addition, new information is generated throughout the system as different inputs are processed by different computational units. The more computations are executed, the more new information the system has generated.
The best possible decision on how to link any pair of computational units can be made only when the maximum amount of information is available to the system. In other words, the decision of how to link two pairs of computational units should be done at the last moment possible with the maximum amount of information available.
This leads to the further constraint that every computational unit in the system must output both 1) a sequence of bits (output data), and 2) the next computational unit to execute. This is a very general and powerful approach, because it enables each computational unit to change the flow of information in the system. In other words, the system has probabilistic links.
Let me illustrate this with a concrete example. Let’s say I am exhausted by endless spam emails and want to filter them out. And let’s say I don’t like to check my email multiple times a day, but sometimes still get important, urgent emails which I need to respond to urgently.
I can build an email assistant to solve this problem!
I’ll think about the problem at hand and dig into my toolbox of computational units to solve it!
First, we have an incoming email. How should the system handle this email? The important question here is whether this is a spam email to be ignored, a regular (non-spam) email to be read later or an urgent (non-spam) email to be read now? If we know the answer to this question, then the rest of the system’s behavior is going to be simple to implement.
LLMs are general-purpose computational units and should be able to answer this question. So, we’ll take the incoming email and give it as an input to a computational unit which is an LLM. We’ll query the LLM to classify the email as spam or not spam. This is a well-defined problem and we can easily inspect whether it can distinguish spam from non-spam emails. Based on its output, this LLM will then decide where the information flows to next (i.e. which is the next computational unit to execute).
If it’s a spam email, we should tell the email client to delete it. This part is so trivial that we can write a small Python function to do it for us. Given the email id, delete the corresponding email. In this case, the computational unit is simply a deterministic Python function.
If it’s not a spam email, then the next question is whether it’s an important email to be read now. This is also a well-defined task, which can be handled by an LLM as the computational unit. We’ll query the LLM to classify the email as important or not important. Although well-defined, this task is much more complex than detecting if it’s spam or not. Whether an email is important or not will depend on my personal preferences (e.g. who the sender is, what the topic is etc.). It will also depend on context around it (e.g. the day and time, other responses in the same email thread and whether it’s related to other emails). We can apply a number of design patterns to solve this problem, including:
• Engineering the LLM prompt to understand my personal preferences and context:
• For example, adding into the LLM prompt my most important contacts
• For example, adding into the LLM prompt the topics most important to me
• Few-shot learning based on examples of other urgent emails
• Fine-tuning the LLM with examples of other urgent emails
• Applying Chain of Thought (CoT) reasoning to decide if it’s urgent or not
We’ll discuss all of these design patterns later, but for now let’s consider it a single, self-contained computational unit.
Based on its classification of the email, this second LLM will now decide where the information flows to next. If it’s not an important email, then the system should do nothing and terminate.
If it’s an important email, then the information should flow to another computational unit which will somehow notify me. Let’s say I always want to be notified on Slack when an important email comes so that I can respond to it quickly. In this case, this last computational unit could be another, small Python function. This Python function will generate a query for a Slack Client and then have it send me a Slack message notifying me about the important email.
Here’s the flowchart of such a system:
As you can see above, the computational units are represented by boxes, external components are represented by circles and the flow of information is represented by arrows.
Importantly, the flow of information is itself probabilistic. The first LLM may have a 90% probability of correctly classifying incoming emails as spam or non-spam. This means there is a 90% probability that information is routed to the correct next computational unit, but also a 10% probability that information is routed to an incorrect computational unit. This is important, because it captures the (probabilistic) decisions made by the different computational units and the fact that errors can cascade throughout the system.
I can now easily look at the flowchart to determine how to build and improve my email assistant.
Among other things, the flowchart implies that test cases are needed for each computational unit. I can quickly determine what these test cases should be based on each computational unit’s impact on the rest of the system. In particular, for LLMs (and other probabilistic computational units), these would usually be statistical tests as we need to evaluate the system against a statistically representative set of inputs. The same way one might test an ML model against a test set of given input-output pairs, one can also test a computational unit.
Now, let’s say I’ve built the email assistant and after a few days find that many emails are incorrectly classified as “important”. This is annoying and slowing me down because now my Slack keeps pinging all the time about emails that aren’t important.
I can look at the flowchart and quickly identify the second LLM as being the culprit. I can now decide to implement any of the approaches and design patterns from earlier: engineer the prompt to incorporate information about my personal preferences and context, apply few-shot learning, fine-tune the ML, apply Chain of Thought (CoT) reasoning and so on.
Let’s suppose I wanted to apply a few-shot learning approach to improve the system. In this case, I might start labeling examples of important and unimportant emails and feed them back into the system with few-shot learning to improve its accuracy.
In order to accomplish this, I would add a new computational unit in the form of a “human annotator”, which takes as input a given email and then outputs a label for it (“important” or “not important”). The email and its label is then sent to be stored in a Vector DB. When a new email comes in, the Vector DB will be queried to find the most similar emails and their labels, and this is then provided as few-shot learning examples to the LLM to classify emails as important or not important.
Here is the new flowchart with the human annotator and Vector DB. The Vector DB computational unit has to actually read/write to a DB, but for simplicity I excluded this:
Notice how I achieved all of this by simply adding two new computational units into the system.
We’ve now seen how to apply this new computational paradigm to build real-world software applications, how to think about computational units and the flow of information in the system.
In the next blog post, we’ll look at a set of design patterns for LLMs, which we can use to construct computational units and systems solving highly complex problems.
----------