#1: What constitutes Artificial Intelligence?
This post is part of a series on the similarities and differences between natural and artificial intelligence. You can find the first introductory post here, which includes a description of all the topics that I’ll be covering: https://ishaan-b.medium.com/the-nature-of-intelligence-in-man-and-machine-a-series-c9b6c8a5e2a6
If you’re excited by this subject, feel free to follow my Medium profile here to stay up to date with new posts: https://ishaan-b.medium.com/subscribe
— —
At the outset, it’s important to establish a very clear, concise definition of artificial intelligence.
AI can be described as any non-biological system or agent that has the ability to act rationally. This definition also extends to how we think about natural intelligence, both in humans and the broader animal kingdom, as we will see in my next post. Rationality is, of course, decided and defined by humans, but can also usually be described in some mathematical terms. Acting is the other key operative word here: AI agents must be able to make decisions and/or produce output. And this output is the basis on which we decide whether the artificial system is ‘intelligent’.
As a matter of fact, this 4-quadrant figure is displayed during the very first lecture of the graduate course on Artificial Intelligence at UPenn’s computer science department. I spent the last few months sitting in on that course without any formal enrollment, just to better understand the academic view of how to model and create AI systems.
Armed with this definition of artificial intelligence, it is now clear to us what AI is not:
- It is not about self-awareness, meta-awareness, and the like.
Metaphysical awareness, though stemming from intelligence, is not the same as intelligence. - It is not about creating human-like robots.
Intelligence is not an intrinsically human trait. Robotics is an active and interesting field, and AI has many applications within robotics. But AI does not exist for the purpose of creating human-likeness. - It is not defined by the process of decision-making, or the way it arrives at a ‘rational action’.
This is critical to note. AI is defined by the outcome, and not the process. We have many methods at our disposal to create forms of intelligence that produce the same outcomes as humans. We can analyze how these AI systems work, and compare that to the way natural intelligence works. In fact, this is an active area of research for many computer scientists and cognitive scientists. We can look for inspiration from the human mind in how to design AI systems, and have done so on multiple occasions. But the process itself is not what makes the artificial system ‘intelligent’. What makes the system intelligent is a rational outcome.
How to create artificial intelligence
What exactly are the methods that we can use to create artificial intelligence? The diagram below illustrates the various methodologies at our disposal, the models we can create using them, and the outcomes and tools that we produce. These final tools are the AI systems that a layman would interact with.
This flowchart is illustrative and non-exhaustive, and certainly lacks technical rigor. The reason I created it is to show the wide range of systems that can be called ‘artificial intelligence’, and the various approaches to build them.
The methodologies on the left can be combined in different ways to form specific types of AI models. Logical sets of rules are the simplest method for creating AI through ‘reasoning machines’. Planning and search agents execute these rules to arrive at an output. One example is a tic-tac-toe computer opponent, that follows a set of rules to decide the best square to choose next.
Knowledge-bases (KBs) are incorporated into AI agents that need a representation of the world in which they’re operating. KBs can be added to search algorithms when the algorithm needs to know how the world is structured in order to make intermediate decisions. The A* algorithm is a ubiquitous model in computer science that is created this way, and it is used to find the shortest path between any two nodes on a graph. Its applications range from pathfinding in video games to parsing through sentences in some natural language processing (NLP) use cases.
We use probabilistic frameworks to help models make decisions in non-deterministic situations. Two examples of doing this are Markov Decision Processes (MDPs) and Bayes Nets. MDPs encode different values of a particular variable at different times. They’re used to model systems that we have some control over, though the outcomes are not always certain. They are often used alongside machine learning algorithms to implement reinforcement learning, which involves trying actions, observing outcomes, getting rewards based on how ‘good’ the outcome is, and adjusting your strategy to maximize rewards. Bayes Nets are another form of representing probabilistic scenarios, used to model systems that evolve outside our control but where we have some understanding of how they evolve and the factors they depend on. Bayes Nets are a method of converting many different variables with joint probability distributions into one single topological graph where each node has a conditional probability only with the node preceding it. Thus we can eliminate long chains of dependence, instead creating individual conditions of dependence. This is an extremely powerful tool, and is used along with neural nets to create deep-learning models. In fact, it is often said that Bayes’ theorem of conditional probability (yes, the same one that everyone studies in high school statistics) is the most important equation in all of artificial intelligence.
Machine learning is used to create a model from data/experience, as compared to using an already created model to make optimal decisions. Filtering and classification systems are an example of a model created with machine learning using ‘labeled’ training data. Such models can learn to perform actions like spam filtering or fraud detection by learning from the examples in the labeled training dataset. This method is called ‘supervised learning’. It is different from reinforcement learning mentioned earlier, which is also a form of ML but involves interacting with the environment to learn from the outcomes. Let’s take the example of Deep Blue vs AlphaGo to understand how reinforcement learning works.
Deep Blue is an AI chess player developed by IBM in 1995, while AlphaGo was created in 2015 to play the Chinese game of Go (by a company called Deepmind, later acquired by Google). Back when IBM launched Deep Blue, it captured the world’s imagination by becoming the first machine to beat a reigning chess world champion. It used tree search algorithms across the entire sample space of chess positions to find optimum moves, and leveraged a sophisticated set of logical rules combined with Bayesian probabilistic reasoning to identify the best move. Deep Blue was considered a monumental breakthrough in artificial intelligence. But machine learning was in its infancy at the time, and was not much help in building it. However, when it comes to the game of Go, the sample space is orders of magnitude larger than that of Chess, due to a much bigger board and many more possible moves. Additionally, it is much more difficult to evaluate the strength of a board position in Go in the middle of the game. Thus it becomes extremely complex (not to mention inaccurate) to represent the decision-making process through a set of logical rules. Instead, AlphaGo was created using machine learning, where it ‘learned’ from a huge dataset of past Go games to understand the various board positions, the types of decisions that can be made in each position, and the types of outcomes that can emerge from those decisions. Then it got further trained through reinforcement learning by playing millions of games against itself to hone its decision making process. This reinforcement learning process in AlphaGo did not use pre-determined logical frameworks such as the ones in Deep Blue; in fact, it was accomplished with neural networks.
Neural networks are a method to analyze vast amounts of data in layered structures that enable parallel data processing to identify ‘deeper’ patterns. I will dedicate an entire post in the later part of this series just to the topic of neural nets, but it’s helpful to know that they are used for deep-learning, a form of machine learning where we enable the machine to interpret data with minimal direction from us. The nodes in the neural network correspond to millions, or sometimes billions, of parameters that the deep-learning model fine-tunes by itself throughout the training process to increase the accuracy of its predictions.
The first simple language models were created using probabilistic frameworks and NLP techniques, and they predicted the next word based on the words that preceded it by referencing the text in some original dataset. Large language models (LLMs), on the other hand, use deep-learning to make much more complex predictions that generalize beyond what is just seen in the training data. LLMs are able to understand semantic interpretations and generalize to related words or word sequences.
As we can see, these methodologies are not mutually exclusive, but rather can be combined to produce models that are capable of different types of actions. For example, Dall-E (Open-AI’s text-to-image generation tool) combines features of NLP, LLMs, and another method known as diffusion processing. But there’s one significant exception: planning/logical agents cannot be combined with deep-learning neural network models. This has an important implication, which I will discuss in detail when comparing natural and artificial intelligence.
I could have spent this entire post just talking about deep-learning. It is the foundation of the current wave of generative-AI, and the area that has seen the most advancement in recent years. In fact, whenever anyone is discussing AI tools today, they are primarily referring to gen-AI applications. And whenever anyone is discussing AI itself, they are often just referring to machine learning. That being said, AI models were developed long before machine learning became mainstream. While ML is entirely predictive/statistical in nature, models based on symbolic logic are useful for deterministic processes (where there’s only one correct outcome dictated by a universal set of rules). That is why we have struggled to combine both these approaches, statistical and deterministic, within the same AI algorithm.
I had originally written more detailed explanations and examples for some of the methodologies, such as a sample Bayes Net for a collection of variables with some joint probabilities. But I removed those descriptions to try and make this post as sparse as possible, so as not to distract from the main point: AI is not one monotone block. AI consists of multiple methodologies to create machines that can get to certain outcomes, almost always created with specific applications or tasks in mind (so far). This is worth emphasizing in the wake of today’s machine-learning-led Gen-AI revolution. As we turn our attention next towards natural intelligence, we must keep in mind the whole universe of AI approaches in order to compare it with the different types of human reasoning.