1. Foreword
The Covid-19 virus has been spreading globally for three years, with mutated strains still prevalent worldwide. Compared to when the virus first appeared three years ago, its infectivity has significantly increased, while its lethality and pathogenicity have decreased. According to the principles of viral genetics, greater infectivity and reduced pathogenicity facilitate the spread of viruses. This year may not mark the end of Covid-19 mutations; new variants are likely to continue emerging in human populations. It is not only viruses that undergo mutations—other organisms such as bacteria, fungi, plants, and animals also experience genetic variations. Why does life mutate? There is a mathematical principle underlying this biological phenomenon. This article will explore the important computational significance of heredity and mutation in the biological world, beginning with the Travelling Salesman Problem (TSP), drawing on the insights of Yongle Li.
2. Travelling Salesman Problem (TSP)
The Travelling Salesman Problem (TSP) is a well-known combinatorial optimisation problem in computer science and operations research. The goal of TSP is to find the shortest route that visits a given set of cities and returns to the starting point, visiting each city only once. TSP has numerous practical applications, including vehicle routing, logistics planning, and facility localisation.
For example, imagine four cities—[A, B, C, D]. The distance from A to B is 3 units, from B to C is 7 units, and from A to D is 5 units, and so on. The challenge is to determine the shortest route that visits all cities exactly once.
TSP is classified as NP-hard, meaning that the time complexity of finding the optimal solution increases rapidly with the number of cities. Consequently, finding an optimal solution for TSP is challenging and computationally expensive, especially for large instances. Two commonly used methods to solve TSP are the brute force search algorithm and the greedy algorithm.
(1) Brute Force Algorithm
The brute force search algorithm, also known as exhaustive search, solves TSP by generating all possible combinations of cities and selecting the shortest route.
This algorithm finds the optimal solution by listing all possible routes and comparing them. For example, using the figure above, the algorithm would list all the routes (assuming it starts from city A):
This produces six paths with three distances—35, 39, and 28 units, respectively. Using brute force, every possible combination is considered, which is feasible only for a small number of cities (nodes).
For n
cities, how many routes are possible? Starting from one city, there are (n-1)
options for the next city, then (n-2)
for the following, and so on until there is only one city left. Considering the reversal symmetry of routes, the total number of strategies is half the factorial of n
:
For increasing values of n
, the number of possible routes grows rapidly, making the time complexity of this method exponential—making it impractical for large instances of TSP. While the brute force algorithm is accurate, it is very slow and unsuitable for more complex problems.
(2) Greedy Algorithm
The greedy algorithm is a heuristic method that starts from an initial city and selects the nearest unvisited city as the next step. This process repeats until all cities are visited, and then the algorithm returns to the starting point. The greedy algorithm makes a locally optimal decision at each step, prioritising the shortest available distance.
For example, as shown above, the greedy algorithm selects the shortest distance at each step, but the overall route is not optimal. The local optimum may not represent the global optimum. Although the greedy algorithm provides reasonable solutions for small instances of TSP, it cannot guarantee an optimal solution for larger problems. A greedy algorithm is fast, but not always accurate. This leads to the question: Can an algorithm combine the advantages of brute force search and greedy strategies to find an optimal solution both efficiently and accurately? The human pursuit of such a method led to the development of the genetic algorithm
.
3. Genetic Variation
Genetic variation is a key concept in evolutionary algorithms, including genetic algorithms. It refers to the random variations in individual characteristics within a population across generations. These variations are necessary for a population to evolve and adapt to its environment.
Charles Darwin’s theory of natural selection states that biological species evolve over time through a selection process.
Selection occurs because individuals within a population have varying success in reproducing and passing on their genes to future generations. Better-adapted individuals are more likely to survive and reproduce, while less-adapted individuals are less successful. Over time, these small changes accumulate, leading to changes in the overall characteristics of the population.
The theory is often summarised as “survival of the fittest” because individuals best suited to their environment are more likely to survive and pass on their genes.
Gregor Mendel, considered the father of modern genetics, conducted experiments on pea plants and discovered fundamental principles of inheritance. He found that traits are passed from generation to generation through segregation and recombination of genes. Thomas Hunt Morgan, an American geneticist, studied fruit flies and found that genes are located on specific chromosomes and that genetic recombination occurs when chromosomes exchange genetic material. This leads to new combinations of genes, resulting in evolutionary changes.
In conclusion, the work of Mendel and Morgan is crucial to understanding inheritance, evolution, and genetic mutation.
Here, we discuss gene recombination (crossover) and mutation, using biological concepts to explain these phenomena:
Genetic recombination involves the exchange of genetic material between non-sister chromatids of homologous chromosomes. Homologous chromosomes align in pairs, forming a tetrad of four chromatids. During this phase, chromatids exchange genetic material, creating unique gene combinations. This recombination can occur multiple times within a homologous chromosome pair, producing diverse genetic combinations.
The above explanation provides an overview of genetic variation from a molecular perspective, primarily involving the separation, recombination, and swapping of genes.
4. An Overview of Genetic Algorithms (GA)
Genetic algorithms (GAs) are computational simulations of natural selection and evolution, used to solve optimisation problems. They mimic how organisms evolve across generations through mutation, selection, and recombination.
GAs begin with a population of randomly generated individuals, each representing a potential solution to an optimisation problem. These individuals are evaluated using a fitness function to determine their ability to solve the problem. The fittest individuals are selected for breeding, and their genetic material is combined to produce offspring. This process continues over multiple generations, resulting in a population with progressively higher fitness.
As illustrated above, the genetic algorithm starts with initialising the population, then calculates each individual’s fitness, and processes genetic variations to produce offspring. This cycle continues until a stopping condition is met.
Genetic algorithms have applications across artificial intelligence, machine learning, finance, and engineering. They are used to create AI agents for games, optimise neural networks, and tune machine learning parameters. In finance, GAs help optimise investment portfolios and trading strategies, while in engineering, they optimise designs for structures and electrical circuits.
The advantage of genetic algorithms lies in their ability to find globally optimal solutions even in complex, non-linear problems, making them suitable for a wide range of applications such as scheduling, resource allocation, and route optimisation.
5. Application of Genetic Algorithm to TSP
(1) Ideas and Principles
Applying GAs to TSP involves representing city routes as chromosomes, with each city corresponding to a gene. The fitness of each route is evaluated based on the total path length. The fittest individuals are selected for breeding, and genetic information is combined to produce offspring. Mutation is introduced to add diversity.
Suppose there are six cities, and a travelling salesman wants to visit all of them. There are 60 potential routes (5!/2 = 60
). Instead of evaluating all routes, we use the genetic algorithm approach. We begin by creating an initial population—e.g., four individuals representing different paths.
S1 = [1, 2, 3, 4, 5, 6]
S2 = [1, 3, 2, 6, 5, 4]
S3 = [1, 4, 2, 6, 3, 5]
S4 = [1, 6, 2, 5, 3, 4]
These paths are initial random paths, similar to primitive life forms. We then calculate the fitness of each path. Let’s assume the cost (or distance) of each path is as follows:
- S1: 10
- S2: 20
- S3: 20
- S4: 50
The fitness value, f
, is given by f = 1/d
(where d
is the distance). Therefore, the fitness of each path is:
- S1: 0.1
- S2: 0.05
- S3: 0.05
- S4: 0.02
Next, we use a probabilistic approach to determine which paths are selected for inheritance. The probability of inheritance for each path is proportional to its fitness, which can be visualised as segments on a roulette wheel:
As shown above, each section of the roulette wheel represents a path’s probability of selection, with larger areas indicating higher fitness. Spinning the wheel selects paths probabilistically, ensuring that fitter paths are more likely to be chosen but not guaranteed.
Once two parents are selected, crossover and mutation are applied. For crossover, DNA fragments of S1 and S2 are swapped at a random point, generating new offspring paths S1’ and S2’. Mutation involves randomly altering the order of cities in a path. This helps maintain diversity and prevents premature convergence.
This process repeats—evaluating the fitness of offspring, selecting parents, and generating the next generation—until a sufficiently optimal solution is found.
Overall, TSP is a challenging optimisation problem with many practical applications. The brute force and greedy algorithms may be effective for small instances but are less suitable for larger cases. Genetic algorithms, inspired by natural selection, offer a promising solution, though they can be computationally expensive for larger problems.
(2) Code Implementation
A random population of potential solutions is initially generated, and genetic operators like mutation and crossover are applied iteratively until an optimal solution is found. The code below is generated using ChatGPT:
1 | import random |
Copyright Notice
This article, except for the referenced content below, is the original work of Junhao. The author retains the exclusive rights to its final interpretation. If there are any issues regarding copyright infringement, please contact me for removal. Reproduction or distribution of this content without my explicit permission is prohibited.
6. References
[1]. Biology LibreTexts. (2018). 7.6: Genetic Variation. article.
[2]. Clever algorithms : nature-inspired programming recipes. United Kingdom: Lulu.com.Li, Y. (2022).
[3]. Why do viruses keep mutating? Traveling salesman problem and genetic algorithm. [online] Yongle Li. Available at: https://youtu.be/iM-VKHWd_JE [Accessed 9 Feb. 2023].
[4]. OpenAI (nd). ChatGPT. [online] chatGPT. Available at: https://chat.openai.com/chat.