1. Foreword
The Covid-19 virus has been spreading around the world for three years, yet mutated strains of the virus are still raging around the world. Compared with when the virus first appeared three years ago, the infectivity of this virus has been greatly enhanced, but the lethality and pathogenicity have both decreased. According to the law of viral genetics, stronger infectivity and lower pathogenicity are more conducive to the spread of viruses. This year is not necessarily the end of the mutation of the Covid-19 virus, and new mutant viruses will still appear in human society in the future. In fact, it’s not just viruses that mutate. Variations also occur in higher organisms such as bacteria, fungi, plants and animals. Why does life mutate? There is also a mathematical principle behind this biological problem. This article will gradually explain the important computer significance of heredity and variation in the biological world from the traveling salesman problem (TSP) based on Yongle Li’s inspiration and explanation.
2. Traveling Salesman Problem (TSP)
The traveling salesman problem (TSP) is a well-known combinatorial optimization problem in computer science and operations research. The goal of TSP is to find the shortest route that visits a given set of cities and returns to the starting city, while visiting each city only once. TSP has many practical applications, including vehicle routing planning, logistics, and facility localization.
For example: Now there are four cities [A, B, C, D], A to B is assumed to be 3 distance units, B to C is 7 distance units, A to D is 5 time units, etc… To traverse all cities at once, and each city only once. What strategy should I choose to make the distance the shortest?
TSP is NP-hard, which means that the time complexity of finding the optimal solution grows rapidly with the number of cities. So finding the optimal solution for TSP can be a challenging and time-consuming task, especially for large instances. Two commonly used methods to solve TSP are brute force search algorithm and greedy algorithm.
(1). Brute force algorithm
The brute force search algorithm, also known as brute force or exhaustive search, is a method of solving TSP by generating all possible combinations of cities and selecting the shortest one.
The brute force search algorithm can find the optimal solution by listing all the routes and then comparing them. For example, in the example mentioned in the previous figure, the algorithm will list all the routes (assuming that the search starts from city A):
Soon, six paths are obtained. But the three strategies are 35, 39, and 28 respectively (the distance unit and direction are irrelevant). According to this method, every situation can be searched violently. This is a case where there are few cities (nodes).
Now suppose there are n
cities, how many strategies should there be? Now that we have n
cities, we have (n-1)
options for the first time starting from a city. The second time there are (n-2)
choices…. until the last time, there are 1
choices left. And because the strategy of one positive and one negative is the same. So divide its factorial by 2
:
You can take several numbers for n to test, and quickly find that the time complexity of this method is exponential, which is almost infeasible for large instances of TSP. This algorithm is accurate, but very slow. Because the search complexity is too high, as long as the problem is slightly more complicated, the optimal solution cannot be obtained.
(2). Greedy algorithm
The greedy algorithm is a heuristic method that starts with an initial city and selects the nearest unvisited city as the next move. The algorithm continues until all cities have been visited and returns the starting city. The greedy algorithm will choose the local optimal route for traversal every time. Every step is optimal every time.
For example, as shown above. The greedy algorithm is looking for the smallest distance unit every time, but the overall distance is the highest. The visible local optimum does not necessarily represent the global optimum. Although a greedy algorithm can provide good solutions for small instances of TSP, it cannot guarantee optimal solutions for large instances. A greedy algorithm finds a solution quickly, but not accurately. Can humans find an algorithm that combines a brute force search algorithm with a greedy algorithm? The optimal solution can be found quickly and accurately. Soon, the evolution of humans imitating creatures found genetic algorithm
.
3. Genetic Variation
Genetic variation is a key concept in evolutionary algorithm
(including genetic algorithms). It refers to the random variation in the characteristics of individuals in a population over generations. This variation is necessary for the evolution of the population and adaptation to its environment.
The theory of natural selection from Charles Darwin states that biological species evolve over time through a process of selection.
This selection occurs because individuals in a population will have varying degrees of success in reproducing and passing their genes on to the next generation. Individuals that are better adapted are more likely to survive and reproduce, while those less adapted will struggle to survive and have fewer offspring. Over time, these small changes accumulate, causing the overall characteristics of the population to change.
The theory is often referred to as “survival of the fittest” because it is those individuals that are best suited to their environment that are more likely to survive and pass their genes on to the next generation.
At the heart of Darwin’s theory of natural selection is the idea that species evolve over time through a process of selection based on changes in characteristics and the ability of individuals to survive and reproduce in their environment.
Gregor Mendel is considered the father of modern genetics. He performed a series of experiments on pea plants and discovered the fundamentals of inheritance, including the fact that traits are passed from one generation to the next through segregation and recombination of genes. He found that genes exist in pairs, and that each parent contributes one gene to the offspring. Thomas Hunter Morgan was an American biologist and geneticist who conducted groundbreaking research on the role of chromosomes in heredity. He studied the fruit fly Drosophila melanogaster and found that genes are located on specific chromosomes and mutations in these genes lead to changes in the organism’s traits. He also discovered that genetic recombination occurs when chromosomes from different individuals come together and exchange genetic material. This process allows the generation of new combinations of genes, leading to the evolution of new traits and the creation of new species.
In conclusion, Mendel’s and Morgan’s research on genetic recombination is crucial to the fundamentals of inheritance and the role of genes in the inheritance of traits. The findings provide a greater understanding of how organisms evolve over time and how genetic mutations play a role in the development of new species.
Here we will talk about gene recombination (crossover) and gene variation, and will cite a lot of biological knowledge to explain:
The process of genetic recombination: it is the exchange of genetic material between non-sister chromatids of homologous chromosomes. Homologous chromosomes line up in pairs, with genes along their entire length, forming a configuration with four chromatids, called a tetrad body. At this point, the chromatids are in close proximity to each other and some material from the two chromatids exchanges chromosomes, i.e. material breaks off and reattaches at the same position on the homologous chromosome. This exchange of genetic material can occur within the same pair Homologous chromosomes multiple times, creating a unique combination of genes. The process of genetic recombination: it is the exchange of genetic material between non-sister chromatids of homologous chromosomes. Homologous chromosomes line up in pairs, with genes along their entire length, forming a An arrangement of four chromatids is called a tetrad. At this point, the chromatids are in close proximity to each other and some material from the two chromatids exchanges chromosomes, i.e. material breaks off and reattaches at the same position on the homologous chromosome. This exchange of genetic material can occur within the same pair Multiple homologous chromosomes create unique genetic combinations.
Image Reference Biology LibreTexts. (2018). 7.6: Genetic Variation. [online]
At this stage, the tetrad moves to the metaphase kinetochore facing the plate of the opposite pole. Homologous pairs are randomly oriented at the equatorial plate. This event is the second mechanism by which variation is introduced into gametes or spores. Each cell undergoes meiosis and the arrangement of the tetrads is different. The amount of variation depends on the number of chromosomes that make up a set. There are two possibilities for positioning mid-plate. Therefore, the number of possible alignments is equal to 2n, where n is the number of chromosomes in each set. Given these two mechanisms, it is unlikely that any two haploid-producing meiotic cells will have the same genetic composition.
Meiosis I ensures the random, independent assortment of unique gametes during metaphase I can be demonstrated by considering cells with a set of two chromosomes (n = 2). In this case, there are two possible arrangements in the equatorial plane of metaphase I. The total number of possible different gametes is 2n, where n is equal to the number of chromosomes in a set. In this example, the gametes have four possible genetic combinations. With n = 23 in a human cell, there are over 8 million possible combinations of paternal and maternal chromosomes.
Image Reference Biology LibreTexts. (2018). 7.6: Genetic Variation. [online]
The above explanation explains the process and principle of genetic variation from a molecular point of view, mainly based on the separation, combination and replacement of genes. “The same is true in nature. The original world, the original universe, the original life and the earth. After continuous genetic variation and natural selection, today’s rich and colorful world has emerged.”
4. An Overview of Genetic Algorithms (GA)
Genetic Algorithms (GA) are computer simulations of natural selection and evolution processes used to solve optimization problems. The algorithm works by mimicking the way organisms evolve through generations through the process of mutation, selection and recombination. GA is an evolutionary algorithm that uses the principles of natural selection and genetics to search for the best solution to a problem. In GA, a population of candidate solutions evolves over multiple generations through a process of selection, crossover, and mutation.
A genetic algorithm starts with a population of randomly generated individuals, each individual representing a possible solution to an optimization problem. These individuals are then evaluated against a fitness function to determine each individual’s ability to solve problems. Individuals with the highest fitness are then selected for breeding, and their genetic material is recombined to produce new offspring. This process is repeated for several generations, with each iteration producing a new population of individuals with higher fitness.
As shown in the figure above, the basic process of the genetic algorithm is to initialize the entire population first, and then calculate the fitness of each individual in the population. The population is then processed for genetic variation. Then recalculate the fitness of each individual, and repeat the subsequent behavior until the condition is met.
In the field of artificial intelligence, genetic algorithms have been used to create agents that can learn to play games, solve puzzles, and perform other tasks. For example, genetic algorithms have been used to train neural networks, which are mathematical models that can learn to recognize patterns in data. Another application of genetic algorithms is in the field of machine learning, where they can be used to optimize the parameters of algorithms such as decision trees, support vector machines, and neural networks. By using genetic algorithms, machine learning algorithms can learn from data in an efficient and automated manner without manual tuning of parameters. In finance, genetic algorithms have been used to optimize investment portfolios, formulate trading strategies, and optimize risk management. For example, genetic algorithms can be used to find the optimal portfolio of assets to maximize returns while minimizing risk. Finally, genetic algorithms are also used in various engineering fields such as aerodynamics, structural engineering, and electrical engineering. For example, genetic algorithms have been used to optimize the design of aircraft wings, bridges and electrical circuits.
One of the main advantages of using genetic algorithms is that they can find globally optimal solutions, even in complex and highly nonlinear problems where traditional optimization techniques may fail. This makes them powerful tools for solving a wide range of optimization problems, such as scheduling, resource allocation, and route optimization.
5. Application of Genetic Algorithm on TSP
(1). Ideas and principles
The application of GA to TSP involves representing routes between cities as chromosomes, with each city represented by a gene. The fitness of candidate solutions is evaluated based on the total length of the path. Then the most suitable individuals in the population are selected for breeding, and their genetic information is cross-combined to produce offspring. Finally, mutation is applied to introduce random variation in the population.
In the context of TSP, GA has been shown to provide good results for small to medium instances. However, it can be computationally expensive for large instances, and the choice of parameters such as population size and mutation rate can greatly affect the performance of the algorithm.
For example, let’s say there are 6 cities in total. There is a traveling salesman who wants to go to 6 cities. There should be 60 algorithms (5!/2==60) in the 6 cities. As long as you traverse these 60 types, you can find the best way. But I don’t want to do that right now. We want to use this genetic algorithm: we can first set up a population, for example, set up 4 individuals first. These 4 individuals are actually 4 paths.
S1=[1, 2, 3, 4, 5, 6]
, will return to 1 in the end. The second path is 132654. The third path is S3 = [1, 4, 2, 6, 3, 5]
, and the fourth path is, for example, 162534…
We randomly set these paths, just like the primitive life in the world. Then we have to calculate its individual fitness. What do we want to investigate? We want to investigate which road is the shortest and which road is the cheapest. Then we list the price. For example, the price of the first road is 10, the total fare of the second road is 20, the total fare of the third road is 20, and the total fare of the fourth road is 50. We look like the first way is the cheapest, but we are not sure if it is the best. So can we improve it and find something better?
We look for its adaptability, which means the lower the price, the more adaptable it is. So we can write this fitness as f= 1/d
, we call it fitness 1/d
. The lower the price, the higher the adaptability. The first route fitness should be 0.1, the second fitness 0.05, and the third fitness 0.05. The fourth fitness is 0.02. Or the first way is the best.
What’s the next step after finding the fitness? According to the laws of genetics, those who adapt to nature are more likely to be inherited, and are more likely to have the next generation. So we say that the higher the fitness, the more likely it will be inherited. Let’s just let it be inherited according to probability. How to calculate the probability of this inheritance? We can use the fitness of each path and divide it by the sum of all fitness. This represents your fitness as a proportion of the population. The higher the ratio, the more likely you are to inherit it. How about explaining this problem from a probabilistic point of view?
As shown above, draw a disk and mark some areas on the disk. For example, the probability of f₁ is the largest and the fitness is the highest. Then its area is correspondingly the largest. The areas of f₂ and f₃ are smaller, so their area on this plate will also be smaller. Then f₂ and f₃, this f₄ is the smallest. So it has the smallest area on this plate, and then we use a pointer and turn the pointer. Let the pointer rotate on this plate, and when it stops, it depends on who it points to, and whoever it points to is to choose who to inherit. Therefore, the higher the fitness, the larger the proportion, and the more suitable for the natural environment. more likely to be selected. Of course this is not 100%.
Let’s say, I chose S₁ when I turned for the first time. Then I selected S₁. I turned around again for the second time, and I selected s₂, and after the selection, I entered the next link.
I need to let it perform mutation swaps, and there are two types of mutation swaps. The first type we can use is cross mutation, that is, S₁ and S₂ already exist. Equivalent to its DNA fragment, we can truncate it somewhere in the middle. Then let it mimic Morgan’s gene swap to swap segments. After the replacement, this S₁’ becomes 123654. Then what does S₂ become? 132456. This is cross mutation. And we must at least ensure that there are as many offspring as parents, otherwise there will be no samples after several generations. Let’s say I spin two more times and we pick S₁ and S₄. Then we perform cross-mutation to find new S₃’ and S₄’.
After the replacement, our next step is to mutate, which is to imitate the variation of biological DNA. For example, say this S₁’, we can change the order of this 365. S₁’’ becomes 125634
, which is to change the order of the three 365. This is a mutation method, and there are other mutation methods. For example, I can also exchange two numbers. For example, if we exchange 3 and 6, S₂’’ is equal to 162453
. This is also a way.
This is why we first selected an excellent parent, and then we let her give birth to offspring for a little variation. The next step is to recalculate the individual fitness, what is the price of these offspring I get? And what is its fitness. Then enter the roulette again to iterate again, as long as the number of populations set at the beginning is large enough and the number of iterations is large enough. You should be able to find a solution that is close to the optimal solution.
Overall, the traveling salesman problem is a challenging optimization problem with many practical applications. Brute force search algorithm and greedy algorithm are two common methods to solve TSP, but they may not be suitable for large instances. Genetic algorithms are a promising approach that uses principles of natural selection and genetics to search for optimal solutions, but it can be computationally expensive for large instances of TSP.
(2). Code implementation
A random population of potential problem solutions is first generated, and this population is iteratively evolved by applying genetic operators such as mutation and crossover until an optimal solution is found. The code below is from the ChatGPT build.
1 | import random |
Copyright Notice
This article is the original content of Junhao except the referenced content below, and the final interpretation right belongs to the original author. If there is any infringement, please contact to delete. Without my authorization, please do not reprint it privately.
6. References
[1]. Biology LibreTexts. (2018). 7.6: Genetic Variation. article.
[2]. Clever algorithms : nature-inspired programming recipes. United Kingdom: Lulu.com.Li, Y. (2022).
[3]. Why do viruses keep mutating? Traveling salesman problem and genetic algorithm. [online] Yongle Li. Available at: https://youtu.be/iM-VKHWd_JE [Accessed 9 Feb. 2023].
[4]. OpenAI (nd). ChatGPT. [online] chatGPT. Available at: https://chat.openai.com/chat.