(en) Homework
Grading:
[(0.8 * H
1 + 0.2 * H
1') + (0.8 * H
2 + 0.2 * H
2') + H
3] / 3
You need at least 7 attendances and 5 points in the above formula, for the lab.
The lab grade (between 0 and 10) will contribute 50% to the final grade; the other 50% will be from the exam.
If the exam grade is below 4, the (Lab + Exam) / 2 grade will be truncated down, not rounded.
The final grade needs to be at least 5.
Report structure
- Title, author
- Abstract
- Introduction: it should clearly explain the content of the paper. A basic requirement is: if a reader chooses to read only a single section of your paper, and the indroduction, they should fully understand that section.
- Methods: the algorithms used, the choices and variants made, etc. You need not write down the algorithms as written in the slides; your modifications are of far greater iterest.
- Experimental setup description: dimensions, precision, number of iterations/repeats.
- Experimental results
- Tables with at least 30 runs for each parameter set, and statistical metadata (such as mean + standard deviation or median + interquartilic Q1-Q3 distance)
- plots: histograms or timewise fitness evolution or whatever important, relevant and true information you believe is important to show. Statistical 'lies' will be penalised here: for instance, if you plot the run number versus the result quality, since the runs are statistically independent from each other.
- Describe how the parameters used and choices made influence the performance of the algorithm.
- Compare the methods used.
- Conclusions. Talk about the larger-scale implications of the results, and try to explain them better than in the Results section. No more than 2 future research ideas.
- Bibliography. Include all the relevant resources, be them scientific papers, the lectures slides, youtube videos or stack overflow questions.
LaTeX code example:
https://gitlab.com/eugennc/teaching/-/tree/master/GA
- Homework H1: Find the minimum for the following functions:
De Jong 1,
Schwefel's,
Rastrigin's,
Michalewicz's
using the Hill Climbing (the first improvement, best improvement and worst improvement variants) and Simulated Annealing algorithms (hybridising one of the Hill Climbing variants).
Analyse the 5-, 10- and 30-dimensional versions of all functions. (Perhaps run the 100-dimensional version as well, as an upper bound). Use a precision of at least 5 decimal places after 0.
Soft deadline: week 5 (-10% points for each week of delay)
Send the homework: source code without binary object and .txt or .pdf file, in a .zip archive (no other archive format allowed) to my e-mail, with a subject written like: [GA]_Name_Surname_Group_T1
- Homework H1'
For the function f=x3-60x2+900x+100, with x in [0, 31], find the maximum.
On [0, 31], the function is uni-modal, having a single maximum point, for which the value is 10.
Study the maximisation of the function using a Hill Climbing algorithm where a candidate solution is represented on 5 bits (32 possible values, so all the integers from 0 to 31). A candidate's neighbourhood is all the bitstrings at a Hamming distance of 1.
Study and explain the function landscape in the context of the first improvement and best improvement variants of the Hill Climbing algorithm. Specify the attraction basing of all local maximum points (attraction basin: the set of points for which the gradient search leads to the same optimum).
You can solve this on paper, +photograph, check the quality of the photo, then send it on e-mail.
Hard deadline: week 6
- Homework H2: Solve Homework T1 using a Genetic Algorithm. Compare with the results obtained in Homework T1 and offer explanations.
Soft deadline: week 9
Homework H2'
Optimise the Genetic Algorithm used for Homework H2 by adding adaptive parameters, adaptive operators, Gray coding, etc. Write a short report, detailing and explaining the performance differences.
Hard deadline: week 10
- Homework H3: Solve one of the problems below with a Genetic Algorithm and another heuristic:
1. The Quadratic Assignment Problem. Test instances
2. The Graph Colouring Problem.(Test instances)
3. The Traveling Salesperson Problem (TSP)(Test instances)
4. The Logical Satisfiability Problem (SAT). (Test instances Test instances)
Use only instances from instance libraries (don't generate your own test instances). Use at least 10 instances, with 3 of them the largest and most difficult available. Suggestion: 1 easy instance, then increase the size and difficulty until the 3 most difficult ones.
Penalty: 0.5 point each if you go over 2000 generations or 200 individuals after selection. (Try to work efficiently, and balance exploration and exploitation in the algorithm).
Deadline: Second-to-last lab, try to finish it earlier
Homework H0
Implement two methods (a heuristic one and a deterministic one) to find the minimum of a function with an arbitrary number of variables. For the best grade, study at least 4 functions (2 of which must be Rastrigin's and Michalewicz's), compare the search algorithm performance between the 2-dimensional and 10-dimensional versions of the functions, and represent number with at least 5 decimal points after 0.
You can find test functions
here.
Write a report (preferably LaTeX + .pdf) containing, for each function instance and algorithm tested, the minimum, average (or median) and maximum run time, and the best, average (or median) and worst solution found.
It is essential to compare the methods and explain the results. (Due to its short deadline, this homework is entirely optional: its points are not considered when defining the minimum number of points required to pass this lab.) Deadline: lab 2.
(ro) Notiuni: notiuni de baza
- Probleme P, NP, NP-complete
- Problema de optimizare, optime locale/globale
- Metode exacte, euristici, algoritmi determministi/probabilisti, metaeuristici
Tema H0
Implementati doua metode (una euristica, cealalta determinista) pentru gasirea punctului de maxim sau de minim al unei functii cu un numar arbitrar de variabile. Pentru punctaj maxim, luati cel putin 4 functii, dintre care Rastrigin's si Michalewicz's, faceti o comparatie intre versiunea 2-dimensionala si versiunea 10-dimensionala, si folositi o precizie de reprezentare de minim 5 zecimale dupa 0.
Testati metoda pe una sau mai multe din functiile de
aici.
Intocmiti un raport (preferabil, LaTeX + .pdf, sau text) in care sa precizati, pentru fiecare functie folosita pentru testare: timpul minim, mediu si maxim de executie cea mai buna si cea mai slaba solutie, precum si media solutiilor obtinute dupa un numar de rulari. Este esential sa faceti o comparatie intre metode, si sa explicati diferentele si rezultatele obtinute. Pana in laboratorul 2. Punctajul, impartit la 10, va fi adaugat punctajului de laborator, ca bonus.
(ro) Notiuni: Fitness landscape (relief fitness)
Notiunea a aparut mai intai in genetica (Wright 1932) semnificand variatia din cadrul speciilor, axele spatiului reprezentand combinatii nespecificate de gene. Dobzhansky (1951) restrange domeniul de definitie a axelor, acestea semnificand frecventa unei alele particulare a unei gene particulare intr-o populatie particulara. Precursor al dezvoltarii unui formalism matematic al conceptului de fitness landscape, Eigen (1989) inroduce notiunea de cvasi-specie ca fiind un grup de secvente similare (siruri peste un anumit alfabet), diferentele intre membrii cvasi-speciei corespunzand la mutatii punctuale. Stadler a dezvoltat ulterior (1995) un formalism matematic relativ complet al teoriei landscape.
O problema de optimizare are in general urmatoarea forma:
Se da o functie f:X->R
si se doreste sa se determine x* in X a.i. f(x*)>=f(x), oricare ar fi x din X. Echivalent, x*=argmaxx din X(f)
Daca X este un spatiu discret avem o problema de optimizare combinatoriala. Daca X este un spatiu real n-dimensional avem o problema de optimizare numerica. In ambele cazuri o solutie candidat x din X ia forma unui sir de variabile iar functia f poarta numele de functie obiectiv. In campul algoritmilor evolutivi f este denumita si functie fitness iar peisajul/relieful asociat poarta numele de relief fitness (fitness landscape).
In cazul spatiilor continue elemente ale calculului diferential permit caracterizarea acestui relief oferind indicii asupra localizarii optimului global. Relieful este unic determinat de functia fitness.
In cazul spatiilor discrete lucrurile sunt diferite. Notiunea de relief implica existenta unor relatii topologice intre punctele spatiului de cautare X. Spre deosebire de spatiile continue, in spatiile discrete acestea nu sunt predefinite; ele sunt de fapt generate in momentul utilizarii unei anumite tehnici de cautare.
(ro) Notiuni: Algoritmi Genetici
Inspiratie
Algoritmii genetici au fost propusi de John Holland in 1973 dupa multi ani de studiere a ideii de simulare a evolutiei. Acesti algoritmi modeleaza mostenirea genetica si lupta Darwiniana pentru supravietuire. Alaturi de alte doua directii:
strategiile evolutive si
programarea evolutiva, formeaza clasa
algoritmilor evolutivi.
Schema generala a algoritmilor evolutivi
Sunt algoritmi probabilisti care:
- mentin o populatie de reprezentari de solutii candidat
- care evolueaza de-a lungul unor generatii/iteratii
- sub controlul unei functii fitness care masoara meritul individual.
Terminologie
Algoritmii evolutivi utilizeaza un vocabular imprumutat din genetica:
- evolutia este simulata printr-o succesiune de generatii ale unei populatii de solutii candidat;
- o solutie candidat poarta numele de cromozom si este reprezentata ca un sir de gene;
- gena este informatia atomica dintr-un cromozom;
- pozitia pe care o ocupa o gena se numeste locus;
- toate valorile posibile pentru o gena formeaza setul de alele ale genei;
- populatia evolueaza prin aplicarea operatorilor genetici: mutatia si incrucisarea;
- cromozomul asupra caruia se aplica un operator genetic se numeste parinte iar cromozomul rezultat se numeste descendent;
- selectia este procedura prin care sunt alesi cromozomii ce vor supravietui in generatia urmatoare; indivizilor mai bine adaptati li se vor da sanse mai mari;
- gradul de adaptare la mediu este masurat de functia fitness;
- solutia returnata de un algoritm genetic este cel mai bun individ din ultima generatie.
Pseudocod
t := 0
generate the starting population P(t)
evaluate P(t)
while not StoppingCondition
t := t + 1
select P(t) from P(t-1)
mutate P(t)
cross-over P(t)
evaluate P(t)
Reprezentarea
Pentru problemele de optimizare numerica se va utiliza reprezentarea binara.
Functia fitness
Functia fitness este utilizata pentru a masura calitatea cromozomilor. Ea este formulata plecand de la functia numerica de optimizat. Trebuie sa fie pozitiva si este construita pentru maximizare (indivizii adaptati obtin valori mari ale functiei fitness). Daca functia
g de optimizat
Selectia
Scopul selectiei este de a alege pentru supravietuire cei mai adaptati indivizi din populatie cu speranta ca descendentii acestora vor avea un fitness mai ridicat. In combinatie cu variatii ale operatorilor de incrucisare si mutatie, selectia trebuie sa pastreze un echilibru intre explorare si exploatare. O presiune de selectie ridicata duce la crearea unei diversitati scazute in populatia creata din indivizi bine adaptati dar suboptimali, ceea ce duce la o limitare a modificarilor si implicit a progresului. Pe de alta parte, o presiune de selectie scazuta incetineste evolutia.
- Roata norocului
Aceasta este schema de selectie introdusa de Holland in algoritmul genetic original. Numarul estimat de copii pe care le primeste un individ este proportional cu fitnessul sau impartit la fitnessul total al populatiei.
Evaluate P for i:=0 to POP_SIZE
eval[i] = f( P(i) )
Total fitness for i:=0 to POP_SIZE
T += eval[i]
Individual sel.prob. for i:=0 to POP_SIZE
p[i] = eval[i] / T
Accumulated sel.prob. q[0] = 0
for i:=0 to POP_SIZE
q[i+1] = q[i] + p[i]
Selection for i:=0 to POP_SIZE
uniform random r in (0,1]
select for the next generation the individual j
for which q[j] < r <= q[j+1]
- Selectia bazata pe rang previne convergenta prematura ce apare la utilizarea schemelor de selectie proportionala cu fitnessul. Indivizii sunt ordonati in functie de valoarea fitnessului iar probabilitatea de selctie este proportionala cu rangul ocupat. Presiunea de selectie este in acest mod scazuta in cazul in care varianta fitnessului este mare si este crescuta daca varianta fitnessului este mica.
- Selectia turneu alege in mod aleatoriu k indivizi iar dintre acestia doar cei mai buni j sunt selectati pentru supravietuire. Procedra se repeta pana se obtine numarul dorit de indivizi. Este cea mai eficienta din punct de vedere al complexitatii timp. Din punct de vedere al presiunii de selectie se aseamana selectiei bazata pe rang.
- Elitismul, introdus de Ken DeJong (1975) este o schema aditionala oricarui mecanism de selectie al carei scop este de a retine cei mai buni k indivizi la fiecare generatie; astfel de idivizi ar putea altfel disparea din cauza ca nu au fost selectati pentru supravietuire sau fiindca au fost distrusi prin aplicarea operatorilor. Studii experimentale au dovedit ca de multe ori elitismul imbunatateste semnificativ performanta algortimului genetic.
Operatorii genetici
- Mutatia - modificarea unei gene alese aleatoriu
001011000 -> 001011010
- Incrucisarea are ca scop schimbul de informatie genetica intre doi sau mai multi cromozomi.
- incrucisarea cu un punct de taiere, ales aleator.
01 001011 _\ 01 111100
10 111100 / 10 001011
- Incrucisarea cu n puncte de taiere, generate aleator. Exemplu (n=3):
01 0 010 11 _\ 01 1 010 00
10 1 111 00 / 10 0 111 11
- Incrucisare uniforma: pentru fiecare locus, se selecteaza probabilist
gena unuia dintre parinti.