A tutorial on multiobjective optimization: fundamentals and evolutionary methods

A tutorial on multiobjective optimization: fundamentals and evolutionary methods In almost no other field of computer science, the idea of using bio-inspired search paradigms has been so useful as in solving multiobjective optimization problems. The idea of using a population of search agents that collectively approxi- mate the Pareto front resonates well with processes in natural evolution, immune systems, and swarm intelligence. Methods such as NSGA-II, SPEA2, SMS-EMOA, MOPSO, and MOEA/D became standard solvers when it comes to solving multiobjective optimization problems. This tutorial will review some of the most important fundamentals in multiobjective optimization and then introduce representative algorithms, illustrate their working principles, and discuss their application scope. In addition, the tutorial will discuss statistical performance assessment. Finally, it highlights recent important trends and closely related research fields. The tutorial is intended for readers, who want to acquire basic knowledge on the mathematical foundations of multiobjective optimization and state-of-the-art methods in evolutionary multiobjective optimization. The aim is to provide a starting point for researching in this active area, and it should also help the advanced reader to identify open research topics. Keywords Multiobjective optimization  Multiobjective evolutionary algorithms  Decomposition-based MOEAs Indicator-based MOEAs  Pareto-based MOEAs  Performance assessment 1 Introduction needed objectives (van der Horst et al. 2012; Rosenthal and Borschbach 2017). Consider making investment choices for an industrial There are countless other examples where multiobjec- process. On the one hand the profit should be maximized tive optimization has been applied or is recently considered and on the other hand environmental emissions should be as a promising field of study. Think, for instance, of the minimized. Another goal is to improve safety and quality minimization of different types of error rates in machine of life of employees. Even in the light of mere economical learning (false positives, false negatives) (Yevseyeva et al. decision making, just following the legal constraints and 2013; Wang et al. 2015), the optimization of delivery costs minimizing production costs can take a turn for the worse. and inventory costs in logistics(Geiger and Sevaux 2011), Another application of multiobjective optimization can the optimization of building designs with respect to health, be found in the medical field. When searching for new energy efficiency, and cost criteria (Hopfe et al. 2012). therapeutic drugs, obviously the potency of the drug is to In the following, we consider a scenario where given the be maximized. But also the minimization of synthesis costs solutions in some space of possible solutions, the so-called and the minimization of unwanted side effects are much- decision space which can be evaluated using the so-called objective functions. These are typically based on com- putable equations but might also be the results of physical experiments. Ultimately, the goal is to find a solution on & Michael T. M. Emmerich which the decision maker can agree, and that is optimal in m.t.m.emmerich@liacs.leidenuniv.nl some sense. Andre H. Deutz When searching for such solutions, it can be interesting a.h.deutz@liacs.leidenuniv.nl to pre-compute or approximate a set of interesting solutions LIACS, Leiden University, Leiden, The Netherlands 123 M. T. M. Emmerich , A. H. Deutz that reveal the essential trade-offs between the objectives. this order. The decision maker has to state additional This strategy implies to avoid so-called Pareto dominated preferences, e.g., weights of the objectives, prior to the solutions, that is solutions that can improve in one objec- optimization. tive without deteriorating the performance in any other 2. A posteriori: A partial order is defined on the objective space R , typically the Pareto order, and the algorithm objective. The Pareto dominance is named after Vilfredo Pareto, an Italian economist. As it was earlier mentioned by searches for the minimal set concerning this partial order over the set of all feasible solutions. The user has Francis Y.Edgeworth, it is also sometimes called Edge- worth-Pareto dominance (see Ehrgott 2012 for some his- to state his/her preferences a posteriori, that is after being informed about the trade-offs among non- torical background). To find or to approximate the set of non-dominated solutions and make a selection among them dominated solutions. is the main topic of multiobjective optimization and multi- 3. Interactive (aka Progressive): The objective functions criterion decision making. Moreover, in case the set of non- and constraints and their prioritization are refined by dominated solutions is known in advance, to aid the deci- requesting user feedback on preferences at multiple sion maker in selecting solutions from this set is the realm points in time during the execution of an algorithm. of decision analysis (aka decision aiding) which is also part In the sequel, the focus will be on a posteriori approaches of multi-criterion decision making. to multiobjective optimization. The a priori approach is often supported by classical single-objective optimization Definition 1 Multiobjective Optimization. Given m ob- algorithms, and we refer to the large body of the literature jective functions f : X! R; ...; f : X! R which map a 1 m that exists for such methods. The a posteriori approach, decision space X into R, a multiobjective optimization however, requires interesting modifications of theorems problem (MOP) is given by the following problem and optimization algorithms—in essence due to the use of statement: partial orders and the desire to compute a set of solutions minimize f ðxÞ; .. .; minimize f ðxÞ; x 2X ð1Þ 1 m rather than a single solution. Interactive methods are highly interesting in real-world applications, but they typically Remark 1 In general, we would demand m [ 1 when we rely upon algorithmic techniques used in a priori and a talk about multiobjective optimization problems. More- posteriori approaches and combine them with intermediate over, there is the convention to call problems with large m, steps of preference elicitation. We will discuss this topic not multiobjective optimization problems but many-ob- briefly at the end of the tutorial. jective optimization problems (see Fleming et al. 2005;Li et al. 2015). The latter problems form a special, albeit important case of multiobjective optimization problems. 2 Related work Remark 2 Definition 1 does not explicitly state constraint functions. However, in practical applications constraints There is a multiple of introductory articles that preceded have to be handled. Mathematical programming techniques this tutorial: often use linear or quadratic approximations of the feasible • In Zitzler et al. (2004) a tutorial on state-of-the-art space to deal with constraints, whereas in evolutionary evolutionary computation methods in 2004 is provided multiobjective optimization constraints are handled by including Strength Pareto Evolutionary Algorithm penalties that increase the objective function values in Version 2 (SPEA2) (Zitzler et al. 2001), Non-domi- proportion to the constraint violation. Typically, penalized nated Sorting Genetic Algorithm II (NSGA-II) (Deb objective function values are always higher than objective et al. 2002), Multiobjective Genetic Algorithm function values of feasible solutions. As it distracts the (MOGA) (Fonseca and Fleming 1993) and Pareto- attention from particular techniques in MOP solving, we Archived Evolution Strategy (PAES) (Knowles and will only consider unconstrained problems. For strategies Corne 2000) method. Indicator-based methods and to handle constraints, see Coello Coello (2013). modern variants of decomposition based methods, that Considering the point(s) in time when the decision our tutorial includes, were not available at that time. maker interacts or provides additional preference infor- • In Deb (2008) an introduction to earlier multiobjective mation, one distinguishes three general approaches to optimization methods is provided, and also in the form multiobjective optimization (Miettinen 2012): of a tutorial. The article contains references to early books in this field and key articles and also discusses 1. A priori: A total order is defined on the objective applications. space, for instance by defining a utility function R ! R and the optimization algorithm finds a minimal point (that is a point in X ) and minimum value concerning 123 A tutorial on multiobjective optimization: fundamentals and evolutionary methods • Derivative-free methods for multiobjective optimiza- – reflexive, if and only if 8x 2 X : ðx; xÞ2 R, tion, including evolutionary and direct search methods, – irreflexive, if and only if 8x 2 X; ðx; xÞ 62 R, are discussed in Custodio et al. (2012). – symmetric, if and only if 8x 2 X : 8y 2 X : ðx; yÞ • On conferences such as GECCO, PPSN, and EMO 2 R ,ðy; xÞ2 R, there have been regularly tutorials and for some of – asymmetric, if and only if 8x 2 X : 8y 2 X : ðx; yÞ these slides are available. A very extensive tutorial 2 R )ðy; xÞ 62 R, based on slides is the citable tutorial by Brockhoff – antisymmetric, if and only if 8x 2 X : 8y 2 X : (2017). ðx; yÞ2 R ^ðy; xÞ2 R ) x ¼ y, – transitive, if and only if 8x 2 X : 8y 2 X : 8z 2 X : Our tutorial is based on teaching material and a reader for a ðx; yÞ2 R ^ðy; zÞ2 R )ðx; zÞ2 R. course on Multiobjective Optimization and Decision Analysis at Leiden University, The Netherlands (http:// Remark 3 Sometimes we will also write xRy for moda.liacs.nl). Besides going into details of algorithm ðx; yÞ2 R. design methodology, it also discusses foundations of mul- tiobjective optimization and order theory. In the light of Now we can define different types of orders: recent developments on hybrid algorithms and links to Definition 3 Pre-order, Partial Order, Strict Partial Order. computational geometry, we considered it valuable to not A binary relation R is said to be a only cover evolutionary methods but also include the basic principles from deterministic multiobjective optimization – pre-order (aka quasi-order), if and only if it is transitive and scalarization-based methods in our tutorial. and reflexive, – partial order, if and only if it is an antisymmetric pre- order, 3 Order and dominance – strict partial order, if and only if it is irreflexive and transitive For the notions we discuss in this section a good reference is Ehrgott (2005). Remark 4 Note that a strict partial order is necessarily The concept of Pareto dominance is of fundamental asymmetric (and therefore also anti-symmetric). importance to multiobjective optimization, as it allows to Proposition 1 Let X be a set and D ¼fðx; xÞjx 2 Xg be compare two objective vectors in a precise sense. That is, the diagonal of X. they can be compared without adding any additional preference information to the problem definition as stated – If R is an anti-symmetric binary relation on X, then any in Definition 1. subset of R is also an anti-symmetric binary relation. In this section, we first discuss partial orders, pre-orders, – If R is irreflexive, then (R is asymmetric if and only if and cones. For partial orders on R there is an important R is antisymmetric). Or: the relation R is asymmetric if geometric way of specifying them with cones. We will and only if R is anti-symmetric and irreflexive. define the Pareto order (aka Edgeworth-Pareto order) on – If R is a pre-order on X, then R . The concept of Pareto dominance is of fundamental fðx; yÞjðx; yÞ2 R and ðy; xÞ 62 Rg, denoted by R , strict importance for multiobjective optimization, as it allows to is transitive and irreflexive. In other words, R is a strict compare two objective vectors in a precise sense (see strict partial order associated to the pre-order R. Definition 5 below). That is, comparisons do not require – If R is a partial order on X, then R n D is irreflexive and adding any additional preference information to the prob- transitive. In other words, R n D is a strict partial lem definition as stated in Definition 1. This way of com- order. Moreover R n D is anti-symmetric (or parison establishes a pre-order (to be defined below) on the asymmetric). set of possible solutions (i.e., the decision space), and it is – If R is a pre-order on X, then (R n D is a strict partial possible to search for the set of its minimal elements—the order if and only if R is asymmetric). efficient set. As partial orders and pre-orders are special binary Remark 5 In general, if R is a pre-order, then R n D does relations, we digress with a discussion on binary relations, not have to be transitive. Therefore, in general, R n D will orders, and pre-orders. not be a strict partial order. Definition 2 Properties of Binary Relations. Given a set X, Definition 4 Minimal Element. A minimal element x 2 X a binary relation on X—that is a set R with R  X  X—is in a (strictly) partially ordered set (X, R) is an element for 0 0 0 said to be which there does not exist an x 2 X with x Rx and x 6¼ x. 123 M. T. M. Emmerich , A. H. Deutz 0 m (In case, the order R is a strict partial order, x Rx implies Definition 7 Strict Component Order on R . Let 0 m x 6¼ x). x; y 2 R . We say x is less than y in the strict component order, denoted by x\y, if and only if x \y ; i ¼ 1; ...; m. i i Definition 5 Pareto Dominance. Given two vectors in the ð1Þ m ð2Þ m Definition 8 (Weakly) Efficient Point, Efficient Set, and objective space, that is y 2 R and y 2 R , then the ð1Þ ð2Þ Pareto Front. point y 2 R is said to Pareto dominate the point y (in ð1Þ ð2Þ symbols y  y Þ, if and only if Pareto – The minimal elements of the Pareto order  on X are ð1Þ ð2Þ ð1Þ ð2Þ called efficient points. 8i 2f1; .. .; mg : y  y and 9j 2f1; .. .; mg : y \y : i i j j – The subset X of all efficient points in X is called the efficient set. ð1Þ ð2Þ In words, in case that y  y the first vector is not Pareto – Let us denote the set of attainable objective vectors worse in each of the objectives and better in at least one with Y :¼ fðXÞ. Then the minimal elements of the objective than the second vector. Pareto order on Y are called the non-dominated or Proposition 2 The Pareto order  on the objective Pareto optimal objective vectors. The subset of all non- Pareto space R is a strict partial order. Moreover ð [ DÞ Pareto dominated objective vectors in Y is called the Pareto is a partial order. We denote this by  or also by  if front. We denote it with Y . Pareto the context provides enough clarity. – A point x 2X is called weakly efficient if and only if there does not exist u 2X such that fðuÞ\fðxÞ. In multiobjective optimization we have to deal with two Moreover, fðxÞ is called weakly non-dominated. spaces: The decision space, which comprises all candidate solutions, and the objective space which is identical to R Remark 7 Clearly, fðX Þ¼Y . E N and it is the space in which the objective function vectors are represented. The vector-valued function f ¼ 3.1 Cone orders ðf ; ...; f Þ maps the decision space X to the objective 1 m m m space R . This mapping and the Pareto order on R as The Pareto order is a special case of a cone order, which defined in Definition 5 can be used to define a pre-order on are orders defined on vector spaces. Defining the Pareto the decision space X as follows. order as a cone order gives rise to geometrical interpreta- tions. We will introduce definitions for R , although cones Definition 6 Pre-order on Search Space. Let x ; x 2X . 1 2 can be defined on more general vector spaces, too. The The solution x is said to Pareto dominate the solution x if 1 2 m m binary relations in this subsection are subsets of R  R and only if fðx Þ fðx Þ. Notation: x Pareto domi- 1 Pareto 2 1 and the cones are subsets of R . nates x is denoted by x  x . 2 1 f 2 Definition 9 Non-trivial Cone. A set C R with ; 6¼ Remark 6 The binary relation  on X is a strict partial C 6¼ R is called a non-trivial cone, if and only if order on X and ð [fðx; xÞj x 2XgÞ is a partial order on 8a 2 R; a [ 0; 8c 2C : ac 2C. X . Note that the pre-order R associated to  via f ( Pareto i.e., x Rx if and only if fðx Þ fðx Þ ) is, in general, 1 2 1 Pareto 2 Remark 8 In the sequel when we say cone we mean non- not asymmetric and therefore, in general, trivial cone. 6¼ R nfðx; xÞj x 2Xg. Definition 10 Minkowski Sum. The Minkowski sum (aka m m Sometimes we need the notion of the so called strict algebraic sum) of two sets A 2 R and B 2 R is defined component order on R and its accompanying notion of as A B :¼fa þ b j a 2 A ^ b 2 Bg. Moreover we define weak non-dominance. aA ¼faaj a 2 Ag. f f f 2 2 2 y ⊕C y ⊕ R 2 C 2 2 y y 1 1 1 f f f 1 1 1 1 2 1 2 1 2 (0, 0) (0, 0) (0, 0) Fig. 1 Example of a cone C(left), Minkowski sum of a singleton fyg and C (middle), and Minkowski sum of fyg and the cone R . The latter is equal to the non-negative quadrant from which the origin is deleted, see also Definition 13 123 A tutorial on multiobjective optimization: fundamentals and evolutionary methods Remark 9 For an illustration of the cone notion and Proposition 3 Let C be a cone and R its associated examples of Minkowski sums see Fig. 1. binary relation (i.e., R ¼fðx; yÞj y x 2Cg). Then the following statements hold. – R is translation and positive multiplication invariant, Definition 11 The binary relation, R , associated to the – R is anti-symmetric if and only if C is pointed, cone C. Given a cone C the binary relation associated to this – R is transitive if and only if C is convex, and moreover, m C cone, notation R , is defined as follows: 8x 2 R : 8y 2 – R is reflexive if and only if 0 2C. R : ðx; yÞ2 R if and only if y 2fxg C. A similar statement can be made if we go in the other Remark 10 It is clear that for any cone C the associated direction, i.e.: binary relation is translation invariant (i.e, if 8u 2 R : ðx; yÞ2 R )ðx þ u; y þ uÞ2 R ) and also Proposition 4 Let R be a translation and positive multi- C C multiplication invariant by any positive real (i.e., plication invariant binary relation and the C the associ- 8a [ 0 : ðx; yÞ2 R )ðax; ayÞ2 R ). Conversely, given ated cone (i.e., C ¼fy xjðx; yÞ2 Rg). Then the C C R a binary relation R which is translation invariant and following statements hold. multiplication invariant by any positive real, the set C :¼ – C is a cone, fy xjðx; yÞ2 Rg is a cone. The above two operations are – R is anti-symmetric if and only if C is pointed, inverses of each other, i.e., to a cone C one associates a – R is transitive if and only if C is convex, and moreover, binary relation R which is translation invariant and mul- – R is reflexive if and only if 0 2C . tiplication invariant by any positive real, and the associated cone of R is C, and conversely starting from a binary In the following definition some important subsets in relation R which is translation invariant and multiplication R ; m 1 are introduced. invariant by any positive real one obtains the cone C and Definition 13 Let m be a natural number bigger or equal the binary relation associated to this cone is R. In short, to 1. The non-negative orthant (aka hyperoctant) of R , there is a natural one to one correspondence between cones m m denoted by R is the set of all elements in R whose and translation invariant and multiplication-invariant-by- coordinates are non-negative. Furthermore, the zero-dom- positive-reals binary relations on R . m m inated orthant, denoted by R , is the set R nf0g. 0 0 Note that for a positive multiplication invariant relation Analogously we define the non-positive orthant of R , R the set C ¼fy x j xRy g is a cone. We restrict our denoted by R , as the set of elements in R the coordi- attention to relations which are translation invariant as well nates of which are non-positive. Furthermore, the set of in order to get the above mentioned bijection between elements in R which dominate the zero vector 0, denoted cones and relations. m m by R , is the set R nf0g. The set of positive reals is 0  0 Also note if a positive multiplication invariant and denoted by R and the set of non-negative reals is [ 0 translation invariant relation R is such that m m denoted by R . ; 6¼ R 6¼ R  R , then the associated cone C is non- trivial. Relations associated to non-trivial cones are non- Remark 12 The sets defined in the previous definition are m m empty and not equal to all of R  R . cones. Remark 11 In general the binary relation R associated to Proposition 5 The Pareto order  on R is given by Pareto a cone is not reflexive nor transitive nor anti-symmetric. the cone order with cone R , also referred to as the For instance, the binary relation R is reflexive if and only Pareto cone. if 0 2C. The following definitions are needed in order to Remark 13 As R is a pointed and convex cone, the state for which cones the associated binary relation is anti- associated binary relation is irreflexive, anti-symmetric and symmetric and/or transitive. transitive (see Proposition 3). Of course, this can be veri- Definition 12 Pointed cone and convex cone. A cone C is fied more directly. pointed if and only if C\ C  f0g where C ¼ The reason to view the Pareto order as derived from a f c j c 2Cg and C is convex if and only if 8c 2C; c 2 1 2 cone is that it gives the opportunity to study this order more C; 8a such that 0  a  1 : ac þð1 aÞc 2C. 1 2 geometrically. For instance, the definition and many of the With these definitions we can specify for which cones properties of the very important hypervolume indicator (to the associated relation is transitive and/or anti-symmetric: be defined later) readily become intuitive. A reason for deviating from the Pareto cone could be to add constraints to the trade-off between solutions. Moreover, see later for a 123 M. T. M. Emmerich , A. H. Deutz discussion, the more general cones turned out to be very of small finite decision spaces, efficient solutions can be useful in generalizing the hypervolume indicator and identified without much effort. In the case of large com- influence the distribution of points in the approximation set binatorial or continuous search spaces, however, opti- to the Pareto front. mization algorithms are needed to find them. Alternatives to the standard Pareto order on R can be easily imagined and constructed by using pointed, convex cones. The alternatives can be used, for instance, in pref- 4 Scalarization techniques erence articulation. Classically, multiobjective optimization problems are often 3.2 Time complexity of basic operations solved using scalarization techniques (see, for instance, on ordered sets Miettinen 2012). Also in the theory and practice of evo- lutionary multiobjective optimization scalarization plays an Partial orders do not have cycles. Let R be a partial order. It important role, especially in the so-called decomposition is easy to see that R does not have cycles. We show that the based approaches. associated strict partial order does not have cycles. That is, In brief, scalarization means that the objective functions there do not exist are aggregated (or reformulated as constraints), and then a constrained single-objective problem is solved. By using ðb ; b Þ2 R n D; ðb ; b Þ2 R n D; ; ðb ; b Þ2 R n D 1 2 2 3 t 1 1 different parameters of the constraints and aggregation function, it is possible to obtain different points on the where D is the diagonal. For suppose such b ; i ¼ 1; ; t Pareto front. However, when using such techniques, certain 1 can be found with this property. Then by transitivity of caveats have to be considered. In fact, one should always R n D (see Proposition 1), we get ðb ; b Þ2 R n D.By 1 t 1 ask the following two questions: assumption, we have ðb ; b Þ2 R n D. Again by transi- t 1 1 tivity, we get ðb ; b Þ2 R n D which is a contradiction. In 1 1 1. Does the optimization of scalarized problems result in other words, R does not have cycles. (The essence of the efficient points? above argument is, that any strict partial order does not 2. Can we obtain all efficient points or vectors on the have cycles.) The absence of cycles for (strict) partial Pareto front by changing the parameters of the orders gives rise to the following proposition. scalarization function or constraints? We will provide four representative examples of scalar- Proposition 6 Let S be a (strict) partially ordered set. ization approaches and analyze whether they have these Then any finite, non-empty subset of S has minimal ele- properties. ments (with respect to the partial order). In particular, any finite, non-empty subset Y  R has minimal elements with 4.1 Linear weighting respect to the Pareto order  . Also any, finite non- Pareto empty subset X X has minimal elements with respect to A simple means to scalarize a problem is to attach non- negative weights (at least one of them positive) to each The question arises: How fast can the minimal elements objective function and then to minimize the weighted sum be obtained? of objective functions. Hence, the multiobjective opti- mization problem is reformulated to: Proposition 7 Given a finite partially ordered set ðX; Þ, the set of minimal elements can be obtained in time Hðn Þ. Definition 14 Linear Scalarization Problem. The linear scalarization problem (LSP) of an MOP using a weight Proof A double nested loop can check non-domination for vector w 2 R , is given by each element. For the lower bound consider the case that all elements in X are incomparable. Only in this case is minimize w f ðxÞ; x 2X : i i X the minimal set. It requires time Xðn Þ to compare all i¼1 pairs (Daskalakis et al. 2011). h Proposition 8 The solution of an LSP is on the Pareto Fortunately, in case of the Pareto ordered set front, no matter which weights in R are chosen. ðX;  Þ, one can find the minimal set faster. The Pareto algorithm suggested by Kung et al. (1975) combines a Proof We show that the solution of the LSP cannot be a dimension sweep algorithm with a divide and conquer dominated point, and therefore, if it exists, it must neces- algorithm and finds the minimal set in time Oðnðlog nÞÞ for sarily be a non-dominated point. Consider a solution of the d 2 m LSP against some weights w 2 R , say x and suppose d ¼ 2 and in time Oðnðlog nÞ Þ for d 3. Hence, in case 123 A tutorial on multiobjective optimization: fundamentals and evolutionary methods this minimal point is dominated. Then there exists an the weighted Chebychev distance to a reference point as an objective vector y 2 fðXÞ with 8i 2f1; ...; mg y  f ðx Þ objective function. i i and for some index j 2f1; ...; mg it holds that y \f ðx Þ. j j P P Definition 17 Chebychev Scalarization Problem. The m m Hence, it must also hold that w y \ w f ðx Þ, i i i i i¼1 i¼1 Chebychev scalarization problem (CSP) of an MOP using a which contradicts the assumption that x is minimal. h weight vector k 2 R , is given by In the literature the notion of convexity (concavity) of minimize max k jf ðxÞ z j; x 2X ; i i i2f1;...;mg Pareto fronts is for the most part not defined formally. Possible formal definitions for convexity and concavity are where z is a reference point, i.e., the ideal point defined as as follows. z ¼ inf f ðxÞ with i ¼ 1; ; m. x2X i Definition 15 Convex Pareto front. A Pareto front is Proposition 10 Let us assume a given set of mutually non- convex if and only if Y R is convex. N m dominated solutions in R (e.g., a Pareto front). Then for every non-dominated point p there exists a set of weights Definition 16 Concave Pareto front. A Pareto front is for a CSP, that makes this point a minimizer of the CSP concave if and only if Y R is convex. provided the reference point z is properly chosen (i.e., the Proposition 9 In case of a convex Pareto front, for each vector p z either lies in the positive or negative solution in Y there is a solution of a linear scalarization orthant). problem for some weight vector w 2 R . Practically speaking, Proposition 10 ensures that by changing the weights, all points of the Pareto front can, in If the Pareto front is non-convex, then, in general, there principle, occur as minimizers of CSP. For the two can be points on the Pareto front which are the solutions of example Pareto fronts, the minimizers of the Chebychev no LSP. Practically speaking, in the case of concave Pareto scalarization function are points on the iso-height lines of fronts, the LSP will tend to give only extremal solutions, the smallest CSP function value which still intersect with that is, solutions that are optimal in one of the objectives. the Pareto front. Clearly, such points are potentially found This phenomenon is illustrated in Fig. 2, where the tan- in convex parts of Pareto fronts as illustrated in Fig. 3 (left) gential points of the dashed lines indicate the solution as well as in concave parts (right).However, it is easy to obtained by minimizing an LSP for different weight choi- construct examples where a CSP obtains minimizers in ces (colors). In the case of the non-convex Pareto front weakly dominated points (see Definition 8). Think for (Fig. 2, right), even equal weights (dark green) cannot lead 2 instance of the case fðXÞ ¼ ½0; 1 . In this case all points on to a solution in the middle part of the Pareto front. Also, by > > the line segment ð0; 0Þ ; ð0; 1Þ and on the line segment solving a series of LSPs with minimizing different > > weighted aggregation functions, it is not possible to obtain ð0; 0Þ ð1; 0Þ are solutions of some Chebychev scalariza- this interesting part of the Pareto front. tion. (The ideal point is 0 ¼ð0; 0Þ , one can take as weights (0, 1) for the first scalarization, and (1, 0) for the 4.1.1 Chebychev scalarization second scalarization; the Pareto front is equal to fð0; 0Þ g). In order to prevent this, the augmented Chebychev Another means of scalarization, that will also uncover scalarization provides a solution. It reads: points in concave parts of the Pareto front, is to formulate Fig. 2 Linear scalarization problems with different weights f f 2 2 for (1) convex Pareto fronts, and (2) concave Pareto fronts 3 3 ∗ ∗ 2 y 2 1 1 f f 1 1 1 2 3 1 2 3 (0, 0) (0, 0) 123 M. T. M. Emmerich , A. H. Deutz Fig. 3 Chebychev scalarization problems with different weights f f 2 2 for (1) convex Pareto fronts, and (2) concave Pareto fronts 3 3 ∗ 2 2 ∗ y 1 1 f f 1 1 1 2 3 1 2 3 (0, 0) (0, 0) Fig. 4 Re-formulation of f f 2 2 multiobjective optimziation problems as single-objective constraint handling optimization problems 3 3 2 2 1 1 f f 1 1 1 2 3 1 2 3 (0, 0) (0, 0) difficult to choose an appropriate range for the  values, if minimize max k f ðxÞþ  f ðxÞ; x 2X ; ð2Þ i i i there is no prior knowledge of the location of the Pareto i2f1;...;mg i¼1 m front in R . where  is a sufficiently small, positive constant. 4.1.3 Boundary intersection methods 4.1.2 -constraint method Another often suggested way to find an optimizer is to A rather straightforward approach to turn a multiobjective search for intersection points of rays with the attained optimization problem into a constraint single-objective subset fðXÞ (Jaszkiewicz and Słowin ´ ski 1999). For this optimization problem is the -constraint method. method, one needs to choose a reference point in R , say r, which, if possible, dominates all points in the Pareto front. Definition 18 –constraint Scalarization. Given a MOP, Alternatively, in the Normal Boundary Intersection method the –constraint scalarization is defined as follows. Given (Das and Dennis 1998) the rays can emanate from a line (in m 1 constants  2 R; ...; 2 R, 1 m 1 the bi-objective case) or an m 1 dimensional hyperplane, minimize f ðxÞ; subject to g ðxÞ  ; ...; g ðxÞ  ; 1 1 1 m 1 m 1 in which case lines originate from different evenly spaced reference points (Das and Dennis 1998). Then the follow- where f ; g ; ...; g constitute the m components of 1 1 m 1 ing problem is solved: vector function f of the multiobjective optimization prob- Definition 19 Boundary Intersection Problem. Let d 2 lem (see Definition 1). m m R denote a direction vector and r 2 R denote the ref- erence vector. Then the boundary intersection problem is The method is illustrated in Fig. 4 (left) for  ¼ 2:5 for formulated as: a biobjective problem. Again, by varying the constants 2 R; ...; 2 R, one can obtain different points on 1 m 1 the Pareto front. And again, among the solutions weakly dominated solutions may occur. It can, moreover, be 123 A tutorial on multiobjective optimization: fundamentals and evolutionary methods conditions and for which all objective functions are convex minimize t; in some open -ball B ðxÞ around x. subject to Remark 14 The equation in the Fritz–John Condition ðaÞ r þ td fðxÞ¼ 0; typically does not result in a unique solution. For an n- ðbÞ x 2X ; and dimensional decision space X we have n þ 1 equations and ðcÞ t 2 R we have m þ n unknowns (including the k multipliers). Hence, in a non-degenerate case, the solution set is of dimension m 1. Constraints (a) and (b) in the above problem formulation It is possible to use continuation and homotopy methods enforce that the point is on the ray and also that there exists to obtain all the solutions. The main idea of continuation a pre-image of the point in the decision space. Because t is methods is to find a single solution of the equation system minimized, we obtain the point that is closest to the ref- and then to expand the solution set in the neighborhood of erence point in the direction of d. This method allows some this solution. To decide in which direction to expand, it is intuitive control on the position of resulting Pareto front necessary to maintain an archive, say A, of points that have points. Excepting rare degenerate cases, it will obtain already been obtained. To obtain a new point x in the points on the boundary of the attainable set fðXÞ. However, new neighborhood of a given point from the archive x 2 A the it also requires an approximate knowledge of the position homotopy method conducts the following steps: of the Pareto front. Moreover, it might result in dominated points if the Pareto front is not convex. The method is 1. Using the implicit function theorem a tangent space at illustrated in Fig. 4 (left) for a single direction and refer- the current point is obtained. It yielded an m 1 ence point. dimensional hyperplane that is tangential to fðxÞ and used to obtain a predictor. See for the implicit function theorem, for instance, Krantz and Parks (2003). 5 Numerical algorithms 2. A point on the hyperplane in the desired direction is obtained, thereby avoiding regions that are already Many of the numerical algorithms for solving multiobjec- well covered in A. tive optimization problems make use of scalarization with 3. A corrector is computed minimizing the residual varying parameters. It is then possible to use single-ob- jj k f ðxÞjj. i i jective numerical optimization methods for finding differ- 4. In case the corrector method succeeded to obtain a new ent points on the Pareto front. point in the desired neighborhood, the new point is Besides these, there are methods that focus on solving added to the archive. Otherwise, the direction is saved the Karush-Kuhn-Tucker conditions. These methods aim (to avoid trying it a second time). for covering all solutions to the typically underdetermined See Hillermeier (2001) and Schu ¨ tze et al. (2005) for nonlinear equation system given by these condition. Again, examples and more detailed descriptions. The continuation for the sake of clarity and brevity, in the following treat- and homotopy methods require the efficient set to be ment, we will focus on the unconstrained case, noting that connected. Moreover, they require points to satisfy certain the full Karush-Kuhn-Tucker and Fritz-John conditions regularity conditions (local convexity and differentiability). also feature equality and inequality constraints (Kuhn and Global multiobjective optimization research is still a Tucker 1951). very active field of research. There are some promising Definition 20 Local Efficient Point. A point x 2X is directions, such as subdivision techniques (Dellnitz et al. locally efficient, if there exists  2 R such that [ 0 2005), Bayesian global optimization (Emmerich et al. 6 9y 2B ðxÞ : y  x and x 6¼ y, where B ðxÞ denotes the ˇ f  2016), and Lipschitz optimization (Zilinskas 2013). How- open -ball around x. ever, these require the decision space to be of low dimension. Theorem 1 Fritz–John Conditions. A neccessary condi- Moreover, there is active research on derivative-free tion for x 2X to be locally efficient is given by methods for numerical multiobjective optimization. Direct m m X X search techniques have been devised, for instance, Custo- 9k 0 : k rf ðxÞ¼ 0 and k ¼ 1: i i i dio et al. (2011), and by Audet et al. (2010). For a sum- i¼1 i¼1 mary of derivative-free methods, see Custo ´ dio et al. Theorem 2 Karush–Kuhn–Tucker Conditions. A point (2012). x 2X is locally efficient, if it satisfies the Fritz–John 123 M. T. M. Emmerich , A. H. Deutz spaces. Whereas the selection operators stay the same, the 6 Evolutionary multiobjective optimization variation operators (mutation, recombination) must be adapted to the representations of solutions in the decision Evolutionary algorithms are a major branch of bio-inspired space. search heuristics, which originated in the 1960ties and are There are currently three main paradigms for MOEA widely applied to solve combinatorial and non-convex designs. These are: numerical optimization problems. In short, they use para- digms from natural evolution, such as selection, recombi- 1. Pareto based MOEAs: The Pareto based MOEAs use a nation, and mutation to steer a population (set) of two-level ranking scheme. The Pareto dominance individuals (decision vectors) towards optimal or near-op- relation governs the first ranking and contributions of timal solutions (Ba ¨ck 1996). points to diversity is the principle of the second level Multiobjective evolutionary algorithms (MOEAs) gen- ranking. The second level ranking applies to points that eralize this idea, and typically they are designed to grad- share the same position in the first ranking. NSGA-II ually approach sets of Pareto optimal solutions that are (see Deb et al. 2002) and SPEA2 (see Zitzler and well-distributed across the Pareto front. As there are—in Thiele 1999) are two popular algorithms that fall into general—no single-best solutions in multiobjective opti- this category. mization, the selection schemes of such algorithms differ 2. Indicator based MOEAs: These MOEAs are guided by from those used in single-objective optimization. First an indicator that measures the performance of a set, for MOEAs were developed in the 1990ties—see, e.g., Kur- instance, the hypervolume indicator or the R2 indica- sawe (1990) and Fonseca and Fleming (1993), but since tor. The MOEAs are designed in a way that improve- around the year 2001, after the first book devoted exclu- ments concerning this indicator determine the selection sively to this topic was published by Deb (2001), the procedure or the ranking of individuals. number of methods and results in this field grew rapidly. 3. Decomposition based MOEAs: Here, the algorithm With some exceptions, the distinction between different decomposes the problem into several subproblems, classes of evolutionary multiobjective optimization algo- each one of them targeting different parts of the Pareto rithms is mainly due to the differences in the paradigms front. For each subproblem, a different parametrization used to define the selection operators, whereas the choice (or weighting) of a scalarization method is used. of the variation operators is generic and dependent on the MOEA/D and NSGA-III are well-known methods in problem. As an example, one might consider NSGA-II (see this domain. Deb et al. 2002) as a typical evolutionary multiobjective In this tutorial, we will introduce typical algorithms for optimization algorithm; NSGA-II can be applied to con- each of these paradigms: NSGA-II, SMS-EMOA, and tinuous search spaces as well as to combinatorial search Algorithm 1 NSGA-II Algorithm 1: initialize population P ⊂X 2: while not terminate do 3: {Begin variate} 4: Q ←∅ 5: for all i ∈{1,...,μ} do (1) (2) (1) (2) 6: (x , x ) ← select mates(P ) {select two parent individuals x ∈ P and x ∈ t t P } (i) (1) (2) 7: r ← recombine(x , x ) (i) 8: q ← mutate(r) (i) 9: Q ← Q ∪{q } t t 10: end for 11: {End variate} 12: {Selection step, select μ-”best” out of (P ∪ Q ) by a two step procedure:} t t 13: (R , ..., R ) ← non-dom sort(f,P ∪ Q ) 1 t t 14: Find the element of the partition, R , for which the sum of the cardinalities |R | + i 1 · ··+|R | is for the first time ≥ μ.If |R |+···+|R | = μ, P ←∪ R , otherwise i 1 i t+1 i µ µ i=1 determine set H containing μ − (|R | + ··· + |R |) elements from R with the 1 i −1 i µ µ i −1 highest crowding distance and P ← (∪ R ) ∪ H. t+1 i i=1 15: {End of selection step.} 16: t ← t +1 17: end while 18: return P 123 A tutorial on multiobjective optimization: fundamentals and evolutionary methods Fig. 5 Illustration of non- f f 2 2 dominated sorting (left) and (9) crowding distance (right) (1) (5) (1) y y y (6) (8) 3 3 y y (2) (2) 2 y 2 y (3) (7) (3) y y y (4) (4) 1 y 1 y (5) f f 1 1 1 2 3 1 2 3 (0, 0) (0, 0) MOEA/D. We will discuss important design choices, and two levels. First, non-dominated sorting is performed. This how and why other, similar algorithms deviate in these ranking solely depends on the Pareto order and does not choices. depend on diversity. Secondly, individuals which share the same rank after the first ranking are then ranked according 6.1 Pareto based algorithms: NSGA-II to the crowding distance criterion which is a strong reflection of the diversity. The basic loop of NSGA-II (Deb et al. 2002) is given by Let NDðPÞ denote the non-dominated solutions in some Algorithm 1. population. Non-dominated sorting partitions the popula- tions into subsets (layers) based on Pareto non-dominance Firstly, a population of points is initialized. Then the and it can be specified through recursion as follows. following generational loop is repeated. This loop consists R ¼NDðPÞð3Þ of two parts. In the first, the population undergoes a vari- ation. In the second part, a selection takes place which R ¼NDðP n[ R Þ; k ¼ 1; 2; ... ð4Þ kþ1 i i¼1 results in the new generation-population. The generational As in each step of the recursion at least one solution is loop repeats until it meets some termination criterion, removed from the population, the maximal number of which could be convergence detection criterion (cf. Wag- layers is |P|. We will use the index ‘ to denote the highest ner et al. 2009) or the exceedance of a maximal compu- non-empty layer. The rank of the solution after non-dom- tational budget. inated sorting is given by the subindex k of R . It is clear In the variation part of the loop k offspring are gener- that solutions in the same layer are mutually incomparable. ated. For each offspring, two parents are selected. Each one The non-dominated sorting procedure is illustrated in of them is selected using binary tournament selection, that Fig. 5 (upper left). The solutions are ranked as follows is drawing randomly two individuals from P and selecting ð1Þ ð2Þ ð3Þ ð4Þ ð5Þ ð6Þ ð7Þ R ¼fy ; y ; y ; y g, R ¼fy ; y ; y g, R ¼ 1 2 3 the better one concerning its rank in the population. The ð8Þ ð9Þ fy ; y g. parents are then recombined using a standard recombina- Now, if there is more than one solution in a layer, say R, tion operator. For real-valued problems simulated binary a secondary ranking procedure is used to rank solutions crossover (SBX) is used (see Deb and Argawal 1995). within that layer. This procedure applies the crowding Then the resulting individual is mutated. For real-valued distance criterion. The crowding distance of a solution x 2 problem polynomial mutation (PM) is used (see Mateo and R is computed by a sum over contributions c of the i-th Alberto 2012). This way, k individuals are created, which objective function: are all combinations or modifications of individuals in P . Then the parent and the offspring populations are merged l ðxÞ :¼ maxðff ðyÞjy 2 R nfxg^ f ðyÞ f ðxÞg [ f 1gÞ i i i i into P [ Q . t t ð5Þ In the second part, the selection part, the l best indi- viduals of P [ Q with respect to a multiobjective ranking u ðxÞ :¼ minðff ðyÞjy 2 R nfxg^ f ðyÞ f ðxÞg [ f1gÞ t t i i i i are selected as the new population P . tþ1 ð6Þ Next we digress in order to explain the multiobjective c ðxÞ :¼u l ; i ¼ 1; ...; m ð7Þ i i i ranking which is used in NSGA-II. The key ingredient of NSGA-II that distinguishes it from genetic algorithms for The crowding distance is now given as: single-objective optimization, is the way the individuals are ranked. The ranking procedure of NSGA-II consists of 123 M. T. M. Emmerich , A. H. Deutz X individual depends on how many other individuals it cðxÞ :¼ c ðxÞ; x 2 R ð8Þ dominates and by how many other individuals dominate it. i¼1 Moreover, clustering serves as a secondary ranking crite- For m ¼ 2 the crowding distances of a set of mutually rion. Both operators have been refined in SPEA2 (Zitzler non-dominated points are illustrated in Fig. 5 (upper right). et al. 2001), and also it features a strategy to maintain an In this particular case, they are proportional to the archive of non-dominated solutions. The Multiobjective perimeter of a rectangle that just is intersecting with the Micro GA. Coello and Pulido (2001) is an algorithm that uses a very neighboring points (up to a factor of ). Practically small population size in conjunction with an archive. speaking, the value of l is determined by the nearest Finally, the Differential Evolution Multiobjective Opti- neighbor of x to the left according to the i-coordinate, and mization (DEMO) (Robic and Filipic 2005) algorithm l is equal to the i-th coordinate of this nearest neighbor, combines concepts from Pareto-based MOEAs with a similarly the value of u is determined by the nearest variation operator from differential evolution, which leads neighbor of x to the right according to the i-coordinate, to improved efficiency and more precise results in partic- and u is equal to the i-th coordinate of this right nearest ular for continuous problems. neighbor. The more space there is around a solution, the higher is the crowding distance. Therefore, solutions with a 6.2 Indicator-based algorithms: SMS-EMOA high crowding distance should be ranked better than those with a low crowding distance in order to maintain diversity A second algorithm that we will discuss is a classical in the population. This way we establish a second order algorithm following the paradigm of indicator-based mul- ranking. If the crowding distance is the same for two tiobjective optimization. In the context of MOEAs, by a points, then it is randomly decided which point is ranked performance indicator (or just indicator), we denote a higher. scalar measure of the quality of a Pareto front approxi- Now we explain the non-dom_sort procedure in line 13 mation. Indicators can be unary, meaning that they yield an of Algorithm 1 the role of P is taken over by P \ Q :In t t absolute measure of the quality of a Pareto front approxi- order to select the l best members of P [ Q according to t t mation. They are called binary, whenever they measure the above described two level ranking, we proceed as how much better one Pareto front approximation is con- follows. Create the partition R ; R ; ; R of P [ Q as 1 2 ‘ t t cerning another Pareto front approximation. described above. For this partition one finds the first index The SMS-EMOA (Emmerich et al. 2005) uses the i for which the sum of the cardinalities jR j þ  þ jR j is l 1 i hypervolume indicator as a performance indicator. Theo- for the first time l.If jR j þ  þ jR j¼ l, then set 1 i i retical analysis attests that this indicator has some favor- P to [ R , otherwise determine the set H containing tþ1 i i¼1 able properties, as the maximization of it yields l ðjR j þ  þ jR jÞ elements from R with the 1 i 1 i l l approximations of the Pareto front with points located on highest crowding distance and set the next generation- the Pareto front and well distributed across the Pareto front. i 1 population, P ,to ð[ R Þ[ H. tþ1 i The hypervolume indicator measures the size of the dom- i¼1 Pareto-based Algorithms are probably the largest class inated space, bound from above by a reference point. of MOEAs. They have in common that they combine a For an approximation set A  R it is defined as ranking criterion based on Pareto dominance with a follows: diversity based secondary ranking. Other common algo- HIðAÞ¼ Volðfy 2 R : y  r^9a 2 A: a  ygÞ Pareto Pareto rithms that belong to this class are as follows. The Mul- ð9Þ tiobjective Genetic Algorithm (MOGA) (Fonseca and Fleming 1993), which was one of the first MOEAs. The Here, Vol ð:Þ denotes the Lebesgue measure of a set in PAES (Knowles and Corne 2000), which uses a grid par- dimension m. This is length for m ¼ 1, area for m ¼ 2, titioning of the objective space in order to make sure that volume for m ¼ 3, and hypervolume for m 4. Practically certain regions of the objective space do not get too speaking, the hypervolume indicator of A measures the size crowded. Within a single grid cell, only one solution is of the space that is dominated by A. The closer points move selected. The Strength Pareto Evolutionary Algorithm to the Pareto front, and the more they distribute along the (SPEA) (Zitzler and Thiele 1999) uses a different criterion Pareto front, the more space gets dominated. As the size of for ranking based on Pareto dominance. The strength of an 123 A tutorial on multiobjective optimization: fundamentals and evolutionary methods Algorithm 2 SMS-EMOA initialize P ⊂X while not terminate do {Begin variate} (1) (2) (1) (2) (x , x ) ← select mates(P ) {select two parent individuals x ∈ P and x ∈ P } t t t (1) (2) c ← recombine(x , x ) q ← mutate(c ) t t {End variate} {Begin selection} P ← select (P ∪{q }) {Select subset of size μ with maximal hypervolume indicator t+1 t t from P ∪{q }} {End selection} t ← t +1 end while return P Fig. 6 Illustration of 2-D f f 2 2 r r hypervolume (top left), 2-d hypervolume contributions (top (1) (1) y y right), 3-D hypervolume 3 Y ⊕ R 3 (bottom left), and 3-D hypervolume contributions (2) (2) 2 y 2 y (bottom right) (3) (3) y y (4) (4) 1 y 1 y (5) (5) y y f f 1 1 1 2 3 1 2 3 (0, 0) (0, 0) (4) (4) y y f f 3 3 (3) (3) y y (2) (2) y y (1) (1) y y (5) (5) y y f f f f 1 2 1 2 the dominated space is infinite, it is necessary to bound it. remains unchanged, the hypervolume indicator of P can For this reason, the reference point r is introduced. only grow or stay equal with an increasing number of The SMS-EMOA seeks to maximize the hypervolume iterations t. indicator of a population which serves as an approximation Next, the details of the selection procedure will be dis- set. This is achieved by considering the contribution of cussed. If all solutions in P are non-dominated, the points to the hypervolume indicator in the selection pro- selection of a subset of maximal hypervolume is equivalent cedure. Algorithm 2 describes the basic loop of the stan- to deleting the point with the smallest (exclusive) hyper- dard implementation of the SMS-EMOA. volume contribution. The hypervolume contribution is The algorithm starts with the initialization of a popula- defined as: tion in the search space. Then it creates only one offspring DHIðy; YÞ¼ HIðYÞ HIðY nfygÞ individual by recombination and mutation. This new indi- vidual enters the population, which has now size l þ 1. To An illustration of the hypervolume indicator and reduce the population size again to the size of l, a subset of hypervolume contributions for m ¼ 2 and, respectively, size l with maximal hypervolume is selected. This way as m ¼ 3 is given in Fig. 6. Efficient computation of all long as the reference point for computing the hypervolume hypervolume contributions of a population can be achieved 123 M. T. M. Emmerich , A. H. Deutz in time Hðl log lÞ for m ¼ 2 and m ¼ 3 (Emmerich and scaling of the objective functions. Although the hypervol- Fonseca 2011). For m ¼ 3 or 4, fast implementations are ume indicator has been very prominent in IBEAs, there are described in Guerreiro and Fonseca (2017). Moreover, for some algorithms using other indicators, notably this is the fast logarithmic-time incremental updates for 2-D an R2 indicator (Trautmann et al. 2013), which features an algorithm is described in Hupkens and Emmerich (2013). ideal point as a reference point, and the averaged Hausdorff For achieving logarithmic time updates in SMS-EMOA, distance (D indicator) (Rudolph et al. 2016), which the non-dominated sorting procedure was replaced by a requires an aspiration set or estimation of the Pareto front procedure, that sorts dominated solutions based on age. For which is dynamically updated and used as a reference. The m [ 2, fast incremental updates of the hypervolume indi- idea of aspiration sets for indicators that require knowledge cator and its contributions were proposed in for more than of the ‘true’ Pareto front also occurred in conjunction with two dimensions (Guerreiro and Fonseca 2017). the a-indicator (Wagner et al. 2015), which generalizes the In case dominated solutions appear the standard imple- approximation ratio in numerical single-objective opti- mentation of SMS-EMOA partitions the population into mization. The Portfolio Selection Multiobjective Opti- layers of equal dominance ranks, just like in NSGA-II. mization Algorithm (POSEA) (Yevseyeva et al. 2014) uses Subsequently, the solution with the smallest hypervolume the Sharpe Index from financial portfolio theory as an contribution on the worst ranked layer gets discarded. indicator, which applies the hypervolume indicator of SMS-EMOA typically converges to regularly spaced singletons as a utility function and a definition of the Pareto front approximations. The density of these approx- covariances based on their overlap. The Sharpe index imations depends on the local curvature of the Pareto front. combines the cumulated performance of single individuals For biobjective problems, it is highest at points where the with the covariance information (related to diversity), and slope is equal to 45 (Auger et al. 2009). It is possible to it has interesting theoretical properties. influence the distribution of the points in the approximation set by using a generalized cone-based hypervolume indi- 6.3 Decomposition-based algorithm: MOEA/D cator. These indicators measure the hypervolume domi- nated by a cone-order of a given cone, and the resulting Decomposition-based algorithms divide the problem into optimal distribution gets more uniform if the cones are subproblems using scalarizations based on different acute, and more concentrated when using obtuse cones (see weights. Each scalarization defines a subproblem. The Emmerich et al. 2013). subproblems are then solved simultaneously by dynami- Besides the SMS-EMOA, there are various other indi- cally assigning and re-assigning points to subproblems and cator-based MOEAs. Some of them also use the hyper- exchanging information from solutions to neighboring sub- volume indicator. The original idea to use the hypervolume problems. indicator in an MOEA was proposed in the context of The method defines neighborhoods on the set of these archiving methods for non-dominated points. Here the subproblems based on the distances between their aggre- hypervolume indicator was used for keeping a bounded- gation coefficient vectors. When optimizing a subproblem, size archive (Knowles et al. 2003). Besides, in an early information from neighboring subproblems can be work hypervolume-based selection which also introduced a exchanged, thereby increasing the efficiency of the search novel mutation scheme, which was the focus of the paper as compared to methods that independently solve the (Huband et al. 2003). The term Indicator-based Evolu- subproblems. tionary Algorithms (IBEA) (Zitzler and Ku ¨ nzli 2004) was MOEA/D (Zhang and Li 2007) is a very commonly used introduced in a paper that proposed an algorithm design, in decomposition based method, that succeeded a couple of which the choice of indicators is generic. The hypervol- preceding algorithms based on the idea of combining ume-based IBEA was discussed as one instance of this decomposition, scalarization and local search(Ishibuchi class. Its design is however different to SMS-EMOA and and Murata 1996; Jin et al. 2001; Jaszkiewicz 2002). Note makes no specific use of the characteristics of the hyper- that even the early proposed algorithms VEGA (Schaffer volume indicator. The Hypervolume Estimation Algorithm 1985) and the vector optimization approach of Kursawe (HypE) (Bader and Zitzler 2011) uses a Monte Carlo (see Kursawe 1990) can be considered as rudimentary Estimation for the hypervolume in high dimensions and decomposition based approaches, where these algorithms thus it can be used for optimization with a high number of obtain a problem decomposition by assigning different objectives (so-called many-objective optimization prob- members of a population to different objective functions. lems). MO-CMA-ES (Igel et al. 2006) is another hyper- These early algorithmic designs used subpopulations to volume-based MOEA. It uses the covariance-matrix solve different scalarized problems. In contrast, in MOEA/ adaptation in its mutation operator, which enables it to adapt its mutation distribution to the local curvature and 123 A tutorial on multiobjective optimization: fundamentals and evolutionary methods is denoted with B(i). Given these preliminaries, the MOEA/ D one population with interacting neighboring individuals is applied, which reduces the complexity of the algorithm. D algorithm—using Chebychev scalarization— reads as described in Algorithm 3. Typically, MOEA/D works with Chebychev scalariza- tions, but the authors also suggest other scalarization Note the following two remarks about MOEA/D: (1) Many parts of the algorithm are kept generic. Here, generic methods, namely scalarization based on linear weighting— which however has problems with approximating non- options are recombination, typically instantiated by stan- dard recombination operators from genetic algorithms, and convex Pareto fronts—and scalarization based on boundary intersection methods—which requires additional parame- local improvement heuristic. The local improvement heuristic should find a solution in the vicinity of a given ters and might also obtain strictly dominated points. solution that does not violate constraints and has a rela- MOEA/D evolves a population of individuals, each ðiÞ tively good performance concerning the objective function individual x 2 P being associated with a weight vector ðiÞ ðiÞ values. (2) MOEA/D has additional statements to collect all k . The weight vectors k are evenly distributed in the non-dominated solutions it generates during a run in an search space, e.g., for two objectives a possible choice is: external archive. Because this external archive is only used ðiÞ > k i i k ¼ð ; Þ ; i ¼ 0; ...; l. k k in the final output and does not influence the search The i-th subproblem gðxjk ; z Þ is defined by the Che- dynamics, it can be seen as a generic feature of the algo- bychev scalarization function (see also Eq. 2): rithm. In principle, an external archive can be used in all EMOAs and could therefore also be done in SMS-EMOA ðiÞ ðiÞ gðxjk ; z Þ¼ max fk jf ðxÞ z jg þ  f ðxÞ z j j j j j and NSGA-II. To make comparisons to NSGA-II and j2f1;...;mg j¼1 SMS-EMOA easier, we omitted the archiving strategy in ð10Þ the description. Recently, decomposition-based MOEAs became very The main idea is that in the creation of a new candidate popular, also because they scale well to problems with solution for the i-th individual the neighbors of this indi- many objective functions. The NSGA-III (Deb and Jain vidual are considered. A neighbor is an incumbent solution 2014) algorithm is specially designed for many-objective optimization and uses a set of reference points that is dynamically updated in its decomposition. Another Algorithm 3 MOEA/D (1) (µ) input: Λ = {λ , ..., λ }{weight vectors} input: z : reference point for Chebychev distance initialize P ⊂X (i) initialize neighborhoods B(i) by collecting k nearest weight vectors in Λ for each λ while not terminate do for all i ∈{1, ..., μ} do (1) (2) Select randomly two solutions x , x in the neighborhood B(i). (1) (2) y ← Recombine x , x by a problem specific recombination operator. y ← Local problem specific, heuristic improvement of y, e.g. local search, based on (i) ∗ the scalarized objective function g(x|λ , z ). (i) ∗ (i) (i) ∗ if g(y |λ , z ) <g(x |λ , z ) then (i) x ← y end if Update z , if neccessary, i.e, one of its component is larger than one of the corre- (i) sponding components of f(x ). end for t ← t +1 end while return P of a subproblem with similar weight vectors. The neigh- decomposition based technique is called Generalized borhood of i-th individual is the set of k subproblems, for Decomposition (Giagkiozis et al. 2014). It uses a mathe- ðiÞ matical programming solver to compute updates, and it was so predefined constant k, that is closest to k in the shown to perform well on continuous problems. The Euclidean distance, including the i-th subproblem itself. It combination of mathematical programming and decom- position techniques is also explored in other, more novel, hybrid techniques, such as Directed Search (Schu ¨ tze et al. 123 M. T. M. Emmerich , A. H. Deutz Fig. 7 In the left picture, the set f f 2 2 of points denoted by blue squares is better than (.) the set consisting of the red-circle ◦ ◦ points. Also in the right picture ◦ ◦ the set consisting of blue squares is better than the set of ◦ ◦ red-circle points—in this case . . . . the intersection of the two sets is . . non-empty 2 2 1 1 f f 1 1 ... .. . 1 2 1 2 (0, 0) (0, 0) 2016), which utilizes the Jacobian matrix of the vector- in R . We say that A is better than B if and only if every valued objective function (or approximations to it) to find b 2 B is weakly dominated by at least one element a 2 A promising directions in the search space, based on desired and A 6¼ B. Notation: A.B. directions in the objective space. In Fig. 7 examples are given of the case of one set being better than another and in Fig. 8 examples are given of the case that a set is not better than another. 7 Performance assessment This relation on sets has been used in Zitzler et al. (2003) to classify performance indicators for Pareto fronts. In order to make a judgement (that is, gain insight into the To do so, they introduced the notion of completeness and advantages and disadvantages) of multiobjective evolu- compatibility of these indicators with respect to the set tionary (or for that matter also deterministic) optimizers we relation ‘is better than’. need to take into account (1) the computational resources used, and (2) the quality of the produced result(s). Definition 23 Unary Set Indicator. A unary set indicator is The current state of the art of multiobjective optimiza- a mapping from finite subsets of the objective space to the set of real numbers. It is used to compare (finite) approx- tion approaches are mainly compared empirically though theoretical analyses exist (see, for instance, the conver- imations to the Pareto front. gence results described in Rudolph and Agapie (2000), Definition 24 Compatibility of Unary Set Indicators Beume et al. (2011) albeit for rather simple problems as concerning the ‘is better than’ order on Approximation more realistic problems elude mathematical analysis. Sets. A unary set indicator I is compatible concerning the The most commonly computational resource which is ‘is better than’ or .-relation if and only if taken into account is the computation time which is very IðAÞ [ IðBÞ) A.B. A unary set indicator I is complete often measured implicitly by counting fitness function with respect to the ‘is better than’ or .-relation if and only evaluations—in this respect, there is no difference with if A.B ) IðAÞ [ IðBÞ. If in the last definition we replace[ single-objective optimization. In contrast to single-objec- by then the indicator is called weakly-complete. tive optimization, in multiobjective optimization, a close The hypervolume indicator and some of its variations distance to a (Pareto) optimal solution is not the only thing required but also good coverage of the entire Pareto front. are complete. Other indicators compared in the paper (Zitzler et al. 2003) are weakly-complete or not even As the results of multiobjective optimization algorithms are (finite) approximation sets to the Pareto front we need to be weakly-complete. It has been proven in the same paper that no unary indicator exists that is complete and compatible at able to say when one Pareto front approximation is better than another. One good way to define when one approxi- the same time. Moreover for the hypervolume indicator mation set is better than another is as in Definition 22 (see HI it has be shown that HI ðAÞ [ HI ðBÞ) :ðB.AÞ. Zitzler et al. 2003). The latter we call weakly-compatible. In all the discussions of the hypervolume indicator, the Definition 21 Approximation Set of a Pareto Front. A assumption is that all points of the approximation sets finite subset A of R is an approximation set of a Pareto under consideration dominate the reference point. front if and only if A consists of mutually (Pareto) non- Recently, variations of the hypervolume indicator have dominated points. been proposed—the so-called free hypervolume indica- tors—that do not require the definition of a reference point Definition 22 Comparing Approximation Sets of a Pareto Front. Let A and B be approximation sets of a Pareto front 123 A tutorial on multiobjective optimization: fundamentals and evolutionary methods Fig. 8 In each of the pictures, f f 2 2 the set consisting of the blue square points is not better than the set consisting of the red ◦ ◦ circle points. Clearly, in each of the two pictures on the right the set consisting of the red circle points is better than the set . . . . ◦ consisting of the blue square . . points 2 2 1 1 f f 1 1 ... .. . 1 2 1 2 (0, 0) (0, 0) f f 2 2 . . . . ◦ . . ◦ ◦ 2 2 1 1 f f 1 1 ... .. . 1 2 1 2 (0, 0) (0, 0) and are complete and weakly-compatible for all approxi- properties hold: A.B ) I ðB; AÞ [ 0, the second mation sets (Zitzler et al. 2003). notable property is as follows: Besides unary indicators, one has introduced binary I ðA; BÞ 0 and I ðB; AÞ [ 0 ) A.B. These two proper- indicators (see Riquelme et al. 2015). The most used ones ties show that based on the binary -indicator it is possible are unary indicators followed up by binary indicators. For to decide whether or not A is better than B. In contrast, the binary indicators, the input is a pair of approximation sets knowledge of the hypervolume indicator on the sets A and and the output is again a real number. Here the goal is to B does not allow to decide whether or not A is better than determine which of the two approximation sets is the better B. one (and how much better it is) Some of indicators are useful in case there is knowledge . Binary indicators can also be used, if the true Pareto front is known, e.g., in bench- or complete knowledge about the Pareto front. For instance marking on test problems. A common binary indicator is (see Rudolph et al. 2016), it has been suggested to compute the binary -indicator. In order to introduce this indicator the Hausdorff distance (or variations of it) of an approxi- we first define for each d 2 R a binary relation on the mation set to the Pareto front. Moreover, the binary - points in R . indicator can be transformed into a complete unary indi- cator in case the second input is the known Pareto front— Definition 25 d-domination. Let d 2 R and let a 2 R note that this indicator needs to be minimized. and b 2 R . We say that a d-dominates b (notation: Another useful way to learn about the behavior of a  b) if and only if a  b þ d; i ¼ 1; ...; m. d i i evolutionary multiobjective algorithms is the attainment Next, we can define the binary indicator I . curve approach (see da Fonseca et al. 2001). The idea is to generalize the cumulative distribution function and for the Definition 26 The Binary Indicator I . Given two study of algorithms it is approximated empirically. The approximation sets A and B, then distribution is defined on the set of (finite) approximation I ðA; BÞ :¼ inf f8b 2 B 9a 2 A such that a  bg. d2R d sets of the Pareto front. For each point in the objective Clearly for a fixed B the smaller I ðA; BÞ is the better the space R it is the probability that the Pareto front approximation set A is relative to B. The following approximation attains this point (that is, it is either one point in the approximation set, or it is dominated by some point in the approximation set). Formally, it reads Conceivably one can can introduce k-ary indicators. To our knowledge, so far they have not been used in multiobjective optimization. 123 M. T. M. Emmerich , A. H. Deutz levels than 0.5 in order to get an optimistic or respectively a pessimistic assessment of the performance. In Fig. 9 an example of the median attainment curve is shown. We assume that the four approximation sets are provided by some algorithm. 8 Recent topics in multiobjective optimization Recently, there are many new developments in the field of multiobjective optimization. Next we will list some of the most important trends. ... 1 2 (0, 0) 8.1 Many-objective optimization Fig. 9 The median attainment curve for the case of four approxima- Optimization with more than 3 objectives is currently ter- tion sets; one approximation set consists of the blue squares, the second set consists of points denoted by brown triangles, the third med many-objective optimization [see, for instance, the consists of the red circles, and the fourth consists of points denoted by survey (Li et al. 2015)]. This is to stress the challenges one black crosses; the darker gray the region is the more approximation meets when dealing with more than 3 objectives. The main sets dominate it. The median attainment curve is the black polygonal reasons are: line 1. problems with many objectives have a Pareto front which cannot be visualized in conventional 2D or 3D plots instead other approaches to deal with this are ð1Þ ð2Þ ðkÞ Pða  z _ a  z _ ... _ a  zÞ; needed; ð1Þ ð2Þ ðkÞ 2. the computation time for many indicators and selection where A ¼fa ; a ; .. .; a g is the approximation set and schemes become computationally hard, for instance, z 2 R . In other words P is the probability of an algorithm time complexity of the hypervolume indicator compu- to find at least one solution which attains the goal z in a tation grows super-polynomially with the number of single run. We define the attainment function a : R ! objectives, under the assumption that P 6¼ NP; ½0; 1 associated to the approximation set A as follows: 3. last but not least the ratio of non-dominated points ð1Þ ð2Þ ðkÞ a ðzÞ :¼ Pða  z _ a  z _ ... _ a  zÞ: tends to increase rapidly with the number of objectives. For instance, the probability that a point is non- This function can be approximated by dominated in a uniformly distributed set of sample points grows exponentially fast towards 1 with the a ðzÞ :¼ IðA ; zÞ; s i number of objectives. i¼1 In the field of many-objective optimization different tech- where A ; ...; A are the outcome approximation sets of an 1 s niques are used to deal with these challenges. For the first algorithm in s runs of the algorithm and I : challenge, various visualization techniques are used such as ð set of approximation sets Þ R !f0; 1g is a func- projection to lower dimensional spaces or parallel coordi- tion which associates to an approximation set and a vector nate diagrams. In practice, one can, if the dimension is only in R the value 1 in case the vector is a member of the slightly bigger than 3, express the coordinate values by approximation set or if some element of the approximation colors and shape in 3D plots. set dominates it, otherwise the value is 0. For m ¼ 2or3 Naturally, in many-objective optimization indicators we can plot the boundaries where this function changes its which scale well with the number of objectives (say value. These are the attainment curves (m ¼ 2Þ and polynomially) are very much desired. Moreover, decom- attainment surfaces (m ¼ 3). In particular the median position based approaches are typically preferred to indi- attainment curve/surface gives a good summary of the cator based approaches. behavior of the optimizer. It is the boundary where the The last problem requires, however, more radical devi- function changes from a level below 0.5 to a level higher ations from standard approaches. In many cases, the than 0.5. Alternatively one can look at lower and higher reduction of the search space achieved by reducing it to the efficient set is not sufficiently adequate to allow for 123 A tutorial on multiobjective optimization: fundamentals and evolutionary methods subsequent decision making because too many alternatives simulations, and finite element computations of mechanical remain. In such cases, a stricter order than the Pareto order structures. To deal with such problems techniques that need is required which requires additional preference knowl- only a limited number of function evaluations have been edge. To elicit preference knowledge, interactive methods devised. A common approach is to learn a surrogate model often come to the rescue. Moreover, in some cases, of the objective functions by using all available past objectives are correlated which allows for grouping of evaluations. This is called surrogate model assisted opti- objectives, and in turn, such groups can be aggregated to a mization. One common approach is to optimize on the single objective. Dimensionality reduction and community surrogate model to predict promising locations for evalu- detection techniques have been proposed for identifying ation and use these evaluations to further improve the meaningful aggregation of objective functions. model. In such methods, it is also important to add points for developing the model in under-explored regions of the 8.2 Preference modeling search space. Some criteria such as expected improvement take both exploitation and exploration into account. Sec- The Pareto order is the most applied order in multiobjective ondly, surrogate models can be used in pre-processing in optimization. However, different ranking schemes and the selection phase of evolutionary algorithms. To save partial orders have been proposed in the literature for time, less interesting points can be discarded before they various reasons. Often additional knowledge of user pref- would be evaluated by the costly and precise evaluator. erences is integrated. For instance, One distinguishes Typically regression methods are used to construct surro- preference modeling according to at what stage of the gate models; Gaussian processes and neural networks are optimization the preference information is collected (a standard choices. Surrogate modeling has in the last decade priori, interactively, and a posteriori). Secondly one can been generalized to multiobjective optimization in various distinguish the type of information requested from the ways. Some important early work in this field was on decision maker, for instance, constraints on the trade-offs, surrogate assisted MOEAs (Emmerich et al. 2006) and relative importance of the objectives, and preferred regions ParEGO algorithm (Knowles 2006). A state of the art on the Pareto front. Another way to elicit preference review can be found in Allmendinger et al. (2017). information is by ordinal regression; here the user is asked for pairwise comparisons of some of the solutions. From 8.4 New bio-inspired paradigms this data, the weights of utility functions are learned (Branke et al. 2015). The interested reader is also referred Inspiration by nature has been a creative force for dealing to interesting work on non-monotonic utility functions, with optimization algorithm design. Apart from biological using the Choquet integral (Branke et al. 2016). Notably, evolution, many other natural phenomena have been con- the topic of preference elicitation is one of the main topics sidered. While many of these algorithmic ideas have so far in Multiple Criteria Decision Analysis (MCDA). In recent remained in a somewhat experimental and immature state, years collaboration between MCDA and multiobjective some non-evolutionary bio-inspired optimization algo- optimization (MOO) brought forward many new useful rithms have gained maturation and competitive perfor- approaches. A recommended reference for MCDA is Bel- mance. Among others, this seems to hold for particle ton and Stewart (2002). For a comprehensive overview of swarm optimization (Reyes-Sierra and Coello Coello preference modelling in multiobjective optimization we 2006), ant colony optimization (Baran and Schaerer 2003), refer to Li et al. (2017) and Hakanen et al. (2016). More- and artificial immune systems Coello Coello and Corte ´s over Greco et al. (2016) contains an updated collection of (2005). As with evolutionary algorithms, also these algo- state of the art surveys on MCDA. A good reference dis- rithms have first been developed for single-objective opti- cussing the integration of MCDA into MOO is Branke mization, and subsequently, they have been generalized to et al. (2008). multiobjective optimization. Moreover, there is some recent research on bio-inspired techniques that are specif- 8.3 Optimization with costly function ically developed for multiobjective optimization. An evaluations example of such a development is the Predator-Prey Evo- lutionary Algorithm, where different objectives are repre- In industrial optimization very often the evaluation of (an) sented by different types of predators to which the prey objective function(s) is achieved by simulation or experi- (solutions) have to adapt (Laumanns et al. 1998; Grimme ments. Such evaluations are typically time-consuming and and Schmitt 2006). expensive. Examples of such costly evaluations occur in In the field of natural computing, it is also investigated the optimization based on crash tests of automobiles, whether algorithms can serve as models for nature. It is an chemical experiments, computational fluid dynamics interesting new research direction to view aspects of 123 M. T. M. Emmerich , A. H. Deutz Table 1 Table of (evolutionary) multiobjective optimization software Libraries (evolutionary) multiobjective optimization Name Scope Prog. Lang. url or ref Public domain ecr EA and EMO R Bossek (2017) JMetal Metatheuristics/EMO Java Barba-Gonzalez et al. (2017) Liger MOO/Design Optim. C?? Giagkiozis et al. (2013) MOEA framework EMO Java moeaframework.org/ Opt4J EMO Java opt4j.sourceforge.net PISA EMO C?? Bleuler et al. (2003) PyMOO EMO Python www.openhub.net/p/pymoo RODEOlib Robust Optimization Matlab sourceforge.net/projects/rodeolib/ Shark Library Machine Learning C# image.diku.dk/shark/ SUMO Bayesian Optimization Matlab sumo.intec.ugent.be/SUMO TEA classical EA and MOO C?? Emmerich and Hosenberg (2000) vOptSolver (Linear) MOO Julia voptsolver.github.io/vOptSolver Commercial software EASY Design Optimization C?? velos0.ltt.mech.ntua.gr/EASY/ IND-NIMBUS Design Optimization N/A ind-nimbus.it.jyu.fi ISight Design Optimization N/A www.simuleon.com MODEfrontier Design Optimization N/A www.esteco.com Optimus Design Optimization N/A www.noesissolutions.com WWW-NIMBUS Design Optimization N/A Miettinen and Ma ¨kela ¨ (2000) Performance assessment Performance assessment test problems BBOB/COCO Benchmarking Tool C?? coco.gforge.inria.fr/ WFG Test Suite C?? www.wfg.csse.uwa.edu.au/toolkit/ ZDT/DTLZ Test Suite C?? esa.github.io/pagmo2/ Performance assessment software Attainment surfaces R/C lopez-ibanez.eu/eaftools Hypervolume computation C lopez-ibanez.eu/hypervolume Hypervolume computation Link Collection Various ls11-www.cs.tu-dortmund.de/rudolph/hypervolume/start natural evolution as a multiobjective optimization process, have been developed, and techniques that originated from and first such models have been explored in Rueffler evolutionary multiobjective optimization have been trans- (2006) and Sterck et al. (2011). ferred into deterministic methods. A notable example is the hypervolume indicator gradient ascent method for multi- 8.5 Set-oriented numerical optimization objective optimization (HIGA-MO) (Wang et al. 2017). In this method a set of say l points is represented as a single Traditionally, numerical techniques for multiobjective vector of dimension ln, where n is the dimension of the optimization are single point techniques: They construct a search space. In real-valued decision space the mapping ld Pareto front by formulating a series of single-objective HI: R ! R from the such population vectors to the optimization problems (with different weights or con- hypervolume indicator has a well-defined derivative in straints) or by expanding a Pareto front from a single almost all points. The computation of such derivatives has solution point by point using continuation. In contrast, set- been described in Emmerich and Deutz (2014). Viewing oriented numerical multiobjective optimization operates on the vector of partial derivatives as a gradient, conventional the level of solution sets, the points of which are simulta- gradient methods can be used. It requires, however, some neously improved, and that converge to a well-distributed specific adaptations in order to construct robust and prac- set on the Pareto front. Only very recently such methods tically usable methods for local optimization. On convex 123 A tutorial on multiobjective optimization: fundamentals and evolutionary methods problems, fast linear convergence can be achieved. By table of MOO Software, including also some packages using second-order derivatives in a hypervolume-based that include deterministic solvers. Newton-Raphson method, even quadratic convergence Conferences and Journals: speed has been demonstrated empirically on a set of con- – Conferences: vex bi-objective problems. The theory of such second- order methods is subject to ongoing research (Herna ´ndez – Conference on Evolutionary Computation et al. 2014). (CEC), annual, published by IEEE – Evolutionary Multi-criterion Optimization 8.6 Advanced performance assessment (EMO) biannual conference, proceedings pub- lished by Springer LNCS Despite significant progress on the topic of performance – EVOLVE—a Bridge between Probability, Set assessment in recent years, there are still many unanswered Oriented Numerics and Evolutionary Computa- questions. A bigger field of research is on performance tion, annual until 2015, published by Springer assessment of interactive and many objective optimization. Studies in Computational Intelligence, continued Moreover, the dependency of performance measures on as NEO see below parameters, such as the reference point of the hypervolume – GECCO with EMO track, annual, published by indicator requires further investigation. Some promising ACM work in that direction was recently provided in Ishibuchi – Global Optimization Workshop (GO), biannual, et al. (2017). published by diverse publishers (as special issues, and post-proceedings) – MCDM with EMO track, biannual, published by 9 How to get started? MCDM International Society – Numerical and Evolutionary Optimiza- In the following, we list some of the main resources for the tion(NEO), annual, published by Springer field of (Evolutionary) Multiobjective Optimization. Advances in Computational Intelligence – and others Introductory Books: –Ju ¨ rgen Branke, Kalyanmoy Deb, Kaisa Miettinen, – Journals : COAP, ECJ, EJOR, IEEE TEVC, JOGO, ´ MCDA Journal, and other Optimization, and Oper- Roman Slowinski Multiobjective Optimization : Interactive and evolutionary approaches, Springer, ations Research journals. Aside from the resources mentioned above, there are many – Carlos Coello Coello et al. Evolutionary Algorithms research groups and labs which maintain a repository of for Solving Multi-Objective Problems, 2007, software accompanying their published research, e.g., the Springer MODA group at Leiden University http://moda.liacs.nl and – Kalyanmoy Deb Multi-Objective Optimization using the research group of Carlos Fonseca at Coimbra Univer- Evolutionary Algorithms, Wiley, 2001 sity eden.dei.uc.pt/cmfonsec/software.html. – Matthias Ehrgott Multicriteria Optimization, Springer, 2005 – Joshua Knowles, David Corne, Kalyanmoy Deb 10 Summary and outlook Multiobjective Problem Solving from Nature, Springer, 2007 In this tutorial, we gave an introduction to the field of – Kaisa Miettinen Multiobjective Nonlinear Optimiza- multiobjective optimization. We covered the topics of tion, Kluwer, 2012 order-theoretical foundations, scalarization approaches, Websites: and optimality conditions. As solution methods, we dis- cussed homotopy and evolutionary methods. In the context – EMOO Repository by Carlos Coello Coello http:// of Evolutionary methods, we discussed three state-of-the- neo.lcc.uma.es/emoo/ art techniques in detail, namely NSGA-II, SMS-EMOA, – SIMCO Open Problems http://simco.gforge.inria.fr/ and MOEA/D, each representing a key paradigm in evo- doku.php?id=openproblems; a collection of open lutionary multiobjective algorithm design. NSGA-II served problems and theoretical results on indicator based approaches and complexity theory. There are many implementations of multiobjective a selection of journals in which many articles are published on optimization algorithms available. Table 1 provides a (Evolutionary) Multiobjective Optimization. 123 M. T. M. Emmerich , A. H. Deutz as a representative of Pareto based approaches, SMS- computational complexity, generalizations, for instance, EMOA as an example of Indicator-based approaches, and level set approximation, diversity optimization, and set- MOEA/D as an example of decomposition based approa- oriented optimization, customization and integration into ches. These algorithms have some advantages and multidisciplinary workflows, and scalability to big data, or disadvantages: expensive evaluations. – Pareto-based approaches follow a straightforward Acknowledgements The authors thank the anonymous reviewers very design principle, that is directly based on Pareto much for their constructive and detailed feedback. dominance and diversity preservation (using, for Open Access This article is distributed under the terms of the Creative instance, crowding distance). Usually, these algorithms Commons Attribution 4.0 International License (http://creative require only a few parameters, and larger numbers of commons.org/licenses/by/4.0/), which permits unrestricted use, dis- objective functions do not cause problems. However, it tribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a might be difficult to guarantee and measure conver- link to the Creative Commons license, and indicate if changes were gence and achieve a very regular spacing of solutions. made. – Indicator-based approaches use an indicator for the performance of an approximation set to guide the search. It is possible to assess their convergence References behavior online, and they hold the promise to be more amenable to theoretical analysis. However, the com- Allmendinger R, Emmerich M, Hakanen J, Jin Y, Rigoni E (2017) Surrogate-assisted multicriteria optimization: complexities, putation time often increases rapidly with the number prospective solutions, and business case. J Multi-Criteria Decis of dimensions and the distribution of points in the Anal 24(1–2):5–24 approximation sets might depend critically on the Audet C, Savard G, Zghal W (2010) A mesh adaptive direct search settings of the reference point or other parameters of algorithm for multiobjective optimization. Eur J Oper Res 204(3):545–556 the indicator. Auger A, Bader J, Brockhoff D, Zitzler E (2009) Theory of the – Decomposition-based methods provide a very flexible hypervolume indicator: optimal l-distributions and the choice of framework for algorithm design, as they can incorpo- the reference point. In: Proceedings of the tenth ACM SIGEVO rate various scalarization methods. A disadvantage is workshop on foundations of genetic algorithms, pp 87–102. ACM that they require some a priori knowledge of the Ba ¨ck T (1996) Evolutionary algorithms in theory and practice: position of the Pareto front in the objective space and evolution strategies, evolutionary programming, genetic algo- the number of weight vectors might grow exponentially rithms. Oxford University Press, Oxford with the objective space size, even if the Pareto front is Bader J, Zitzler E (2011) Hype: an algorithm for fast hypervolume- based many-objective optimization. Evol Comput 19(1):45–76 of low dimension. Bara ´n B, Schaerer M (2003) A multiobjective ant colony system for According to the above, choosing the right method depends vehicle routing problem with time windows. In: Proceedings of much on the dimension of the objective space, the number the twenty first IASTED international conference on applied informatics, vol 378. Insbruck, Austria, pp 97–102 of solutions one wants to output, the desired distribution of ´ ´ Barba-Gonzalez C, Garcıa-Nieto J, Nebro AJ, Montes JFA (2017) the solutions (knee-point focused or uniformly spread) and Multi-objective big data optimization with jmetal and spark. In: the a priori knowledge on the location and shape of the EMO, volume 10173 of lecture notes in computer science, Pareto front. pp 16–30. Springer Belton V, Stewart T (2002) Multiple criteria decision analysis: an Due to space constraints, many advanced topics in integrated approach. Springer, Berlin multiobjective optimization are not covered in depth. We Beume N, Laumanns M, Rudolph G (2011). Convergence rates of refer for these topics to the literature. For instance, con- SMS-EMOA on continuous bi-objective problem classes. In: straint handling (Coello Coello 2013), multimodality FOGA, pages 243–252. ACM Bleuler S, Laumanns M, Thiele L, Zitzler E (2003). PISA: A platform (Kerschke et al. 2016), non-convex global optimization and programming language independent interface for search (Zilinskas 2013), and combinatorial optimization (Ehrgott algorithms. In: EMO, volume 2632 of lecture notes in computer and Gandibleux 2000). science, pp 494–508. Springer Bossek J (2017). Ecr 2.0: A modular framework for evolutionary Multiobjective Optimization is a very active field of computation in r. In: Proceedings of the genetic and evolutionary research. There are still many open, challenging problems computation conference companion, GECCO ‘17, in the area. For future development of the research field it pp 1187–1193, New York, NY, USA. ACM will be essential to provide EMO algorithms that are built Branke J, Corrente S, Slowin ´ ski R, Zielniewicz P (2016). Using Choquet integral as preference model in interactive evolutionary around a robust notion of performance and, ideally, also multiobjective optimization. In European Jounal of Operational can be analyzed more rigorously. Major topics for current Research, volume 250, pp 884–901. Springer research are also uncertainty handling and robustness, many-objective optimization, theoretical foundations and 123 A tutorial on multiobjective optimization: fundamentals and evolutionary methods Branke J, Deb K, Miettinen K, Slowinski R, e. (2008). Multiobjective volume 3410 of lecture notes in computer science, pp 62–76. optimization: interactive and evolutionary approaches. In vol- Springer ume 5252 of lecture notes in computer science. Springer Emmerich M, Deutz A (2014) Time complexity and zeros of the Branke J, Greco S, Slowinski R, Zielniewicz P (2015) Learning value hypervolume indicator gradient field. In: EVOLVE-a bridge functions in interactive and evolutionary multiobjective opti- between probability, set oriented numerics, and evolutionary mization. IEEE Trans Evol Comput 19(1):88–102 computation III, pp 169–193. Springer Brockhoff D (2017). GECCO 2017 tutorial on evolutionary multiob- Emmerich M, Deutz A, Kruisselbrink J, Shukla PK (2013) Cone- jective optimization. In: Proceedings of the genetic and evolu- based hypervolume indicators: construction, properties, and tionary computation conference companion, GECCO ‘17, efficient computation. In: International conference on evolution- pp 335–358, New York, NY, USA. ACM ary multi-criterion optimization, pp 111–127. Springer Coello CAC, Pulido GT (2001) A micro-genetic algorithm for Emmerich M, Hosenberg R (2000) Tea: a toolbox for the design of multiobjective optimization. In: EMO, volume 1993 of lecture parallel evolutionary algorithms. Technical report, C??-tech- notes in computer science, pp 126–140. Springer nical report, CI-106/01 Collaborative Research Center (Sonder- Coello Coello CA (2013) Constraint-handling techniques used with forschungsbereich) DFG-SFB 531, Design and Management of evolutionary algorithms. In GECCO (Companion), pp 521–544. Complex Technical Processes and Systems by Means of ACM Computational Intelligence Methods, University of Dortmund Coello Coello CA, Corte ´s NC (2005) Solving multiobjective Emmerich M, Yang K, Deutz A, Wang H, Fonseca CM (2016) A optimization problems using an artificial immune system. Genet multicriteria generalization of bayesian global optimization. Program Evol Mach 6(2):163–190 Springer, Cham, pp 229–242 Coello Coello CA, Van Veldhuizen DA, Lamont GA (2007) Evolu- Emmerich MT, Giannakoglou KC, Naujoks B (2006) Single-and tionary algorithms for solving multi-objective problems, second multiobjective evolutionary optimization assisted by Gaussian edition. Springer Science & Business Media random field metamodels. IEEE Trans Evol Comput Custodio A, Emmerich M, Madeira J (2012) Recent developments in 10(4):421–439 derivative-free multiobjective optimization. Comput Technol Emmerich MTM, Fonseca CM (2011) Computing hypervolume Rev 5:1–30 contributions in low dimensions: asymptotically optimal algo- Custodio AL, Madeira JA, Vaz AIF, Vicente LN (2011) Direct rithm and complexity results. In: EMO, volume 6576 of lecture multisearch for multiobjective optimization. SIAM J Optim notes in computer science, pp 121–135. Springer 21(3):1109–1140 Fleming PJ, Purshouse RC, Lygoe RJ (2005) Many-objective da Fonseca VG, Fonseca CM, Hall AO (2001) Inferential perfor- optimization: an engineering design perspective. In: EMO, mance assessment of stochastic optimisers and the attainment volume 3410 of lecture notes in computer science, pp 14–32. function. In: EMO, volume 1993 of Lecture notes in computer Springer science, pp 213–225. Springer Fonseca CM, Fleming PJ (1993) Genetic algorithms for multiobjec- Das I, Dennis JE (1998) Normal-boundary intersection: a new method tive optimization: Formulation, discussion and generalization. for generating the Pareto surface in nonlinear multicriteria In: ICGA, pp 416–423. Morgan Kaufmann optimization problems. SIAM J Optim 8(3):631–657 Geiger MJ, Sevaux M (2011) The biobjective inventory routing Daskalakis C, Karp RM, Mossel E, Riesenfeld SJ, Verbin E (2011) problem–problem solution and decision support. In: Network Sorting and selection in posets. SIAM J Comput 40(3):597–622 optimization, pp 365–378. Springer Deb K (2001) Multi-objective optimization using evolutionary Giagkiozis I, Lygoe RJ, Fleming PJ (2013) Liger: an open source algorithms. John-Wiley, Chichester integrated optimization environment. In: Proceedings of the 15th Deb K (2008). Introduction to evolutionary multiobjective optimiza- annual conference companion on Genetic and evolutionary tion. In: Branke J, Deb K, Miettinen K, Słowin ´ ski R (eds) computation, pp 1089–1096. ACM Multiobjective optimization: interactive and evolutionary Giagkiozis I, Purshouse RC, Fleming PJ (2014) Generalized decom- approaches, lecture notes in computer science 5252, pp 59–96, position and cross entropy methods for many-objective opti- Berlin, Heidelberg. Springer mization. Inf Sci 282:363–387 Deb K, Argawal RB (1995) Simulated binary crossover for contin- Greco A, Ehrgott M, Figueira J (2016) Multiple criteria decision uous search space. Complex Syst 9(2):115–148 analysis: state of the art surveys, 2nd edn. Springer, Berlin Deb K, Jain H (2014) An evolutionary many-objective optimization Grimme C, Schmitt K (2006) Inside a predator-prey model for multi- algorithm using reference-point-based nondominated sorting objective optimization: a second study. In: GECCO, pp 707–714. approach, part I: solving problems with box constraints. IEEE ACM Trans Evolut Comput 18(4):577–601 Guerreiro AP, Fonseca CM (2017) Computing and updating hyper- Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist volume contributions in up to four dimensions. Technical report, multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol CISUC Technical Report TR-2017-001, University of Coimbra Comput 6(2):182–197 Hakanen J, Chugh T, Sindhya K, Jin Y, Miettinen K (2016) Dellnitz M, Schu ¨ tze O, Hestermeyer T (2005) Covering pareto sets by Connections of reference vectors and different types of prefer- multilevel subdivision techniques. J Optim Theory Appl ence information in interactive multiobjective evolutionary 124(1):113–136 algorithms. In: SSCI, pp 1–8. IEEE ´ ¨ Ehrgott M (2005) Multicriteria optimization. Springer, Berlin Hernandez V A S, Schutze O, Emmerich M (2014) Hypervolume maximization via set based Newton‘s method. In: EVOLVE-a Ehrgott M (2012) Vilfredo Pareto and multi-objective optimization. Optimization stories. Journal der Deutschen Mathematiker- bridge between probability, set oriented numerics, and evolu- Vereiningung, Extra 21:447–453 tionary computation V, pp 15–28. Springer Ehrgott M, Gandibleux X (2000) A survey and annotated bibliogra- Hillermeier C (2001) Nonlinear multiobjective optimization: a phy of multiobjective combinatorial optimization. OR Spectr generalized homotopy approach, vol 135. Springer, Berlin 22(4):425–460 Hopfe CJ, Emmerich MT, Marijt R, Hensen J (2012) Robust multi- Emmerich M, Beume N, Naujoks B (2005). An EMO algorithm using criteria design optimisation in building design. In: Proceedings the hypervolume measure as selection criterion. In: EMO, of building simulation and optimization, Loughborough, UK, pp 118–125 123 M. T. M. Emmerich , A. H. Deutz Huband S, Hingston P, While L, Barone L (2003) An evolution Mateo P, Alberto I (2012) A mutation operator based on a Pareto strategy with probabilistic mutation for multi-objective optimi- ranking for multi-objective evolutionary algorithms. J Heuristics sation. In: The 2003 congress on evolutionary computation, 18(1):53–89 2003. CEC‘03. volume 4, pp 2284–2291. IEEE Miettinen K (2012) Nonlinear multiobjective optimization, vol 12. Hupkens I, Emmerich M (2013) Logarithmic-time updates in SMS- Springer, Berlin EMOA and hypervolume-based archiving. In: EVOLVE—a Miettinen K, Ma ¨kela ¨ MM (2000) Interactive multiobjective opti- bridge between probability, set oriented numerics, and evolu- mization system WWW-NIMBUS on the internet. Comput OR tionary computation IV, volume 227 of advances in intelligent 27(7–8):709–723 systems and computing. Springer, Heidelberg Reyes-Sierra M, Coello Coello C (2006) Multi-objective particle Igel C, Suttorp T, Hansen N (2006) Steady-state selection and swarm optimizers: a survey of the state-of-the-art. Int J Comput efficient covariance matrix update in the multi-objective CMA- Intel Res 2(3):287–308 ES. In: EMO, volume 4403 of lecture notes in computer science, Riquelme N, Von Lu ¨ cken C, Bara ´n B (2015) Performance metrics in pp 171–185. Springer multi-objective optimization. In: 2015 XLI Latin American Ishibuchi H, Imada R, Setoguchi Y, Nojima Y (2017) Reference point computing conference. IEEE specification in hypervolume calculation for fair comparison and Robic T, Filipic B (2005) DEMO: differential evolution for multi- efficient search. In: Proceedings of the genetic and evolutionary objective optimization. In: EMO, volume 3410 of lecture notes computation conference, GECCO ‘17, pp 585–592, New York, in computer science, pp 520–533. Springer NY, USA. ACM Rosenthal S, Borschbach M (2017) Design perspectives of an Ishibuchi H, Murata T (1996) Multi-objective genetic local search evolutionary process for multi-objective molecular optimization. algorithm. In: Proceedings of IEEE international conference on In EMO, volume 10173 of lecture notes in computer science, evolutionary computation, 1996. pp 119–124. IEEE pp 529–544. Springer Jaszkiewicz A (2002) On the performance of multiple-objective Rudolph G, Agapie A (2000) Convergence properties of some multi- genetic local search on the 0/1 knapsack problem-a comparative objective evolutionary algorithms. In: Proceedings of the 2000 experiment. IEEE Trans Evol Comput 6(4):402–412 Congress on evolutionary computation, 2000, volume 2, Jaszkiewicz A, Słowinski R (1999) The ‘Light Beam Search’ pp 1010–1016. IEEE approach-an overview of methodology applications. Eur J Oper Rudolph G, Schu ¨ tze O, Grimme C, Domınguez-Medina C, Trautmann Res 113(2):300–314 H (2016) Optimal averaged Hausdorff archives for bi-objective Jin Y, Okabe T, Sendho B (2001) Adapting weighted aggregation for problems: theoretical and numerical results. Comp Opt Appl multiobjective evolution strategies. In: Evolutionary multi- 64(2):589–618 criterion optimization, pp 96–110. Springer Rueffler C et al (2006) Traits traded off. techreport, Institute of Kerschke P, Wang H, Preuss M, Grimme C, Deutz A H, Trautmann Biology Leiden, Theoretical Biology; Faculty of Mathematics H, Emmerich M (2016) Towards analyzing multimodality of and Natural Sciences; Leiden University continuous multiobjective landscapes. In: PPSN, volume 9921 of Schaffer JD (1985) Multiple objective optimization with vector lecture notes in computer science, pp 962–972. Springer evaluated genetic algorithm. In: Proceeding of the first interna- Knowles J (2006) Parego: a hybrid algorithm with on-line landscape tional conference of genetic algorithms and their application, approximation for expensive multiobjective optimization prob- pp 93–100 lems. IEEE Trans Evol Comput 10(1):50–66 Schu ¨ tze O, Dell’Aere A, Dellnitz M (2005) On continuation methods Knowles J, Corne D, Deb K (2007) Multiobjective problem solving for the numerical treatment of multi-objective optimization from nature. Springer, Berlin problems. In: Dagstuhl Seminar Proceedings. Schloss Dagstuhl- Knowles JD, Corne DW (2000) Approximating the nondominated Leibniz-Zentrum fu ¨ r Informatik front using the Pareto archived evolution strategy. Evol Comput Schu ¨ tze O, Martı ´n A, Lara A, Alvarado S, Salinas E, Coello CAC 8(2):149–172 (2016) The directed search method for multi-objective memetic Knowles JD, Corne DW, Fleischer M (2003) Bounded archiving algorithms. Comp Opt Appl 63(2):305–332 using the Lebesgue measure. In: The 2003 congress on Sterck F, Markesteijn L, Schieving F, Poorter L (2011) Functional evolutionary computation, 2003. CEC‘03. volume 4, traits determine trade-offs and niches in a tropical forest pp 2490–2497. IEEE community. Proc Nat Acad Sci 108(51):20627–20632 Krantz S, Parks H (2003) Implicit function theorem: history, theory, Trautmann H, Wagner T, Brockhoff D (2013) R2-EMOA: focused and applications. Springer, New York multiobjective search using R2-Indicator-Based selection. In Kuhn HW, Tucker AW (1951) Nonlinear programming. In: Proceed- LION, volume 7997 of Lecture Notes in Computer Science, ings of 2nd Berkeley symposium. Berkeley, Berkeley and Los pages 70–74. Springer Angeles. University of California Press, pp 481–492 van der Horst E, Marques-Gallego P, Mulder-Krieger T, van Kung H-T, Luccio F, Preparata FP (1975) On finding the maxima of a Veldhoven J, Kruisselbrink J, Aleman A, Emmerich MT, set of vectors. J ACM (JACM) 22(4):469–476 Brussee J, Bender A, IJzerman AP (2012) Multi-objective Kursawe F (1990) A variant of evolution strategies for vector evolutionary design of adenosine receptor ligands. J Chem Inf optimization. In: PPSN, volume 496 of lecture notes in computer Model 52(7):1713–1721 science, pp 193–197. Springer Wagner M, Bringmann K, Friedrich T, Neumann F (2015) Efficient Laumanns M, Rudolph G, Schwefel H (1998) A spatial predator-prey optimization of many objectives by approximation-guided evolution. Eur J Oper Res 243(2):465–479 approach to multi-objective optimization: a preliminary study. In: PPSN, volume 1498 of lecture notes in computer science, Wagner T, Trautmann H, Naujoks B (2009) OCD: Online conver- pp 241–249. Springer gence detection for evolutionary multi-objective algorithms Li B, Li J, Tang K, Yao X (2015) Many-objective evolutionary based on statistical testing. In Ehrgott, M., Fonseca, C. M., algorithms: a survey. ACM Comput Surv 48(1):13:1–13:35 Gandibleux, X., Hao, J.-K., and Sevaux, M., editors, Evolution- Li L, Yevseyeva I, Fernandes V B, Trautmann H, Jing N, Emmerich ary Multi-Criterion Optimization: 5th International Conference, M (2017) Building and using an ontology of preference-based EMO 2009, Nantes, France, April 7-10, 2009. Proceedings, multiobjective evolutionary algorithms. In EMO, volume 10173 pages 198–215. Springer Berlin Heidelberg, Berlin, Heidelberg of lecture notes in computer science, pp 406–421. Springer 123 A tutorial on multiobjective optimization: fundamentals and evolutionary methods Wang H, Deutz A H, Back T, Emmerich M (2017) Hypervolume Zilinskas A (2013) On the worst-case optimal multi-objective global indicator gradient ascent multi-objective optimization. In EMO, optimization. Optimization Letters 7(8):1921–1928 volume 10173 of Lecture Notes in Computer Science, pages Zitzler E, Kunzli S (2004) Indicator-based selection in multiobjective 654–669. Springer search. In PPSN, volume 3242 of Lecture Notes in Computer Wang P, Emmerich M, Li R, Tang K, Ba ¨ck T, Yao X (2015) Convex Science, pages 832–842. Springer hull-based multiobjective genetic programming for maximizing Zitzler E, Laumanns M, Bleuler S (2004) A tutorial on evolutionary receiver operating characteristic performance. IEEE Trans Evol multiobjective optimization. Metaheuristics for multiobjective Comput 19(2):188–200 optimisation, pages 3–37 Yevseyeva I, Basto-Fernandes V, Ruano-Orda ´sD, Me ´ndez JR (2013) Zitzler E, Laumanns M, Thiele L (2001) SPEA2: Improving the Optimising anti-spam filters with evolutionary algorithms. strength pareto evolutionary algorithm. TIK – Report 103, Expert Syst Appl 40(10):4010–4021 Eidgeno ¨ ssische Technische Hochschule Zu ¨ rich (ETH), Institut Yevseyeva I, Guerreiro A P, Emmerich M T M, Fonseca C M (2014) fu ¨ r Technische Informatik und Kommunikationsnetze (TIK) A portfolio optimization approach to selection in multiobjective Zitzler E, Thiele L (1999) Multiobjective evolutionary algorithms: a evolutionary algorithms. In PPSN, volume 8672 of Lecture comparative case study and the strength Pareto approach. IEEE Notes in Computer Science, pages 672–681. Springer Trans. Evolutionary Computation 3(4):257–271 Zhang Q, Li H (2007) MOEA/D: A multiobjective evolutionary Zitzler E, Thiele L, Laumanns M, Fonseca CM, Da Fonseca VG algorithm based on decomposition. IEEE Trans Evol Comput (2003) Performance assessment of multiobjective optimizers: An 11(6):712–731 analysis and review. IEEE Trans Evol Comput 7(2):117–132 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Natural Computing Springer Journals

A tutorial on multiobjective optimization: fundamentals and evolutionary methods

Free
25 pages
Loading next page...
 
/lp/springer_journal/a-tutorial-on-multiobjective-optimization-fundamentals-and-yAvI8vmfQk
Publisher
Springer Netherlands
Copyright
Copyright © 2018 by The Author(s)
Subject
Computer Science; Theory of Computation; Evolutionary Biology; Processor Architectures; Artificial Intelligence (incl. Robotics); Complex Systems
ISSN
1567-7818
eISSN
1572-9796
D.O.I.
10.1007/s11047-018-9685-y
Publisher site
See Article on Publisher Site

Abstract

In almost no other field of computer science, the idea of using bio-inspired search paradigms has been so useful as in solving multiobjective optimization problems. The idea of using a population of search agents that collectively approxi- mate the Pareto front resonates well with processes in natural evolution, immune systems, and swarm intelligence. Methods such as NSGA-II, SPEA2, SMS-EMOA, MOPSO, and MOEA/D became standard solvers when it comes to solving multiobjective optimization problems. This tutorial will review some of the most important fundamentals in multiobjective optimization and then introduce representative algorithms, illustrate their working principles, and discuss their application scope. In addition, the tutorial will discuss statistical performance assessment. Finally, it highlights recent important trends and closely related research fields. The tutorial is intended for readers, who want to acquire basic knowledge on the mathematical foundations of multiobjective optimization and state-of-the-art methods in evolutionary multiobjective optimization. The aim is to provide a starting point for researching in this active area, and it should also help the advanced reader to identify open research topics. Keywords Multiobjective optimization  Multiobjective evolutionary algorithms  Decomposition-based MOEAs Indicator-based MOEAs  Pareto-based MOEAs  Performance assessment 1 Introduction needed objectives (van der Horst et al. 2012; Rosenthal and Borschbach 2017). Consider making investment choices for an industrial There are countless other examples where multiobjec- process. On the one hand the profit should be maximized tive optimization has been applied or is recently considered and on the other hand environmental emissions should be as a promising field of study. Think, for instance, of the minimized. Another goal is to improve safety and quality minimization of different types of error rates in machine of life of employees. Even in the light of mere economical learning (false positives, false negatives) (Yevseyeva et al. decision making, just following the legal constraints and 2013; Wang et al. 2015), the optimization of delivery costs minimizing production costs can take a turn for the worse. and inventory costs in logistics(Geiger and Sevaux 2011), Another application of multiobjective optimization can the optimization of building designs with respect to health, be found in the medical field. When searching for new energy efficiency, and cost criteria (Hopfe et al. 2012). therapeutic drugs, obviously the potency of the drug is to In the following, we consider a scenario where given the be maximized. But also the minimization of synthesis costs solutions in some space of possible solutions, the so-called and the minimization of unwanted side effects are much- decision space which can be evaluated using the so-called objective functions. These are typically based on com- putable equations but might also be the results of physical experiments. Ultimately, the goal is to find a solution on & Michael T. M. Emmerich which the decision maker can agree, and that is optimal in m.t.m.emmerich@liacs.leidenuniv.nl some sense. Andre H. Deutz When searching for such solutions, it can be interesting a.h.deutz@liacs.leidenuniv.nl to pre-compute or approximate a set of interesting solutions LIACS, Leiden University, Leiden, The Netherlands 123 M. T. M. Emmerich , A. H. Deutz that reveal the essential trade-offs between the objectives. this order. The decision maker has to state additional This strategy implies to avoid so-called Pareto dominated preferences, e.g., weights of the objectives, prior to the solutions, that is solutions that can improve in one objec- optimization. tive without deteriorating the performance in any other 2. A posteriori: A partial order is defined on the objective space R , typically the Pareto order, and the algorithm objective. The Pareto dominance is named after Vilfredo Pareto, an Italian economist. As it was earlier mentioned by searches for the minimal set concerning this partial order over the set of all feasible solutions. The user has Francis Y.Edgeworth, it is also sometimes called Edge- worth-Pareto dominance (see Ehrgott 2012 for some his- to state his/her preferences a posteriori, that is after being informed about the trade-offs among non- torical background). To find or to approximate the set of non-dominated solutions and make a selection among them dominated solutions. is the main topic of multiobjective optimization and multi- 3. Interactive (aka Progressive): The objective functions criterion decision making. Moreover, in case the set of non- and constraints and their prioritization are refined by dominated solutions is known in advance, to aid the deci- requesting user feedback on preferences at multiple sion maker in selecting solutions from this set is the realm points in time during the execution of an algorithm. of decision analysis (aka decision aiding) which is also part In the sequel, the focus will be on a posteriori approaches of multi-criterion decision making. to multiobjective optimization. The a priori approach is often supported by classical single-objective optimization Definition 1 Multiobjective Optimization. Given m ob- algorithms, and we refer to the large body of the literature jective functions f : X! R; ...; f : X! R which map a 1 m that exists for such methods. The a posteriori approach, decision space X into R, a multiobjective optimization however, requires interesting modifications of theorems problem (MOP) is given by the following problem and optimization algorithms—in essence due to the use of statement: partial orders and the desire to compute a set of solutions minimize f ðxÞ; .. .; minimize f ðxÞ; x 2X ð1Þ 1 m rather than a single solution. Interactive methods are highly interesting in real-world applications, but they typically Remark 1 In general, we would demand m [ 1 when we rely upon algorithmic techniques used in a priori and a talk about multiobjective optimization problems. More- posteriori approaches and combine them with intermediate over, there is the convention to call problems with large m, steps of preference elicitation. We will discuss this topic not multiobjective optimization problems but many-ob- briefly at the end of the tutorial. jective optimization problems (see Fleming et al. 2005;Li et al. 2015). The latter problems form a special, albeit important case of multiobjective optimization problems. 2 Related work Remark 2 Definition 1 does not explicitly state constraint functions. However, in practical applications constraints There is a multiple of introductory articles that preceded have to be handled. Mathematical programming techniques this tutorial: often use linear or quadratic approximations of the feasible • In Zitzler et al. (2004) a tutorial on state-of-the-art space to deal with constraints, whereas in evolutionary evolutionary computation methods in 2004 is provided multiobjective optimization constraints are handled by including Strength Pareto Evolutionary Algorithm penalties that increase the objective function values in Version 2 (SPEA2) (Zitzler et al. 2001), Non-domi- proportion to the constraint violation. Typically, penalized nated Sorting Genetic Algorithm II (NSGA-II) (Deb objective function values are always higher than objective et al. 2002), Multiobjective Genetic Algorithm function values of feasible solutions. As it distracts the (MOGA) (Fonseca and Fleming 1993) and Pareto- attention from particular techniques in MOP solving, we Archived Evolution Strategy (PAES) (Knowles and will only consider unconstrained problems. For strategies Corne 2000) method. Indicator-based methods and to handle constraints, see Coello Coello (2013). modern variants of decomposition based methods, that Considering the point(s) in time when the decision our tutorial includes, were not available at that time. maker interacts or provides additional preference infor- • In Deb (2008) an introduction to earlier multiobjective mation, one distinguishes three general approaches to optimization methods is provided, and also in the form multiobjective optimization (Miettinen 2012): of a tutorial. The article contains references to early books in this field and key articles and also discusses 1. A priori: A total order is defined on the objective applications. space, for instance by defining a utility function R ! R and the optimization algorithm finds a minimal point (that is a point in X ) and minimum value concerning 123 A tutorial on multiobjective optimization: fundamentals and evolutionary methods • Derivative-free methods for multiobjective optimiza- – reflexive, if and only if 8x 2 X : ðx; xÞ2 R, tion, including evolutionary and direct search methods, – irreflexive, if and only if 8x 2 X; ðx; xÞ 62 R, are discussed in Custodio et al. (2012). – symmetric, if and only if 8x 2 X : 8y 2 X : ðx; yÞ • On conferences such as GECCO, PPSN, and EMO 2 R ,ðy; xÞ2 R, there have been regularly tutorials and for some of – asymmetric, if and only if 8x 2 X : 8y 2 X : ðx; yÞ these slides are available. A very extensive tutorial 2 R )ðy; xÞ 62 R, based on slides is the citable tutorial by Brockhoff – antisymmetric, if and only if 8x 2 X : 8y 2 X : (2017). ðx; yÞ2 R ^ðy; xÞ2 R ) x ¼ y, – transitive, if and only if 8x 2 X : 8y 2 X : 8z 2 X : Our tutorial is based on teaching material and a reader for a ðx; yÞ2 R ^ðy; zÞ2 R )ðx; zÞ2 R. course on Multiobjective Optimization and Decision Analysis at Leiden University, The Netherlands (http:// Remark 3 Sometimes we will also write xRy for moda.liacs.nl). Besides going into details of algorithm ðx; yÞ2 R. design methodology, it also discusses foundations of mul- tiobjective optimization and order theory. In the light of Now we can define different types of orders: recent developments on hybrid algorithms and links to Definition 3 Pre-order, Partial Order, Strict Partial Order. computational geometry, we considered it valuable to not A binary relation R is said to be a only cover evolutionary methods but also include the basic principles from deterministic multiobjective optimization – pre-order (aka quasi-order), if and only if it is transitive and scalarization-based methods in our tutorial. and reflexive, – partial order, if and only if it is an antisymmetric pre- order, 3 Order and dominance – strict partial order, if and only if it is irreflexive and transitive For the notions we discuss in this section a good reference is Ehrgott (2005). Remark 4 Note that a strict partial order is necessarily The concept of Pareto dominance is of fundamental asymmetric (and therefore also anti-symmetric). importance to multiobjective optimization, as it allows to Proposition 1 Let X be a set and D ¼fðx; xÞjx 2 Xg be compare two objective vectors in a precise sense. That is, the diagonal of X. they can be compared without adding any additional preference information to the problem definition as stated – If R is an anti-symmetric binary relation on X, then any in Definition 1. subset of R is also an anti-symmetric binary relation. In this section, we first discuss partial orders, pre-orders, – If R is irreflexive, then (R is asymmetric if and only if and cones. For partial orders on R there is an important R is antisymmetric). Or: the relation R is asymmetric if geometric way of specifying them with cones. We will and only if R is anti-symmetric and irreflexive. define the Pareto order (aka Edgeworth-Pareto order) on – If R is a pre-order on X, then R . The concept of Pareto dominance is of fundamental fðx; yÞjðx; yÞ2 R and ðy; xÞ 62 Rg, denoted by R , strict importance for multiobjective optimization, as it allows to is transitive and irreflexive. In other words, R is a strict compare two objective vectors in a precise sense (see strict partial order associated to the pre-order R. Definition 5 below). That is, comparisons do not require – If R is a partial order on X, then R n D is irreflexive and adding any additional preference information to the prob- transitive. In other words, R n D is a strict partial lem definition as stated in Definition 1. This way of com- order. Moreover R n D is anti-symmetric (or parison establishes a pre-order (to be defined below) on the asymmetric). set of possible solutions (i.e., the decision space), and it is – If R is a pre-order on X, then (R n D is a strict partial possible to search for the set of its minimal elements—the order if and only if R is asymmetric). efficient set. As partial orders and pre-orders are special binary Remark 5 In general, if R is a pre-order, then R n D does relations, we digress with a discussion on binary relations, not have to be transitive. Therefore, in general, R n D will orders, and pre-orders. not be a strict partial order. Definition 2 Properties of Binary Relations. Given a set X, Definition 4 Minimal Element. A minimal element x 2 X a binary relation on X—that is a set R with R  X  X—is in a (strictly) partially ordered set (X, R) is an element for 0 0 0 said to be which there does not exist an x 2 X with x Rx and x 6¼ x. 123 M. T. M. Emmerich , A. H. Deutz 0 m (In case, the order R is a strict partial order, x Rx implies Definition 7 Strict Component Order on R . Let 0 m x 6¼ x). x; y 2 R . We say x is less than y in the strict component order, denoted by x\y, if and only if x \y ; i ¼ 1; ...; m. i i Definition 5 Pareto Dominance. Given two vectors in the ð1Þ m ð2Þ m Definition 8 (Weakly) Efficient Point, Efficient Set, and objective space, that is y 2 R and y 2 R , then the ð1Þ ð2Þ Pareto Front. point y 2 R is said to Pareto dominate the point y (in ð1Þ ð2Þ symbols y  y Þ, if and only if Pareto – The minimal elements of the Pareto order  on X are ð1Þ ð2Þ ð1Þ ð2Þ called efficient points. 8i 2f1; .. .; mg : y  y and 9j 2f1; .. .; mg : y \y : i i j j – The subset X of all efficient points in X is called the efficient set. ð1Þ ð2Þ In words, in case that y  y the first vector is not Pareto – Let us denote the set of attainable objective vectors worse in each of the objectives and better in at least one with Y :¼ fðXÞ. Then the minimal elements of the objective than the second vector. Pareto order on Y are called the non-dominated or Proposition 2 The Pareto order  on the objective Pareto optimal objective vectors. The subset of all non- Pareto space R is a strict partial order. Moreover ð [ DÞ Pareto dominated objective vectors in Y is called the Pareto is a partial order. We denote this by  or also by  if front. We denote it with Y . Pareto the context provides enough clarity. – A point x 2X is called weakly efficient if and only if there does not exist u 2X such that fðuÞ\fðxÞ. In multiobjective optimization we have to deal with two Moreover, fðxÞ is called weakly non-dominated. spaces: The decision space, which comprises all candidate solutions, and the objective space which is identical to R Remark 7 Clearly, fðX Þ¼Y . E N and it is the space in which the objective function vectors are represented. The vector-valued function f ¼ 3.1 Cone orders ðf ; ...; f Þ maps the decision space X to the objective 1 m m m space R . This mapping and the Pareto order on R as The Pareto order is a special case of a cone order, which defined in Definition 5 can be used to define a pre-order on are orders defined on vector spaces. Defining the Pareto the decision space X as follows. order as a cone order gives rise to geometrical interpreta- tions. We will introduce definitions for R , although cones Definition 6 Pre-order on Search Space. Let x ; x 2X . 1 2 can be defined on more general vector spaces, too. The The solution x is said to Pareto dominate the solution x if 1 2 m m binary relations in this subsection are subsets of R  R and only if fðx Þ fðx Þ. Notation: x Pareto domi- 1 Pareto 2 1 and the cones are subsets of R . nates x is denoted by x  x . 2 1 f 2 Definition 9 Non-trivial Cone. A set C R with ; 6¼ Remark 6 The binary relation  on X is a strict partial C 6¼ R is called a non-trivial cone, if and only if order on X and ð [fðx; xÞj x 2XgÞ is a partial order on 8a 2 R; a [ 0; 8c 2C : ac 2C. X . Note that the pre-order R associated to  via f ( Pareto i.e., x Rx if and only if fðx Þ fðx Þ ) is, in general, 1 2 1 Pareto 2 Remark 8 In the sequel when we say cone we mean non- not asymmetric and therefore, in general, trivial cone. 6¼ R nfðx; xÞj x 2Xg. Definition 10 Minkowski Sum. The Minkowski sum (aka m m Sometimes we need the notion of the so called strict algebraic sum) of two sets A 2 R and B 2 R is defined component order on R and its accompanying notion of as A B :¼fa þ b j a 2 A ^ b 2 Bg. Moreover we define weak non-dominance. aA ¼faaj a 2 Ag. f f f 2 2 2 y ⊕C y ⊕ R 2 C 2 2 y y 1 1 1 f f f 1 1 1 1 2 1 2 1 2 (0, 0) (0, 0) (0, 0) Fig. 1 Example of a cone C(left), Minkowski sum of a singleton fyg and C (middle), and Minkowski sum of fyg and the cone R . The latter is equal to the non-negative quadrant from which the origin is deleted, see also Definition 13 123 A tutorial on multiobjective optimization: fundamentals and evolutionary methods Remark 9 For an illustration of the cone notion and Proposition 3 Let C be a cone and R its associated examples of Minkowski sums see Fig. 1. binary relation (i.e., R ¼fðx; yÞj y x 2Cg). Then the following statements hold. – R is translation and positive multiplication invariant, Definition 11 The binary relation, R , associated to the – R is anti-symmetric if and only if C is pointed, cone C. Given a cone C the binary relation associated to this – R is transitive if and only if C is convex, and moreover, m C cone, notation R , is defined as follows: 8x 2 R : 8y 2 – R is reflexive if and only if 0 2C. R : ðx; yÞ2 R if and only if y 2fxg C. A similar statement can be made if we go in the other Remark 10 It is clear that for any cone C the associated direction, i.e.: binary relation is translation invariant (i.e, if 8u 2 R : ðx; yÞ2 R )ðx þ u; y þ uÞ2 R ) and also Proposition 4 Let R be a translation and positive multi- C C multiplication invariant by any positive real (i.e., plication invariant binary relation and the C the associ- 8a [ 0 : ðx; yÞ2 R )ðax; ayÞ2 R ). Conversely, given ated cone (i.e., C ¼fy xjðx; yÞ2 Rg). Then the C C R a binary relation R which is translation invariant and following statements hold. multiplication invariant by any positive real, the set C :¼ – C is a cone, fy xjðx; yÞ2 Rg is a cone. The above two operations are – R is anti-symmetric if and only if C is pointed, inverses of each other, i.e., to a cone C one associates a – R is transitive if and only if C is convex, and moreover, binary relation R which is translation invariant and mul- – R is reflexive if and only if 0 2C . tiplication invariant by any positive real, and the associated cone of R is C, and conversely starting from a binary In the following definition some important subsets in relation R which is translation invariant and multiplication R ; m 1 are introduced. invariant by any positive real one obtains the cone C and Definition 13 Let m be a natural number bigger or equal the binary relation associated to this cone is R. In short, to 1. The non-negative orthant (aka hyperoctant) of R , there is a natural one to one correspondence between cones m m denoted by R is the set of all elements in R whose and translation invariant and multiplication-invariant-by- coordinates are non-negative. Furthermore, the zero-dom- positive-reals binary relations on R . m m inated orthant, denoted by R , is the set R nf0g. 0 0 Note that for a positive multiplication invariant relation Analogously we define the non-positive orthant of R , R the set C ¼fy x j xRy g is a cone. We restrict our denoted by R , as the set of elements in R the coordi- attention to relations which are translation invariant as well nates of which are non-positive. Furthermore, the set of in order to get the above mentioned bijection between elements in R which dominate the zero vector 0, denoted cones and relations. m m by R , is the set R nf0g. The set of positive reals is 0  0 Also note if a positive multiplication invariant and denoted by R and the set of non-negative reals is [ 0 translation invariant relation R is such that m m denoted by R . ; 6¼ R 6¼ R  R , then the associated cone C is non- trivial. Relations associated to non-trivial cones are non- Remark 12 The sets defined in the previous definition are m m empty and not equal to all of R  R . cones. Remark 11 In general the binary relation R associated to Proposition 5 The Pareto order  on R is given by Pareto a cone is not reflexive nor transitive nor anti-symmetric. the cone order with cone R , also referred to as the For instance, the binary relation R is reflexive if and only Pareto cone. if 0 2C. The following definitions are needed in order to Remark 13 As R is a pointed and convex cone, the state for which cones the associated binary relation is anti- associated binary relation is irreflexive, anti-symmetric and symmetric and/or transitive. transitive (see Proposition 3). Of course, this can be veri- Definition 12 Pointed cone and convex cone. A cone C is fied more directly. pointed if and only if C\ C  f0g where C ¼ The reason to view the Pareto order as derived from a f c j c 2Cg and C is convex if and only if 8c 2C; c 2 1 2 cone is that it gives the opportunity to study this order more C; 8a such that 0  a  1 : ac þð1 aÞc 2C. 1 2 geometrically. For instance, the definition and many of the With these definitions we can specify for which cones properties of the very important hypervolume indicator (to the associated relation is transitive and/or anti-symmetric: be defined later) readily become intuitive. A reason for deviating from the Pareto cone could be to add constraints to the trade-off between solutions. Moreover, see later for a 123 M. T. M. Emmerich , A. H. Deutz discussion, the more general cones turned out to be very of small finite decision spaces, efficient solutions can be useful in generalizing the hypervolume indicator and identified without much effort. In the case of large com- influence the distribution of points in the approximation set binatorial or continuous search spaces, however, opti- to the Pareto front. mization algorithms are needed to find them. Alternatives to the standard Pareto order on R can be easily imagined and constructed by using pointed, convex cones. The alternatives can be used, for instance, in pref- 4 Scalarization techniques erence articulation. Classically, multiobjective optimization problems are often 3.2 Time complexity of basic operations solved using scalarization techniques (see, for instance, on ordered sets Miettinen 2012). Also in the theory and practice of evo- lutionary multiobjective optimization scalarization plays an Partial orders do not have cycles. Let R be a partial order. It important role, especially in the so-called decomposition is easy to see that R does not have cycles. We show that the based approaches. associated strict partial order does not have cycles. That is, In brief, scalarization means that the objective functions there do not exist are aggregated (or reformulated as constraints), and then a constrained single-objective problem is solved. By using ðb ; b Þ2 R n D; ðb ; b Þ2 R n D; ; ðb ; b Þ2 R n D 1 2 2 3 t 1 1 different parameters of the constraints and aggregation function, it is possible to obtain different points on the where D is the diagonal. For suppose such b ; i ¼ 1; ; t Pareto front. However, when using such techniques, certain 1 can be found with this property. Then by transitivity of caveats have to be considered. In fact, one should always R n D (see Proposition 1), we get ðb ; b Þ2 R n D.By 1 t 1 ask the following two questions: assumption, we have ðb ; b Þ2 R n D. Again by transi- t 1 1 tivity, we get ðb ; b Þ2 R n D which is a contradiction. In 1 1 1. Does the optimization of scalarized problems result in other words, R does not have cycles. (The essence of the efficient points? above argument is, that any strict partial order does not 2. Can we obtain all efficient points or vectors on the have cycles.) The absence of cycles for (strict) partial Pareto front by changing the parameters of the orders gives rise to the following proposition. scalarization function or constraints? We will provide four representative examples of scalar- Proposition 6 Let S be a (strict) partially ordered set. ization approaches and analyze whether they have these Then any finite, non-empty subset of S has minimal ele- properties. ments (with respect to the partial order). In particular, any finite, non-empty subset Y  R has minimal elements with 4.1 Linear weighting respect to the Pareto order  . Also any, finite non- Pareto empty subset X X has minimal elements with respect to A simple means to scalarize a problem is to attach non- negative weights (at least one of them positive) to each The question arises: How fast can the minimal elements objective function and then to minimize the weighted sum be obtained? of objective functions. Hence, the multiobjective opti- mization problem is reformulated to: Proposition 7 Given a finite partially ordered set ðX; Þ, the set of minimal elements can be obtained in time Hðn Þ. Definition 14 Linear Scalarization Problem. The linear scalarization problem (LSP) of an MOP using a weight Proof A double nested loop can check non-domination for vector w 2 R , is given by each element. For the lower bound consider the case that all elements in X are incomparable. Only in this case is minimize w f ðxÞ; x 2X : i i X the minimal set. It requires time Xðn Þ to compare all i¼1 pairs (Daskalakis et al. 2011). h Proposition 8 The solution of an LSP is on the Pareto Fortunately, in case of the Pareto ordered set front, no matter which weights in R are chosen. ðX;  Þ, one can find the minimal set faster. The Pareto algorithm suggested by Kung et al. (1975) combines a Proof We show that the solution of the LSP cannot be a dimension sweep algorithm with a divide and conquer dominated point, and therefore, if it exists, it must neces- algorithm and finds the minimal set in time Oðnðlog nÞÞ for sarily be a non-dominated point. Consider a solution of the d 2 m LSP against some weights w 2 R , say x and suppose d ¼ 2 and in time Oðnðlog nÞ Þ for d 3. Hence, in case 123 A tutorial on multiobjective optimization: fundamentals and evolutionary methods this minimal point is dominated. Then there exists an the weighted Chebychev distance to a reference point as an objective vector y 2 fðXÞ with 8i 2f1; ...; mg y  f ðx Þ objective function. i i and for some index j 2f1; ...; mg it holds that y \f ðx Þ. j j P P Definition 17 Chebychev Scalarization Problem. The m m Hence, it must also hold that w y \ w f ðx Þ, i i i i i¼1 i¼1 Chebychev scalarization problem (CSP) of an MOP using a which contradicts the assumption that x is minimal. h weight vector k 2 R , is given by In the literature the notion of convexity (concavity) of minimize max k jf ðxÞ z j; x 2X ; i i i2f1;...;mg Pareto fronts is for the most part not defined formally. Possible formal definitions for convexity and concavity are where z is a reference point, i.e., the ideal point defined as as follows. z ¼ inf f ðxÞ with i ¼ 1; ; m. x2X i Definition 15 Convex Pareto front. A Pareto front is Proposition 10 Let us assume a given set of mutually non- convex if and only if Y R is convex. N m dominated solutions in R (e.g., a Pareto front). Then for every non-dominated point p there exists a set of weights Definition 16 Concave Pareto front. A Pareto front is for a CSP, that makes this point a minimizer of the CSP concave if and only if Y R is convex. provided the reference point z is properly chosen (i.e., the Proposition 9 In case of a convex Pareto front, for each vector p z either lies in the positive or negative solution in Y there is a solution of a linear scalarization orthant). problem for some weight vector w 2 R . Practically speaking, Proposition 10 ensures that by changing the weights, all points of the Pareto front can, in If the Pareto front is non-convex, then, in general, there principle, occur as minimizers of CSP. For the two can be points on the Pareto front which are the solutions of example Pareto fronts, the minimizers of the Chebychev no LSP. Practically speaking, in the case of concave Pareto scalarization function are points on the iso-height lines of fronts, the LSP will tend to give only extremal solutions, the smallest CSP function value which still intersect with that is, solutions that are optimal in one of the objectives. the Pareto front. Clearly, such points are potentially found This phenomenon is illustrated in Fig. 2, where the tan- in convex parts of Pareto fronts as illustrated in Fig. 3 (left) gential points of the dashed lines indicate the solution as well as in concave parts (right).However, it is easy to obtained by minimizing an LSP for different weight choi- construct examples where a CSP obtains minimizers in ces (colors). In the case of the non-convex Pareto front weakly dominated points (see Definition 8). Think for (Fig. 2, right), even equal weights (dark green) cannot lead 2 instance of the case fðXÞ ¼ ½0; 1 . In this case all points on to a solution in the middle part of the Pareto front. Also, by > > the line segment ð0; 0Þ ; ð0; 1Þ and on the line segment solving a series of LSPs with minimizing different > > weighted aggregation functions, it is not possible to obtain ð0; 0Þ ð1; 0Þ are solutions of some Chebychev scalariza- this interesting part of the Pareto front. tion. (The ideal point is 0 ¼ð0; 0Þ , one can take as weights (0, 1) for the first scalarization, and (1, 0) for the 4.1.1 Chebychev scalarization second scalarization; the Pareto front is equal to fð0; 0Þ g). In order to prevent this, the augmented Chebychev Another means of scalarization, that will also uncover scalarization provides a solution. It reads: points in concave parts of the Pareto front, is to formulate Fig. 2 Linear scalarization problems with different weights f f 2 2 for (1) convex Pareto fronts, and (2) concave Pareto fronts 3 3 ∗ ∗ 2 y 2 1 1 f f 1 1 1 2 3 1 2 3 (0, 0) (0, 0) 123 M. T. M. Emmerich , A. H. Deutz Fig. 3 Chebychev scalarization problems with different weights f f 2 2 for (1) convex Pareto fronts, and (2) concave Pareto fronts 3 3 ∗ 2 2 ∗ y 1 1 f f 1 1 1 2 3 1 2 3 (0, 0) (0, 0) Fig. 4 Re-formulation of f f 2 2 multiobjective optimziation problems as single-objective constraint handling optimization problems 3 3 2 2 1 1 f f 1 1 1 2 3 1 2 3 (0, 0) (0, 0) difficult to choose an appropriate range for the  values, if minimize max k f ðxÞþ  f ðxÞ; x 2X ; ð2Þ i i i there is no prior knowledge of the location of the Pareto i2f1;...;mg i¼1 m front in R . where  is a sufficiently small, positive constant. 4.1.3 Boundary intersection methods 4.1.2 -constraint method Another often suggested way to find an optimizer is to A rather straightforward approach to turn a multiobjective search for intersection points of rays with the attained optimization problem into a constraint single-objective subset fðXÞ (Jaszkiewicz and Słowin ´ ski 1999). For this optimization problem is the -constraint method. method, one needs to choose a reference point in R , say r, which, if possible, dominates all points in the Pareto front. Definition 18 –constraint Scalarization. Given a MOP, Alternatively, in the Normal Boundary Intersection method the –constraint scalarization is defined as follows. Given (Das and Dennis 1998) the rays can emanate from a line (in m 1 constants  2 R; ...; 2 R, 1 m 1 the bi-objective case) or an m 1 dimensional hyperplane, minimize f ðxÞ; subject to g ðxÞ  ; ...; g ðxÞ  ; 1 1 1 m 1 m 1 in which case lines originate from different evenly spaced reference points (Das and Dennis 1998). Then the follow- where f ; g ; ...; g constitute the m components of 1 1 m 1 ing problem is solved: vector function f of the multiobjective optimization prob- Definition 19 Boundary Intersection Problem. Let d 2 lem (see Definition 1). m m R denote a direction vector and r 2 R denote the ref- erence vector. Then the boundary intersection problem is The method is illustrated in Fig. 4 (left) for  ¼ 2:5 for formulated as: a biobjective problem. Again, by varying the constants 2 R; ...; 2 R, one can obtain different points on 1 m 1 the Pareto front. And again, among the solutions weakly dominated solutions may occur. It can, moreover, be 123 A tutorial on multiobjective optimization: fundamentals and evolutionary methods conditions and for which all objective functions are convex minimize t; in some open -ball B ðxÞ around x. subject to Remark 14 The equation in the Fritz–John Condition ðaÞ r þ td fðxÞ¼ 0; typically does not result in a unique solution. For an n- ðbÞ x 2X ; and dimensional decision space X we have n þ 1 equations and ðcÞ t 2 R we have m þ n unknowns (including the k multipliers). Hence, in a non-degenerate case, the solution set is of dimension m 1. Constraints (a) and (b) in the above problem formulation It is possible to use continuation and homotopy methods enforce that the point is on the ray and also that there exists to obtain all the solutions. The main idea of continuation a pre-image of the point in the decision space. Because t is methods is to find a single solution of the equation system minimized, we obtain the point that is closest to the ref- and then to expand the solution set in the neighborhood of erence point in the direction of d. This method allows some this solution. To decide in which direction to expand, it is intuitive control on the position of resulting Pareto front necessary to maintain an archive, say A, of points that have points. Excepting rare degenerate cases, it will obtain already been obtained. To obtain a new point x in the points on the boundary of the attainable set fðXÞ. However, new neighborhood of a given point from the archive x 2 A the it also requires an approximate knowledge of the position homotopy method conducts the following steps: of the Pareto front. Moreover, it might result in dominated points if the Pareto front is not convex. The method is 1. Using the implicit function theorem a tangent space at illustrated in Fig. 4 (left) for a single direction and refer- the current point is obtained. It yielded an m 1 ence point. dimensional hyperplane that is tangential to fðxÞ and used to obtain a predictor. See for the implicit function theorem, for instance, Krantz and Parks (2003). 5 Numerical algorithms 2. A point on the hyperplane in the desired direction is obtained, thereby avoiding regions that are already Many of the numerical algorithms for solving multiobjec- well covered in A. tive optimization problems make use of scalarization with 3. A corrector is computed minimizing the residual varying parameters. It is then possible to use single-ob- jj k f ðxÞjj. i i jective numerical optimization methods for finding differ- 4. In case the corrector method succeeded to obtain a new ent points on the Pareto front. point in the desired neighborhood, the new point is Besides these, there are methods that focus on solving added to the archive. Otherwise, the direction is saved the Karush-Kuhn-Tucker conditions. These methods aim (to avoid trying it a second time). for covering all solutions to the typically underdetermined See Hillermeier (2001) and Schu ¨ tze et al. (2005) for nonlinear equation system given by these condition. Again, examples and more detailed descriptions. The continuation for the sake of clarity and brevity, in the following treat- and homotopy methods require the efficient set to be ment, we will focus on the unconstrained case, noting that connected. Moreover, they require points to satisfy certain the full Karush-Kuhn-Tucker and Fritz-John conditions regularity conditions (local convexity and differentiability). also feature equality and inequality constraints (Kuhn and Global multiobjective optimization research is still a Tucker 1951). very active field of research. There are some promising Definition 20 Local Efficient Point. A point x 2X is directions, such as subdivision techniques (Dellnitz et al. locally efficient, if there exists  2 R such that [ 0 2005), Bayesian global optimization (Emmerich et al. 6 9y 2B ðxÞ : y  x and x 6¼ y, where B ðxÞ denotes the ˇ f  2016), and Lipschitz optimization (Zilinskas 2013). How- open -ball around x. ever, these require the decision space to be of low dimension. Theorem 1 Fritz–John Conditions. A neccessary condi- Moreover, there is active research on derivative-free tion for x 2X to be locally efficient is given by methods for numerical multiobjective optimization. Direct m m X X search techniques have been devised, for instance, Custo- 9k 0 : k rf ðxÞ¼ 0 and k ¼ 1: i i i dio et al. (2011), and by Audet et al. (2010). For a sum- i¼1 i¼1 mary of derivative-free methods, see Custo ´ dio et al. Theorem 2 Karush–Kuhn–Tucker Conditions. A point (2012). x 2X is locally efficient, if it satisfies the Fritz–John 123 M. T. M. Emmerich , A. H. Deutz spaces. Whereas the selection operators stay the same, the 6 Evolutionary multiobjective optimization variation operators (mutation, recombination) must be adapted to the representations of solutions in the decision Evolutionary algorithms are a major branch of bio-inspired space. search heuristics, which originated in the 1960ties and are There are currently three main paradigms for MOEA widely applied to solve combinatorial and non-convex designs. These are: numerical optimization problems. In short, they use para- digms from natural evolution, such as selection, recombi- 1. Pareto based MOEAs: The Pareto based MOEAs use a nation, and mutation to steer a population (set) of two-level ranking scheme. The Pareto dominance individuals (decision vectors) towards optimal or near-op- relation governs the first ranking and contributions of timal solutions (Ba ¨ck 1996). points to diversity is the principle of the second level Multiobjective evolutionary algorithms (MOEAs) gen- ranking. The second level ranking applies to points that eralize this idea, and typically they are designed to grad- share the same position in the first ranking. NSGA-II ually approach sets of Pareto optimal solutions that are (see Deb et al. 2002) and SPEA2 (see Zitzler and well-distributed across the Pareto front. As there are—in Thiele 1999) are two popular algorithms that fall into general—no single-best solutions in multiobjective opti- this category. mization, the selection schemes of such algorithms differ 2. Indicator based MOEAs: These MOEAs are guided by from those used in single-objective optimization. First an indicator that measures the performance of a set, for MOEAs were developed in the 1990ties—see, e.g., Kur- instance, the hypervolume indicator or the R2 indica- sawe (1990) and Fonseca and Fleming (1993), but since tor. The MOEAs are designed in a way that improve- around the year 2001, after the first book devoted exclu- ments concerning this indicator determine the selection sively to this topic was published by Deb (2001), the procedure or the ranking of individuals. number of methods and results in this field grew rapidly. 3. Decomposition based MOEAs: Here, the algorithm With some exceptions, the distinction between different decomposes the problem into several subproblems, classes of evolutionary multiobjective optimization algo- each one of them targeting different parts of the Pareto rithms is mainly due to the differences in the paradigms front. For each subproblem, a different parametrization used to define the selection operators, whereas the choice (or weighting) of a scalarization method is used. of the variation operators is generic and dependent on the MOEA/D and NSGA-III are well-known methods in problem. As an example, one might consider NSGA-II (see this domain. Deb et al. 2002) as a typical evolutionary multiobjective In this tutorial, we will introduce typical algorithms for optimization algorithm; NSGA-II can be applied to con- each of these paradigms: NSGA-II, SMS-EMOA, and tinuous search spaces as well as to combinatorial search Algorithm 1 NSGA-II Algorithm 1: initialize population P ⊂X 2: while not terminate do 3: {Begin variate} 4: Q ←∅ 5: for all i ∈{1,...,μ} do (1) (2) (1) (2) 6: (x , x ) ← select mates(P ) {select two parent individuals x ∈ P and x ∈ t t P } (i) (1) (2) 7: r ← recombine(x , x ) (i) 8: q ← mutate(r) (i) 9: Q ← Q ∪{q } t t 10: end for 11: {End variate} 12: {Selection step, select μ-”best” out of (P ∪ Q ) by a two step procedure:} t t 13: (R , ..., R ) ← non-dom sort(f,P ∪ Q ) 1 t t 14: Find the element of the partition, R , for which the sum of the cardinalities |R | + i 1 · ··+|R | is for the first time ≥ μ.If |R |+···+|R | = μ, P ←∪ R , otherwise i 1 i t+1 i µ µ i=1 determine set H containing μ − (|R | + ··· + |R |) elements from R with the 1 i −1 i µ µ i −1 highest crowding distance and P ← (∪ R ) ∪ H. t+1 i i=1 15: {End of selection step.} 16: t ← t +1 17: end while 18: return P 123 A tutorial on multiobjective optimization: fundamentals and evolutionary methods Fig. 5 Illustration of non- f f 2 2 dominated sorting (left) and (9) crowding distance (right) (1) (5) (1) y y y (6) (8) 3 3 y y (2) (2) 2 y 2 y (3) (7) (3) y y y (4) (4) 1 y 1 y (5) f f 1 1 1 2 3 1 2 3 (0, 0) (0, 0) MOEA/D. We will discuss important design choices, and two levels. First, non-dominated sorting is performed. This how and why other, similar algorithms deviate in these ranking solely depends on the Pareto order and does not choices. depend on diversity. Secondly, individuals which share the same rank after the first ranking are then ranked according 6.1 Pareto based algorithms: NSGA-II to the crowding distance criterion which is a strong reflection of the diversity. The basic loop of NSGA-II (Deb et al. 2002) is given by Let NDðPÞ denote the non-dominated solutions in some Algorithm 1. population. Non-dominated sorting partitions the popula- tions into subsets (layers) based on Pareto non-dominance Firstly, a population of points is initialized. Then the and it can be specified through recursion as follows. following generational loop is repeated. This loop consists R ¼NDðPÞð3Þ of two parts. In the first, the population undergoes a vari- ation. In the second part, a selection takes place which R ¼NDðP n[ R Þ; k ¼ 1; 2; ... ð4Þ kþ1 i i¼1 results in the new generation-population. The generational As in each step of the recursion at least one solution is loop repeats until it meets some termination criterion, removed from the population, the maximal number of which could be convergence detection criterion (cf. Wag- layers is |P|. We will use the index ‘ to denote the highest ner et al. 2009) or the exceedance of a maximal compu- non-empty layer. The rank of the solution after non-dom- tational budget. inated sorting is given by the subindex k of R . It is clear In the variation part of the loop k offspring are gener- that solutions in the same layer are mutually incomparable. ated. For each offspring, two parents are selected. Each one The non-dominated sorting procedure is illustrated in of them is selected using binary tournament selection, that Fig. 5 (upper left). The solutions are ranked as follows is drawing randomly two individuals from P and selecting ð1Þ ð2Þ ð3Þ ð4Þ ð5Þ ð6Þ ð7Þ R ¼fy ; y ; y ; y g, R ¼fy ; y ; y g, R ¼ 1 2 3 the better one concerning its rank in the population. The ð8Þ ð9Þ fy ; y g. parents are then recombined using a standard recombina- Now, if there is more than one solution in a layer, say R, tion operator. For real-valued problems simulated binary a secondary ranking procedure is used to rank solutions crossover (SBX) is used (see Deb and Argawal 1995). within that layer. This procedure applies the crowding Then the resulting individual is mutated. For real-valued distance criterion. The crowding distance of a solution x 2 problem polynomial mutation (PM) is used (see Mateo and R is computed by a sum over contributions c of the i-th Alberto 2012). This way, k individuals are created, which objective function: are all combinations or modifications of individuals in P . Then the parent and the offspring populations are merged l ðxÞ :¼ maxðff ðyÞjy 2 R nfxg^ f ðyÞ f ðxÞg [ f 1gÞ i i i i into P [ Q . t t ð5Þ In the second part, the selection part, the l best indi- viduals of P [ Q with respect to a multiobjective ranking u ðxÞ :¼ minðff ðyÞjy 2 R nfxg^ f ðyÞ f ðxÞg [ f1gÞ t t i i i i are selected as the new population P . tþ1 ð6Þ Next we digress in order to explain the multiobjective c ðxÞ :¼u l ; i ¼ 1; ...; m ð7Þ i i i ranking which is used in NSGA-II. The key ingredient of NSGA-II that distinguishes it from genetic algorithms for The crowding distance is now given as: single-objective optimization, is the way the individuals are ranked. The ranking procedure of NSGA-II consists of 123 M. T. M. Emmerich , A. H. Deutz X individual depends on how many other individuals it cðxÞ :¼ c ðxÞ; x 2 R ð8Þ dominates and by how many other individuals dominate it. i¼1 Moreover, clustering serves as a secondary ranking crite- For m ¼ 2 the crowding distances of a set of mutually rion. Both operators have been refined in SPEA2 (Zitzler non-dominated points are illustrated in Fig. 5 (upper right). et al. 2001), and also it features a strategy to maintain an In this particular case, they are proportional to the archive of non-dominated solutions. The Multiobjective perimeter of a rectangle that just is intersecting with the Micro GA. Coello and Pulido (2001) is an algorithm that uses a very neighboring points (up to a factor of ). Practically small population size in conjunction with an archive. speaking, the value of l is determined by the nearest Finally, the Differential Evolution Multiobjective Opti- neighbor of x to the left according to the i-coordinate, and mization (DEMO) (Robic and Filipic 2005) algorithm l is equal to the i-th coordinate of this nearest neighbor, combines concepts from Pareto-based MOEAs with a similarly the value of u is determined by the nearest variation operator from differential evolution, which leads neighbor of x to the right according to the i-coordinate, to improved efficiency and more precise results in partic- and u is equal to the i-th coordinate of this right nearest ular for continuous problems. neighbor. The more space there is around a solution, the higher is the crowding distance. Therefore, solutions with a 6.2 Indicator-based algorithms: SMS-EMOA high crowding distance should be ranked better than those with a low crowding distance in order to maintain diversity A second algorithm that we will discuss is a classical in the population. This way we establish a second order algorithm following the paradigm of indicator-based mul- ranking. If the crowding distance is the same for two tiobjective optimization. In the context of MOEAs, by a points, then it is randomly decided which point is ranked performance indicator (or just indicator), we denote a higher. scalar measure of the quality of a Pareto front approxi- Now we explain the non-dom_sort procedure in line 13 mation. Indicators can be unary, meaning that they yield an of Algorithm 1 the role of P is taken over by P \ Q :In t t absolute measure of the quality of a Pareto front approxi- order to select the l best members of P [ Q according to t t mation. They are called binary, whenever they measure the above described two level ranking, we proceed as how much better one Pareto front approximation is con- follows. Create the partition R ; R ; ; R of P [ Q as 1 2 ‘ t t cerning another Pareto front approximation. described above. For this partition one finds the first index The SMS-EMOA (Emmerich et al. 2005) uses the i for which the sum of the cardinalities jR j þ  þ jR j is l 1 i hypervolume indicator as a performance indicator. Theo- for the first time l.If jR j þ  þ jR j¼ l, then set 1 i i retical analysis attests that this indicator has some favor- P to [ R , otherwise determine the set H containing tþ1 i i¼1 able properties, as the maximization of it yields l ðjR j þ  þ jR jÞ elements from R with the 1 i 1 i l l approximations of the Pareto front with points located on highest crowding distance and set the next generation- the Pareto front and well distributed across the Pareto front. i 1 population, P ,to ð[ R Þ[ H. tþ1 i The hypervolume indicator measures the size of the dom- i¼1 Pareto-based Algorithms are probably the largest class inated space, bound from above by a reference point. of MOEAs. They have in common that they combine a For an approximation set A  R it is defined as ranking criterion based on Pareto dominance with a follows: diversity based secondary ranking. Other common algo- HIðAÞ¼ Volðfy 2 R : y  r^9a 2 A: a  ygÞ Pareto Pareto rithms that belong to this class are as follows. The Mul- ð9Þ tiobjective Genetic Algorithm (MOGA) (Fonseca and Fleming 1993), which was one of the first MOEAs. The Here, Vol ð:Þ denotes the Lebesgue measure of a set in PAES (Knowles and Corne 2000), which uses a grid par- dimension m. This is length for m ¼ 1, area for m ¼ 2, titioning of the objective space in order to make sure that volume for m ¼ 3, and hypervolume for m 4. Practically certain regions of the objective space do not get too speaking, the hypervolume indicator of A measures the size crowded. Within a single grid cell, only one solution is of the space that is dominated by A. The closer points move selected. The Strength Pareto Evolutionary Algorithm to the Pareto front, and the more they distribute along the (SPEA) (Zitzler and Thiele 1999) uses a different criterion Pareto front, the more space gets dominated. As the size of for ranking based on Pareto dominance. The strength of an 123 A tutorial on multiobjective optimization: fundamentals and evolutionary methods Algorithm 2 SMS-EMOA initialize P ⊂X while not terminate do {Begin variate} (1) (2) (1) (2) (x , x ) ← select mates(P ) {select two parent individuals x ∈ P and x ∈ P } t t t (1) (2) c ← recombine(x , x ) q ← mutate(c ) t t {End variate} {Begin selection} P ← select (P ∪{q }) {Select subset of size μ with maximal hypervolume indicator t+1 t t from P ∪{q }} {End selection} t ← t +1 end while return P Fig. 6 Illustration of 2-D f f 2 2 r r hypervolume (top left), 2-d hypervolume contributions (top (1) (1) y y right), 3-D hypervolume 3 Y ⊕ R 3 (bottom left), and 3-D hypervolume contributions (2) (2) 2 y 2 y (bottom right) (3) (3) y y (4) (4) 1 y 1 y (5) (5) y y f f 1 1 1 2 3 1 2 3 (0, 0) (0, 0) (4) (4) y y f f 3 3 (3) (3) y y (2) (2) y y (1) (1) y y (5) (5) y y f f f f 1 2 1 2 the dominated space is infinite, it is necessary to bound it. remains unchanged, the hypervolume indicator of P can For this reason, the reference point r is introduced. only grow or stay equal with an increasing number of The SMS-EMOA seeks to maximize the hypervolume iterations t. indicator of a population which serves as an approximation Next, the details of the selection procedure will be dis- set. This is achieved by considering the contribution of cussed. If all solutions in P are non-dominated, the points to the hypervolume indicator in the selection pro- selection of a subset of maximal hypervolume is equivalent cedure. Algorithm 2 describes the basic loop of the stan- to deleting the point with the smallest (exclusive) hyper- dard implementation of the SMS-EMOA. volume contribution. The hypervolume contribution is The algorithm starts with the initialization of a popula- defined as: tion in the search space. Then it creates only one offspring DHIðy; YÞ¼ HIðYÞ HIðY nfygÞ individual by recombination and mutation. This new indi- vidual enters the population, which has now size l þ 1. To An illustration of the hypervolume indicator and reduce the population size again to the size of l, a subset of hypervolume contributions for m ¼ 2 and, respectively, size l with maximal hypervolume is selected. This way as m ¼ 3 is given in Fig. 6. Efficient computation of all long as the reference point for computing the hypervolume hypervolume contributions of a population can be achieved 123 M. T. M. Emmerich , A. H. Deutz in time Hðl log lÞ for m ¼ 2 and m ¼ 3 (Emmerich and scaling of the objective functions. Although the hypervol- Fonseca 2011). For m ¼ 3 or 4, fast implementations are ume indicator has been very prominent in IBEAs, there are described in Guerreiro and Fonseca (2017). Moreover, for some algorithms using other indicators, notably this is the fast logarithmic-time incremental updates for 2-D an R2 indicator (Trautmann et al. 2013), which features an algorithm is described in Hupkens and Emmerich (2013). ideal point as a reference point, and the averaged Hausdorff For achieving logarithmic time updates in SMS-EMOA, distance (D indicator) (Rudolph et al. 2016), which the non-dominated sorting procedure was replaced by a requires an aspiration set or estimation of the Pareto front procedure, that sorts dominated solutions based on age. For which is dynamically updated and used as a reference. The m [ 2, fast incremental updates of the hypervolume indi- idea of aspiration sets for indicators that require knowledge cator and its contributions were proposed in for more than of the ‘true’ Pareto front also occurred in conjunction with two dimensions (Guerreiro and Fonseca 2017). the a-indicator (Wagner et al. 2015), which generalizes the In case dominated solutions appear the standard imple- approximation ratio in numerical single-objective opti- mentation of SMS-EMOA partitions the population into mization. The Portfolio Selection Multiobjective Opti- layers of equal dominance ranks, just like in NSGA-II. mization Algorithm (POSEA) (Yevseyeva et al. 2014) uses Subsequently, the solution with the smallest hypervolume the Sharpe Index from financial portfolio theory as an contribution on the worst ranked layer gets discarded. indicator, which applies the hypervolume indicator of SMS-EMOA typically converges to regularly spaced singletons as a utility function and a definition of the Pareto front approximations. The density of these approx- covariances based on their overlap. The Sharpe index imations depends on the local curvature of the Pareto front. combines the cumulated performance of single individuals For biobjective problems, it is highest at points where the with the covariance information (related to diversity), and slope is equal to 45 (Auger et al. 2009). It is possible to it has interesting theoretical properties. influence the distribution of the points in the approximation set by using a generalized cone-based hypervolume indi- 6.3 Decomposition-based algorithm: MOEA/D cator. These indicators measure the hypervolume domi- nated by a cone-order of a given cone, and the resulting Decomposition-based algorithms divide the problem into optimal distribution gets more uniform if the cones are subproblems using scalarizations based on different acute, and more concentrated when using obtuse cones (see weights. Each scalarization defines a subproblem. The Emmerich et al. 2013). subproblems are then solved simultaneously by dynami- Besides the SMS-EMOA, there are various other indi- cally assigning and re-assigning points to subproblems and cator-based MOEAs. Some of them also use the hyper- exchanging information from solutions to neighboring sub- volume indicator. The original idea to use the hypervolume problems. indicator in an MOEA was proposed in the context of The method defines neighborhoods on the set of these archiving methods for non-dominated points. Here the subproblems based on the distances between their aggre- hypervolume indicator was used for keeping a bounded- gation coefficient vectors. When optimizing a subproblem, size archive (Knowles et al. 2003). Besides, in an early information from neighboring subproblems can be work hypervolume-based selection which also introduced a exchanged, thereby increasing the efficiency of the search novel mutation scheme, which was the focus of the paper as compared to methods that independently solve the (Huband et al. 2003). The term Indicator-based Evolu- subproblems. tionary Algorithms (IBEA) (Zitzler and Ku ¨ nzli 2004) was MOEA/D (Zhang and Li 2007) is a very commonly used introduced in a paper that proposed an algorithm design, in decomposition based method, that succeeded a couple of which the choice of indicators is generic. The hypervol- preceding algorithms based on the idea of combining ume-based IBEA was discussed as one instance of this decomposition, scalarization and local search(Ishibuchi class. Its design is however different to SMS-EMOA and and Murata 1996; Jin et al. 2001; Jaszkiewicz 2002). Note makes no specific use of the characteristics of the hyper- that even the early proposed algorithms VEGA (Schaffer volume indicator. The Hypervolume Estimation Algorithm 1985) and the vector optimization approach of Kursawe (HypE) (Bader and Zitzler 2011) uses a Monte Carlo (see Kursawe 1990) can be considered as rudimentary Estimation for the hypervolume in high dimensions and decomposition based approaches, where these algorithms thus it can be used for optimization with a high number of obtain a problem decomposition by assigning different objectives (so-called many-objective optimization prob- members of a population to different objective functions. lems). MO-CMA-ES (Igel et al. 2006) is another hyper- These early algorithmic designs used subpopulations to volume-based MOEA. It uses the covariance-matrix solve different scalarized problems. In contrast, in MOEA/ adaptation in its mutation operator, which enables it to adapt its mutation distribution to the local curvature and 123 A tutorial on multiobjective optimization: fundamentals and evolutionary methods is denoted with B(i). Given these preliminaries, the MOEA/ D one population with interacting neighboring individuals is applied, which reduces the complexity of the algorithm. D algorithm—using Chebychev scalarization— reads as described in Algorithm 3. Typically, MOEA/D works with Chebychev scalariza- tions, but the authors also suggest other scalarization Note the following two remarks about MOEA/D: (1) Many parts of the algorithm are kept generic. Here, generic methods, namely scalarization based on linear weighting— which however has problems with approximating non- options are recombination, typically instantiated by stan- dard recombination operators from genetic algorithms, and convex Pareto fronts—and scalarization based on boundary intersection methods—which requires additional parame- local improvement heuristic. The local improvement heuristic should find a solution in the vicinity of a given ters and might also obtain strictly dominated points. solution that does not violate constraints and has a rela- MOEA/D evolves a population of individuals, each ðiÞ tively good performance concerning the objective function individual x 2 P being associated with a weight vector ðiÞ ðiÞ values. (2) MOEA/D has additional statements to collect all k . The weight vectors k are evenly distributed in the non-dominated solutions it generates during a run in an search space, e.g., for two objectives a possible choice is: external archive. Because this external archive is only used ðiÞ > k i i k ¼ð ; Þ ; i ¼ 0; ...; l. k k in the final output and does not influence the search The i-th subproblem gðxjk ; z Þ is defined by the Che- dynamics, it can be seen as a generic feature of the algo- bychev scalarization function (see also Eq. 2): rithm. In principle, an external archive can be used in all EMOAs and could therefore also be done in SMS-EMOA ðiÞ ðiÞ gðxjk ; z Þ¼ max fk jf ðxÞ z jg þ  f ðxÞ z j j j j j and NSGA-II. To make comparisons to NSGA-II and j2f1;...;mg j¼1 SMS-EMOA easier, we omitted the archiving strategy in ð10Þ the description. Recently, decomposition-based MOEAs became very The main idea is that in the creation of a new candidate popular, also because they scale well to problems with solution for the i-th individual the neighbors of this indi- many objective functions. The NSGA-III (Deb and Jain vidual are considered. A neighbor is an incumbent solution 2014) algorithm is specially designed for many-objective optimization and uses a set of reference points that is dynamically updated in its decomposition. Another Algorithm 3 MOEA/D (1) (µ) input: Λ = {λ , ..., λ }{weight vectors} input: z : reference point for Chebychev distance initialize P ⊂X (i) initialize neighborhoods B(i) by collecting k nearest weight vectors in Λ for each λ while not terminate do for all i ∈{1, ..., μ} do (1) (2) Select randomly two solutions x , x in the neighborhood B(i). (1) (2) y ← Recombine x , x by a problem specific recombination operator. y ← Local problem specific, heuristic improvement of y, e.g. local search, based on (i) ∗ the scalarized objective function g(x|λ , z ). (i) ∗ (i) (i) ∗ if g(y |λ , z ) <g(x |λ , z ) then (i) x ← y end if Update z , if neccessary, i.e, one of its component is larger than one of the corre- (i) sponding components of f(x ). end for t ← t +1 end while return P of a subproblem with similar weight vectors. The neigh- decomposition based technique is called Generalized borhood of i-th individual is the set of k subproblems, for Decomposition (Giagkiozis et al. 2014). It uses a mathe- ðiÞ matical programming solver to compute updates, and it was so predefined constant k, that is closest to k in the shown to perform well on continuous problems. The Euclidean distance, including the i-th subproblem itself. It combination of mathematical programming and decom- position techniques is also explored in other, more novel, hybrid techniques, such as Directed Search (Schu ¨ tze et al. 123 M. T. M. Emmerich , A. H. Deutz Fig. 7 In the left picture, the set f f 2 2 of points denoted by blue squares is better than (.) the set consisting of the red-circle ◦ ◦ points. Also in the right picture ◦ ◦ the set consisting of blue squares is better than the set of ◦ ◦ red-circle points—in this case . . . . the intersection of the two sets is . . non-empty 2 2 1 1 f f 1 1 ... .. . 1 2 1 2 (0, 0) (0, 0) 2016), which utilizes the Jacobian matrix of the vector- in R . We say that A is better than B if and only if every valued objective function (or approximations to it) to find b 2 B is weakly dominated by at least one element a 2 A promising directions in the search space, based on desired and A 6¼ B. Notation: A.B. directions in the objective space. In Fig. 7 examples are given of the case of one set being better than another and in Fig. 8 examples are given of the case that a set is not better than another. 7 Performance assessment This relation on sets has been used in Zitzler et al. (2003) to classify performance indicators for Pareto fronts. In order to make a judgement (that is, gain insight into the To do so, they introduced the notion of completeness and advantages and disadvantages) of multiobjective evolu- compatibility of these indicators with respect to the set tionary (or for that matter also deterministic) optimizers we relation ‘is better than’. need to take into account (1) the computational resources used, and (2) the quality of the produced result(s). Definition 23 Unary Set Indicator. A unary set indicator is The current state of the art of multiobjective optimiza- a mapping from finite subsets of the objective space to the set of real numbers. It is used to compare (finite) approx- tion approaches are mainly compared empirically though theoretical analyses exist (see, for instance, the conver- imations to the Pareto front. gence results described in Rudolph and Agapie (2000), Definition 24 Compatibility of Unary Set Indicators Beume et al. (2011) albeit for rather simple problems as concerning the ‘is better than’ order on Approximation more realistic problems elude mathematical analysis. Sets. A unary set indicator I is compatible concerning the The most commonly computational resource which is ‘is better than’ or .-relation if and only if taken into account is the computation time which is very IðAÞ [ IðBÞ) A.B. A unary set indicator I is complete often measured implicitly by counting fitness function with respect to the ‘is better than’ or .-relation if and only evaluations—in this respect, there is no difference with if A.B ) IðAÞ [ IðBÞ. If in the last definition we replace[ single-objective optimization. In contrast to single-objec- by then the indicator is called weakly-complete. tive optimization, in multiobjective optimization, a close The hypervolume indicator and some of its variations distance to a (Pareto) optimal solution is not the only thing required but also good coverage of the entire Pareto front. are complete. Other indicators compared in the paper (Zitzler et al. 2003) are weakly-complete or not even As the results of multiobjective optimization algorithms are (finite) approximation sets to the Pareto front we need to be weakly-complete. It has been proven in the same paper that no unary indicator exists that is complete and compatible at able to say when one Pareto front approximation is better than another. One good way to define when one approxi- the same time. Moreover for the hypervolume indicator mation set is better than another is as in Definition 22 (see HI it has be shown that HI ðAÞ [ HI ðBÞ) :ðB.AÞ. Zitzler et al. 2003). The latter we call weakly-compatible. In all the discussions of the hypervolume indicator, the Definition 21 Approximation Set of a Pareto Front. A assumption is that all points of the approximation sets finite subset A of R is an approximation set of a Pareto under consideration dominate the reference point. front if and only if A consists of mutually (Pareto) non- Recently, variations of the hypervolume indicator have dominated points. been proposed—the so-called free hypervolume indica- tors—that do not require the definition of a reference point Definition 22 Comparing Approximation Sets of a Pareto Front. Let A and B be approximation sets of a Pareto front 123 A tutorial on multiobjective optimization: fundamentals and evolutionary methods Fig. 8 In each of the pictures, f f 2 2 the set consisting of the blue square points is not better than the set consisting of the red ◦ ◦ circle points. Clearly, in each of the two pictures on the right the set consisting of the red circle points is better than the set . . . . ◦ consisting of the blue square . . points 2 2 1 1 f f 1 1 ... .. . 1 2 1 2 (0, 0) (0, 0) f f 2 2 . . . . ◦ . . ◦ ◦ 2 2 1 1 f f 1 1 ... .. . 1 2 1 2 (0, 0) (0, 0) and are complete and weakly-compatible for all approxi- properties hold: A.B ) I ðB; AÞ [ 0, the second mation sets (Zitzler et al. 2003). notable property is as follows: Besides unary indicators, one has introduced binary I ðA; BÞ 0 and I ðB; AÞ [ 0 ) A.B. These two proper- indicators (see Riquelme et al. 2015). The most used ones ties show that based on the binary -indicator it is possible are unary indicators followed up by binary indicators. For to decide whether or not A is better than B. In contrast, the binary indicators, the input is a pair of approximation sets knowledge of the hypervolume indicator on the sets A and and the output is again a real number. Here the goal is to B does not allow to decide whether or not A is better than determine which of the two approximation sets is the better B. one (and how much better it is) Some of indicators are useful in case there is knowledge . Binary indicators can also be used, if the true Pareto front is known, e.g., in bench- or complete knowledge about the Pareto front. For instance marking on test problems. A common binary indicator is (see Rudolph et al. 2016), it has been suggested to compute the binary -indicator. In order to introduce this indicator the Hausdorff distance (or variations of it) of an approxi- we first define for each d 2 R a binary relation on the mation set to the Pareto front. Moreover, the binary - points in R . indicator can be transformed into a complete unary indi- cator in case the second input is the known Pareto front— Definition 25 d-domination. Let d 2 R and let a 2 R note that this indicator needs to be minimized. and b 2 R . We say that a d-dominates b (notation: Another useful way to learn about the behavior of a  b) if and only if a  b þ d; i ¼ 1; ...; m. d i i evolutionary multiobjective algorithms is the attainment Next, we can define the binary indicator I . curve approach (see da Fonseca et al. 2001). The idea is to generalize the cumulative distribution function and for the Definition 26 The Binary Indicator I . Given two study of algorithms it is approximated empirically. The approximation sets A and B, then distribution is defined on the set of (finite) approximation I ðA; BÞ :¼ inf f8b 2 B 9a 2 A such that a  bg. d2R d sets of the Pareto front. For each point in the objective Clearly for a fixed B the smaller I ðA; BÞ is the better the space R it is the probability that the Pareto front approximation set A is relative to B. The following approximation attains this point (that is, it is either one point in the approximation set, or it is dominated by some point in the approximation set). Formally, it reads Conceivably one can can introduce k-ary indicators. To our knowledge, so far they have not been used in multiobjective optimization. 123 M. T. M. Emmerich , A. H. Deutz levels than 0.5 in order to get an optimistic or respectively a pessimistic assessment of the performance. In Fig. 9 an example of the median attainment curve is shown. We assume that the four approximation sets are provided by some algorithm. 8 Recent topics in multiobjective optimization Recently, there are many new developments in the field of multiobjective optimization. Next we will list some of the most important trends. ... 1 2 (0, 0) 8.1 Many-objective optimization Fig. 9 The median attainment curve for the case of four approxima- Optimization with more than 3 objectives is currently ter- tion sets; one approximation set consists of the blue squares, the second set consists of points denoted by brown triangles, the third med many-objective optimization [see, for instance, the consists of the red circles, and the fourth consists of points denoted by survey (Li et al. 2015)]. This is to stress the challenges one black crosses; the darker gray the region is the more approximation meets when dealing with more than 3 objectives. The main sets dominate it. The median attainment curve is the black polygonal reasons are: line 1. problems with many objectives have a Pareto front which cannot be visualized in conventional 2D or 3D plots instead other approaches to deal with this are ð1Þ ð2Þ ðkÞ Pða  z _ a  z _ ... _ a  zÞ; needed; ð1Þ ð2Þ ðkÞ 2. the computation time for many indicators and selection where A ¼fa ; a ; .. .; a g is the approximation set and schemes become computationally hard, for instance, z 2 R . In other words P is the probability of an algorithm time complexity of the hypervolume indicator compu- to find at least one solution which attains the goal z in a tation grows super-polynomially with the number of single run. We define the attainment function a : R ! objectives, under the assumption that P 6¼ NP; ½0; 1 associated to the approximation set A as follows: 3. last but not least the ratio of non-dominated points ð1Þ ð2Þ ðkÞ a ðzÞ :¼ Pða  z _ a  z _ ... _ a  zÞ: tends to increase rapidly with the number of objectives. For instance, the probability that a point is non- This function can be approximated by dominated in a uniformly distributed set of sample points grows exponentially fast towards 1 with the a ðzÞ :¼ IðA ; zÞ; s i number of objectives. i¼1 In the field of many-objective optimization different tech- where A ; ...; A are the outcome approximation sets of an 1 s niques are used to deal with these challenges. For the first algorithm in s runs of the algorithm and I : challenge, various visualization techniques are used such as ð set of approximation sets Þ R !f0; 1g is a func- projection to lower dimensional spaces or parallel coordi- tion which associates to an approximation set and a vector nate diagrams. In practice, one can, if the dimension is only in R the value 1 in case the vector is a member of the slightly bigger than 3, express the coordinate values by approximation set or if some element of the approximation colors and shape in 3D plots. set dominates it, otherwise the value is 0. For m ¼ 2or3 Naturally, in many-objective optimization indicators we can plot the boundaries where this function changes its which scale well with the number of objectives (say value. These are the attainment curves (m ¼ 2Þ and polynomially) are very much desired. Moreover, decom- attainment surfaces (m ¼ 3). In particular the median position based approaches are typically preferred to indi- attainment curve/surface gives a good summary of the cator based approaches. behavior of the optimizer. It is the boundary where the The last problem requires, however, more radical devi- function changes from a level below 0.5 to a level higher ations from standard approaches. In many cases, the than 0.5. Alternatively one can look at lower and higher reduction of the search space achieved by reducing it to the efficient set is not sufficiently adequate to allow for 123 A tutorial on multiobjective optimization: fundamentals and evolutionary methods subsequent decision making because too many alternatives simulations, and finite element computations of mechanical remain. In such cases, a stricter order than the Pareto order structures. To deal with such problems techniques that need is required which requires additional preference knowl- only a limited number of function evaluations have been edge. To elicit preference knowledge, interactive methods devised. A common approach is to learn a surrogate model often come to the rescue. Moreover, in some cases, of the objective functions by using all available past objectives are correlated which allows for grouping of evaluations. This is called surrogate model assisted opti- objectives, and in turn, such groups can be aggregated to a mization. One common approach is to optimize on the single objective. Dimensionality reduction and community surrogate model to predict promising locations for evalu- detection techniques have been proposed for identifying ation and use these evaluations to further improve the meaningful aggregation of objective functions. model. In such methods, it is also important to add points for developing the model in under-explored regions of the 8.2 Preference modeling search space. Some criteria such as expected improvement take both exploitation and exploration into account. Sec- The Pareto order is the most applied order in multiobjective ondly, surrogate models can be used in pre-processing in optimization. However, different ranking schemes and the selection phase of evolutionary algorithms. To save partial orders have been proposed in the literature for time, less interesting points can be discarded before they various reasons. Often additional knowledge of user pref- would be evaluated by the costly and precise evaluator. erences is integrated. For instance, One distinguishes Typically regression methods are used to construct surro- preference modeling according to at what stage of the gate models; Gaussian processes and neural networks are optimization the preference information is collected (a standard choices. Surrogate modeling has in the last decade priori, interactively, and a posteriori). Secondly one can been generalized to multiobjective optimization in various distinguish the type of information requested from the ways. Some important early work in this field was on decision maker, for instance, constraints on the trade-offs, surrogate assisted MOEAs (Emmerich et al. 2006) and relative importance of the objectives, and preferred regions ParEGO algorithm (Knowles 2006). A state of the art on the Pareto front. Another way to elicit preference review can be found in Allmendinger et al. (2017). information is by ordinal regression; here the user is asked for pairwise comparisons of some of the solutions. From 8.4 New bio-inspired paradigms this data, the weights of utility functions are learned (Branke et al. 2015). The interested reader is also referred Inspiration by nature has been a creative force for dealing to interesting work on non-monotonic utility functions, with optimization algorithm design. Apart from biological using the Choquet integral (Branke et al. 2016). Notably, evolution, many other natural phenomena have been con- the topic of preference elicitation is one of the main topics sidered. While many of these algorithmic ideas have so far in Multiple Criteria Decision Analysis (MCDA). In recent remained in a somewhat experimental and immature state, years collaboration between MCDA and multiobjective some non-evolutionary bio-inspired optimization algo- optimization (MOO) brought forward many new useful rithms have gained maturation and competitive perfor- approaches. A recommended reference for MCDA is Bel- mance. Among others, this seems to hold for particle ton and Stewart (2002). For a comprehensive overview of swarm optimization (Reyes-Sierra and Coello Coello preference modelling in multiobjective optimization we 2006), ant colony optimization (Baran and Schaerer 2003), refer to Li et al. (2017) and Hakanen et al. (2016). More- and artificial immune systems Coello Coello and Corte ´s over Greco et al. (2016) contains an updated collection of (2005). As with evolutionary algorithms, also these algo- state of the art surveys on MCDA. A good reference dis- rithms have first been developed for single-objective opti- cussing the integration of MCDA into MOO is Branke mization, and subsequently, they have been generalized to et al. (2008). multiobjective optimization. Moreover, there is some recent research on bio-inspired techniques that are specif- 8.3 Optimization with costly function ically developed for multiobjective optimization. An evaluations example of such a development is the Predator-Prey Evo- lutionary Algorithm, where different objectives are repre- In industrial optimization very often the evaluation of (an) sented by different types of predators to which the prey objective function(s) is achieved by simulation or experi- (solutions) have to adapt (Laumanns et al. 1998; Grimme ments. Such evaluations are typically time-consuming and and Schmitt 2006). expensive. Examples of such costly evaluations occur in In the field of natural computing, it is also investigated the optimization based on crash tests of automobiles, whether algorithms can serve as models for nature. It is an chemical experiments, computational fluid dynamics interesting new research direction to view aspects of 123 M. T. M. Emmerich , A. H. Deutz Table 1 Table of (evolutionary) multiobjective optimization software Libraries (evolutionary) multiobjective optimization Name Scope Prog. Lang. url or ref Public domain ecr EA and EMO R Bossek (2017) JMetal Metatheuristics/EMO Java Barba-Gonzalez et al. (2017) Liger MOO/Design Optim. C?? Giagkiozis et al. (2013) MOEA framework EMO Java moeaframework.org/ Opt4J EMO Java opt4j.sourceforge.net PISA EMO C?? Bleuler et al. (2003) PyMOO EMO Python www.openhub.net/p/pymoo RODEOlib Robust Optimization Matlab sourceforge.net/projects/rodeolib/ Shark Library Machine Learning C# image.diku.dk/shark/ SUMO Bayesian Optimization Matlab sumo.intec.ugent.be/SUMO TEA classical EA and MOO C?? Emmerich and Hosenberg (2000) vOptSolver (Linear) MOO Julia voptsolver.github.io/vOptSolver Commercial software EASY Design Optimization C?? velos0.ltt.mech.ntua.gr/EASY/ IND-NIMBUS Design Optimization N/A ind-nimbus.it.jyu.fi ISight Design Optimization N/A www.simuleon.com MODEfrontier Design Optimization N/A www.esteco.com Optimus Design Optimization N/A www.noesissolutions.com WWW-NIMBUS Design Optimization N/A Miettinen and Ma ¨kela ¨ (2000) Performance assessment Performance assessment test problems BBOB/COCO Benchmarking Tool C?? coco.gforge.inria.fr/ WFG Test Suite C?? www.wfg.csse.uwa.edu.au/toolkit/ ZDT/DTLZ Test Suite C?? esa.github.io/pagmo2/ Performance assessment software Attainment surfaces R/C lopez-ibanez.eu/eaftools Hypervolume computation C lopez-ibanez.eu/hypervolume Hypervolume computation Link Collection Various ls11-www.cs.tu-dortmund.de/rudolph/hypervolume/start natural evolution as a multiobjective optimization process, have been developed, and techniques that originated from and first such models have been explored in Rueffler evolutionary multiobjective optimization have been trans- (2006) and Sterck et al. (2011). ferred into deterministic methods. A notable example is the hypervolume indicator gradient ascent method for multi- 8.5 Set-oriented numerical optimization objective optimization (HIGA-MO) (Wang et al. 2017). In this method a set of say l points is represented as a single Traditionally, numerical techniques for multiobjective vector of dimension ln, where n is the dimension of the optimization are single point techniques: They construct a search space. In real-valued decision space the mapping ld Pareto front by formulating a series of single-objective HI: R ! R from the such population vectors to the optimization problems (with different weights or con- hypervolume indicator has a well-defined derivative in straints) or by expanding a Pareto front from a single almost all points. The computation of such derivatives has solution point by point using continuation. In contrast, set- been described in Emmerich and Deutz (2014). Viewing oriented numerical multiobjective optimization operates on the vector of partial derivatives as a gradient, conventional the level of solution sets, the points of which are simulta- gradient methods can be used. It requires, however, some neously improved, and that converge to a well-distributed specific adaptations in order to construct robust and prac- set on the Pareto front. Only very recently such methods tically usable methods for local optimization. On convex 123 A tutorial on multiobjective optimization: fundamentals and evolutionary methods problems, fast linear convergence can be achieved. By table of MOO Software, including also some packages using second-order derivatives in a hypervolume-based that include deterministic solvers. Newton-Raphson method, even quadratic convergence Conferences and Journals: speed has been demonstrated empirically on a set of con- – Conferences: vex bi-objective problems. The theory of such second- order methods is subject to ongoing research (Herna ´ndez – Conference on Evolutionary Computation et al. 2014). (CEC), annual, published by IEEE – Evolutionary Multi-criterion Optimization 8.6 Advanced performance assessment (EMO) biannual conference, proceedings pub- lished by Springer LNCS Despite significant progress on the topic of performance – EVOLVE—a Bridge between Probability, Set assessment in recent years, there are still many unanswered Oriented Numerics and Evolutionary Computa- questions. A bigger field of research is on performance tion, annual until 2015, published by Springer assessment of interactive and many objective optimization. Studies in Computational Intelligence, continued Moreover, the dependency of performance measures on as NEO see below parameters, such as the reference point of the hypervolume – GECCO with EMO track, annual, published by indicator requires further investigation. Some promising ACM work in that direction was recently provided in Ishibuchi – Global Optimization Workshop (GO), biannual, et al. (2017). published by diverse publishers (as special issues, and post-proceedings) – MCDM with EMO track, biannual, published by 9 How to get started? MCDM International Society – Numerical and Evolutionary Optimiza- In the following, we list some of the main resources for the tion(NEO), annual, published by Springer field of (Evolutionary) Multiobjective Optimization. Advances in Computational Intelligence – and others Introductory Books: –Ju ¨ rgen Branke, Kalyanmoy Deb, Kaisa Miettinen, – Journals : COAP, ECJ, EJOR, IEEE TEVC, JOGO, ´ MCDA Journal, and other Optimization, and Oper- Roman Slowinski Multiobjective Optimization : Interactive and evolutionary approaches, Springer, ations Research journals. Aside from the resources mentioned above, there are many – Carlos Coello Coello et al. Evolutionary Algorithms research groups and labs which maintain a repository of for Solving Multi-Objective Problems, 2007, software accompanying their published research, e.g., the Springer MODA group at Leiden University http://moda.liacs.nl and – Kalyanmoy Deb Multi-Objective Optimization using the research group of Carlos Fonseca at Coimbra Univer- Evolutionary Algorithms, Wiley, 2001 sity eden.dei.uc.pt/cmfonsec/software.html. – Matthias Ehrgott Multicriteria Optimization, Springer, 2005 – Joshua Knowles, David Corne, Kalyanmoy Deb 10 Summary and outlook Multiobjective Problem Solving from Nature, Springer, 2007 In this tutorial, we gave an introduction to the field of – Kaisa Miettinen Multiobjective Nonlinear Optimiza- multiobjective optimization. We covered the topics of tion, Kluwer, 2012 order-theoretical foundations, scalarization approaches, Websites: and optimality conditions. As solution methods, we dis- cussed homotopy and evolutionary methods. In the context – EMOO Repository by Carlos Coello Coello http:// of Evolutionary methods, we discussed three state-of-the- neo.lcc.uma.es/emoo/ art techniques in detail, namely NSGA-II, SMS-EMOA, – SIMCO Open Problems http://simco.gforge.inria.fr/ and MOEA/D, each representing a key paradigm in evo- doku.php?id=openproblems; a collection of open lutionary multiobjective algorithm design. NSGA-II served problems and theoretical results on indicator based approaches and complexity theory. There are many implementations of multiobjective a selection of journals in which many articles are published on optimization algorithms available. Table 1 provides a (Evolutionary) Multiobjective Optimization. 123 M. T. M. Emmerich , A. H. Deutz as a representative of Pareto based approaches, SMS- computational complexity, generalizations, for instance, EMOA as an example of Indicator-based approaches, and level set approximation, diversity optimization, and set- MOEA/D as an example of decomposition based approa- oriented optimization, customization and integration into ches. These algorithms have some advantages and multidisciplinary workflows, and scalability to big data, or disadvantages: expensive evaluations. – Pareto-based approaches follow a straightforward Acknowledgements The authors thank the anonymous reviewers very design principle, that is directly based on Pareto much for their constructive and detailed feedback. dominance and diversity preservation (using, for Open Access This article is distributed under the terms of the Creative instance, crowding distance). Usually, these algorithms Commons Attribution 4.0 International License (http://creative require only a few parameters, and larger numbers of commons.org/licenses/by/4.0/), which permits unrestricted use, dis- objective functions do not cause problems. However, it tribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a might be difficult to guarantee and measure conver- link to the Creative Commons license, and indicate if changes were gence and achieve a very regular spacing of solutions. made. – Indicator-based approaches use an indicator for the performance of an approximation set to guide the search. It is possible to assess their convergence References behavior online, and they hold the promise to be more amenable to theoretical analysis. However, the com- Allmendinger R, Emmerich M, Hakanen J, Jin Y, Rigoni E (2017) Surrogate-assisted multicriteria optimization: complexities, putation time often increases rapidly with the number prospective solutions, and business case. J Multi-Criteria Decis of dimensions and the distribution of points in the Anal 24(1–2):5–24 approximation sets might depend critically on the Audet C, Savard G, Zghal W (2010) A mesh adaptive direct search settings of the reference point or other parameters of algorithm for multiobjective optimization. Eur J Oper Res 204(3):545–556 the indicator. Auger A, Bader J, Brockhoff D, Zitzler E (2009) Theory of the – Decomposition-based methods provide a very flexible hypervolume indicator: optimal l-distributions and the choice of framework for algorithm design, as they can incorpo- the reference point. In: Proceedings of the tenth ACM SIGEVO rate various scalarization methods. A disadvantage is workshop on foundations of genetic algorithms, pp 87–102. ACM that they require some a priori knowledge of the Ba ¨ck T (1996) Evolutionary algorithms in theory and practice: position of the Pareto front in the objective space and evolution strategies, evolutionary programming, genetic algo- the number of weight vectors might grow exponentially rithms. Oxford University Press, Oxford with the objective space size, even if the Pareto front is Bader J, Zitzler E (2011) Hype: an algorithm for fast hypervolume- based many-objective optimization. Evol Comput 19(1):45–76 of low dimension. Bara ´n B, Schaerer M (2003) A multiobjective ant colony system for According to the above, choosing the right method depends vehicle routing problem with time windows. In: Proceedings of much on the dimension of the objective space, the number the twenty first IASTED international conference on applied informatics, vol 378. Insbruck, Austria, pp 97–102 of solutions one wants to output, the desired distribution of ´ ´ Barba-Gonzalez C, Garcıa-Nieto J, Nebro AJ, Montes JFA (2017) the solutions (knee-point focused or uniformly spread) and Multi-objective big data optimization with jmetal and spark. In: the a priori knowledge on the location and shape of the EMO, volume 10173 of lecture notes in computer science, Pareto front. pp 16–30. Springer Belton V, Stewart T (2002) Multiple criteria decision analysis: an Due to space constraints, many advanced topics in integrated approach. Springer, Berlin multiobjective optimization are not covered in depth. We Beume N, Laumanns M, Rudolph G (2011). Convergence rates of refer for these topics to the literature. For instance, con- SMS-EMOA on continuous bi-objective problem classes. In: straint handling (Coello Coello 2013), multimodality FOGA, pages 243–252. ACM Bleuler S, Laumanns M, Thiele L, Zitzler E (2003). PISA: A platform (Kerschke et al. 2016), non-convex global optimization and programming language independent interface for search (Zilinskas 2013), and combinatorial optimization (Ehrgott algorithms. In: EMO, volume 2632 of lecture notes in computer and Gandibleux 2000). science, pp 494–508. Springer Bossek J (2017). Ecr 2.0: A modular framework for evolutionary Multiobjective Optimization is a very active field of computation in r. In: Proceedings of the genetic and evolutionary research. There are still many open, challenging problems computation conference companion, GECCO ‘17, in the area. For future development of the research field it pp 1187–1193, New York, NY, USA. ACM will be essential to provide EMO algorithms that are built Branke J, Corrente S, Slowin ´ ski R, Zielniewicz P (2016). Using Choquet integral as preference model in interactive evolutionary around a robust notion of performance and, ideally, also multiobjective optimization. In European Jounal of Operational can be analyzed more rigorously. Major topics for current Research, volume 250, pp 884–901. Springer research are also uncertainty handling and robustness, many-objective optimization, theoretical foundations and 123 A tutorial on multiobjective optimization: fundamentals and evolutionary methods Branke J, Deb K, Miettinen K, Slowinski R, e. (2008). Multiobjective volume 3410 of lecture notes in computer science, pp 62–76. optimization: interactive and evolutionary approaches. In vol- Springer ume 5252 of lecture notes in computer science. Springer Emmerich M, Deutz A (2014) Time complexity and zeros of the Branke J, Greco S, Slowinski R, Zielniewicz P (2015) Learning value hypervolume indicator gradient field. In: EVOLVE-a bridge functions in interactive and evolutionary multiobjective opti- between probability, set oriented numerics, and evolutionary mization. IEEE Trans Evol Comput 19(1):88–102 computation III, pp 169–193. Springer Brockhoff D (2017). GECCO 2017 tutorial on evolutionary multiob- Emmerich M, Deutz A, Kruisselbrink J, Shukla PK (2013) Cone- jective optimization. In: Proceedings of the genetic and evolu- based hypervolume indicators: construction, properties, and tionary computation conference companion, GECCO ‘17, efficient computation. In: International conference on evolution- pp 335–358, New York, NY, USA. ACM ary multi-criterion optimization, pp 111–127. Springer Coello CAC, Pulido GT (2001) A micro-genetic algorithm for Emmerich M, Hosenberg R (2000) Tea: a toolbox for the design of multiobjective optimization. In: EMO, volume 1993 of lecture parallel evolutionary algorithms. Technical report, C??-tech- notes in computer science, pp 126–140. Springer nical report, CI-106/01 Collaborative Research Center (Sonder- Coello Coello CA (2013) Constraint-handling techniques used with forschungsbereich) DFG-SFB 531, Design and Management of evolutionary algorithms. In GECCO (Companion), pp 521–544. Complex Technical Processes and Systems by Means of ACM Computational Intelligence Methods, University of Dortmund Coello Coello CA, Corte ´s NC (2005) Solving multiobjective Emmerich M, Yang K, Deutz A, Wang H, Fonseca CM (2016) A optimization problems using an artificial immune system. Genet multicriteria generalization of bayesian global optimization. Program Evol Mach 6(2):163–190 Springer, Cham, pp 229–242 Coello Coello CA, Van Veldhuizen DA, Lamont GA (2007) Evolu- Emmerich MT, Giannakoglou KC, Naujoks B (2006) Single-and tionary algorithms for solving multi-objective problems, second multiobjective evolutionary optimization assisted by Gaussian edition. Springer Science & Business Media random field metamodels. IEEE Trans Evol Comput Custodio A, Emmerich M, Madeira J (2012) Recent developments in 10(4):421–439 derivative-free multiobjective optimization. Comput Technol Emmerich MTM, Fonseca CM (2011) Computing hypervolume Rev 5:1–30 contributions in low dimensions: asymptotically optimal algo- Custodio AL, Madeira JA, Vaz AIF, Vicente LN (2011) Direct rithm and complexity results. In: EMO, volume 6576 of lecture multisearch for multiobjective optimization. SIAM J Optim notes in computer science, pp 121–135. Springer 21(3):1109–1140 Fleming PJ, Purshouse RC, Lygoe RJ (2005) Many-objective da Fonseca VG, Fonseca CM, Hall AO (2001) Inferential perfor- optimization: an engineering design perspective. In: EMO, mance assessment of stochastic optimisers and the attainment volume 3410 of lecture notes in computer science, pp 14–32. function. In: EMO, volume 1993 of Lecture notes in computer Springer science, pp 213–225. Springer Fonseca CM, Fleming PJ (1993) Genetic algorithms for multiobjec- Das I, Dennis JE (1998) Normal-boundary intersection: a new method tive optimization: Formulation, discussion and generalization. for generating the Pareto surface in nonlinear multicriteria In: ICGA, pp 416–423. Morgan Kaufmann optimization problems. SIAM J Optim 8(3):631–657 Geiger MJ, Sevaux M (2011) The biobjective inventory routing Daskalakis C, Karp RM, Mossel E, Riesenfeld SJ, Verbin E (2011) problem–problem solution and decision support. In: Network Sorting and selection in posets. SIAM J Comput 40(3):597–622 optimization, pp 365–378. Springer Deb K (2001) Multi-objective optimization using evolutionary Giagkiozis I, Lygoe RJ, Fleming PJ (2013) Liger: an open source algorithms. John-Wiley, Chichester integrated optimization environment. In: Proceedings of the 15th Deb K (2008). Introduction to evolutionary multiobjective optimiza- annual conference companion on Genetic and evolutionary tion. In: Branke J, Deb K, Miettinen K, Słowin ´ ski R (eds) computation, pp 1089–1096. ACM Multiobjective optimization: interactive and evolutionary Giagkiozis I, Purshouse RC, Fleming PJ (2014) Generalized decom- approaches, lecture notes in computer science 5252, pp 59–96, position and cross entropy methods for many-objective opti- Berlin, Heidelberg. Springer mization. Inf Sci 282:363–387 Deb K, Argawal RB (1995) Simulated binary crossover for contin- Greco A, Ehrgott M, Figueira J (2016) Multiple criteria decision uous search space. Complex Syst 9(2):115–148 analysis: state of the art surveys, 2nd edn. Springer, Berlin Deb K, Jain H (2014) An evolutionary many-objective optimization Grimme C, Schmitt K (2006) Inside a predator-prey model for multi- algorithm using reference-point-based nondominated sorting objective optimization: a second study. In: GECCO, pp 707–714. approach, part I: solving problems with box constraints. IEEE ACM Trans Evolut Comput 18(4):577–601 Guerreiro AP, Fonseca CM (2017) Computing and updating hyper- Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist volume contributions in up to four dimensions. Technical report, multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol CISUC Technical Report TR-2017-001, University of Coimbra Comput 6(2):182–197 Hakanen J, Chugh T, Sindhya K, Jin Y, Miettinen K (2016) Dellnitz M, Schu ¨ tze O, Hestermeyer T (2005) Covering pareto sets by Connections of reference vectors and different types of prefer- multilevel subdivision techniques. J Optim Theory Appl ence information in interactive multiobjective evolutionary 124(1):113–136 algorithms. In: SSCI, pp 1–8. IEEE ´ ¨ Ehrgott M (2005) Multicriteria optimization. Springer, Berlin Hernandez V A S, Schutze O, Emmerich M (2014) Hypervolume maximization via set based Newton‘s method. In: EVOLVE-a Ehrgott M (2012) Vilfredo Pareto and multi-objective optimization. Optimization stories. Journal der Deutschen Mathematiker- bridge between probability, set oriented numerics, and evolu- Vereiningung, Extra 21:447–453 tionary computation V, pp 15–28. Springer Ehrgott M, Gandibleux X (2000) A survey and annotated bibliogra- Hillermeier C (2001) Nonlinear multiobjective optimization: a phy of multiobjective combinatorial optimization. OR Spectr generalized homotopy approach, vol 135. Springer, Berlin 22(4):425–460 Hopfe CJ, Emmerich MT, Marijt R, Hensen J (2012) Robust multi- Emmerich M, Beume N, Naujoks B (2005). An EMO algorithm using criteria design optimisation in building design. In: Proceedings the hypervolume measure as selection criterion. In: EMO, of building simulation and optimization, Loughborough, UK, pp 118–125 123 M. T. M. Emmerich , A. H. Deutz Huband S, Hingston P, While L, Barone L (2003) An evolution Mateo P, Alberto I (2012) A mutation operator based on a Pareto strategy with probabilistic mutation for multi-objective optimi- ranking for multi-objective evolutionary algorithms. J Heuristics sation. In: The 2003 congress on evolutionary computation, 18(1):53–89 2003. CEC‘03. volume 4, pp 2284–2291. IEEE Miettinen K (2012) Nonlinear multiobjective optimization, vol 12. Hupkens I, Emmerich M (2013) Logarithmic-time updates in SMS- Springer, Berlin EMOA and hypervolume-based archiving. In: EVOLVE—a Miettinen K, Ma ¨kela ¨ MM (2000) Interactive multiobjective opti- bridge between probability, set oriented numerics, and evolu- mization system WWW-NIMBUS on the internet. Comput OR tionary computation IV, volume 227 of advances in intelligent 27(7–8):709–723 systems and computing. Springer, Heidelberg Reyes-Sierra M, Coello Coello C (2006) Multi-objective particle Igel C, Suttorp T, Hansen N (2006) Steady-state selection and swarm optimizers: a survey of the state-of-the-art. Int J Comput efficient covariance matrix update in the multi-objective CMA- Intel Res 2(3):287–308 ES. In: EMO, volume 4403 of lecture notes in computer science, Riquelme N, Von Lu ¨ cken C, Bara ´n B (2015) Performance metrics in pp 171–185. Springer multi-objective optimization. In: 2015 XLI Latin American Ishibuchi H, Imada R, Setoguchi Y, Nojima Y (2017) Reference point computing conference. IEEE specification in hypervolume calculation for fair comparison and Robic T, Filipic B (2005) DEMO: differential evolution for multi- efficient search. In: Proceedings of the genetic and evolutionary objective optimization. In: EMO, volume 3410 of lecture notes computation conference, GECCO ‘17, pp 585–592, New York, in computer science, pp 520–533. Springer NY, USA. ACM Rosenthal S, Borschbach M (2017) Design perspectives of an Ishibuchi H, Murata T (1996) Multi-objective genetic local search evolutionary process for multi-objective molecular optimization. algorithm. In: Proceedings of IEEE international conference on In EMO, volume 10173 of lecture notes in computer science, evolutionary computation, 1996. pp 119–124. IEEE pp 529–544. Springer Jaszkiewicz A (2002) On the performance of multiple-objective Rudolph G, Agapie A (2000) Convergence properties of some multi- genetic local search on the 0/1 knapsack problem-a comparative objective evolutionary algorithms. In: Proceedings of the 2000 experiment. IEEE Trans Evol Comput 6(4):402–412 Congress on evolutionary computation, 2000, volume 2, Jaszkiewicz A, Słowinski R (1999) The ‘Light Beam Search’ pp 1010–1016. IEEE approach-an overview of methodology applications. Eur J Oper Rudolph G, Schu ¨ tze O, Grimme C, Domınguez-Medina C, Trautmann Res 113(2):300–314 H (2016) Optimal averaged Hausdorff archives for bi-objective Jin Y, Okabe T, Sendho B (2001) Adapting weighted aggregation for problems: theoretical and numerical results. Comp Opt Appl multiobjective evolution strategies. In: Evolutionary multi- 64(2):589–618 criterion optimization, pp 96–110. Springer Rueffler C et al (2006) Traits traded off. techreport, Institute of Kerschke P, Wang H, Preuss M, Grimme C, Deutz A H, Trautmann Biology Leiden, Theoretical Biology; Faculty of Mathematics H, Emmerich M (2016) Towards analyzing multimodality of and Natural Sciences; Leiden University continuous multiobjective landscapes. In: PPSN, volume 9921 of Schaffer JD (1985) Multiple objective optimization with vector lecture notes in computer science, pp 962–972. Springer evaluated genetic algorithm. In: Proceeding of the first interna- Knowles J (2006) Parego: a hybrid algorithm with on-line landscape tional conference of genetic algorithms and their application, approximation for expensive multiobjective optimization prob- pp 93–100 lems. IEEE Trans Evol Comput 10(1):50–66 Schu ¨ tze O, Dell’Aere A, Dellnitz M (2005) On continuation methods Knowles J, Corne D, Deb K (2007) Multiobjective problem solving for the numerical treatment of multi-objective optimization from nature. Springer, Berlin problems. In: Dagstuhl Seminar Proceedings. Schloss Dagstuhl- Knowles JD, Corne DW (2000) Approximating the nondominated Leibniz-Zentrum fu ¨ r Informatik front using the Pareto archived evolution strategy. Evol Comput Schu ¨ tze O, Martı ´n A, Lara A, Alvarado S, Salinas E, Coello CAC 8(2):149–172 (2016) The directed search method for multi-objective memetic Knowles JD, Corne DW, Fleischer M (2003) Bounded archiving algorithms. Comp Opt Appl 63(2):305–332 using the Lebesgue measure. In: The 2003 congress on Sterck F, Markesteijn L, Schieving F, Poorter L (2011) Functional evolutionary computation, 2003. CEC‘03. volume 4, traits determine trade-offs and niches in a tropical forest pp 2490–2497. IEEE community. Proc Nat Acad Sci 108(51):20627–20632 Krantz S, Parks H (2003) Implicit function theorem: history, theory, Trautmann H, Wagner T, Brockhoff D (2013) R2-EMOA: focused and applications. Springer, New York multiobjective search using R2-Indicator-Based selection. In Kuhn HW, Tucker AW (1951) Nonlinear programming. In: Proceed- LION, volume 7997 of Lecture Notes in Computer Science, ings of 2nd Berkeley symposium. Berkeley, Berkeley and Los pages 70–74. Springer Angeles. University of California Press, pp 481–492 van der Horst E, Marques-Gallego P, Mulder-Krieger T, van Kung H-T, Luccio F, Preparata FP (1975) On finding the maxima of a Veldhoven J, Kruisselbrink J, Aleman A, Emmerich MT, set of vectors. J ACM (JACM) 22(4):469–476 Brussee J, Bender A, IJzerman AP (2012) Multi-objective Kursawe F (1990) A variant of evolution strategies for vector evolutionary design of adenosine receptor ligands. J Chem Inf optimization. In: PPSN, volume 496 of lecture notes in computer Model 52(7):1713–1721 science, pp 193–197. Springer Wagner M, Bringmann K, Friedrich T, Neumann F (2015) Efficient Laumanns M, Rudolph G, Schwefel H (1998) A spatial predator-prey optimization of many objectives by approximation-guided evolution. Eur J Oper Res 243(2):465–479 approach to multi-objective optimization: a preliminary study. In: PPSN, volume 1498 of lecture notes in computer science, Wagner T, Trautmann H, Naujoks B (2009) OCD: Online conver- pp 241–249. Springer gence detection for evolutionary multi-objective algorithms Li B, Li J, Tang K, Yao X (2015) Many-objective evolutionary based on statistical testing. In Ehrgott, M., Fonseca, C. M., algorithms: a survey. ACM Comput Surv 48(1):13:1–13:35 Gandibleux, X., Hao, J.-K., and Sevaux, M., editors, Evolution- Li L, Yevseyeva I, Fernandes V B, Trautmann H, Jing N, Emmerich ary Multi-Criterion Optimization: 5th International Conference, M (2017) Building and using an ontology of preference-based EMO 2009, Nantes, France, April 7-10, 2009. Proceedings, multiobjective evolutionary algorithms. In EMO, volume 10173 pages 198–215. Springer Berlin Heidelberg, Berlin, Heidelberg of lecture notes in computer science, pp 406–421. Springer 123 A tutorial on multiobjective optimization: fundamentals and evolutionary methods Wang H, Deutz A H, Back T, Emmerich M (2017) Hypervolume Zilinskas A (2013) On the worst-case optimal multi-objective global indicator gradient ascent multi-objective optimization. In EMO, optimization. Optimization Letters 7(8):1921–1928 volume 10173 of Lecture Notes in Computer Science, pages Zitzler E, Kunzli S (2004) Indicator-based selection in multiobjective 654–669. Springer search. In PPSN, volume 3242 of Lecture Notes in Computer Wang P, Emmerich M, Li R, Tang K, Ba ¨ck T, Yao X (2015) Convex Science, pages 832–842. Springer hull-based multiobjective genetic programming for maximizing Zitzler E, Laumanns M, Bleuler S (2004) A tutorial on evolutionary receiver operating characteristic performance. IEEE Trans Evol multiobjective optimization. Metaheuristics for multiobjective Comput 19(2):188–200 optimisation, pages 3–37 Yevseyeva I, Basto-Fernandes V, Ruano-Orda ´sD, Me ´ndez JR (2013) Zitzler E, Laumanns M, Thiele L (2001) SPEA2: Improving the Optimising anti-spam filters with evolutionary algorithms. strength pareto evolutionary algorithm. TIK – Report 103, Expert Syst Appl 40(10):4010–4021 Eidgeno ¨ ssische Technische Hochschule Zu ¨ rich (ETH), Institut Yevseyeva I, Guerreiro A P, Emmerich M T M, Fonseca C M (2014) fu ¨ r Technische Informatik und Kommunikationsnetze (TIK) A portfolio optimization approach to selection in multiobjective Zitzler E, Thiele L (1999) Multiobjective evolutionary algorithms: a evolutionary algorithms. In PPSN, volume 8672 of Lecture comparative case study and the strength Pareto approach. IEEE Notes in Computer Science, pages 672–681. Springer Trans. Evolutionary Computation 3(4):257–271 Zhang Q, Li H (2007) MOEA/D: A multiobjective evolutionary Zitzler E, Thiele L, Laumanns M, Fonseca CM, Da Fonseca VG algorithm based on decomposition. IEEE Trans Evol Comput (2003) Performance assessment of multiobjective optimizers: An 11(6):712–731 analysis and review. IEEE Trans Evol Comput 7(2):117–132

Journal

Natural ComputingSpringer Journals

Published: May 31, 2018

References

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off