TY - JOUR AU - Yuan, Ya-xiang AB - Abstract Quadratically constrained quadratic programming (QCQP) appears widely in engineering applications such as wireless communications and networking and multiuser detection with examples like the MAXCUT problem and boolean optimization. A general QCQP problem is NP-hard. We propose a penalty formulation for the QCQP problem based on semidefinite relaxation. Under suitable assumptions we show that the optimal solutions of the penalty problem are the same as those of the original QCQP problem if the penalty parameter is sufficiently large. Then, to solve the penalty problem, we present a proximal point algorithm and an update rule for the penalty parameter. Numerically, we test our algorithm on two well-studied QCQP problems. The results show that our proposed algorithm is very effective in finding high-quality solutions. 1. Introduction In this paper we consider the following quadratically constrained quadratic programming (QCQP) problem: $$\begin{equation} \begin{array}{cl} \min\limits_{x\in \mathbb{R}^n} & x^{\textrm T}Q_0x+2g_0^{\textrm T}x\\ \text{s.t.} & x^{\textrm T}Q_ix+2g_i^{\textrm T}x+c_i= 0,\quad i=1,\ldots,m_e,\\[2pt] & x^{\textrm T}Q_ix+2g_i^{\textrm T}x+c_i\geqslant 0,\quad i=m_e+1,\ldots,m, \end{array} \end{equation}$$(1.1) where |$m$| is the number of constraints, |$Q_0\in \mathbb S^{n\times n}$| and |$Q_i \in \mathbb S^{n\times n}$||$(i=1,\ldots ,m)$| are real constant |$n$| by |$n$| symmetric matrices, |$g_0\in \mathbb R^n $| and |$g_i \in \mathbb R^n$||$(i=1,\ldots ,m)$| are |$n$| dimensional real vectors and |$c_i \in \mathbb R$||$(i=1,\ldots ,m)$| are real scalars. Concerning (1.1) Vavasis (1990) proved that the general QCQP is NP-hard. Bar-On & Grasse (1994) derived optimality conditions for QCQP. Peng & Yuan (1997) further analyze the conditions when the number of constraints is two. Wang & Xia (2015) focused on the case that quadratic terms are uniform. Some special instances possess hidden convexity (see Ben-Tal & Teboulle, 1996; Ye & Zhang, 2003; Beck & Eldar, 2006; Wu et al., 2007; Ai & Zhang, 2009). Intuitively, the difficulty of solving (1.1) lies not only in the nonconvexity of the objective but also in that of the constraints. The feasible region may be nonconvex or even disconnected in some problems. For example, the boolean quadratic programming (BQP) problem is given by Luo et al. (2010): $$\begin{equation} \begin{array}{cl} \min\limits_{x\in \mathbb{R}^n} & x^{\textrm T}Q_0x\\ \text{s.t.} & x_i^2=1,\quad i=1,\ldots,n. \end{array} \end{equation}$$(1.2) The constraint |$x_i^2=1$| implies that |$x_i\in \{1,-1\}$|⁠. In this case the feasible region consists of |$2^n$| isolated points. On the other hand the problem of finding a feasible point of (1.1) may be NP-hard itself. An example of the latter is the sensor location problem: with |$V_s=\{1,\ldots ,n\}$| and |$V_a=\{n+1,\ldots ,n+m\}$| denoting the positions of sensors and anchors, respectively, |$E_{ss}$| representing the edges between pairs of sensors and |$E_{sa}$| representing the edges between sensors and anchors, let the positions of all the anchors and the Euclidean distances in |$E_{ss}$| and |$E_{sa}$| be specified; then the aim is to find the positions of all the sensors. The sensor location problem can be written as $$\begin{equation} \begin{array}{cl} \textrm{find} & x_1,\ldots,x_n\in\mathbb R^2\\[2pt] \text{s.t.} & \|x_i-x_k\|^2=d_{ik}^2,\quad (i,k)\in E_{ss},\\[2pt] & \|a_i-x_k\|^2=\bar d_{ik}^2,\quad (i,k)\in E_{sa}, \end{array} \end{equation}$$(1.3) that is, a problem of finding a feasible point with quadratic constraints, which has been shown to be NP-hard (Saxe, 1979). 1.1 Related works Algorithms for solving QCQP can in general be divided into two categories, namely, exact and approximate. In this paper, since we are interested in methods that can quickly find approximate solutions, we do not discuss exact methods. Readers can find studies on the latter, such as different branch and bound methods (Bar-On & Grasse, 1994; Dinh & Le Thi, 1998; Raber, 1998; Cartis et al., 2015) and Lasserre’s method (Lasserre, 2001). To overcome the nonconvexity of the feasible region of QCQP, relaxation techniques have been used in the literature. The reconstruction-linearization technique (RLT) (Sherali, 2007) is for QCQP with bounded constraints. Without loss of generality we assume that the lower bound of each component of the variable is 0 and the upper bound is 1. The key point of RLT is to add a matrix variable |$X\in \mathbb S^{n\times n}$| to represent |$xx^T$|⁠. Using the fact that |$x\geqslant 0$| and |$e-x\geqslant 0$| we have $$\begin{equation*} \begin{array}{ll} X_{ij}\geqslant 0, & i,j=1,\ldots,n,\\[2pt] X_{ij}-x_j-x_i\geqslant -1, & i,j=1,\ldots,n,\\[2pt] X_{ij}-x_i\leqslant 0, & i,j=1,\ldots,n. \end{array} \end{equation*}$$ RLT transforms all the quadratic constraints into linear ones and adds all the above constraints. Then one can obtain a relaxed problem as follows: $$\begin{equation*} \begin{array}{cl} \min\limits_{x\in \mathbb{R}^n,X\in\mathbb S^n} & Q_0\cdot X+2g_0^{\textrm T}x\\[2pt] \text{s.t.} & Q_i\cdot X+2g_i^{\textrm T}x+c_i= 0,\quad i=1,\ldots,m_e,\\[2pt] & Q_i\cdot X+2g_i^{\textrm T}x+c_i\geqslant 0,\quad i=m_e+1,\ldots,m,\\[2pt] & X\geqslant 0,\\[2pt] & ee^{\textrm T}-ex^{\textrm T}-xe^{\textrm T}+X\geqslant 0,\\[2pt] & xe^{\textrm T}-X\geqslant 0, \end{array} \end{equation*}$$ where ‘|$\cdot $|’ is the Frobenius inner product and |$e$| is the vector with all elements equal to 1. This relaxed problem is a linear programming problem, which can be solved in various ways. Positive semidefinite relaxation (SDR) (Fujie & Kojima, 1997) uses the same approach to relax QCQP to a convex problem in matrix space. More specifically, SDR relaxes the constraint |$X=xx^T$| to |$X-xx^{\textrm T}\succeq 0$|⁠, where ‘|$\succeq 0$|’ means positive semidefinite. Then QCQP is relaxed to a semidefinite programming problem (SDP) $$\begin{equation} \begin{array}{cl} \min\limits_{x\in \mathbb{R}^n,X\in\mathbb S^n} & Q_0\cdot X+2g_0^{\textrm T}x\\[2pt] \text{s.t.} & Q_i\cdot X+2g_i^{\textrm T}x+c_i=0,\quad i=1,\ldots,m_e,\\[2pt] & Q_i\cdot X+2g_i^{\textrm T}x+c_i\geqslant 0,\quad i=m_e+1,\ldots,m,\\[2pt] & X\succeq xx^{\textrm T}, \end{array} \end{equation}$$(1.4) where |$X\succeq xx^{\textrm T}$| can be regarded as $$\left ( \begin{smallmatrix} 1 & x^{\textrm T} \\[2pt] x & X \end{smallmatrix}\right )\succeq 0. $$ SDR has been widely used in practice (Luo et al., 2010). For some special QCQP the approximation ratio, a standard measure of the quality of the approximation, has been analyze (Ma et al., 2002; Steingrimsson et al., 2003). In particular for BQP in (1.2), when |$Q_0\geqslant 0$| the ratio is 1, which means that the solution of the relaxed problem is that of the original problem (Luo et al., 2010). More extended SDR-type relaxations are discussed by Bao et al. (2011). Comparison between RLT and SDR can be found in Anstreicher (2009). One can also find more studies on approximation methods in Beck et al. (2010) and Lu et al. (2011). 1.2 Our contribution In this paper we take a different angle to interpret the SDR. To be more precise we regard SDR as a penalty function method with the penalty term being zero. By showing that the penalty function is exact for sufficiently large penalty parameters we argue that it is reasonable to study approaches based on the penalty formulation with a nonzero penalty term. Accordingly, we propose an algorithm to solve the penalty problem, together with an update rule for the penalty parameter. Numerical tests are presented to demonstrate that the proposed algorithm achieves our main target in improving the quality of solutions, in comparison with SDR, at the expense of possibly longer computational time. Moreover, in a majority of the testing cases as quantified in the reported data, our algorithm is able to find the global solutions of the prescribed problems. 1.3 Organization This paper is organized as follows: in Section 2 we propose our new penalty problem and prove that the penalty is exact. In Section 3 we give our algorithm to solve the penalty problem. In Section 4 we use numerical tests to illustrate the performance of our proposed algorithm. Finally, Section 5 contains a summary of the main contributions of the paper and a perspective on some outstanding challenges. 2. Penalty problem The QCQP problem (1.1) can be transformed into the following equivalent problem: $$\begin{equation} \begin{array}{cl} \min\limits_{x\in \mathbb{R}^n} & \begin{pmatrix} 1 & x^{\textrm T} \\ x & xx^{\textrm T} \\ \end{pmatrix}\cdot \begin{pmatrix} 0 & g_0^{\textrm T} \\ g_0 & Q_0 \\ \end{pmatrix}\\[10pt] \text{s.t.} & \begin{pmatrix} 1 & x^{\textrm T} \\ x & xx^{\textrm T} \\ \end{pmatrix}\cdot \begin{pmatrix} c_i & g_i^{\textrm T} \\ g_i & Q_i \\ \end{pmatrix}= 0,\quad i=1,\ldots,m_e,\\[10pt] & \begin{pmatrix} 1 & x^{\textrm T} \\ x & xx^{\textrm T} \\ \end{pmatrix}\cdot \begin{pmatrix} c_i & g_i^{\textrm T} \\ g_i & Q_i \\ \end{pmatrix}\geqslant 0,\quad i=m_e+1,\ldots,m. \end{array} \end{equation}$$(2.1) We can add a matrix variable |$Z$| to the above problem with an additional constraint |$Z=0$|⁠; then the QCQP problem (1.1) is equivalent to $$\begin{equation*} \begin{array}{cl} \min\limits_{x\in \mathbb{R}^n, Z\in \mathbb S^n} & \begin{pmatrix} 1 & x^{\textrm T} \\ x & xx^{\textrm T}+Z \\ \end{pmatrix}\cdot \begin{pmatrix} 0 & g_0^{\textrm T} \\ g_0 & Q_0 \\ \end{pmatrix}\\[10pt] \text{s.t.} & \begin{pmatrix} 1 & x^T \\ x & xx^{\textrm T}+Z \\ \end{pmatrix}\cdot \begin{pmatrix} c_i & g_i^{\textrm T} \\ g_i & Q_i \\ \end{pmatrix}= 0,\quad i=1,\ldots,m_e,\\[10pt] & \begin{pmatrix} 1 & x^{\textrm T} \\ x & xx^{\textrm T}+Z \\ \end{pmatrix}\cdot \begin{pmatrix} c_i & g_i^{\textrm T} \\ g_i & Q_i \\ \end{pmatrix}\geqslant 0,\quad i=m_e+1,\ldots,m,\\[10pt] & Z=0. \end{array} \end{equation*}$$ Here, |$Z$| can be regarded as |$X-xx^{\textrm T}$| in Section 1. Notice that the constraint |$Z=0$| can be reformulated as two constraints, |$Z\succeq 0$| and |$Z\preceq 0$|⁠; then the above problem is equivalent to $$\begin{equation} \begin{array}{cl} \min\limits_{x\in \mathbb{R}^n, Z\in \mathbb S^n} & \begin{pmatrix} 1 & x^{\textrm T} \\ x & xx^{\textrm T}+Z \\ \end{pmatrix}\cdot \begin{pmatrix} 0 & g^{\textrm T} \\ g & Q \\ \end{pmatrix}\\[10pt] \text{s.t.} & \begin{pmatrix} 1 & x^{\textrm T} \\ x & xx^{\textrm T}+Z \\ \end{pmatrix}\cdot \begin{pmatrix} c_i & g_i^{\textrm T} \\ g_i & Q_i \\ \end{pmatrix}\geqslant 0,\quad i=1,\ldots,m,\\[10pt] & Z\succeq 0, \\[5pt] &Z\preceq 0. \end{array} \end{equation}$$(2.2) The positive semidefiniteness of |$Z$| and of $$\left(\begin{smallmatrix} 1 & x^{\textrm T} \\[2pt] x & xx^{\textrm T}+Z \end{smallmatrix}\right)$$ are equivalent, as shown in the following lemma. Lemma 2.1 For any real vector |$x\in \mathbb R^n$| and real symmetric matrix |$Z\in \mathbb S^n$|⁠, |$Z\succeq 0$| is equivalent to $$\left(\begin{smallmatrix} 1 & x^{\textrm T} \\[1pt] x & xx^T+Z \end{smallmatrix}\right)\succeq 0$$ ⁠. Proof. The matrix $$\left(\begin{smallmatrix} 1 & x^{\textrm T} \\[1pt] x & xx^T+Z \end{smallmatrix}\right)$$ becomes $$\left (\begin{smallmatrix} 1 & 0 \\[2pt] 0 & Z \end{smallmatrix} \right )$$ under a congruence transformation. Then by Sylvester’s law of inertia we have that $$\left(\begin{smallmatrix} 1 & x^{\textrm T} \\[1pt] x & xx^{\textrm T}+Z\end{smallmatrix}\right)\succeq 0$$ if and only if |$Z\succeq 0$|⁠. Now using Lemma 2.1, SDR (1.4) can be viewed as a relaxation of (2.2) by dropping the constraint |$Z\preceq 0$|⁠. We would like to add a penalty term |$P\cdot Z$| to the objective function. Here |$P$| is a positive definite matrix to be properly chosen so that the effect of |$P\cdot Z$| is to control the sizes of the positive eigenvalues of |$Z$|⁠, if any. For more discussion on the choices of |$P$| we refer to Section 3. Thus, the new penalty formulation of the original problem is given as $$\begin{equation} \begin{array}{cl} \min\limits_{x\in \mathbb{R}^n, Z\in \mathbb S^n} & \begin{pmatrix} 1 & x^{\textrm T} \\ x & xx^{\textrm T}+Z \\ \end{pmatrix}\cdot \begin{pmatrix} 0 & g_0^{\textrm T} \\ g_0 & Q_0 \\ \end{pmatrix} +P\cdot Z\\ \\ \text{s.t.} & \begin{pmatrix} 1 & x^{\textrm T} \\ x & xx^{\textrm T}+Z \\ \end{pmatrix}\cdot \begin{pmatrix} c_i & g_i^{\textrm T} \\ g_i & Q_i \\ \end{pmatrix}= 0,\quad i=1,\ldots,m_e,\\ \\ & \begin{pmatrix} 1 & x^{\textrm T} \\ x & xx^{\textrm T}+Z \\ \end{pmatrix}\cdot \begin{pmatrix} c_i & g_i^{\textrm T} \\ g_i & Q_i \\ \end{pmatrix}\geqslant 0,\quad i=m_e+1,\ldots,m,\\ \\ & Z\succeq 0. \end{array} \end{equation}$$(2.3) In the special case where |$P=0$|⁠, problem 2.3 becomes SDR. 2.1 Notation Before we analyze the properties of the penalty problem (2.3) we first define some notation to simplify the description of the problem. Let $$\begin{align*} f(x):=&\ x^{\textrm T}Q_0x+2g_0^{\textrm T}x,\\ F:=&\ \Bigg\{x\ \Bigg| \begin{pmatrix} 1 & x^{\textrm T} \\ x & xx^{\textrm T} \\ \end{pmatrix}\cdot \begin{pmatrix} c_i & g_i^{\textrm T} \\ g_i & Q_i \\ \end{pmatrix}\geqslant 0,\quad i=1,\ldots,m_e,\notag\\ &\qquad \begin{pmatrix} 1 & x^{\textrm T} \\ x & xx^{\textrm T} \\ \end{pmatrix}\cdot \begin{pmatrix} c_i & g_i^{\textrm T} \\ g_i & Q_i \\ \end{pmatrix}\geqslant 0,\quad i=m_e+1,\ldots,m\Bigg\},\\ G:=&\ \Bigg\{(x,Z)\ \Bigg|\ Z\succeq 0,\begin{pmatrix} 1 & x^{\textrm T} \\ x & xx^{\textrm T}+Z \\ \end{pmatrix}\cdot \begin{pmatrix} c_i & g_i^{\textrm T} \\ g_i & Q_i \\ \end{pmatrix}= 0,\quad i=1,\ldots,m_e,\notag\\ &\hskip65pt\begin{pmatrix} 1 & x^{\textrm T} \\ x & xx^{\textrm T}+Z \\ \end{pmatrix}\cdot \begin{pmatrix} c_i & g_i^{\textrm T} \\ g_i & Q_i \\ \end{pmatrix}\geqslant 0,\quad i=m_e+1,\ldots,m\Bigg\},\\ \mathcal L\,(x,Z,P):=&\ f(x)+Q_0\cdot Z+P\cdot Z. \end{align*}$$ Then, |$F$| and |$G$| are related by $$\begin{equation} F=\{x\ |\ (x,0)\in G\}. \end{equation}$$(2.4) Using the above notation we rewrite the QCQP problem (1.1) as $$\begin{equation} \begin{array}{cl} \min\limits_x & f(x)\\ \text{s.t.}& x\in F. \end{array} \end{equation}$$(2.5) We also rewrite the penalty problem (2.3) as $$\begin{equation} \min\limits_{(x,Z)\in G} \mathcal L\,(x,Z,P). \end{equation}$$(2.6) One can view |$\mathcal L$| as the Lagrangian function corresponding to the constraint |$Z\preceq 0$|⁠, with |$P$| being the dual variable. 2.2 Exact penalty In this subsection, we prove that the penalty problem (2.6) is exact, which means that its optimal solution is also an optimal solution of QCQP problem (2.5), when all the eigenvalues of |$P$| are larger than a certain constant. These results can be found in Theorems 2.4 and 2.5. First, we give the following lemma on the relationship between problem (2.6) and |$P$|⁠, which reveals a monotone dependence. Lemma 2.2 |$\min\limits _{(x,Z)\in G}\mathcal L\,(x,Z,P_1)\leqslant \min \limits _{(x,Z)\in G}\mathcal L\,(x,Z,P_2)\leqslant \min \limits _{x\in F}\,f(x)$||$\forall\, P_1\preceq P_2$|⁠. Proof. Due to (2.4), for any |$P$| we have $$\begin{equation*} \min\limits_{(x,Z)\in G}\mathcal L\,(x,Z,P)\leqslant \min\limits_{(x,0)\in G}\mathcal L\,(x,0,P)=\min\limits_{x\in F}\,f(x). \end{equation*}$$ For any |$P_1\preceq P_2$| we have |$P_2-P_1\succeq 0$|⁠, which implies |$(P_2-P_1)\cdot Z\geqslant 0$| for all |$Z\succeq 0$|⁠. Consequently, |$P_2\cdot Z\geqslant P_1\cdot Z$|⁠. Thus we have $$\begin{equation*}\mathcal L\,(x,Z,P_1)\leqslant \mathcal L\,(x,Z,P_2) \quad \forall\, \; Z\succeq 0. \end{equation*}$$ Then $$\begin{equation*}\min\limits_{(x,Z)\in G}\mathcal L\,(x,Z,P_1)\leqslant\min\limits_{(x,Z)\in G}\mathcal L\,(x,Z,P_2),\end{equation*}$$ which completes our proof. Suppose that |$x^\star $| is a local minimizer of problem (2.5). For simplification we denote each of the constraints of problem (2.5) by $$\begin{align} &c_i(x):=x^{\textrm T}Q_ix+2g_i^{\textrm T}x+c_i= 0, \quad i=1,\ldots,m_e,\notag\\ &c_i(x):=x^{\textrm T}Q_ix+2g_i^{\textrm T}x+c_i\geqslant 0, \quad i=m_e+1,\ldots,m.\notag \end{align}$$ In the following we demonstrate the exact penalty via Karush–Kuhn–Tucker (KKT) conditions. In order for a minimum point to satisfy KKT conditions the problem should satisfy some regularity constraint qualifications. Peterson (1973) reviewed various constraint qualifications and stated their relationships. We choose the Abadie constraint qualification (ACQ) described below for our use (Abadie, 1966), which is considered to be pretty mild. Let the set of sequence feasible directions be defined by (see Sun & Yuan, 2006) $$\begin{equation*} \textrm{SFD}(x)=\left\{d|\ \exists\{d^k\}\rightarrow d,\{t_k\}\downarrow 0:x+t_kd^k\in F \ \forall\, k\right\} \end{equation*}$$ and the set of linearization feasible directions be defined by $$\begin{align} \textrm{LFD}(x)=\Big\{d| &\nabla c_i(x)^{\textrm T}d=0,\ i=1,\ldots,m_e,\notag\\ &\nabla c_j(x)^{\textrm T}d\geqslant 0,\ c_j(x)=0, \ j=m_e+1,\ldots,m \Big\}.\notag \end{align}$$ The following assumption ensures that a minimum point satisfies KKT conditions. Assumption 2.3 (ACQ) |$x^\star $| is a local minimizer of problem (2.5), that is, $$\begin{equation*}\textrm{SFD}(x^\star)=\textrm{LFD}(x^\star).\end{equation*}$$ We then state our theorems. Theorem 2.4 Suppose that |$x^\star $| is a local minimizer of QCQP problem (2.5) and satisfies Assumption 2.3 and the second-order sufficient condition, which means that for any direction |$d\in C:=\textrm{SFD}(x^\star )$|⁠, $$\begin{equation} d^T\left(Q_0-\sum_{i=1}^m \alpha_i^\star Q_i\right)d>0, \end{equation}$$(2.7) where |$\alpha ^\star $| is the multiplier corresponding to |$x^\star $|⁠. Then there exists |$P^\star \succeq 0$| such that |$(x^\star , 0)$| is a strict local minimizer of (2.6) for all |$P\succ P^\star $|⁠. Proof. Since |$x^\star $| is the local minimizer of problem (2.5), under Assumption 2.3, |$(x^\star , \alpha ^\star )$| satisfies the following first-order necessary conditions: $$\begin{equation*} \begin{array}{l} \nabla f(x^\star)-\sum_{i=1}^m \alpha_i^\star\nabla c_i(x^\star)=0,\\[3pt] c_i(x^\star)= 0,\quad i=1,\ldots,m_e,\\[3pt] c_i(x^\star)\geqslant 0,\quad \alpha_i^\star\geqslant 0,\quad i=m_e+1,\ldots,m,\\[3pt] \sum_{i=m_e+1}^m \alpha_ic_i(x^\star)=0. \end{array} \end{equation*}$$ Let |$M=Q_0-\sum _{i=1}^m \alpha _i^\star Q_i$|⁠. The second-order sufficient condition (2.7) implies that there is a constant |$\delta>0$|⁠, which makes the following inequality satisfied for all |$d\in C$|⁠: $$\begin{equation} d^{\textrm T}Md\geqslant \delta\|d\|^2. \end{equation}$$(2.8) Otherwise, there exists a sequence |$\{d_k\}$| such that |$d^{\textrm T}_kMd_k/\|d_k\|^2\rightarrow 0$|⁠. Let |$\hat d_k=d_k/\|d_k\|$|⁠; then |$\hat d^{\textrm T}_kM\hat d_k\rightarrow 0$| and |$\|\hat d_k\|=1$|⁠. Since |$C$| is a closed set there exists a |$\bar d\in C$| such that |$\|\bar d\|=1$| and |$\bar d^{\textrm T}_kM\bar d_k=0$|⁠. This is in contradiction with (2.7). With regard to the QCQP in matrix form (2.2), its primal and dual solution |$(x,Z,\alpha ,S_1,S)$| at |$x=x^\star , Z=Z^\star =0$| satisfies $$\begin{equation*} Q_0-\sum \alpha_i^\star Q_i-S_1^\star+S^\star=0,\quad S_1^\star\succeq 0,\quad S^\star\succeq 0, \end{equation*}$$ where |$S_1^\star $| is the multiplier corresponding to |$Z\succeq 0$| and |$S^\star $| is the multiplier corresponding to |$Z\preceq 0$|⁠. We consider the following problem: $$\begin{equation} \begin{array}{cl} \min\limits_{x,Z} & f(x)+Q_0\cdot Z+S^\star \cdot Z+\frac{\sigma_1}{2}\|Z\|_F^2\\[3pt] \text{s.t.} & c_i(x)+Q_i\cdot Z= 0,\quad i=1,\ldots,m_e,\\[3pt] & c_i(x)+Q_i\cdot Z\geqslant 0,\quad i=m_e+1,\ldots,m,\\[3pt] & Z\succeq 0. \end{array} \end{equation}$$(2.9) Here we introduce problem (2.9) as the intermediate tool of this proof. We will prove that |$(x^\star , 0)$| is its strict local minimum by assigning a large number to the coefficient of the quadratic term. Then we will prove that |$(x^\star , 0)$| is a strict local minimum of problem (2.6) by using the fact that the square of the Frobenius norm is smaller than the linear function in the neighbourhood of 0. First we need to prove that when |$\sigma _1$| is greater than some constant, |$(x^\star , 0) $| is a local strictly minimal point of problem (2.9). The Lagrangian function of problem (2.9) is $$\begin{equation*} \bar{\mathcal L}\,(x,Z,\alpha,S_1):=f(x)+Q_0\cdot Z+S^\star\cdot Z+\frac{\sigma_1}{2}\|Z\|_F^2-\sum_{i=1}^m \alpha_i(c_i(x)+Q_i\cdot Z)-S_1\cdot Z. \end{equation*}$$ Moreover, its KKT conditions are given by $$\begin{equation*} \begin{array}{l} \nabla f(x)-\sum_{i=1}^m \alpha_i\nabla c_i(x)=0,\\[3pt] Q_0+S^\star +\sigma_1Z-\sum_{i=1}^m \alpha_iQ_i-S_1=0,\\[3pt] c_i(x)+Q_i\cdot Z= 0,\quad i=1,\ldots,m_e,\\[3pt] c_i(x)+Q_i\cdot Z\geqslant 0,\quad \alpha_i\geqslant 0,\quad i=m_e+1,\ldots, m,\\[3pt] \sum_{i=m_e+1}^m \alpha_i(c_i(x)+Q_i\cdot Z)=0,\\[3pt] Z\succeq 0,\quad S_1\succeq 0,\quad Z\cdot S_1=0. \end{array} \end{equation*}$$ If we set |$x=x^\star $|⁠, |$Z=0$|⁠, |$\alpha =\alpha ^\star $|⁠, |$S_1=S_1^\star $|⁠, the above conditions are satisfied. The second derivatives of |$\bar{\mathcal L}\, (x, Z, \alpha , S_1) $| corresponding to primal variables are $$\begin{equation*} \begin{array}{rcl} \nabla^2_{xx}\bar{\mathcal L}&=&Q_0-\sum\alpha_i Q_i,\\ \nabla^2_{x~\textrm{vec}(Z)}\bar{\mathcal L}&=&0,\\ \nabla^2_{\textrm{vec}(Z)~\textrm{vec}(Z)}\bar{\mathcal L}&=&\sigma_1 I, \end{array} \end{equation*}$$ where |$I$| is the identity matrix and |$vec(Z)$| represents the matrix |$Z$| in vector form. Then for any feasible direction |$(d,\textrm{vec}(D))$| at |$(x^\star , 0)$| in problem (2.9) we have $$\begin{equation*} (d,\textrm{vec}(D))^{\textrm T}\left(\nabla^2_{(x,\textrm{vec}(Z))(x,\textrm{vec}(Z))}\bar{\mathcal L}\,\right)(d,\textrm{vec}(D))=d^{\textrm T}Md+\sigma_1\|D\|_F^2. \end{equation*}$$ By the definition of distance and projection from Rockafellar (2015), the distance between |$d$| and |$C$| and the projection from |$d$| to |$C$| are defined as $$\begin{align*} \textrm{dist}(d,C)=&\ \min\limits_{y\in C}\|d-y\|,\\ P_C(d)=&\ \left\{d_1\in C\ |\ \textrm{dist}(d,C)=\|d-d_1\|\right\}. \end{align*}$$ Here we fix any |$d_1\in P_C(d)$|⁠, and let |$d_2=d-d_1$|⁠. If |$d\notin C$| then by Assumption 2.3 there exists some |$i\in \{1,\ldots ,m_e\}$| satisfying |$\nabla c_i(x^\star )^{\textrm T} d \neq 0$| or some |$i\in \{m_e+1,\ldots ,m\}$| satisfying |$c_i(x^\star )=0$| and |$\nabla c_i(x^\star )^{\textrm T} d < 0$|⁠. Let us denote $$\begin{equation*} I=\{i\in\{m_e+1,\ldots,m\}\ |\ c_i(x^\star)=0, \nabla c_i(x^\star)^{\textrm T}d<0\}. \end{equation*}$$ From the error bounds of Hoffman (2003) there exists |$\tilde d\in C$| and a constant |$a_1>0$| such that $$\begin{equation*}\max\left\{\max\limits_{i\in \{1,\ldots,m_e\}}|\nabla c_i (x^\star) ^{\textrm T}d|, \max\limits_{i\in I} -\nabla c_i (x^\star) ^{\textrm T}d\right\}\geqslant a_1\|d-\tilde d\|\geqslant a_1\|d_2\|,\end{equation*}$$ and |$a_1$| is independent of |$d$|⁠. Notice that |$(d,D)$| is a feasible direction; then for the above |$i$| we have $$\begin{equation*} \|Q_i\|\|D\|\geqslant Q_i\cdot D \geqslant -\nabla c_i(x^\star)^{\textrm T}d\geqslant a_1\|d_2\|. \end{equation*}$$ Since there are only finitely many different sets |$I$| there is a constant |$a_2>0$| depending only on |$x^\star $| and satisfying $$\begin{equation} \|d_2\|\leqslant a_2\|D\|. \end{equation}$$(2.10) The case for |$d\in C$| is also correct because |$d_2=0$|⁠. Then $$\begin{equation*} \begin{array}{rcl} d^TMd+\sigma_1\|D\|_F^2&=&d_1^{\textrm T}Md_1+2d_1^{\textrm T}Md_2+d_2^{\textrm T}Md_2+\sigma_1\|D\|_F^2\\[3pt] &\geqslant &d_1^{\textrm T}Md_1-\left(\frac{\delta}{2}\|d_1\|^2+2\delta^{-1}d_2^{\textrm T}M^2d_2\right)+d_2^{\textrm T}Md_2+\sigma_1\|D\|_F^2\\[3pt] & = & \left(d_1^{\textrm T}Md_1-\frac{\delta}{2}\|d_1\|^2\right)+d_2^{\textrm T}(M-2\delta^{-1}M^2)d_2+\sigma_1\|D\|_F^2\\[3pt] &> & 0 \end{array} \end{equation*}$$ if we choose |$\sigma _1>a_2^2\|M-2\delta ^{-1}M^2\|_F$|⁠. From (2.8) and (2.10) we know that |$d^TMd+\sigma _1\|D\|_F^2\geqslant 0$| and the equality holds if and only if |$d=0,D=0$|⁠. Hence we obtain that |$(x^\star ,0)$| is a strict local minimizer of problem (2.9). Then there exists a constant |$\delta _1>0$| such that when |$\max \{\|x-x^\star \|,\|Z\|_F\}\leqslant \delta _1$| and |$(x,Z)\in G$| we have $$\begin{equation*} f(x) + Q_0 \cdot Z + S^\star \cdot Z +\frac{\sigma_1}{2}\|Z\|_F^2\geqslant f(x^\star), \end{equation*}$$ where the equality holds if and only if |$x=x^\star $| and |$Z=0$|⁠. When |$P-S^\star \succ 0$| there is a constant |$\delta _2>0$| such that for any |$Z$| satisfying |$\|Z\|_F\leqslant \delta _2$|⁠, $$\begin{equation*} \frac{\sigma_1}{2}\|Z\|_F^2\leqslant (P-S^\star)\cdot Z. \end{equation*}$$ So far, only |$\delta _2$| depends on |$P$|⁠. One can take $$\begin{equation} \delta_2\leqslant 2\lambda/\sigma_1, \end{equation}$$(2.11) where |$\lambda $| is the smallest eigenvalue of |$P-S^\star $|⁠, because $$\begin{equation*} \|Z\|^2_F\leqslant \|Z\|_F \textrm{tr}(Z)\leqslant \delta_2\textrm{tr}(Z)\leqslant(2\lambda/\sigma_1)\textrm{tr}(Z)\leqslant (2/\sigma_1)(P-S^\star)\cdot Z. \end{equation*}$$ This statement is used in the proof of the next theorem. Therefore, when |$\max \{\|x-x^\star \|,\|Z\|_F\}\leqslant \min \{\delta _1,\delta _2\}$| and |$(x,Z)\in G$|⁠, $$\begin{equation*} \begin{array}{rcl} f(x)+Q_0\cdot Z+P\cdot Z&=&f(x)+Q_0\cdot Z+S^\star\cdot Z+(P-S^\star)\cdot Z\\ &\geqslant&f(x)+Q_0\cdot Z+S^\star\cdot Z+\frac{\sigma_1}{2}\|Z\|_F^2\\ &\geqslant&f(x^\star), \end{array} \end{equation*}$$ where the equality holds if and only if |$x=x^\star $| and |$Z=0$|⁠. This is to say that |$(x^\star , 0)$| is a strict local minimizer of problem (2.6) when |$P\succ S^\star $|⁠. We note that Theorem 2.4 is focused on local minimizers. Next we discuss the global minimizer in the following theorem. Theorem 2.5 Suppose that |$x^\star $| is a strict global minimizer of the QCQP problem (2.5) and satisfies Assumption 2.3 and the second-order sufficient condition. Assume that $$\begin{equation*}\textrm{LS}:=\left\{x\big|\,f(x)\leqslant \min\limits_{x\in F}\,f(x), (x,Z)\in G\right\}\end{equation*}$$ is bounded. Then there exists a |$\bar P\succeq 0$| such that |$(x^\star , 0)$| is a strict global minimizer of (2.6) for all |$P\succ \bar P$|⁠. Proof. If there exists a |$\bar P$| such that |$(\bar x,0 )$| is a global minimizer of $$\begin{equation*}\min\limits_{(x,Z)\in G}\mathcal L\,(x,Z,\bar P),\end{equation*}$$ then for any |$P\succeq \bar P$| we obtain the following relationship from Lemma 2.2: $$\begin{equation*} \mathcal L\,(\bar x,0,\bar P)\leqslant \min\limits_{(x,Z)\in G} \mathcal L\,(x,Z,P)\leqslant \mathcal L\,(\bar x,0, P)=\mathcal L\,(\bar x,0, \bar P). \end{equation*}$$ Thus $$\begin{equation*} \min\limits_{(x,Z)\in G}\mathcal L\,(x,Z,P)=\mathcal L\,(\bar x,0, P). \end{equation*}$$ Hence |$(\bar x, 0)$| is a global minimizer of problem (2.6). Since |$x^\star $| is a strict global minimizer of problem (2.5) we have |$\bar x=x^\star $|⁠. If there does not exist any |$P$| such that |$Z=0$| is part of a global minimizer of problem (2.6) then there are two cases: For any |$\hat P\succeq S^\star +(\rho (Q_0)+\epsilon )I$|⁠, the global minimizer |$(\hat x,\hat Z)$| of problem (2.6) satisfies |$\|\hat Z\|_F\geqslant \delta>0$|⁠, where |$\rho (Q_0)$| is the spectral radius of |$Q_0$|⁠, |$\epsilon $| is an arbitrary positive constant scalar and |$I$| is the identity matrix. Since |$\hat x\in LS$|⁠, |$\hat x$| and |$f(\hat x) $| are bounded, by letting |$\hat P=\lambda I$| and |$\lambda \rightarrow +\infty $|⁠, we get $$\begin{equation*}\mathcal L\,(\hat x,\hat Z,\hat P)=f(\hat x)+(\hat P+Q_0)\cdot \hat Z\rightarrow +\infty.\end{equation*}$$ This is in contradiction with Lemma 2.2. There exists a sequence |$\{P^k\}\succeq S^\star +(\rho (Q_0)+\epsilon )I$|⁠, such that |$Z^k\rightarrow 0$|⁠, but |$Z^k\neq 0$|⁠, where |$(x^k, Z^k)$| is a global minimizer of |$\min\nolimits _{(x,Z)\in G} \mathcal L\,(x,Z,P^k)$|⁠. Since |$x^k\in LS$|⁠, |$\{x^k\}$| are bounded, there exists a cluster point of |$\{x^k\}$|⁠, denoted by |$\bar x$|⁠. Suppose that |$x^{i_k}\rightarrow \bar x$|⁠. Due to the fact that |$Z^{i_k}\rightarrow 0$| and the closeness of the set of solutions we have |$(\bar x, 0)\in G$|⁠, i.e. |$\bar x\in F$|⁠. It is known from Lemma 2.2 that $$\begin{equation} f(x^{i_k})+(Q_0+P^{i_k})\cdot Z^{i_k}\leqslant \min\limits_{x\in F}\,f(x). \end{equation}$$(2.12) Since |$(Q_0+P^{i_k})\cdot Z^{i_k}>0$| we have $$\begin{equation*} f(x^{i_k})< \min\limits_{x\in F}\,f(x). \end{equation*}$$ Taking the limit |$i_k\rightarrow +\infty $| on both sides we have |$f(\bar x)\leqslant \min\nolimits _{x\in F}f(x)$|⁠. On the other hand, |$\bar x\in F$|⁠, thus |$f(\bar x)\geqslant \min \nolimits _{x\in F}f(x)$|⁠. Therefore, |$f(\bar x)= \min \nolimits _{x\in F}f(x)$|⁠, which implies that |$\bar x$| is a global minimizer of (2.5); then |$\bar x=x^\star $|⁠. According to Theorem 2.4, for all |$P\succeq S^\star +(\rho (Q_0)+\epsilon )I$|⁠, |$(x^\star , 0)$| is a strict local minimizer of problem (2.6). Moreover, as mentioned in the proof of Theorem 2.4, that is (2.11), we take |$\delta _2=(2\epsilon /\sigma _1)$| to ensure that when |$\max \{\|x-x^\star \|,\|Z\|_F\}\leqslant \min \{\delta _1,\delta _2\}$| and |$(x,Z)\in G$|⁠, we have $$\begin{equation*} f(x)+Q_0\cdot Z+P\cdot Z\geqslant f(x^\star), \end{equation*}$$ where the equality holds if and only if |$x=x^\star $| and |$Z=0$|⁠. This is in contradiction with |$(x^{i_k}, Z^{i_k})\rightarrow (x^\star , 0)$| and (2.12). In summary, there exists a |$\bar P$| such that |$(x^\star ,0 )$| is a global minimizer of problem (2.6) when |$P\succ \bar P$|⁠. 3. Algorithm Compared with the feasible region of QCQP problem (2.5), which may be nonconvex or discontinuous, the feasible region of our penalty formulation (2.6) is a semipositive definite convex cone, which is easier to apply to classical optimization algorithms, such as the projection gradient method. However, an essential difficulty of the problem still remains. Indeed, problem (2.6) remains a nonconvex quadratic semidefinite programming, which is NP-hard as well. As discussed in Section 2 the penalty factor |$P$| is a positive semidefinite matrix. The choice of |$P$| will be discussed later. When |$P$| is given, problem (2.6) can be viewed as a difference of convex (DC) functions because it can be written as $$\begin{equation*} \begin{array}{cl} \min\limits_{x\in \mathbb{R}^n, Z\in \mathbb S^n} & \begin{pmatrix} 1 & x^{\textrm T} \\ x & xx^T+Z \\ \end{pmatrix}\cdot \begin{pmatrix} 0 & g_0^{\textrm T} \\ g_0 & Q_0+P \\ \end{pmatrix} -x^{\textrm T}Px\\[10pt] \text{s.t.} & \begin{pmatrix} 1 & x^{\textrm T} \\ x & xx^T+Z \\ \end{pmatrix}\cdot \begin{pmatrix} c_i & g_i^{\textrm T} \\ g_i & Q_i \\ \end{pmatrix}= 0,\quad i=1,\ldots,m_e,\\[10pt] & \begin{pmatrix} 1 & x^{\textrm T} \\ x & xx^{\textrm T}+Z \\ \end{pmatrix}\cdot \begin{pmatrix} c_i & g_i^{\textrm T} \\ g_i & Q_i \\ \end{pmatrix}\geqslant 0,\quad i=m_e+1,\ldots,m,\\[10pt] & Z\succeq 0, \end{array} \end{equation*}$$ and by Lemma 2.1 it is equivalent to $$\begin{equation*} \begin{array}{cl} \min\limits_{x\in \mathbb{R}^n, W\in \mathbb S^{n+1}} & W\cdot \begin{pmatrix} 0 & g_0^{\textrm T} \\ g_0 & Q_0+P \\ \end{pmatrix} -x^TPx\\[10pt] \text{s.t.} & W\cdot \begin{pmatrix} c_i & g_i^{\textrm T} \\ g_i & Q_i \\ \end{pmatrix}= 0,\quad i=1,\ldots,m_e,\\[10pt] & W\cdot \begin{pmatrix} c_i & g_i^{\textrm T} \\ g_i & Q_i \\ \end{pmatrix}\geqslant 0,\quad i=m_e+1,\ldots,m,\\[10pt] &W_{11}=1,\\[3pt] &W_{\{2,\ldots,n+1\}~1}=x,\\[3pt] & W\succeq 0. \end{array} \end{equation*}$$ As far as the authors know, there is no mature algorithm to solve a nonconvex quadratic positive semidefinite programming problem. One can use one of the methods for solving DC problems to solve problem (2.6). There have been many studies devoted to the theory of DC functions in the literature (see, for example, Hiriart-Urruty, 1985, 1989). For DC algorithms (DCA) one can find references that use the regularization approach (Fernández Cara & Moreno, 1988), the dual approach (Auchmuty, 1989), the subgradient method (Dinh & Souad, 1986) and the proximal point algorithm (Sun et al., 2003). Le Thi & Dinh (2018) offered a nice survey on the 30 years of developments of these theoretical and algorithmic tools and provide more relevant information about DC and DCA. Due to the NP-hardness we try to design an algorithm to find solutions satisfying the first-order optimality conditions: $$\begin{equation} \begin{array}{l} Q_0x+g_0-\sum\limits_{i=1}^m \alpha_i(Q_ix+g_i)=0,\\[10pt] x^{\textrm T}Q_ix+2g_i^{\textrm T}x+c_i+Q_i\cdot Z= 0,\quad i=1,\ldots,m_e,\\[5pt] x^{\textrm T}Q_ix+2g_i^{\textrm T}x+c_i+Q_i\cdot Z\geqslant 0,\quad i=m_e+1,\ldots,m,\\[5pt] \alpha_i\geqslant 0,\quad i=m_e+1,\ldots,m,\\[5pt] \alpha_i\left(x^{\textrm T}Q_ix+2g_i^Tx+c_i+Q_i\cdot Z\right)=0,\quad i=m_e+1,\ldots,m,\\[5pt] \left(Q_0+P-\sum\limits_{i=1}^m \alpha_iQ_i\right)\cdot Z=0,\\[10pt] Z\succeq 0,\\[5pt] \left(Q_0+P-\sum\limits_{i=1}^m \alpha_iQ_i\right)\succeq 0. \end{array} \end{equation}$$(3.1) We propose the entire algorithmic framework in this section, including an algorithm for problems with the penalty function and an algorithm on the update rules of |$P$|⁠. 3.1 Proximal point algorithm We choose the proximal point algorithm (PPA) proposed by Rockafellar (1976) to solve the penalty problem (2.6). Sun et al. (2003) extended PPA to solve general DC. The basic properties of PPA and applications of PPA in machine learning, statistics, data analysis, signal and image processing, computational economics and finance, engineering design, scheduling and resource allocation and other areas have been discussed by Parikh & Boyd (2014). For our problem, to be more specific, we solve the following subproblem in the |$k$|th iteration: $$\begin{equation} \begin{array}{rcl} (d^k,Z^{k+1})=& \underset{d\in \mathbb R^n,Z\in \mathbb S^n}{\textrm{arg min}} & \mathcal L\,(x^k+d,Z,P)+d^{\textrm T}Pd\\ & \text{s.t.} & (x^k+d,Z)\in G, \end{array} \end{equation}$$(3.2) where |$d^{\textrm T}Pd$| is the proximal term. Then we update $$\begin{equation*} x^{k+1}=x^k+d^k. \end{equation*}$$ In fact, problem (3.2) is a standard positive semidefinite programming problem, which can be written as $$\begin{equation} \begin{array}{cl} \min\limits_{d\in \mathbb{R}^n, Z\in \mathbb S^n} &\begin{pmatrix} 1 & d^{\textrm T} \\ d & dd^{\textrm T}+Z \\ \end{pmatrix}\cdot \begin{pmatrix} 0 & (Q_0x^k+g_0)^{\textrm T} \\ Q_0x^k+g_0 & Q_0+P \\ \end{pmatrix} \\ \\ \text{s.t.} & \begin{pmatrix} 1 & d^{\textrm T} \\ d & dd^{\textrm T}+Z \\ \end{pmatrix}\cdot \begin{pmatrix} c_i+2g_ix^k+{x^k}^{\textrm T}Q_ix^k & (Q_ix^k+g_i)^{\textrm T} \\ Q_ix^k+g_i & Q_i \\ \end{pmatrix} = 0, \qquad i=1,\ldots,m_e,\\ \\ & \begin{pmatrix} 1 & d^v \\ d & dd^{\textrm T}+Z \\ \end{pmatrix}\cdot \begin{pmatrix} c_i+2g_ix^k+{x^k}^{\textrm T}Q_ix^k & (Q_ix^k+g_i)^{\textrm T} \\ Q_ix^k+g_i & Q_i \\ \end{pmatrix}\geqslant 0, \qquad i=m_e+1,\ldots,m,\\ \\ & Z\succeq 0 \end{array} \end{equation}$$(3.3) or $$\begin{equation*} \begin{array}{cl} \min\limits_{d\in \mathbb{R}^n, W\in \mathbb S^{n+1}} &W\cdot \begin{pmatrix} 0 & (Q_0x^k+g_0)^{\textrm T} \\ Q_0x^k+g & Q_0+P \\ \end{pmatrix} \\ \\ \text{s.t.} & W\cdot \begin{pmatrix} c_i+2g_ix^k+{x^k}^{\textrm T}Q_ix^k & (Q_ix^k+g_i)^{\textrm T} \\ Q_ix^k+g_i & Q_i \\ \end{pmatrix} = 0, \qquad i=1,\ldots,m_e,\\ \\ & W\cdot \begin{pmatrix} c_i+2g_ix^k+{x^k}^{\textrm T}Q_ix^k & (Q_ix^k+g_i)^{\textrm T} \\ Q_ix^k+g_i & Q_i \\ \end{pmatrix} \geqslant 0, \qquad i=m_e+1,\ldots,m,\\ \\ &W_{11}=1,\\[5pt] &W_{\{2,\ldots,n+1\}~1}=x^k,\\[5pt] & W\succeq 0, \end{array} \end{equation*}$$ for easy understanding. The above problem can be solved in polynomial time by various methods including off-the-shelf software (for example, SDPT3, Toh et al., 2009, and SDPNAL+, Yang et al., 2015). Its solution |$(d,Z,\alpha , S)$| satisfies the first-order conditions as follows: $$\begin{equation} \begin{array}{l} Q_0(x^k+d)+g_0-\sum\limits_{i=1}^m \alpha_i\left(Q_i(x^k+d)+g_i\right)+Pd=0,\\[10pt] (x^k+d)^{\textrm T}Q_i(x^k+d)+2g_i^{\textrm T}(x^k+d)+c_i+Q_i\cdot Z=0,\quad i=1,\ldots,m_e\\[5pt] (x^k+d)^{\textrm T}Q_i(x^k+d)+2g_i^{\textrm T}(x^k+d)+c_i+Q_i\cdot Z\geqslant 0,\quad i=m_e+1,\ldots,m\\[5pt] \alpha_i\geqslant 0,\quad i=m_e+1,\ldots,m,\\[5pt] \alpha_i\left((x^k+d)^{\textrm T}Q_i(x^k+d)+2g_i^{\textrm T}(x^k+d)+c_i+Q_i\cdot Z\right)=0, \quad i=m_e+1,\ldots,m,\\[5pt] \left(Q_0+P-\sum\limits_{i=1}^m \alpha_iQ_i\right)\cdot Z=0,\\[10pt] Z\succeq 0,\\[5pt] \left(Q_0+P-\sum\limits_{i=1}^m \alpha_iQ_i\right)\succeq 0. \end{array} \end{equation}$$(3.4) Here, we present one ingredient in our algorithmic framework, which is an algorithm for solving the penalty problem (2.6). The monotone descent property and convergence analysis of the algorithm are given in Lemma 3.1 and Theorem 3.2. Algorithm 3.1 Proximal point algorithm for penalty problem Step 0: give the initial point |$x^0$|⁠. Step 1: solve problem (3.3) to obtain |$d^k$|⁠. Step 2: |$x^{k+1}:=x^k+d^k$|⁠. Step 3: if the termination criteria is satisfied, stop. Otherwise, |$k:=k+1$|⁠, go to Step 1. Lemma 3.1 Suppose that |$\{x^k\}$| and |$\{Z^k\}$| are generated by Algorithm 3.1; then $$\begin{equation} \mathcal L\,(x^{k+1},Z^{k+1},P)\leqslant \mathcal L\,(x^k,Z^k,P)-{d^k}^{\textrm T}Pd^k. \end{equation}$$(3.5) Proof. Since for all |$k$| we have |$(x^k, Z^{k})\in G$|⁠, thus $$\begin{equation*} \begin{array}{rcl} \mathcal L\,(x^{k+1},Z^{k+1},P)&=&\left(\mathcal L\,(x^k+d^k,Z^{k+1},P)+(d^k)^{\textrm T}Pd^k\right)-(d^k)^{\textrm T}Pd^k\\ &\leqslant &\mathcal L\,(x^k,Z^{k},P)-(d^k)^{\textrm T}Pd^k. \end{array} \end{equation*}$$ Here, the inequality holds due to (3.2). Theorem 3.2 Suppose that |$\{d^k\}$|⁠, |$\{x^k\}$| and |$\{Z^k\}$| are generated by Algorithm 3.1. If the primal and dual solutions |$\{x^k\},\{Z^k\},\{\alpha ^k\},\{S^k\}$| of (3.3) are bounded, where |$S^k=Q_0+P-\sum _{i=1}^m \alpha ^k_iQ_i$|⁠, then |$\{d^k\}$| converges to 0 and any cluster point of |$\{(x^k,Z^k,\alpha ^k,S^k)\}$| is a first-order stationary point of (2.6). Proof. From (3.5) we know that $$\begin{equation} \mathcal L\,(x^{k+1},Z^{k+1},P)\leqslant \mathcal L\,(x^0,Z^0,P)-\sum_{i=1}^k{d^i}^{\textrm T}Pd^i. \end{equation}$$(3.6) Due to the boundness assumption we have $$\begin{equation} \sum_{i=1}^\infty{d^i}^{\textrm T}Pd^i<+\infty. \end{equation}$$(3.7) Since |$P\succ 0$| thus |$d^i\rightarrow 0$|⁠. Our boundedness assumptions imply that |$\{(x^k,Z^k,\alpha ^k,S^k)\}$| has at least one cluster point. Let |$(\bar x, \bar Z,\bar \alpha , \bar S)$| be any one of the cluster points, and subsequence |$\{(x^{i_k},Z^{i_k},\alpha ^{i_k},S^{i_k})\}\rightarrow (\bar x, \bar Z,\bar \alpha , \bar S)$|⁠. Taking |$i_k\rightarrow \infty $| in (3.4) we find that |$(\bar x, \bar Z,\bar \alpha , \bar S)$| satisfies (3.1). 3.2 Update of penalty When solving the problem |$\min \nolimits _{(x,Z)\in G} \mathcal L\,(x,Z,P)$| we hope that the solution satisfies |$Z=0$|⁠. By putting |$Z=0$| in the first-order conditions (3.1) we obtain $$\begin{equation} \begin{array}{l} Q_0x+g_0-\sum\limits_{i=1}^m \alpha_i(Q_ix+g_i)=0,\\[10pt] x^{\textrm T}Q_ix+2g_i^{\textrm T}x+c_i= 0,\quad i=1,\ldots,m_e,\\[10pt] x^{\textrm T}Q_ix+2g_i^{\textrm T}x+c_i\geqslant 0,\quad i=m_e+1,\ldots,m,\\[10pt] \alpha_i\geqslant 0,\quad i=m_e+1,\ldots,m,\\[10pt] \alpha_i\left(x^{\textrm T}Q_ix+2g_i^{\textrm T}x+c_i\right)=0,\quad i=m_e+1,\ldots,m,\\[10pt] \left(Q_0+P-\sum\limits_{i=1}^m \alpha_iQ_i\right)\succeq 0. \end{array} \end{equation}$$(3.8) In fact, only the final equation of the above conditions contains |$P$|⁠. This means that with any given feasible |$x$| and |$\alpha $|⁠, (3.8) can be satisfied if |$P$| is sufficiently large. We consider a simple example, that is, $$\begin{equation*}\begin{array}{cl} \min & x^{\textrm T}\left( \begin{array}{cc} 6 & -3 \\ -3 & 1 \end{array} \right)^{\textrm T} x\\[10pt] \text{s.t.} & x^{\textrm T}\left( \begin{array}{cc} 1 & 0 \\ 0 & 0 \end{array} \right)x-\left( \begin{array}{cc} 1 & 0 \end{array} \right)x =0,\\[10pt] & x^{\textrm T}\left( \begin{array}{cc} 0 & 0 \\ 0 & 1 \end{array} \right)x-\left( \begin{array}{cc} 0 & 1 \end{array} \right)x =0. \end{array} \end{equation*}$$ There are four solutions satisfying the first five conditions of (3.8), which are $$\begin{equation*}\begin{array}{lll} x^{(1)}=\left( \begin{array}{cc} 0 & 0 \end{array}\right)^{\textrm T}, & \alpha_1^{(1)}=0, & \alpha_2^{(1)}=0,\\[5pt] x^{(2)}=\left( \begin{array}{cc} 1 & 1 \end{array}\right)^{\textrm T}, & \alpha_1^{(4)}=6, & \alpha_2^{(4)}=-4,\\[5pt] x^{(3)}=\left( \begin{array}{cc} 0 & 1 \end{array}\right)^{\textrm T}, & \alpha_1^{(3)}=6, & \alpha_2^{(3)}=2,\\[5pt] x^{(4)}=\left( \begin{array}{cc} 1 & 0 \end{array}\right)^{\textrm T}, & \alpha_1^{(2)}=12, & \alpha_2^{(2)}=6.\\ \end{array}\end{equation*}$$ The global minimum is |$x^{(1)}$|⁠. One may note that if |$P=0$|⁠, then none of the four solutions satisfy the last condition of (3.8). The last condition of (3.8) is satisfied only by the first solution |$x^{(1)}$|⁠, |$\alpha _1^{(1)}$| and |$\alpha _2^{(1)}$| if |$ P = I$|⁠. If |$P=2I$| the last condition of (3.8) is satisfied by the first two solutions. If |$P=4I$| then the last condition of (3.8) is satisfied by the first three solutions. Moreover, if |$P=9I$| then all four solutions satisfy (3.8). We thus can see that, if |$P=0$|⁠, there is no solution satisfying the KKT condition. On the other hand, if the initially selected |$P$| is too large (in the sense of |$P\succ \succ 0$|⁠), Algorithm 3.1 might stop, with a high probability, at a solution that is not the global minimizer. Therefore, we propose to update |$P$| iteratively, rather than directly assigning a fixed and large |$P$| initially. Notice that the penalty problem (2.6) is based on the Lagrangian function of problem (2.2) corresponding to |$Z\preceq 0$|⁠, where |$P$| is its dual variable. The update of |$P$| can be obtained from the update criteria in the Lagrange relaxation method (Guignard, 2003). We regard the optimal objective value of the penalty problem as a function of |$P$|⁠: $$\begin{equation} z(P):=\min\limits_{(x,Z)\in G}\mathcal L\,(x,Z,P). \end{equation}$$(3.9) Let |$(x^k, Z^k)$| be the optimal solution of |$z(P^k)$|⁠. In fact, |$Z^k$| is the subgradient of |$z(P)$| at |$P^k$|⁠. Hence, we can increase |$P^{k+1}$| along the |$Z^k$| direction, that is, $$\begin{equation} P^{k+1}=P^k+\mu Z^k. \end{equation}$$(3.10) We would like to achieve |$\mathcal L\,(x^{k+1},Z^{k+1},P^{k+1})=\min \nolimits _{x\in F}f(x)$|⁠, though |$x^{k+1},Z^{k+1}$| is unknown. Nevertheless, we may use |$x^{k},Z^{k}$| as an approximation. By the formula of approximation $$\begin{equation*} \mathcal L\,(x^k,Z^k,P^{k+1})=\min\limits_{x\in F}\,f(x), \end{equation*}$$ we have $$\begin{equation*} \mu=\left(\min\limits_{x\in F}\,f(x)-\mathcal L\,(x^k,Z^k,P^k)\right)/\|Z^k\|^2_F. \end{equation*}$$ Here, because |$\min \nolimits _{x\in F}f(x)$| is unknown, we thus use its upper bound in the computation. For example, we can generate a feasible solution |$\bar x^k\in F$| from |$x^k$| and take the upper bound to be |$f(\bar x^k)$|⁠. For simple constraints such as |$x_i^2=1$|⁠, when |$x^k_i\geqslant 0$|⁠, we let |$\bar x^k_i=1$|⁠, and when |$x^k_i< 0$|⁠, we let |$\bar x^k_i=-1$|⁠. For more complicated constraints, if a feasible solution can not be generated from |$x^k$|⁠, a good strategy is to add |$\mu Z^k$| with a constant |$\mu $| as shown in (3.10) or a fixed scalar multiple of the identity |$\delta I$| to |$P$| each time. In practice, adding |$\mu Z^k$| to |$P^k$| as in (3.10) usually offers better performance than adding a scalar multiple of the identity matrix. Moreover, the combination |$\mu Z^k+\delta I$| is also a possible choice. We can design different choices of the scalar for different problems. We now present the framework of the positive semidefinite penalty method (PSDP) in Algorithm 3.2. Algorithm 3.2 PSDP Step 0: let |$P^0=0$|⁠. Step 1: compute |$(x^{k},Z^{k})=\underset{(x,Z)\in G}{\textrm{arg min}}\ \mathcal L\,(x,Z,P^k)$| by Algorithm 3.1. Step 2: if the termination criteria are satisfied, stop. Otherwise, update |$P^{k+1}$| by (3.10), |$k:=k+1$|⁠, go to Step 1. When performing Step 1 one can use |$(x^{k-1},Z^{k-1})$| as an initial point if |$k\geqslant 1$|⁠. Notice that when |$P^0=0$|⁠, |$\min \nolimits _{(x,Z)\in G}\mathcal L\,(x,Z,P^0)$| is SDR. Consequently, Algorithm 3.1 converges in one step from any initial point, which can be derived from KKT (3.4). Therefore, |$x^0$| in PSDP is the solution of SDR. We note that |$Z^k=0$| is one of the termination criteria, which means that Algorithm 3.2 finds a feasible solution and does not find a better solution in the subsequent penalty updates. Another criterion is |$\mathcal L\, (x^k, Z^k, P^k) -\min _{x\in F}\,f(x)> 0$| because it implies that Algorithm 3.1 obtains a bad solution due to Lemma 2.2, and there is no need to implement more updates. In practice, |$\min _{x\in F}\,f(x)$| can be approximated via generated |$\{x^k\}$|⁠. The maximum number of penalty updates can also be a criterion as efficiency of the algorithm is very much affected by the number of updates taken. These termination criteria are used in our numerical tests. Interestingly, we can see that the number of updates required for a good solution is often small so that the added computational cost can be under control. 4. Numerical tests In this section we report numerical results to demonstrate the performance of our proposed Algorithm 3.2 PSDP in terms of our proposed algorithm PSDP in terms of the quality of solutions. We solve two kinds of problems, which are binary quadratic programming and binary quadratic equations. In these tests PSDP is compared with the classic SDR. 4.1 Binary quadratic programming We consider a binary quadratic programming given by $$\begin{equation*} \begin{array}{cl} \min\limits_{x\in \mathbb{R}^n} & x^{\textrm T}Qx+2g^{\textrm T}x\\ \text{s.t.} & x_i^2-x_i=0,\quad i=1,\ldots,n. \end{array} \end{equation*}$$ The constraint |$x_i^2-x_i=0$| implies |$x_i\in \{0,1\}$|⁠. The problem is known to be NP-hard (Garey & Johnson, 1979). It has many applications such as capital budgeting and financial analysis problems (Laughunn, 1970; McBride & Yormark, 1980), CAD problems (Krarup & Pruzan, 1978), traffic message management problems (Gallo et al., 1980), machine scheduling (Alidaee et al., 1994) and molecular conformation (Phillips & Rosen, 1994). In our test |$Q$| is a random real symmetric matrix. All the entries of |$Q$| are independent identically distributed with expectation 0 and variance 1. For any |$x^k$| generated in Algorithm 3.2 we obtain the feasible solution by a binary thresholding (rounding off to 0 or 1); that is to say that we set $$\begin{equation} \bar x^k_i=\begin{cases} 1& \mbox{if}\ \; x^k_i\geqslant 0.5,\\[5pt] 0 & \mbox{if}\ \; x^k_i< 0.5. \end{cases} \end{equation}$$(4.1) The operation given in (4.1) of getting |$\bar x^k$| from |$x^k$| is referred to as the feasibilization or rounding off. Let $$\begin{equation*} f_{\min}^k=\min_{i\leqslant k}\{\,f(\bar x^i)\}. \end{equation*}$$ Then we update |$P$| by $$\begin{equation*} P^{k+1}=P^k+\min\left\{\frac{f_{\min}^k-\mathcal L\,(x^k,Z^k,P^k)}{\|Z^k\|^2_F},\frac{1}{\|Z^k\|_F}\right\}Z^k. \end{equation*}$$ We set up |$\mathcal L\, (x^k, Z^k, P^k) -f_{\min }^k> 0$| and |$x^k\in F$| as the termination criteria. By Lemma 2.2 the global solution |$(x_\star ^k,Z_\star ^k)$| satisfies |$\mathcal L\,(x_\star ^k,Z_\star ^k,P^k)-f_{\min }^k\leqslant 0$|⁠, thus the first criterion means that the current penalty problem is badly solved, and the second implies that |$(x^k,0)$| is also the solution when |$P=P^{k+1}$|⁠. If any one of them happens the algorithm stops. We first consider a 10-dimensional example, whose |$g$| equals 0 and |$Q$| equals |$ $| The optimal solution |$x^\star $| is $$\begin{equation*} \left( \begin{array}{rrrrrrrrrr} 0 & 1 & 1 & 1 & 0 & 1 & 0 & 0 & 0 & 0 \end{array} \right)^{\textrm T}. \end{equation*}$$ The solution of SDR is given below, with only four decimal places shown to save space, by $$\begin{equation*} \left( \begin{array}{rrrrrrrrrr} 0.2504 & 0.7005 & 0.3709 & 0.7968 & 0.1731 & 0.9559 & 0.0002 & 0.2462 & 0.0474 & 0.4757 \end{array} \right)^{\textrm T}. \end{equation*}$$ With the feasibilization given by (4.1) the above is transformed into $$\begin{equation*} \left( \begin{array}{rrrrrrrrrr} 0 & 1 & 0 & 1 & 0 & 1 & 0 & 0 & 0 & 0 \end{array} \right)^{\textrm T}, \end{equation*}$$ which is not the global solution. In PSDP the solution of SDR corresponds to the first solution when |$P^0=0$|⁠. Then we update |$P^1$| and obtain |$x^1=$| $$\begin{equation*} \left(\begin{array}{rrrrrrrrrr} 0.1065 & 0.9441 & 0.6733 & 0.9696 & 0.0552 & 0.9945 & 0.0008 & 0.0299 & 0.0005 & 0.0056 \end{array} \right)^{\textrm T}, \end{equation*}$$ which is again shown with only four decimal places. With feasibilization, the round-off solution |$\bar x^1$| is $$\begin{equation*} \left( \begin{array}{rrrrrrrrrr} 0 & 1 & 1 & 1 & 0 & 1 & 0 & 0 & 0 & 0 \end{array} \right)^{\textrm T}, \end{equation*}$$ which is the optimal solution |$x^\star $|⁠. When we focus only on the sequence |$\{x^k\}$| but not the solutions obtained by feasibilization (i.e. rounding off via (4.1)), we find that in the subsequent iterations, the difference between |$x^k$| and |$x^\star $| drops gradually, then jumps to |$1{\rm e}{-}6$| at the 24th iteration and remains almost unchanged for the rest of the iterations as shown in Fig. 1. The error tolerance |$1{\rm e}{-}6$| is also used here to measure the accuracy of solving SDP subproblems. We regard |$x^k$| as a feasible solution if |$\|x^k-\bar x^k\|\leqslant 1{\rm e}{-}6$|⁠. Moreover, the jump in |$\|x^k-\bar x^k\|$| gives an indication of the exact penalty. Fig. 1. Open in new tabDownload slide Convergence of a BQP instance. Fig. 1. Open in new tabDownload slide Convergence of a BQP instance. This example illustrates that, rather than directly rounding off an SDR solution, our Algorithm 3.2 can offer a more systematic and effective way to make the solution feasible and get closer to a global solution. As additional examples we choose three different |$n$| (⁠|$n=10,15,20$|⁠) and generate 50 random examples for each |$n$|⁠. For these 150 examples, Fig. 2 reports the numbers of examples in which PSDP gets terminated within a suitable number of penalty updates. From Fig. 2 we see that PSDP satisfies the stopping criteria within 51 updates of the penalty parameter for all the examples. In fact, the number of penalty updates in PSDP can be further reduced if feasibilization is adopted. As shown in the example stated earlier, the number of penalty updates is 24, but just 1 update is enough for us to obtain the global solution when we consider rounding off the solutions via (4.1). Fig. 2. Open in new tabDownload slide Number of penalty updates of PSDP for BQP. Fig. 2. Open in new tabDownload slide Number of penalty updates of PSDP for BQP. The comparison between PSDP and SDP in the tests is shown in Table 1. Here we consider only the objective values of their solutions due to rounding off, where |$f$| represents the objective value of the solution generated by PSDP, |$f^{SDR}$| represents the objective value of the solution generated by SDR and |$f^{OPT}$| represents the optimal objective value. Since we use their solutions obtained by rounding off we have |$f^{OPT}\leqslant f^{SDR}$| and |$f^{OPT}\leqslant f$|⁠. Moreover, SDR is the first step in PSDP; thus, we always have |$f\leqslant f^{SDR}$|⁠. In this table we list the numbers of examples associated with the different cases. For instance, when |$n=20$|⁠, of all the 50 examples, SDR gets 18 optimal solutions, while PSDP solves 42, which is 24 more than SDR. For the remaining 8 examples that PSDP is not able to globally solve, PSDP improves in 6 of these examples on the basis of SDR and retains the same solutions as SDR in the other 2 cases. We see from the Table 1 that, with the gradual increase of |$n$|⁠, PSDP maintains the ability to find global solutions. Table 1 BQP: PSDP vs. SDR |$n$| . |$f=f^{\,\textrm OPT}$| . |$f>f^{\,\textrm OPT}$| . . |$ff^{\,\textrm OPT}$| . . |$ff^{\,\textrm OPT}$| . . |$ff^{\,\textrm OPT}$| . . |$f