Measuring static complexity

Measuring static complexity University of Nevada, Las Vegas Las Vegas NV 89154 (Received August 17, 1990 and in revised form March 1, 1991} ABSTRACT: The concept of "pattern" is introduced, formally defined, and used to analyze various measures of the complexity of finite binary sequences and other objects. The standard Kolmogoroff.Chaitin.Solomonoff complexity measure is considered, along with Bennett’s ’logical depth; Koppel’s "sophistication’, and Chaitin’s analysis of the complexity of geometric objects. The pattern.theoretic point of view illuminates the shortcomings of these measures and leads to specific improvements, it giles rise to two novel mathematical concepts "orders" of complexity and "levels" of pattern, and it yields a new measure of complexity, the "structural complexity’, which measures the total amount of structure an entity possesses. KEY WORDS AND PHRASES. Kolmogorov complexity, algorithritic Information, pattern, sophistication, structure, depth AMS SUBJECT CLASSIFICATION CODE. 68Q30 1. INTRODUCTION Different contexts require different concepts of "complexity’. In the theory of computational complexity, as outlined for instance by Kronsjo [1], one deals with the complexity of problems. And the complexity of evolving systems falls under the aegis of dynamical systems theory, as considered for example by Bowen [2]. The present paper, however, is concerned with the complexity of static objects, a subject which has receiled rather little attention. Although most of the discussion focuses on binary sequences, the implications are much more general. The first mathematically precise measure of the complexity of static objects was invented simultaneously by Kolmogorov [3], Chaitin [4] and Solomonoff [5]. DEFINITION 1. Let M be a universal Turing machine. Let us say that a program for M is serf.delimiting if it contains a segment telling M its total length in bits. Then, the KCS complexity of a finite binary sequence x is the length of the shortest program which computes x on M. In the decades since its conception, this definition has led to a number of Interesting developments. Chaitin [4] has used it to provide an Interesting new proof of Godel’s theorem; and Bennett [6], Zurek [7] and others have applied it to problems in thermodynamics. However, it has Increasingly been realized that the concept of KCS complexity falls to capture the Intuitive meaning of "complexity." The problem is that, according to the KCS definition, "random", structureless sequences are judged the most complex. The least complex sequences are those like O(X)IX)O000...O00, 010101010101...010101, and 1010010001000010(XXX)...O, which can be computed by very short programs. And the most complex sequences x are those which cannot be computed by any program shorter than "print X’. There is a sense in which this is not a desirable property for a definition of complexity to have In which a human or a ree or the sequence of prime numbers is more "complex" than a random sequence. Over the past decade, there have been two noteworthy attempts to remedy this deficiency: Bennett’s [6] "logical depth", and Koppel’s [8] "sophistication." We outline a general mathematical framework within which various measures of complexity may be formulated, analyzed and compared. This approach yields significant modifications of these measures, as well as several novel, general concepts for the analysis of complexity. Furthermore, it giles rise to an entirely new complexity measure, the "structural complexity’, which measures the total amount of structure an entity possesses. Intuitilely, this tells one "how much there is to say" about a gilen object. 2. PATTERN DEFINITION 2. A pattern space is a set (S,*,l I), where S is a set, * is a binary is a map from S into the nonnegatile operation defined on some subset of SxS, and real numbers. Let us consider a simple example: Turing machines and finite binary sequences. DEFINITION 3. Let y be a program for a unilersal Turing machine; let be a finite binary sequence. Define y*z to be the binary sequence which appears on the memory tape of the Turing machine after, having been started with on its input tape beginning directly under the tape head and extending to the right, program y finishes running. If y never Mops running, then let y*z be undefined. Let zl denote the length of as Its length, and let yl denote the length of the program y. Now we are ready to gile a general definItion of pattern. DEFINITION 4. Let a, b, and c denote constant, nonnegative numbers. Then an ordered pair (y,z) is a paftern In x if x=y*z and alYl + blzl + cC(y,z) < Ixl, where C(y,z) denotes the complexity of obtaining x from (y,z). DEFINITION 6. ff y is a Turing machine program and is a finIte binary sequence, C(y,z) denotes the number of time steps which the Turlng machine takes to stop when equipped with program y and given z as initial input. For many purposes, the numbers a, b and c are not Important. Often they can all be taken to equal 1, so that they do not appear in the formula at all. But in some cases it may be useful to, for instance, set a=b= 1 and c=O. Then the formula reads Yl + Izl Ixl. and The constants could be dispensed with, but then it would be necessary to redefine C more often. Intuitively, an ordered pair (y,z) is a pattern in x if the complexity of y, plus the complexity of z, plus the complexity of getting x out of y and z, is less than the complexity of x. In other words, an ordered pair (y,z) is a pattern in x if it is simpler to represent x in terms of y and z than it is to say "x’. The constants a, b and c are, of course, weights: If a=3/4 and b=5/4, for example, then the complexity of a is counted less than the complexity of b. The definition of pattern can be generalized to ordered n.tuples, and to take into account the possibility of different kinds of combination, say *, and *: DEFINITION 7: An ordered set of n entities (x, ,x2 ,...,x. ) is a pattern in x if x=x,*, x:*: ...*. x. and a, lx, +a:lx:l +...+a.lx.I + a./, C(x, ,...,x. ) < Ixl, where C(x, ,...,x. ) is the complexity of computing x,*x,*...*x, and a, ,...,a., are nonnegative numbers. Also, the following concept will be of use: DEFINITION 8: The intensity in x of a ordered pair (y,z) such that y*z=x may be defined as IN[(y,z)lx] Ixl -/alyl bizl /cC(y,z)])/Ixl Obviously, this quantity is positive whenever (y,z) is a pattern in x, and negative or zero whenever it is not; and its maximum value is 1. 3. AN EXAMPLE: GEOMETRIC PATTERN Most of the present paper is devoted to Turing machines and binary sequences. However, the definition of pattern does not Involve the theory of computation; essentially, a pattern is a "representation as something simple’. Instead of Turlng machines and binary sequences let us now consider pictures. Suppose that A is a one inch square picture, and B is a five inch square picture made up of twenty.five non.overlapping one. inch pictures identical to A. Intuitively, it is simpler to represent B as an arrangement of copies of A, than it is to simply consider B as a "thing in itself’. Very roughly speaking, it would seem likely that part of the process of remembering what B looks like consists of representing B as an arrangement of copies of A. This Intuition may be expressed in terms of the definition of pattern. Where x and y are square regions, let: y *, z denote the region obtained by placing y to the right of z Y "2 z denote the region obtained by placing y to the left of z y z denote the region obtained by placing y below z y *, z denote the region obtained by placing y above z And, although this is obviously a very crude measure, let us define the complexity Ix of a square region with a black.and.white picture drawn in it as the proportion of the region covered with black. Also, let us assume that two pictures are identical if one can be obtained by a rigid motion of the other. *, may be called simple operations. * andsimple operations, such as the operation Compound operations are, then, compositions of (x*, w*, x)*, The operations *,, "2 w. If y is a compound operation, let us define its complexity Yl to be the length of the shortest program which computes the actual statement of the compound operation. For Instance, }(x* w* x)*, w is defined to be the length of the shortest program which outputs the sequence of symbols "(x* w* x)*, w’. Where y is a simple operation and z is a square region, let y*z denote the region that results from applying y to z. A compound operation acts on a number of square regions. For instance, (x*, w*, x)*, w acts on w and x both. We may consider it to act on the ordered pair (x,w). In general, we may consider a compound operation y to act on an ordered set of square regions (x, ,x, xn ), where x, is the letter that occurs first in the statement of y, x2 is the letter that occurs second, etc. And we may define y*(x, xn ) to be the region that results from applying the compound operation y to the ordered set of regions (x, x ). Let us return to the two pictures, A and B, discussed above. Let q=A*, A*, A*, A*, A. Then, it is easy to see that B=q*,q*,q*,q*,q. In other words, B (A*, A *, A*, A*, A)*,(A*, A*, A*, A*, A)*,(A*, A*, A*, A*, A)*, (A*, A*, A*, A*, A)*,(A*, A*, A*, A*, A). Where y is the compound operation given in the previous sentence, we have B=y*A. The complexity of that compound operation, lYl, is certainly very close to the length of the program "Let q=A *, A *, A *, A *, A; print q*,q*,q*,q*’. Note that this program is shorter than the program "Print(A*, A*, A*, A*, A)*,(A*, A*, A*, A*, A)*,(A*, A*, A*, A*, A)*(A*, A*, A*, A*, )*, (A*, A*, A*, A*, A)", so it is clear that the latter should not be used in the computation of Yl- We have not yet discussed the term C(y,(B, ,B. )), which represents the amount of effort required to execute the compound operation y on the regions (x, ,...,x. ). For simplicity’s sake, we shall simply set it equal to the number of times the symbol "*" appears in the statement of y; that is, to the number of simple operations involved in y. So, is (y,A) a pattern in B? Let us assume that the constants a, b and c are all equal to 1. We know y*A=B; the question is whether lYI / IAI /C(y,A) < IBI. According to the above definitions, Yl is 37 symbols long. Obviously this is a matter of the particular notation being used. For instance, it would be less if only one character were used to denote *,, and it would be more if it were written in binary code. C(y,z) is even easier to compute: there are 24 simple operations involved in the construction of B from A. So we have, very roughly speaking, 37 + Izl + 24 < Ixl. This is the inequality that must be satisfied ff (y,z) is to be considered a pattern in x. Rearranging, we find: Izl < Ixl 61. Recall that we defined the complexity of a region as the proportion of black which it contains. This means that (y,z) is a pattern in x if and only if it the amount of black required to draw B exceeds amount of black required to draw A by at least 62. Obviously, whether or not this is the case depends on the units of measurement. This is a very simple example, in that the compound operation y involves only one MEASURING S’rATIC COMPIEXI’IY 1)5 region. In general, we may define I(x, ,x, )1 {x,I +...+ Ixnl, assuming that the amount of black in a union of disjoint regions is the sum of the amounts of black in the individual regions. From this it follows that (y,(x, xn )) is a pattern in x if and only if a lYl + b(Ix, l/.../lx.I) / cC(y,(x, ,...,x.)) < Ixl. Results similar to these could also be obtained from a different sort of analysis. In order to deal with regions other than squares, it is desirable to replace *,, *,, %, with a single "joining" operation *, namely the the set.theoretic union U. Let z=(x,, xn ), let ** y be a Turing machine, let f be a method for converting a picture into a binary sequence, and let g be a method for converting a binary sequence into a picture. Then we have DEFINITION 9: If x U xn then (y,z,f,g) is a pattern in x if x, U x U alYl +blzl +clfl +dlgl +eC(y,z,f,g) < Ixl. We have not said how Ill and gl are to be defined. However, this would require a detailed consideration of the geometric space containing x, and that would take us too far afield. This general approach is somewhat similar to that taken in Chaitin [9]. 4. ORDERS OF COMPLEXITY It should be apparent from the foregoing that complexity and pattern are deeply interrelated. In this and the following sections, we shall explore several different approaches to measuring complexity, all of which seek to go beyond the simplistic KCS approach. According to the KCS approach, complexity means structurelessness. The most "random; least structured sequences are the most complex. The formulation of this approach was a great step forward. But it seems clear that the next step is to give formulas which capture more of the intuitive meaning of the word "complexity". First, we shall consider the idea that pattern Itself may be used to define complexity. Recall the geometric example of the previous section, in which the complexity of a black. and.white picture in a square region was defined as the amount of black required to draw gauge the effort required to represent a black. and.white picture in a square region. One way to measure the effort required to represent such a picture, call it x, is to look at all compound operations y, and all sets of square black.and.white pictures (x, ,...,x, ), such that y*(x, ,...,x. )=x. One may then ask which y and (x, x,) give the smalleMvalue of alYl + b(Ix, + + Ix.l) / cc(y,(x, ,...,x.)). Th#s minimal value of alYl + b(Ix, l+...+lx, I) may be defined to be the "second.order" complexity of x. The second.order complexity is then be a measure of how simply x can be represented in terms of compound operations on square regions. In general, given any complexity measure I, we may use this sort of reasoning to define a complexity measure I’. is a complexity measure, DEFINITION 10: If I’ is the complexity measure defined so that, for all x, Ixl’ is the smallest value that the quantity alYl + blzl + cC(y,z) takes on, for any (y,z) such that y*z=x. xl’ measures how complex the simplest representation of x is, where complexity and I’ will measure is measured by I. Sometimes, as in our geometric example, very different things. But It is not Impossible for them to be Identical. it. This measure did not even presume to B. GOERTZEI Extending this process, one can derive from that the quantity takes on, for any (y,z) such that y*z=x. I’ a measure I": the smallest value (4.1) complexity of the simplest alYl’ + blzl’ + cC(y,z) Ixl" measures the representation of x, where complexity is measured by I’. And from I", one may obtain It is clear that this process may be continued indefinitely. It is interesting to ask when and I’ are equivalent, or almost equivalent. For Instance, assume that y is a Turing machine, and x and are binary sequences. If, in the notation given above, we let I,, then Ix l’ is a natural measure of the complexity of a sequence x. In fact, if a=b= 1 and c=O,/t is exactly the KCS comp/ex/ty of x. Without specifying a, b and c, let us nonetheless use Chaitin’s [4] notation for this complexity: I(x). Also, let us adopt Chaitin’s notation I(vl w) for the complexity of v relative to w. DEFINITION 11. Let y be a Turing machine program, v and w binary sequences; then I(vlw) denotes the smallest value the quantity alYl,+cC, (y,w) takes on for any self. delimiting program y that computes v when its input consists of a minimal.length program for computing w. Intuitively, this measures how hard it is to compute v given complete knowledge of w. and I’ are not always substantially different: Finally, it should be noted that THEOREM 1. If Ixl’=l(x), a=b=l, and c=O, then there is some K so that for all x, lxl’- Ixl"l < K. PROOF: alYl’ + blzl’ + cC(y,z) lYI’ + Izl: So, what is the smallest value that lyl’ + Izl’ assumes for any (y,z) such that y*z=x? Clearly, this smallest value must be either equal to xl’. For, what ff YI’ + zl’ is bigger than xl’? Then it cannot be the smallest yl’ + zl; because ff one took z to be the "empty sequence" (the sequence consisting of no characters) and then took y to be the shortest program for computing x, one would have Izl’=o and lYl ’= Ixl: And, on the other hand, is it possible for lYI’+ Izl’ to be smaller than Ix ’? If Y I’+ z I’ were smaller than x, then ore could supply a Turing machine with a program saying "Plug the sequence z into the program y," and the length of this program would be greater than Ix l’ by no more than the length of the program P(y,z)=’Plug the sequence z into the program y". This length is the constant K in the theorem. a measure I’". COROLLARY 1. For a Turing machine for which the program P(y,z) mentioned in the proof is a "hardware function" which takes only one unit of length to program, ’’= I’PROOF: Both I’ and I" are Integer valued, and by the theorem, for any x, Ixl’-< Ixl"<- Ixl’+l. 5. PATTERNS IN PATTERNS; SUBSTITUTION MACHINES We have discussed pattern in sequences, and patterns in pictures. It is also quite possible to analyze patterns in other patterns. This is interesting for many reasons, one being that when dealing with machines more restricted than Turing machines, it may often be the case that the only way to express an intuitively simple phenomenon is as a pattern in another pattern. Let us consider a simple example. Suppose that we are dealing not with Turing machines, but rather with "substitution machines" machines which are capable of running only programs of the form P(A,B,C)=’Wherever sequence B occurs in sequence C, replace it with sequence A’. Instead of writing P(A,B,C) each time, we shall denote such a program with the symbol (A,B,C). For instance, (1,10001,1000110001100011000110001) = 11111. (A,B,C) should be read "substitute A for B in C’. We may define the complexity xl of a sequence x as the length of the sequence, Le. xl Ix I,, and the complexity yl of a substitution program y as the number of symbols required to express y in the form (A,B,C). Then, 110001100011000110001100011 =25, I1 5and I(OOO, ,z) = ,z=l, (ooo, ,z)= ooo ooo ooo OOOl lOOO. For example, is ((10001,1,z), 11111 ) a pattern in 1000110001100011000110001? What is required is that a(11) + b(5) + cC((10001,1,z), 11111) < 25. If we take a=b= 1 and c=O (thus Ignoring time complexity), this reduces to 11 + 5 < 25, so it is indeed a pattern. If we take c= 1 instead of c=O, and leave a and b equal to one, then this will still be a pattern, as long as the computational complexity of obtaining 1000110001100011000110001 from (10001,1,11111) does not exceed 9. It would seem most intuitive to assume that this computational complexity C((10001,1,z), 11111) is equal to 5, since there are 5 ones into which 10001 must be substituted, and there is no effort involved in locating these l’s. In that case the fundamental inequality reads 11 + 5 + 5 < 25, which verifies that a pattern is indeed present. Now, let us look at the sequence x 1001001001001001000111001 1001001001001001001011101110100100100100100100110111100100100100100100. Remember, we are not dealing with general Turing machines, we are only dealing with substitution machines, and anything which cannot be represented in the form (A,B,C), in the notation given above, is not a substitution machine. There are two obvious ways to compute this sequence x o0 a substitution machine. First of all, one can let y=(100100100100100100,B,z), and z= B 0111001 B 1011101110 B 110111 B. This amounts to recognizing that 100100100100100100 is repeated in x. Alternatively, one can let y’=(lOO, B,z’), and let z’= BBBBBB 0111001 BBBBBB 1011101110 BBBBBB 110111 BBBBBB. This amounts to recognizing that 100 is a pattern in x. Let us assume that a=b=l, and c=O. Then in the first case lYl + Izl 24 + 27 51; and in the second case ly’l / Iz’l 9 + 47 56. Since Ixl 95, both (y,z) and (y’,z’) are patterns inx. The problem is that, since we are only using substitution machines, there is no way to combine the two patterns. One may say that 100100100100100100 a pattern in x, that 100 is a pattern in x, that 100 is a pattern in 100100100100100100. BUt, using only substitution machines, there is no way to say that the simplest way to look at x is as "a form involving repetition of 100100100100100100, which is itse/f a repetition of 100. Let us first consider Ixl ’. It is not hard to see that, of all (y,z) such that y is a substitution machine and is a sequence, the minimum of Yl + zl is obtained when y=(100100100100100100,B,z), and z= B 0111001 B 1011101110 B 110111 B. Thus, assuming as we have that a=b=l and c=O, Ixl’=51. This is much less than Ixl, which equals 95. Now, let us consider this optimal y. It contains the sequence 100100100100100100. If we ignore the fact that y denotes a substitution machine, and simply consider the sequence of characters "(lO0100100100100100,B,z)’, we can search for patterns in this sequence, just as we would in any other sequence. For instance, if we let y,=(lOO, C,z, ), and z,=CCCCCC, then y,*z,=y, lY, =1o, and Iz, =6. It is apparent that (y, z, ) is a 10 + 6 pattern in y, since lY, + Iz, 18. By recognizing the 16, whereas lYl pattern (y,z) in x, and then recognizing the pattern (y, z, ) in y, one may express both the repetition of 100100100100100100 in x and the repetition of 100 in 100100100100100100 as patterns in x, using only substitution machines, Is (y, z, ) a pattern in x? Strictly speaking, it is not. But we might call it a secondlevel pattern in x. It is a pattern in a pattern in x. And, if there were a pattern (Y2 z2 ) in the sequences of symbols representing y, or z,, we could call that a third-level pattern in x, etc. In general, we may make the following definition: DEFINITION 12. Let F be a map from S into S. Where a first.level pattern in x is simply a pattern in x, and n is an integer greater than one, we shall say that P is an nh. level pattern in x if there is some Q so that P is an (n.1) h.level pattern in x and P is a pattern in F(Q). In the examples we have given, the map F has been, implicity, the map from substitution machines into their expression in (A,B,C) notation. 6. APPROXIMATE PATTERN Suppose that y,*z,=x, whereas y2*z2 does not equal x, but is still very close to x. Say Ixl =1ooo. Then, even if ly, l+lz, l=9OO and ly, + lz, =lO, (y, z, ) is not a pattern in x, but (y, z, ) is. This is not a flaw in the definition of pattern after all, computing near x is not the same as computing x. Indeed, it might seem that if (y= z ) something were really so close to computing x, it could be modified into a pattern in x without sacrificing much simplicity. However, the extent to which this is the case is unknown. In order to incorporate pairs like (y= z= ), we shall introduce the notion of approximate pattern. In order to deal with approximate pattern, we must assume that It is meaningful to talk about the distance d(x,y) between two elements of S. Let (y,z) be any ordered pair for which y*z is defined. Then we have DEFINITION 13. The ordered pair (y,z) is an approximate pattern in x if [ 1 + d(x,y*z) ][ alYl + blzl + cC(y,z) ] < Ixl, where a, b, c and C are defined as in the ordinary definition of pattern. Obviously, when x=y*z, the distance d(x,y*z) between x and y*z is equal to zero, and the definition of approximate pattern reduces to the normal definition. And the larger d(x,y*z) gets, the smaller alYl+blzl+cC(y,z) must be in order for (y,z) to qualify as a pattern in x. Of course, if the distance measure d is defined so that d(a,b) is infinite whenever a and b are not the same, then an approximate pattem is an exact pattern. This means that when one speaks of "approximate pattern’, one is also speaking of ordinary, exact pattern. Most concepts involving ordinary or "strict" pattern may be generalized to the case of approximate pattern. For instance, we have: DEFINITION 14: The intensity of an approximate pattern (y,z) in x is IN[(y,z)lx] = ( Ixl-[ +d(x,y*z)][alYl +blzl +cC(y,z)] )/Ix I. DEFINITION 15: Where v and w are binary sequences, the approximate of v relative to w, I.(v,w), is the smallest value that [1 +d(v,y*w)][a }y} +cC(y,w)] takes on for any program y with input consisting of a minimal program for w. The Incorporation of inexactitude permits the definition of pattern to encompass all sods of Interesting practical problems. For example, suppose x is a curve in the plane or some other space, z is a set of points in that space, and y is some interpolation formula which assigns to each set of points a curve passing through those points. Then I,[(y,z) ix] is an Indicator of how much use it is to approximate the curve x by applying the Interpolation formula y to the set of points z. 7. SOPHISTICATION AND CRUDITY As Indicated above, Koppel [8] has recently proposed an alternative to the KCS complexity measure. According to Koppel’s measure, the sequences which are most complex are not the structureless ones. Neither, of course, are they the ones with very simple structures, like O00(XX)tX)(X Rather, the more complex sequences are the ones with more "sophisticated" structures. The basic idea [10] is that a sequence with a sophisticated structure is part of a nabral class of sequences, all of which are computed by the same program. The program produces different sequences depending on the data it is given, but these sequences all possess the same underlying structure. Essentially, the program represents the structured pad of the sequence, and the data the random part. Therefore, the "sophistication" of a sequence x should be defined as the size of the program defining the "natural class" containing x. But how is this "natural" program to be found? As above, where y is a program and z is a binary sequence, let yl and zl denote the length of y and z respectively. Koppel proposes the following: ALGORITHM 1: 1) search over all pairs of binary sequences (y,z) for which the two-tape Turing machine with program y and data z computes x, and find those pairs for which yl / zl is smallest. 2) search over all pairs found in Step 1, and find the one for which yl is biggest. This value of zl is the "sophistication" of x. All the pairs found in Step I are "best" representations of x. Step 2 searches all the com best" representations of x, and find the one with the most program (as opposed to data). This program is assumed to be the natural structure of x, and its length is therefore taken as a measure of the sophistication of the structure of x. There is no doubt that the decomposition of a sequence into a structured part and a random part is an important and useful idea. But Koppel’s algorithm for achieving it is conceptually problematic. Suppose the program pairs (y, z, ) and (y, z2 ) both cause a Turing machine to output x, but whereas ly, l=50 and Iz, l=300, ly21 =250 and Iz, l=lO. Since lY, + Iz, =350, whereas lY=I + Iz=l =36o, (y= z= ) will not be selected in Step 1, which searches for those pairs (y,z) that minimize Yl + zl What if, in Step 2, (y, z, ) is chosen as the pair with maximum lYl ? Then the sophistication of x will be set at lY, =50. Does it not seem that the intuitively much more sophisticated program y,, which computes x almost as well as y,, should count toward the sophistication of x? In the language of pattern, what Koppel’s algorithm does is: 1) Locate the pairs (y,z) that are the most intense patterns in x according to It, a=b=l, c=0. 2) Among these pairs, select the one which is the most intense pattern in x according to I=1 I,, a=l, b=c=0. It applies two different special cases of the definition of pattern, one after the other. How can all this be modified to accomodate examples like the pairs (y, z, ), (y= z, ) given above? One approach is to look at some sort of combination of Yl + z with Yl- Yl + zl measures the combined length of program and data, and Yl measures the length of the program. What is desired is a small Yl + zl but a large Yl. This is some motivation for looking at (lYl + Izl)/lYl. The smaller lyl + Izl gets, the smallerthis quantity gets; and the bigger Yl gets, the smaller it gets. One approach to measuring complexity, then, is to search all (y,z) such that x=y*z, and pick the one which makes (lyl + Izl)/lYl smallest. Of course, (lYl + Izl)/lYl + Izl/lYl, so whatever makes (lYl + Izl)/lYl smallest also makes zl/lYl smallest. Hence, in this context, the following is natural: DEFINITION 16. The crudity of a pattern (y,z) is z / Yl. The crudity is simply the ratio of data to program. The cruder a pattern is, the greater the proportion of data to program. A very crude pattern is mostly data; and a pattern which is mostly program is not very crude. Obviously, "crudity" is Intended as an Intuitive opposite to "sophistication"; however, it is not exactly the opposite of "sophistication" as Koppel defined it. This approach can also be interpreted to assign each x a "natural program" and hence a "natural class’. One must simply look at the pattern (y,z) in x whose crudity is the smallest. The program y associated with this pattern is, in a sense, the most natural program for x. 8. LOGICAL DEPTH Bennett [9], as mentioned above, has proposed a complexity measure called "logical depth’, which incorporates the time factor in an interesting way. The KCS complexity of x measures only the length of the shortest program required for computing x it says nothing about how long this program takes to run. Is it really correct to call a sequence of length 1000 simple ff it can be computed by a short program which takes a thousand years to run? Bennett’s idea is to look at the running time of the shortest program for computing a sequence x. This quantity he calls the logical depth of the sequence. One of the motivations for this approach was a desire to capture the sense in which a biological organism is more complex than a random sequence. Indeed, it is easy to see that a sequence x with no patterns in it has the smallest logical depth of any sequence. The shortest program for computing it is "Print x", which obviously runs faster than any other program computing a sequence of the same length as x. And them is no mason to doubt the hypothesis that biological organisms have a high logical depth. But it seems to us that, in some ways, Bennett’s definition is nearly as counterintuitive as the KCS approach. Suppose them are two competing programs for computing x, program y and program y’. What if y has a length of 1000 and a running time of 10 minutes, but y’ has a length of 999 and a running time of 10 years. Then if y’ is the shortest program for computing x, the logical depth of x Is ten years. Intuitively, this doesn seem quite right: it Is not the case that x fundamentally requires ten years to compute. At the core of Bennett’s measure is the idea that the shortest program for computing x is the most natural representation of x. Otherwise why would the running time of this particular program be a meaningful measure of the amount of time x requires to evolve naturally. But one define the "most natural representation" of a given entity in many different ways. Bennett’s is only the simplest. For Instance, one may study the quantity dC(y,z) + elzl/lYl + f(lYl + Izl), where d, and f are positive constants defined so that d+e+f=3. The motivation for this is as follows. The smaller z I/lYl is, the less crude is the pattern (y,z). And, as Indicated above, the crudity of a pattern (y,z) may be Interpreted as a measure of how natural a representation it is. The smaller C(y,z) is, the less time it takes to get x out of (y,z). And, finally, the smaller lyl + Izl s, the more intense a pattern (y,z) is. All these facts suggest the following: DEFINITION 17: Let m denote the smallest value that the quantity dC(y,z) + elzl/lYl + f(lYl / Izl) assumes for any pair (y,z) such that x=y*z (assuming them is such a minimum value). The depth complexily of x may then be defined as the time complexity C(y,z) of the pattern (y,z) at which this minimum m is attained. Setting d=e=O reduces the depth complexity to the logical depth as Bennett defined it. Setting e=O means that everything is as Bennett’s definition would have it, except that cases such as the patterns (y, z, ), (y, z, ) described above are resolved in a more Intuitive matter. Setting f=O means that one is considering the time complexity of the moM sophistJcated least crude, most structured representation of x, rather than mere/y the shortest. And keeping all the constants nonzero ensures a balance between time, space, and sophistication. Admittedly, this approach is not nearly so tidy as Bennett’s. Its key shortcoming is Its failure to yield any particular number of crucial significance everything depends on various factors which may be glvsn various weights. But there Is something to be said for considering all the relevant factors. 9. STRUCTURE AND STRUCTURAL COMPLEXITY We have discussed several different measures of static complexity, which measure rather different things. But all these measures have one thing in common: they work by singling out the one pattern which minimizes some quantity. It is equally interesting to study the total amount of structure in an entity. For Instance, suppose x and x, both have KCS complexity A, but whereas x can only be computed by one program of length A, x, can be computed by a hundred totally different programs of length A. Does it not seem that x, is in some sense more complex than x, that there is more to x, than to x? Let us define the Mnecture of x as the set of all (y,z) which are approximate patterns in x (assuming the constants a, b, and c, and the metric d(v,w), have previously been fixed), and denote it P(x). Then the question is: what is a meaningful way to measure the At first one might think to add up the Intensities size of P(x) ? [l +d(Y*Z,X)][alYl +blzl +cC(y,z)] of all the elements in P(x). But this approach has one crucial flaw, revealed by the following example. Say x is a sequence of 10,000 characters, and (y, z, ) is a pattern in x with [z,I =70, lY, 1000, and C(y, z, )=2000. Suppose that y, computes the first 1000 digits of x from the first 7 digits of z, according to a certain algorithm A. And suppose It computes the second 1000 digits of x from the next 7 digits of z, according to the same algorithm A. And so on for the third 1000 digits of z,, etc. a/ways using the same algorithm A Next, consider the pair (y, z, ) which computes the first 9000 digits of x in the same manner as (Y2 z, ), but computes the last 1000 digits of x by storing them in z, and printing them after the rest of its program finishes. We have z, 1063, and surely Y, Is not much larger than Y, I. Let’s say y, 1o. Furthermore, C(y, z, ) is certainly no greater than C(y, z, ): after all, the change from (y, z, ) to (y, z, ) Involved the replacement of serious computation with simple storage and printing. The point is that both (y, z, ) and (y z, ) are patterns in x, but in computing the total amount of structure in x, it would be foolish to count both of them. In general, the problem is that different patterns may share similar components, and it is unacceptable to count each of these components several times. In the present example the solution is easy: don count (y, z2 ). BUt one may also construct examples of very different patterns which have a significant, sophisticated component in common. Clearly, what is needed is a general method of dealing with similarities between patterns. Recall that I.(v w) was defined as the approximate version of the effort required to compute v from a minimal program for w, so that if v and w have nothing in common, I.(v,w)=l.(v). And, on the other hand, if v and w have a large common component, then both I.(v,w) and I.(w,v) are very small. I.(vlw) is defined only when v and w are sequences. But we shall also need to talk about one program being similar to another. In order to do this, it suffices to assume some standard "programming language" L, which assigns to each program y a certain binary sequence L(y). The specifics of L are Irrelevant, so long !73 as it is computable on a Turing machine, and it does not assign the same sequence to any two different programs. The introduction of a programming language L permits us to define the complexity a program y as I.(L(y)), and to define the complexity of one program y, relative to of another program Y2 as I.(L(y, ) L(Y, )). As the lengths of the programs involved increase, the differences between programming languages matter less and less. To be precise, let L and L, be any two programming languages, computable on Turing machines. Then it can be shown that, as L(y, ) and L(y2 ) approach Infinity, the ratios I.(L(y, ))/I.(L,(y, )) and I.(L(y, ) L(Y= ))/I.(L, (y, ) L, (Y, )) both approach 1. Where is any binary sequence of length n, let D(z) be the binary sequence of length 2n obtained by replacing each 1 in with 01, and each 0 in with 10. Where w and are any two binary sequences, let wz denote the sequence obtained by placing the sequence 111 at the end of D(w), and placing D(z) at the end of this composite sequence. The point is that 111 cannot occur in either D(z) or D(w), so that wz is essentially w Juxtaposed with z, with 111 as a marker inbetween. Now, we may define the complexity of a program.data pair (y,z) as I. (L(y)z), and we may define the complexity of (y,z) relative to (y, z, ) as I. (L(y)z L(y, )z, ). We may define the complexity of (y,z) relative to a set of pairs {(y, z, ),(y,, z, ) (y,, z, )} to be I. (L(y)z L(y, )z,L(y, )z, ...L(y, )z, ). This is the tool we need to make sense of the phrase the total amount of structure of Let S be any set of program-data pairs (x,y). Then we may define the size IS of S as the result of the following process: ALGORITHM 2: Step O. Make a list of all the patterns in S, and label them (y, z, ), (y, z, ), (y. z. ). Step 1. Let s,(x)=l.(L(y, )z, ) Step 2. Let s,(x)=s,(x)+l.(L(y, )z, )l(L(y, )z, ) Step 3. Let s,(x) =s,(x) + l.(L(y, )z, L(y, )z,L(y, )z, )) Step 4. Let s,(x)=s,(x)+l.(L(y, )z, lL(y, )z,L(y, )z, )L(y, )z, ))... Step N. Let ISl =s(x)=s,,(x)+l.(L(y, )z, lL(y, )z,L(y, )z, )...i.(y,, )z,, ) At the kh step, only that portion of (y, z, ) which is independent of {(y, z, ), (y,., ,z,., )} is added onto the current estimate of Sl. For instance, in Step 2, if (y, z, ) is independent of (y, z, ), then this step increases the initial estimate of Sl by the complexity of (y, z, ). But if (y, z, ) is highly dependent on (y, z, ), not much will be added onto the first estimate. It is not difficult to see that this process will arrive at the same answer regardless of the order in which the (y, z, ) appear: THEOREM 2: The resu/t of Algorithm 2 is invariant under permutation of the (y, ,z, ). Where P(x) is the set of all patterns in x, we may now define the structural complexity of x to be the quantity P(x) I. This, we suggest, is the sense of the word complexity" that one uses when one says that a person is more complex than a tree, which is more complex than a bacterium. In a way, structural complexity measures how many insightful statements can possibly be made about something. Them is much more to say about a person than about a tree, and much more to say about a tree than a bacterium. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png

Measuring static complexity

Aug 23, 2007
Free
14 pages

Loading next page...
/lp/hindawi-publishing-corporation/measuring-static-complexity-dqdd0RDxpi
Publisher
Hindawi Publishing Corporation
Copyright
Copyright © 1992 Hindawi Publishing Corporation.
Publisher site
See Article on Publisher Site

Abstract

University of Nevada, Las Vegas Las Vegas NV 89154 (Received August 17, 1990 and in revised form March 1, 1991} ABSTRACT: The concept of "pattern" is introduced, formally defined, and used to analyze various measures of the complexity of finite binary sequences and other objects. The standard Kolmogoroff.Chaitin.Solomonoff complexity measure is considered, along with Bennett’s ’logical depth; Koppel’s "sophistication’, and Chaitin’s analysis of the complexity of geometric objects. The pattern.theoretic point of view illuminates the shortcomings of these measures and leads to specific improvements, it giles rise to two novel mathematical concepts "orders" of complexity and "levels" of pattern, and it yields a new measure of complexity, the "structural complexity’, which measures the total amount of structure an entity possesses. KEY WORDS AND PHRASES. Kolmogorov complexity, algorithritic Information, pattern, sophistication, structure, depth AMS SUBJECT CLASSIFICATION CODE. 68Q30 1. INTRODUCTION Different contexts require different concepts of "complexity’. In the theory of computational complexity, as outlined for instance by Kronsjo [1], one deals with the complexity of problems. And the complexity of evolving systems falls under the aegis of dynamical systems theory, as considered for example by Bowen [2]. The present paper, however, is concerned with the complexity of static objects, a subject which has receiled rather little attention. Although most of the discussion focuses on binary sequences, the implications are much more general. The first mathematically precise measure of the complexity of static objects was invented simultaneously by Kolmogorov [3], Chaitin [4] and Solomonoff [5]. DEFINITION 1. Let M be a universal Turing machine. Let us say that a program for M is serf.delimiting if it contains a segment telling M its total length in bits. Then, the KCS complexity of a finite binary sequence x is the length of the shortest program which computes x on M. In the decades since its conception, this definition has led to a number of Interesting developments. Chaitin [4] has used it to provide an Interesting new proof of Godel’s theorem; and Bennett [6], Zurek [7] and others have applied it to problems in thermodynamics. However, it has Increasingly been realized that the concept of KCS complexity falls to capture the Intuitive meaning of "complexity." The problem is that, according to the KCS definition, "random", structureless sequences are judged the most complex. The least complex sequences are those like O(X)IX)O000...O00, 010101010101...010101, and 1010010001000010(XXX)...O, which can be computed by very short programs. And the most complex sequences x are those which cannot be computed by any program shorter than "print X’. There is a sense in which this is not a desirable property for a definition of complexity to have In which a human or a ree or the sequence of prime numbers is more "complex" than a random sequence. Over the past decade, there have been two noteworthy attempts to remedy this deficiency: Bennett’s [6] "logical depth", and Koppel’s [8] "sophistication." We outline a general mathematical framework within which various measures of complexity may be formulated, analyzed and compared. This approach yields significant modifications of these measures, as well as several novel, general concepts for the analysis of complexity. Furthermore, it giles rise to an entirely new complexity measure, the "structural complexity’, which measures the total amount of structure an entity possesses. Intuitilely, this tells one "how much there is to say" about a gilen object. 2. PATTERN DEFINITION 2. A pattern space is a set (S,*,l I), where S is a set, * is a binary is a map from S into the nonnegatile operation defined on some subset of SxS, and real numbers. Let us consider a simple example: Turing machines and finite binary sequences. DEFINITION 3. Let y be a program for a unilersal Turing machine; let be a finite binary sequence. Define y*z to be the binary sequence which appears on the memory tape of the Turing machine after, having been started with on its input tape beginning directly under the tape head and extending to the right, program y finishes running. If y never Mops running, then let y*z be undefined. Let zl denote the length of as Its length, and let yl denote the length of the program y. Now we are ready to gile a general definItion of pattern. DEFINITION 4. Let a, b, and c denote constant, nonnegative numbers. Then an ordered pair (y,z) is a paftern In x if x=y*z and alYl + blzl + cC(y,z) < Ixl, where C(y,z) denotes the complexity of obtaining x from (y,z). DEFINITION 6. ff y is a Turing machine program and is a finIte binary sequence, C(y,z) denotes the number of time steps which the Turlng machine takes to stop when equipped with program y and given z as initial input. For many purposes, the numbers a, b and c are not Important. Often they can all be taken to equal 1, so that they do not appear in the formula at all. But in some cases it may be useful to, for instance, set a=b= 1 and c=O. Then the formula reads Yl + Izl Ixl. and The constants could be dispensed with, but then it would be necessary to redefine C more often. Intuitively, an ordered pair (y,z) is a pattern in x if the complexity of y, plus the complexity of z, plus the complexity of getting x out of y and z, is less than the complexity of x. In other words, an ordered pair (y,z) is a pattern in x if it is simpler to represent x in terms of y and z than it is to say "x’. The constants a, b and c are, of course, weights: If a=3/4 and b=5/4, for example, then the complexity of a is counted less than the complexity of b. The definition of pattern can be generalized to ordered n.tuples, and to take into account the possibility of different kinds of combination, say *, and *: DEFINITION 7: An ordered set of n entities (x, ,x2 ,...,x. ) is a pattern in x if x=x,*, x:*: ...*. x. and a, lx, +a:lx:l +...+a.lx.I + a./, C(x, ,...,x. ) < Ixl, where C(x, ,...,x. ) is the complexity of computing x,*x,*...*x, and a, ,...,a., are nonnegative numbers. Also, the following concept will be of use: DEFINITION 8: The intensity in x of a ordered pair (y,z) such that y*z=x may be defined as IN[(y,z)lx] Ixl -/alyl bizl /cC(y,z)])/Ixl Obviously, this quantity is positive whenever (y,z) is a pattern in x, and negative or zero whenever it is not; and its maximum value is 1. 3. AN EXAMPLE: GEOMETRIC PATTERN Most of the present paper is devoted to Turing machines and binary sequences. However, the definition of pattern does not Involve the theory of computation; essentially, a pattern is a "representation as something simple’. Instead of Turlng machines and binary sequences let us now consider pictures. Suppose that A is a one inch square picture, and B is a five inch square picture made up of twenty.five non.overlapping one. inch pictures identical to A. Intuitively, it is simpler to represent B as an arrangement of copies of A, than it is to simply consider B as a "thing in itself’. Very roughly speaking, it would seem likely that part of the process of remembering what B looks like consists of representing B as an arrangement of copies of A. This Intuition may be expressed in terms of the definition of pattern. Where x and y are square regions, let: y *, z denote the region obtained by placing y to the right of z Y "2 z denote the region obtained by placing y to the left of z y z denote the region obtained by placing y below z y *, z denote the region obtained by placing y above z And, although this is obviously a very crude measure, let us define the complexity Ix of a square region with a black.and.white picture drawn in it as the proportion of the region covered with black. Also, let us assume that two pictures are identical if one can be obtained by a rigid motion of the other. *, may be called simple operations. * andsimple operations, such as the operation Compound operations are, then, compositions of (x*, w*, x)*, The operations *,, "2 w. If y is a compound operation, let us define its complexity Yl to be the length of the shortest program which computes the actual statement of the compound operation. For Instance, }(x* w* x)*, w is defined to be the length of the shortest program which outputs the sequence of symbols "(x* w* x)*, w’. Where y is a simple operation and z is a square region, let y*z denote the region that results from applying y to z. A compound operation acts on a number of square regions. For instance, (x*, w*, x)*, w acts on w and x both. We may consider it to act on the ordered pair (x,w). In general, we may consider a compound operation y to act on an ordered set of square regions (x, ,x, xn ), where x, is the letter that occurs first in the statement of y, x2 is the letter that occurs second, etc. And we may define y*(x, xn ) to be the region that results from applying the compound operation y to the ordered set of regions (x, x ). Let us return to the two pictures, A and B, discussed above. Let q=A*, A*, A*, A*, A. Then, it is easy to see that B=q*,q*,q*,q*,q. In other words, B (A*, A *, A*, A*, A)*,(A*, A*, A*, A*, A)*,(A*, A*, A*, A*, A)*, (A*, A*, A*, A*, A)*,(A*, A*, A*, A*, A). Where y is the compound operation given in the previous sentence, we have B=y*A. The complexity of that compound operation, lYl, is certainly very close to the length of the program "Let q=A *, A *, A *, A *, A; print q*,q*,q*,q*’. Note that this program is shorter than the program "Print(A*, A*, A*, A*, A)*,(A*, A*, A*, A*, A)*,(A*, A*, A*, A*, A)*(A*, A*, A*, A*, )*, (A*, A*, A*, A*, A)", so it is clear that the latter should not be used in the computation of Yl- We have not yet discussed the term C(y,(B, ,B. )), which represents the amount of effort required to execute the compound operation y on the regions (x, ,...,x. ). For simplicity’s sake, we shall simply set it equal to the number of times the symbol "*" appears in the statement of y; that is, to the number of simple operations involved in y. So, is (y,A) a pattern in B? Let us assume that the constants a, b and c are all equal to 1. We know y*A=B; the question is whether lYI / IAI /C(y,A) < IBI. According to the above definitions, Yl is 37 symbols long. Obviously this is a matter of the particular notation being used. For instance, it would be less if only one character were used to denote *,, and it would be more if it were written in binary code. C(y,z) is even easier to compute: there are 24 simple operations involved in the construction of B from A. So we have, very roughly speaking, 37 + Izl + 24 < Ixl. This is the inequality that must be satisfied ff (y,z) is to be considered a pattern in x. Rearranging, we find: Izl < Ixl 61. Recall that we defined the complexity of a region as the proportion of black which it contains. This means that (y,z) is a pattern in x if and only if it the amount of black required to draw B exceeds amount of black required to draw A by at least 62. Obviously, whether or not this is the case depends on the units of measurement. This is a very simple example, in that the compound operation y involves only one MEASURING S’rATIC COMPIEXI’IY 1)5 region. In general, we may define I(x, ,x, )1 {x,I +...+ Ixnl, assuming that the amount of black in a union of disjoint regions is the sum of the amounts of black in the individual regions. From this it follows that (y,(x, xn )) is a pattern in x if and only if a lYl + b(Ix, l/.../lx.I) / cC(y,(x, ,...,x.)) < Ixl. Results similar to these could also be obtained from a different sort of analysis. In order to deal with regions other than squares, it is desirable to replace *,, *,, %, with a single "joining" operation *, namely the the set.theoretic union U. Let z=(x,, xn ), let ** y be a Turing machine, let f be a method for converting a picture into a binary sequence, and let g be a method for converting a binary sequence into a picture. Then we have DEFINITION 9: If x U xn then (y,z,f,g) is a pattern in x if x, U x U alYl +blzl +clfl +dlgl +eC(y,z,f,g) < Ixl. We have not said how Ill and gl are to be defined. However, this would require a detailed consideration of the geometric space containing x, and that would take us too far afield. This general approach is somewhat similar to that taken in Chaitin [9]. 4. ORDERS OF COMPLEXITY It should be apparent from the foregoing that complexity and pattern are deeply interrelated. In this and the following sections, we shall explore several different approaches to measuring complexity, all of which seek to go beyond the simplistic KCS approach. According to the KCS approach, complexity means structurelessness. The most "random; least structured sequences are the most complex. The formulation of this approach was a great step forward. But it seems clear that the next step is to give formulas which capture more of the intuitive meaning of the word "complexity". First, we shall consider the idea that pattern Itself may be used to define complexity. Recall the geometric example of the previous section, in which the complexity of a black. and.white picture in a square region was defined as the amount of black required to draw gauge the effort required to represent a black. and.white picture in a square region. One way to measure the effort required to represent such a picture, call it x, is to look at all compound operations y, and all sets of square black.and.white pictures (x, ,...,x, ), such that y*(x, ,...,x. )=x. One may then ask which y and (x, x,) give the smalleMvalue of alYl + b(Ix, + + Ix.l) / cc(y,(x, ,...,x.)). Th#s minimal value of alYl + b(Ix, l+...+lx, I) may be defined to be the "second.order" complexity of x. The second.order complexity is then be a measure of how simply x can be represented in terms of compound operations on square regions. In general, given any complexity measure I, we may use this sort of reasoning to define a complexity measure I’. is a complexity measure, DEFINITION 10: If I’ is the complexity measure defined so that, for all x, Ixl’ is the smallest value that the quantity alYl + blzl + cC(y,z) takes on, for any (y,z) such that y*z=x. xl’ measures how complex the simplest representation of x is, where complexity and I’ will measure is measured by I. Sometimes, as in our geometric example, very different things. But It is not Impossible for them to be Identical. it. This measure did not even presume to B. GOERTZEI Extending this process, one can derive from that the quantity takes on, for any (y,z) such that y*z=x. I’ a measure I": the smallest value (4.1) complexity of the simplest alYl’ + blzl’ + cC(y,z) Ixl" measures the representation of x, where complexity is measured by I’. And from I", one may obtain It is clear that this process may be continued indefinitely. It is interesting to ask when and I’ are equivalent, or almost equivalent. For Instance, assume that y is a Turing machine, and x and are binary sequences. If, in the notation given above, we let I,, then Ix l’ is a natural measure of the complexity of a sequence x. In fact, if a=b= 1 and c=O,/t is exactly the KCS comp/ex/ty of x. Without specifying a, b and c, let us nonetheless use Chaitin’s [4] notation for this complexity: I(x). Also, let us adopt Chaitin’s notation I(vl w) for the complexity of v relative to w. DEFINITION 11. Let y be a Turing machine program, v and w binary sequences; then I(vlw) denotes the smallest value the quantity alYl,+cC, (y,w) takes on for any self. delimiting program y that computes v when its input consists of a minimal.length program for computing w. Intuitively, this measures how hard it is to compute v given complete knowledge of w. and I’ are not always substantially different: Finally, it should be noted that THEOREM 1. If Ixl’=l(x), a=b=l, and c=O, then there is some K so that for all x, lxl’- Ixl"l < K. PROOF: alYl’ + blzl’ + cC(y,z) lYI’ + Izl: So, what is the smallest value that lyl’ + Izl’ assumes for any (y,z) such that y*z=x? Clearly, this smallest value must be either equal to xl’. For, what ff YI’ + zl’ is bigger than xl’? Then it cannot be the smallest yl’ + zl; because ff one took z to be the "empty sequence" (the sequence consisting of no characters) and then took y to be the shortest program for computing x, one would have Izl’=o and lYl ’= Ixl: And, on the other hand, is it possible for lYI’+ Izl’ to be smaller than Ix ’? If Y I’+ z I’ were smaller than x, then ore could supply a Turing machine with a program saying "Plug the sequence z into the program y," and the length of this program would be greater than Ix l’ by no more than the length of the program P(y,z)=’Plug the sequence z into the program y". This length is the constant K in the theorem. a measure I’". COROLLARY 1. For a Turing machine for which the program P(y,z) mentioned in the proof is a "hardware function" which takes only one unit of length to program, ’’= I’PROOF: Both I’ and I" are Integer valued, and by the theorem, for any x, Ixl’-< Ixl"<- Ixl’+l. 5. PATTERNS IN PATTERNS; SUBSTITUTION MACHINES We have discussed pattern in sequences, and patterns in pictures. It is also quite possible to analyze patterns in other patterns. This is interesting for many reasons, one being that when dealing with machines more restricted than Turing machines, it may often be the case that the only way to express an intuitively simple phenomenon is as a pattern in another pattern. Let us consider a simple example. Suppose that we are dealing not with Turing machines, but rather with "substitution machines" machines which are capable of running only programs of the form P(A,B,C)=’Wherever sequence B occurs in sequence C, replace it with sequence A’. Instead of writing P(A,B,C) each time, we shall denote such a program with the symbol (A,B,C). For instance, (1,10001,1000110001100011000110001) = 11111. (A,B,C) should be read "substitute A for B in C’. We may define the complexity xl of a sequence x as the length of the sequence, Le. xl Ix I,, and the complexity yl of a substitution program y as the number of symbols required to express y in the form (A,B,C). Then, 110001100011000110001100011 =25, I1 5and I(OOO, ,z) = ,z=l, (ooo, ,z)= ooo ooo ooo OOOl lOOO. For example, is ((10001,1,z), 11111 ) a pattern in 1000110001100011000110001? What is required is that a(11) + b(5) + cC((10001,1,z), 11111) < 25. If we take a=b= 1 and c=O (thus Ignoring time complexity), this reduces to 11 + 5 < 25, so it is indeed a pattern. If we take c= 1 instead of c=O, and leave a and b equal to one, then this will still be a pattern, as long as the computational complexity of obtaining 1000110001100011000110001 from (10001,1,11111) does not exceed 9. It would seem most intuitive to assume that this computational complexity C((10001,1,z), 11111) is equal to 5, since there are 5 ones into which 10001 must be substituted, and there is no effort involved in locating these l’s. In that case the fundamental inequality reads 11 + 5 + 5 < 25, which verifies that a pattern is indeed present. Now, let us look at the sequence x 1001001001001001000111001 1001001001001001001011101110100100100100100100110111100100100100100100. Remember, we are not dealing with general Turing machines, we are only dealing with substitution machines, and anything which cannot be represented in the form (A,B,C), in the notation given above, is not a substitution machine. There are two obvious ways to compute this sequence x o0 a substitution machine. First of all, one can let y=(100100100100100100,B,z), and z= B 0111001 B 1011101110 B 110111 B. This amounts to recognizing that 100100100100100100 is repeated in x. Alternatively, one can let y’=(lOO, B,z’), and let z’= BBBBBB 0111001 BBBBBB 1011101110 BBBBBB 110111 BBBBBB. This amounts to recognizing that 100 is a pattern in x. Let us assume that a=b=l, and c=O. Then in the first case lYl + Izl 24 + 27 51; and in the second case ly’l / Iz’l 9 + 47 56. Since Ixl 95, both (y,z) and (y’,z’) are patterns inx. The problem is that, since we are only using substitution machines, there is no way to combine the two patterns. One may say that 100100100100100100 a pattern in x, that 100 is a pattern in x, that 100 is a pattern in 100100100100100100. BUt, using only substitution machines, there is no way to say that the simplest way to look at x is as "a form involving repetition of 100100100100100100, which is itse/f a repetition of 100. Let us first consider Ixl ’. It is not hard to see that, of all (y,z) such that y is a substitution machine and is a sequence, the minimum of Yl + zl is obtained when y=(100100100100100100,B,z), and z= B 0111001 B 1011101110 B 110111 B. Thus, assuming as we have that a=b=l and c=O, Ixl’=51. This is much less than Ixl, which equals 95. Now, let us consider this optimal y. It contains the sequence 100100100100100100. If we ignore the fact that y denotes a substitution machine, and simply consider the sequence of characters "(lO0100100100100100,B,z)’, we can search for patterns in this sequence, just as we would in any other sequence. For instance, if we let y,=(lOO, C,z, ), and z,=CCCCCC, then y,*z,=y, lY, =1o, and Iz, =6. It is apparent that (y, z, ) is a 10 + 6 pattern in y, since lY, + Iz, 18. By recognizing the 16, whereas lYl pattern (y,z) in x, and then recognizing the pattern (y, z, ) in y, one may express both the repetition of 100100100100100100 in x and the repetition of 100 in 100100100100100100 as patterns in x, using only substitution machines, Is (y, z, ) a pattern in x? Strictly speaking, it is not. But we might call it a secondlevel pattern in x. It is a pattern in a pattern in x. And, if there were a pattern (Y2 z2 ) in the sequences of symbols representing y, or z,, we could call that a third-level pattern in x, etc. In general, we may make the following definition: DEFINITION 12. Let F be a map from S into S. Where a first.level pattern in x is simply a pattern in x, and n is an integer greater than one, we shall say that P is an nh. level pattern in x if there is some Q so that P is an (n.1) h.level pattern in x and P is a pattern in F(Q). In the examples we have given, the map F has been, implicity, the map from substitution machines into their expression in (A,B,C) notation. 6. APPROXIMATE PATTERN Suppose that y,*z,=x, whereas y2*z2 does not equal x, but is still very close to x. Say Ixl =1ooo. Then, even if ly, l+lz, l=9OO and ly, + lz, =lO, (y, z, ) is not a pattern in x, but (y, z, ) is. This is not a flaw in the definition of pattern after all, computing near x is not the same as computing x. Indeed, it might seem that if (y= z ) something were really so close to computing x, it could be modified into a pattern in x without sacrificing much simplicity. However, the extent to which this is the case is unknown. In order to incorporate pairs like (y= z= ), we shall introduce the notion of approximate pattern. In order to deal with approximate pattern, we must assume that It is meaningful to talk about the distance d(x,y) between two elements of S. Let (y,z) be any ordered pair for which y*z is defined. Then we have DEFINITION 13. The ordered pair (y,z) is an approximate pattern in x if [ 1 + d(x,y*z) ][ alYl + blzl + cC(y,z) ] < Ixl, where a, b, c and C are defined as in the ordinary definition of pattern. Obviously, when x=y*z, the distance d(x,y*z) between x and y*z is equal to zero, and the definition of approximate pattern reduces to the normal definition. And the larger d(x,y*z) gets, the smaller alYl+blzl+cC(y,z) must be in order for (y,z) to qualify as a pattern in x. Of course, if the distance measure d is defined so that d(a,b) is infinite whenever a and b are not the same, then an approximate pattem is an exact pattern. This means that when one speaks of "approximate pattern’, one is also speaking of ordinary, exact pattern. Most concepts involving ordinary or "strict" pattern may be generalized to the case of approximate pattern. For instance, we have: DEFINITION 14: The intensity of an approximate pattern (y,z) in x is IN[(y,z)lx] = ( Ixl-[ +d(x,y*z)][alYl +blzl +cC(y,z)] )/Ix I. DEFINITION 15: Where v and w are binary sequences, the approximate of v relative to w, I.(v,w), is the smallest value that [1 +d(v,y*w)][a }y} +cC(y,w)] takes on for any program y with input consisting of a minimal program for w. The Incorporation of inexactitude permits the definition of pattern to encompass all sods of Interesting practical problems. For example, suppose x is a curve in the plane or some other space, z is a set of points in that space, and y is some interpolation formula which assigns to each set of points a curve passing through those points. Then I,[(y,z) ix] is an Indicator of how much use it is to approximate the curve x by applying the Interpolation formula y to the set of points z. 7. SOPHISTICATION AND CRUDITY As Indicated above, Koppel [8] has recently proposed an alternative to the KCS complexity measure. According to Koppel’s measure, the sequences which are most complex are not the structureless ones. Neither, of course, are they the ones with very simple structures, like O00(XX)tX)(X Rather, the more complex sequences are the ones with more "sophisticated" structures. The basic idea [10] is that a sequence with a sophisticated structure is part of a nabral class of sequences, all of which are computed by the same program. The program produces different sequences depending on the data it is given, but these sequences all possess the same underlying structure. Essentially, the program represents the structured pad of the sequence, and the data the random part. Therefore, the "sophistication" of a sequence x should be defined as the size of the program defining the "natural class" containing x. But how is this "natural" program to be found? As above, where y is a program and z is a binary sequence, let yl and zl denote the length of y and z respectively. Koppel proposes the following: ALGORITHM 1: 1) search over all pairs of binary sequences (y,z) for which the two-tape Turing machine with program y and data z computes x, and find those pairs for which yl / zl is smallest. 2) search over all pairs found in Step 1, and find the one for which yl is biggest. This value of zl is the "sophistication" of x. All the pairs found in Step I are "best" representations of x. Step 2 searches all the com best" representations of x, and find the one with the most program (as opposed to data). This program is assumed to be the natural structure of x, and its length is therefore taken as a measure of the sophistication of the structure of x. There is no doubt that the decomposition of a sequence into a structured part and a random part is an important and useful idea. But Koppel’s algorithm for achieving it is conceptually problematic. Suppose the program pairs (y, z, ) and (y, z2 ) both cause a Turing machine to output x, but whereas ly, l=50 and Iz, l=300, ly21 =250 and Iz, l=lO. Since lY, + Iz, =350, whereas lY=I + Iz=l =36o, (y= z= ) will not be selected in Step 1, which searches for those pairs (y,z) that minimize Yl + zl What if, in Step 2, (y, z, ) is chosen as the pair with maximum lYl ? Then the sophistication of x will be set at lY, =50. Does it not seem that the intuitively much more sophisticated program y,, which computes x almost as well as y,, should count toward the sophistication of x? In the language of pattern, what Koppel’s algorithm does is: 1) Locate the pairs (y,z) that are the most intense patterns in x according to It, a=b=l, c=0. 2) Among these pairs, select the one which is the most intense pattern in x according to I=1 I,, a=l, b=c=0. It applies two different special cases of the definition of pattern, one after the other. How can all this be modified to accomodate examples like the pairs (y, z, ), (y= z, ) given above? One approach is to look at some sort of combination of Yl + z with Yl- Yl + zl measures the combined length of program and data, and Yl measures the length of the program. What is desired is a small Yl + zl but a large Yl. This is some motivation for looking at (lYl + Izl)/lYl. The smaller lyl + Izl gets, the smallerthis quantity gets; and the bigger Yl gets, the smaller it gets. One approach to measuring complexity, then, is to search all (y,z) such that x=y*z, and pick the one which makes (lyl + Izl)/lYl smallest. Of course, (lYl + Izl)/lYl + Izl/lYl, so whatever makes (lYl + Izl)/lYl smallest also makes zl/lYl smallest. Hence, in this context, the following is natural: DEFINITION 16. The crudity of a pattern (y,z) is z / Yl. The crudity is simply the ratio of data to program. The cruder a pattern is, the greater the proportion of data to program. A very crude pattern is mostly data; and a pattern which is mostly program is not very crude. Obviously, "crudity" is Intended as an Intuitive opposite to "sophistication"; however, it is not exactly the opposite of "sophistication" as Koppel defined it. This approach can also be interpreted to assign each x a "natural program" and hence a "natural class’. One must simply look at the pattern (y,z) in x whose crudity is the smallest. The program y associated with this pattern is, in a sense, the most natural program for x. 8. LOGICAL DEPTH Bennett [9], as mentioned above, has proposed a complexity measure called "logical depth’, which incorporates the time factor in an interesting way. The KCS complexity of x measures only the length of the shortest program required for computing x it says nothing about how long this program takes to run. Is it really correct to call a sequence of length 1000 simple ff it can be computed by a short program which takes a thousand years to run? Bennett’s idea is to look at the running time of the shortest program for computing a sequence x. This quantity he calls the logical depth of the sequence. One of the motivations for this approach was a desire to capture the sense in which a biological organism is more complex than a random sequence. Indeed, it is easy to see that a sequence x with no patterns in it has the smallest logical depth of any sequence. The shortest program for computing it is "Print x", which obviously runs faster than any other program computing a sequence of the same length as x. And them is no mason to doubt the hypothesis that biological organisms have a high logical depth. But it seems to us that, in some ways, Bennett’s definition is nearly as counterintuitive as the KCS approach. Suppose them are two competing programs for computing x, program y and program y’. What if y has a length of 1000 and a running time of 10 minutes, but y’ has a length of 999 and a running time of 10 years. Then if y’ is the shortest program for computing x, the logical depth of x Is ten years. Intuitively, this doesn seem quite right: it Is not the case that x fundamentally requires ten years to compute. At the core of Bennett’s measure is the idea that the shortest program for computing x is the most natural representation of x. Otherwise why would the running time of this particular program be a meaningful measure of the amount of time x requires to evolve naturally. But one define the "most natural representation" of a given entity in many different ways. Bennett’s is only the simplest. For Instance, one may study the quantity dC(y,z) + elzl/lYl + f(lYl + Izl), where d, and f are positive constants defined so that d+e+f=3. The motivation for this is as follows. The smaller z I/lYl is, the less crude is the pattern (y,z). And, as Indicated above, the crudity of a pattern (y,z) may be Interpreted as a measure of how natural a representation it is. The smaller C(y,z) is, the less time it takes to get x out of (y,z). And, finally, the smaller lyl + Izl s, the more intense a pattern (y,z) is. All these facts suggest the following: DEFINITION 17: Let m denote the smallest value that the quantity dC(y,z) + elzl/lYl + f(lYl / Izl) assumes for any pair (y,z) such that x=y*z (assuming them is such a minimum value). The depth complexily of x may then be defined as the time complexity C(y,z) of the pattern (y,z) at which this minimum m is attained. Setting d=e=O reduces the depth complexity to the logical depth as Bennett defined it. Setting e=O means that everything is as Bennett’s definition would have it, except that cases such as the patterns (y, z, ), (y, z, ) described above are resolved in a more Intuitive matter. Setting f=O means that one is considering the time complexity of the moM sophistJcated least crude, most structured representation of x, rather than mere/y the shortest. And keeping all the constants nonzero ensures a balance between time, space, and sophistication. Admittedly, this approach is not nearly so tidy as Bennett’s. Its key shortcoming is Its failure to yield any particular number of crucial significance everything depends on various factors which may be glvsn various weights. But there Is something to be said for considering all the relevant factors. 9. STRUCTURE AND STRUCTURAL COMPLEXITY We have discussed several different measures of static complexity, which measure rather different things. But all these measures have one thing in common: they work by singling out the one pattern which minimizes some quantity. It is equally interesting to study the total amount of structure in an entity. For Instance, suppose x and x, both have KCS complexity A, but whereas x can only be computed by one program of length A, x, can be computed by a hundred totally different programs of length A. Does it not seem that x, is in some sense more complex than x, that there is more to x, than to x? Let us define the Mnecture of x as the set of all (y,z) which are approximate patterns in x (assuming the constants a, b, and c, and the metric d(v,w), have previously been fixed), and denote it P(x). Then the question is: what is a meaningful way to measure the At first one might think to add up the Intensities size of P(x) ? [l +d(Y*Z,X)][alYl +blzl +cC(y,z)] of all the elements in P(x). But this approach has one crucial flaw, revealed by the following example. Say x is a sequence of 10,000 characters, and (y, z, ) is a pattern in x with [z,I =70, lY, 1000, and C(y, z, )=2000. Suppose that y, computes the first 1000 digits of x from the first 7 digits of z, according to a certain algorithm A. And suppose It computes the second 1000 digits of x from the next 7 digits of z, according to the same algorithm A. And so on for the third 1000 digits of z,, etc. a/ways using the same algorithm A Next, consider the pair (y, z, ) which computes the first 9000 digits of x in the same manner as (Y2 z, ), but computes the last 1000 digits of x by storing them in z, and printing them after the rest of its program finishes. We have z, 1063, and surely Y, Is not much larger than Y, I. Let’s say y, 1o. Furthermore, C(y, z, ) is certainly no greater than C(y, z, ): after all, the change from (y, z, ) to (y, z, ) Involved the replacement of serious computation with simple storage and printing. The point is that both (y, z, ) and (y z, ) are patterns in x, but in computing the total amount of structure in x, it would be foolish to count both of them. In general, the problem is that different patterns may share similar components, and it is unacceptable to count each of these components several times. In the present example the solution is easy: don count (y, z2 ). BUt one may also construct examples of very different patterns which have a significant, sophisticated component in common. Clearly, what is needed is a general method of dealing with similarities between patterns. Recall that I.(v w) was defined as the approximate version of the effort required to compute v from a minimal program for w, so that if v and w have nothing in common, I.(v,w)=l.(v). And, on the other hand, if v and w have a large common component, then both I.(v,w) and I.(w,v) are very small. I.(vlw) is defined only when v and w are sequences. But we shall also need to talk about one program being similar to another. In order to do this, it suffices to assume some standard "programming language" L, which assigns to each program y a certain binary sequence L(y). The specifics of L are Irrelevant, so long !73 as it is computable on a Turing machine, and it does not assign the same sequence to any two different programs. The introduction of a programming language L permits us to define the complexity a program y as I.(L(y)), and to define the complexity of one program y, relative to of another program Y2 as I.(L(y, ) L(Y, )). As the lengths of the programs involved increase, the differences between programming languages matter less and less. To be precise, let L and L, be any two programming languages, computable on Turing machines. Then it can be shown that, as L(y, ) and L(y2 ) approach Infinity, the ratios I.(L(y, ))/I.(L,(y, )) and I.(L(y, ) L(Y= ))/I.(L, (y, ) L, (Y, )) both approach 1. Where is any binary sequence of length n, let D(z) be the binary sequence of length 2n obtained by replacing each 1 in with 01, and each 0 in with 10. Where w and are any two binary sequences, let wz denote the sequence obtained by placing the sequence 111 at the end of D(w), and placing D(z) at the end of this composite sequence. The point is that 111 cannot occur in either D(z) or D(w), so that wz is essentially w Juxtaposed with z, with 111 as a marker inbetween. Now, we may define the complexity of a program.data pair (y,z) as I. (L(y)z), and we may define the complexity of (y,z) relative to (y, z, ) as I. (L(y)z L(y, )z, ). We may define the complexity of (y,z) relative to a set of pairs {(y, z, ),(y,, z, ) (y,, z, )} to be I. (L(y)z L(y, )z,L(y, )z, ...L(y, )z, ). This is the tool we need to make sense of the phrase the total amount of structure of Let S be any set of program-data pairs (x,y). Then we may define the size IS of S as the result of the following process: ALGORITHM 2: Step O. Make a list of all the patterns in S, and label them (y, z, ), (y, z, ), (y. z. ). Step 1. Let s,(x)=l.(L(y, )z, ) Step 2. Let s,(x)=s,(x)+l.(L(y, )z, )l(L(y, )z, ) Step 3. Let s,(x) =s,(x) + l.(L(y, )z, L(y, )z,L(y, )z, )) Step 4. Let s,(x)=s,(x)+l.(L(y, )z, lL(y, )z,L(y, )z, )L(y, )z, ))... Step N. Let ISl =s(x)=s,,(x)+l.(L(y, )z, lL(y, )z,L(y, )z, )...i.(y,, )z,, ) At the kh step, only that portion of (y, z, ) which is independent of {(y, z, ), (y,., ,z,., )} is added onto the current estimate of Sl. For instance, in Step 2, if (y, z, ) is independent of (y, z, ), then this step increases the initial estimate of Sl by the complexity of (y, z, ). But if (y, z, ) is highly dependent on (y, z, ), not much will be added onto the first estimate. It is not difficult to see that this process will arrive at the same answer regardless of the order in which the (y, z, ) appear: THEOREM 2: The resu/t of Algorithm 2 is invariant under permutation of the (y, ,z, ). Where P(x) is the set of all patterns in x, we may now define the structural complexity of x to be the quantity P(x) I. This, we suggest, is the sense of the word complexity" that one uses when one says that a person is more complex than a tree, which is more complex than a bacterium. In a way, structural complexity measures how many insightful statements can possibly be made about something. Them is much more to say about a person than about a tree, and much more to say about a tree than a bacterium.

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off