paul-almond.com
Home Guest Book Links Email
Occam's Razor Part 6: Partial Models as "Envelopes"

By Paul Almond, 26 February 2006

Previous Articles in this Series

Reading of the following articles is suggested before reading this article:

Occam's Razor Part 1: What Is Occam's Razor? http://www.paul-almond.com/OccamsRazorPart01.htm. [1]

Occam's Razor Part 2: Principles of Language http://www.paul-almond.com/OccamsRazorPart02.htm. [2]

Occam's Razor Part 3: Assumptions About Reality http://www.paul-almond.com/OccamsRazorPart03.htm. [3]

Occam's Razor Part 4: An Overview of How Occam's Razor Works http://www.paul-almond.com/OccamsRazorPart04.htm. [4]

Occam's Razor Part 5: How Mapping Can Work http://www.paul-almond.com/OccamsRazorPart05.htm. [5]

The following article, although not part of this series, is closely related to it and is a prequel to it:

What is a Low Level Language? http://www.paul-almond.com/WhatIsALowLevelLanguage.htm. [6]

Introduction

In previous articles a theory - a partial model - was considered to be an algorithm that generates expected future observations of reality - a prediction machine. While there was some success in justifying Occam's razor on statistical grounds, limitations in this way of representing partial models became apparent. These are:

  1. It does not allow for multiple theories, but only a single "monolithic" theory.
  2. There is no concept of hierarchy.
  3. Models are required to be totally accurate.
  4. Models cannot be easily translated into human terms.

Another way of representing partial models is required before we can proceed much further with Occam's razor. How we represent a worldview is closely related to how Occam's razor works.

The issue has practical importance. If we hope to use Occam's razor in artificial intelligence, or to test real claims, we need a better way of representing models of reality than what we have been discussing so far. It must be general enough to express any human models of reality and must somehow allow concepts within a model to relate to concepts within a human equivalent of the same model.

This will be the subject of this article. It will not deal directly with Occam's razor, but rather with how we can describe partial models differently to enable a complete worldview to be described by a number of partial models being used together. The main issue is still Occam's razor, but we need to deal with this first.

This article certainly puts us into the area of "ensemble" views of nature. In particular, it will make use of the idea of algorithms which extract meaning from data. I wish to bring it to the reader's attention that prior consideration of such algorithms is made by Standish in his article Why Occam's Razor [7].

A Correction

There is a trivial error in previous articles in this series that I will correct. It relates just to terminology.

In Occam's Razor Part 2: Principles of Language. I stated a requirement for algorithmic language that I referred to as the principle of algorithmic description. In Occam's Razor Part 3: Assumptions About Reality I stated an assumption that reality is "generated" by some algorithm and also referred to that as the principle of algorithmic description.

This means that I have given the term the principle of algorithmic description to two things. Although closely related, they are different: one is a specified requirement of language and the other is an assumption about reality. The terminology should not have been reused.

To correct this, from now I will refer to the language requirement in Part 2 as the principle of algorithmic description and the assumption in Part 3 as the assumption of algorithmic description. This means that, restating some of the text from previous articles with a small amount of editing so that it makes sense out of context, we will have:

The principle of algorithmic description

The language used to describe a model should only allow formal description. There must be no vagueness in the predictions made by a model or in interpreting the model to determine what the predictions are. The best way of avoiding vagueness in how predictions are to be extracted from a model is not for the model merely to be a static entity, admitting to such interpretation as may seem desirable, but for it to be a dynamic thing that actually makes its predictions. This demands a great deal of flexibility from the language.

All of this demands that the languages used to describe models should be algorithmic ones. Models are algorithms that accept input data about the current state of reality and make predictions; that is to say models are like computer programs.

The assumption of algorithmic description

If we consider all the possible algorithmically expressed descriptions models of reality that could be made by following the principles in Occam's Razor Part 2: Principles of Language, one of these is correct. We may not know which algorithm is correct, but there is a correct algorithm out there somewhere that describes reality.

Practical Application of Occam's Razor

This series of articles has a number of purposes, beyond mere discussion of the philosophy of Occam's razor. Two important purposes are:

  • to describe Occam's razor in a way matching the way that humans view models. Such a description may allow Occam's razor to be applied formally by humans, without the current vagueness in the understanding of it that allows both sides to use it in some areas such as the existence of god and the plausibility of the many-worlds hypothesis. It would also be of interest in cognitive science as it would be a formalized description of how our brains model reality.
  • to specify a description of Occam's razor which could be useful in artificial intelligence. It would be useful in the implementation of theory synthesis and evaluation systems in software, so that computer programs could model reality as we do. It is this application of a deep understanding of Occam's razor that particularly interests me.

Both of these objectives dictate some of the approach that is to be used. If our only motive were to show Occam's razor to be correct then we may be satisfied with a "minimalist" description of it that does not fit in very well with how we view the world, but still contains the essential characteristics of the idea. The above objectives require a version of Occam's razor which can be practically applied and this means that I will be going further than a "minimalist" idea of worldview description and Occam's razor.

Some of the ideas that I introduce will be for practical motives and when I say that a particular feature has been introduced for practicality, to give a practical version of Occam's razor, or anything equivalent, it will relate to the objectives mentioned here.

Limitations of the Previous View

Previous articles viewed an observer's experience of reality as being a sequence of binary digits (bits) generated by the actual model. Occam's razor was viewed as a criterion for selecting partial models. A partial model was viewed as being an algorithm to generate a future sequence of bits which is supposed to match, as closely as possible, the one generated by the actual model. I referred to these as partial models to avoid any pretence that there would be a reasonable chance of total correctness. This view has some limitations:

  1. It does not allow for multiple theories, but only a single "monolithic" theory.
  2. There is no concept of hierarchy.
  3. Models are required to be totally accurate.
  4. Models cannot be easily translated into human terms.

A different way of expressing partial models is needed to resolve this.

Partial Models as "Envelope" Descriptions

The problem is caused because a partial model, as I have defined it, is actually required to predict the results of future observations. It is this idea that causes the problems. It precludes simultaneous use of multiple theories (multiple partial models) because each will simply state what the results of future observations will be. If they agree exactly then they are equivalent anyway and we only need one of them. If they disagree then we have no way of combining their predictions and must discard one of them. Either way, we can only have a use for one of them. This is a sever weakness in any practical way of expressing models: we have practically no chance of assembling a single monolithic model to deal with reality, or even with many small parts of it in specific real-world modelling situations.

Humans, however, seem to make predictions using many partial models of the world. How do they avoid this problem?

The answer is that having models explicitly make predictions is only one way to get predictions done. Another way is to make partial models work as "envelope" descriptions of reality. This means that each partial model would not explicitly describe what reality does but merely some condition(s) that reality has to satisfy. This would not amount to a prediction in itself, but it would unambiguously mean that any prediction would either agree with the partial model - meaning that the predicted observations would meet the condition(s) specified by the partial model - or would not agree with it - meaning that the predicted observations would not meet the condition(s) specified by the partial model.

Here is a simple example to illustrate this. Rather than consider the sequence of observational results as bits I will merely consider it as a sequence of base-ten digits.

We have this sequence of "observations":

272541676189414345832147265…

and we want to make a model for this. As was discussed in previous articles the only good reason to make a model is to know what is likely to happen next. Any model we make of this should give us an idea of what will come next in the sequence.

We could attempt to make a complete model that is expected to predict the entire sequence indefinitely with complete precision. As has been discussed, that is not going to work when dealing with reality.

We could attempt to produce a partial model as defined by the previous articles, but this approach would have problems that have already been discussed. We need, instead, a different way of representing partial models.

The solution is for the partial model, instead of making statements about what the sequence of numbers will do in the future, to declare some condition that the sequence of numbers will satisfy. If we look at the sequence of numbers we should be able to detect a pattern (I know because I put it there) and make a partial model consisting of the following statement:

"Every occurrence of an odd digit (1,3,5,7 or 9) is immediately followed by an occurrence of even digit (0,2,4,6 or 8)."

This does not explicitly tell us what is going to happen next, but it does tell us about what is going to happen next. The last digit in the sequence was "5", so it tells us that, if the partial model continues to apply, the next digit must be an even digit (0,2,4,6 or 8). It cannot be an odd digit (1,3,5,7 or 9) as this results in a sequence of numbers not compliant with the partial model. The partial model can be seen as setting an imaginary boundary - an "envelope" around a set of possible future sequences of digits. Any hypothetical future sequence of digits can be tested to see if it is within the "envelope" - and if it is not then, according to the partial model, it cannot happen.

In itself, this does not allow the partial model to make a specific prediction, but it allows hypothetical future sequences to be accepted as possible or discarded and in that sense it does make a kind of vague prediction.

Why would we want partial models which work like this? This approach becomes valuable in dealing with the problem of not being able to use multiple partial models simultaneously and needing a single "monolithic" model. In this approach there is nothing to stop multiple partial models being used. One partial model could declare some condition that the sequence of digits has to satisfy and another partial model could declare another condition. There need be no conflict between the two models, as there would be if each were stating exactly what will happen in the future. Having two models would simply mean that to be consistent with both partial models any future digits in the sequence of numbers would have to allow it to agree with the conditions declared in both models and there is no reason why a sequence of digits cannot satisfy two conditions. In fact, there is no reason why a sequence of digits cannot satisfy all of a very large number of conditions.

In the example that I gave, the partial model did not constrain the sequence of numbers very much. A lot of sequences could still exist that complied with the partial model. This need not be the case for all partial models: some partial models may specify conditions that are satisfied by very few sequences and which tightly constrain the future. If this is not the case, many partial models, each of which does not in itself constrain the future development of a sequence very much, could combine together to produce a very large amount of constraint.

Taking this view of partial models solves the "monolithic model" problem. It allows us to have multiple partial models that work together.

This weakens the principle of algorithmic description a bit. That principle described a model as "…a dynamic thing that actually makes it predictions." A partial model will now longer make its predictions autonomously, but simply be used to specify conditions that the predictions need to satisfy. Something else - that we have not considered yet - would need to make the actual predictions. If we wanted, we could always say that this "something else" and the partial model form the true model referred to in this principle.

Contradictory "Envelope" Partial Models

We do, of course, have to be careful about how we construct these models. It would be possible for the set of possible sequences allowed by one partial model not to include any possible sequences allowed by another and this would mean that the two partial models would be contradicting each other: in fact, I would say that this is what is happening when any two contradictory statements about physical reality are made.

The Need for Formalization

The example of the numerical sequence that I gave shows how partial models can be considered as "envelopes", but it is not formalized adequately to allow proper philosophical consideration. It would also need to be formalized more to provide a general and formal way of describing models which we would need to provide a practical version of this allow us, for example, to automate Occam's razor in a computer program.

We will not achieve such formalization by relying on partial models described in everyday human language. As previously, we will implement this by using algorithms to describe partial models. The main idea that we will use is that of a meaning extraction algorithm. A meaning extraction algorithm will be an algorithmic description of a partial model that is working as an envelope.

An Algorithm to Extract Meaning

An algorithm to extract meaning from the sequence of bits representing all the observations of reality that can be made could work as follows:

Let B = the sequence of bits representing the sequence of observations of reality that can be made and that is generated by the actual model. Basically, B is the sequence of bits that we regard as being reality and which conforms to the assumption of algorithmic description.

Let i = an index, which is a position in the sequence of bits B.

Let M(B,i) be a partial model - a meaning extraction algorithm - that accepts B as input data and returns some sequence of bits R corresponding to meaning that it has extracted.

That is to say:

R = M(B,i) for some partial model (meaning extraction algorithm) acting on a sequence of observations B.

Purpose of the Index i

The partial model will need to read bits from the sequence of bits representing the observations of reality. We can imagine it scanning up and down the sequence of observation bits, reading bits and then processing them according to some algorithm to generate its result - a bit like the way in which Turing machines are often depicted.

To do this it has to have some reference point. For example, if one of its instructions says "Read bit 7 from the sequence of observations" and another instruction says "Read bit 10 from the sequence of observations" then we can see that two instructions are reading two bits that are three bit positions apart from each other, but this does not tell us where the bits actually are in the sequence: unless we know were bit 0 is.

This is dealt with by the index i. i defines some reference point in B. Whenever commands in the partial model M refer to bit positions in B they define such positions, not absolutely, but relative to this position given by index i. When the meaning extraction algorithm is used an actual value is assigned to i and all the relative positions in the meaning extraction algorithm now become absolute positions.

Some readers will recognize this as being similar to relative addressing in machine programming languages. An alternative way of thinking of this would be that the index i simply specifies where bit 0 is and, from this, it can be said where bits 1,2, 3 etc and also bits -1, -2,-3, etc all are. This is the same concept - just expressed slightly differently.

There is still some arbitrariness about specification of positions. It relates to specification of the index i itself. There are also deeper reasons for use of the index i.

The (Trivial) Subjective Nature of the Index i

We are assuming, at least for now, that any partial model is being applied by an observer. There is a rather trivial form of subjectivity involved in specification of the index i.

The index i deals with the problem that any positions of bits in the sequence of observations need to be relative to some position and i defines that position. Some readers may object to this by asking how the position of i is specified. To what is that relative?

Here is an example:

A partial model may contain a command to read the bit at position i+1001 from the sequence of observations. If the index used i=10,000 then this would be specifying the bit at position 11,001. What does it mean, however, to specify an index i=10,000? Where is the position specified by i=0? To what is i relative?

This is not a significant issue. An observer simply needs to select a bit position in the sequence of observations as being the position specified by i=0. The position can be arbitrarily chosen and once it has been selected then any future use of any index with any partial model will refer to an absolute position in the sequence.

Some readers may wonder why I actually introduced the index if an arbitrary i=0 position needs to be chosen anyway: why not just say that any bit positions in the sequence are relative to this arbitrary position? I have not, however, introduced the index i in a flawed attempt to avoid having to specify an arbitrary "zero" position, but for two other reasons:

  1. to allow the description of a meaning extraction algorithm to be independent of any arbitrary position choice made by an observer: although an observer needs to specify where i=0 this can be done outside the meaning extraction algorithm and the meaning extraction algorithms can still make sense to a different observer.
  2. for a reason relating to laws of nature that will be discussed shortly.

What "Meaning" Means

Meaning is simply the data R that is extracted by the partial model (meaning extraction algorithm) M. The best way of thinking about this is that the partial model is asking a question - one phrased in a rather abstract way - about physical reality (or whatever system it is being applied to) and the returned sequence of bits R is the answer.

It may seem as though R would have to contain a long sequence of bits to extract very much meaning but this is not necessarily the case. The minimum amount of information which a meaning extraction algorithm needs to return is a single bit and this could be considered equivalent to the answer to a yes/no question. Some yes/no questions could be very profound. For example:

"Does reality obey Newton's inverse law of gravity?"

This question has a simple yes/no answer, but it is not trivial. A "yes" answer to it conveys a lot of meaning: it tells us a lot about how reality works. If we had a sequence of observations and wanted to know if they implied that reality obeyed Newton's law of gravity then we would have to make a partial to ask such a question. It would have to analyse the sequence of observations and check to see if they were consistent with the idea that Newton's law is correct. To do this the partial model would itself have to contain a description of Newton's law of gravity and this means that it would itself contain some meaning. When the partial model returned a "yes" - which may be a "1" - all of the meaning of Newton's theory - which is built into the partial model - would be implied by this simple answer.

Of course, we could run a partial model on any number of hypothetical sequences of observations and we could find some for which it would return "no". It is in this way that extraction of meaning can enable us to do prediction: if we think that a partial model should give a certain result, then this rules out any future sequences of observations that would cause it to give a different result.

Meaning Extraction Algorithms with Parameters

Some partial models will be related to others. A single partial model can be made to perform the same task as a group of such partial models by allowing it to accept a parameter. The parameter is a sequence of bits provided as input to the partial model and which can be used in any way by its code.

So we now have:

Let B = the sequence of bits representing the sequence of observations of reality that can be made and that is generated by the actual model. B is the sequence of bits that we regard as being reality and which conforms to the assumption of algorithmic description.

Let i = an index, which is a position in the sequence of bits B.

Let P = a sequence of bits provided as a parameter to any partial model (meaning extraction algorithm).

Let M(P,B,i) be a partial model - a meaning extraction algorithm - that accepts B as input data and returns some sequence of bits R corresponding to meaning that it has extracted.

That is to say:

R = M(P,B,i) for some partial model (meaning extraction algorithm) acting on a sequence of observations B.

Having partial models accept parameters is important with regard to practical versions of Occam's razor.

Simplifications of Partial Models

We may sometimes want to consider partial models in a simpler way. To do this we could make these obvious simplifications:

  • We could consider partial model without the index i. If I use this idea in future I will refer to these as indexless partial models.
  • We could consider partial models without parameters. I have already described such partial models that would take the form R=M(B,i). If I use this idea in future I will refer to these as parameterless partial models.
  • We could consider "yes/no" partial models that are restricted to returning only a single bit as the result, so that if R=M(P,B,i) then R can only be a single bit. This would not mean that such partial models would not be able to deal with profound issues: it could specify an entire scientific theory and return an answer about whether or not a specific sequence of observations agrees with it. Such partial models would have some limitations. If I use this idea in future I will refer to these as Boolean partial models.
  • We could also combine these methods of simplification; for example, we could consider indexless, parameterless and Boolean partial models.

Physics-Like Partial Models

I define a physics-like partial models as follows:

A physics-like partial model is one for which the returned meaning is not dependent on the index i.

It is simpler to consider this first using the simplification of parameterless partial models:

Let B = the sequence of bits representing the sequence of observations of reality that can be made and that is generated by the actual model. B is the sequence of bits that we regard as being reality and which conforms to the assumption of algorithmic description.

Let i = an index, which is a position in the sequence of bits B.

Let M(B,i) be a partial model - a meaning extraction algorithm - that accepts B as input data and returns some sequence of bits R corresponding to meaning that it has extracted.

So we have:

R = M(B,i) for some partial model (meaning extraction algorithm) acting on a sequence of observations B.

and for M(B,i) to be a physics-like partial model (meaning extraction algorithm) R is not dependent on i: R is always the same for some sequence of observations B irrespective of what index i is chosen.

For partial models that are not parameterless the situation is slightly more complicated. We may declare a partial model to be physics-like, but this statement would only be valid for a single parameter. That is to say, when that parameter is used, the returned result R is not dependent on i. Changing the parameter may change the returned result R and even if the partial model is physics-like for a particular parameter, it may or may not be physics-like for other parameters.

That is to say if we have:

i = an index, which is a position in the sequence of bits B.

P = a particular sequence of bits provided as a parameter to a partial model (meaning extraction algorithm).

M(P,B,i) is a partial model (meaning extraction algorithm) that accepts B as input data and returns some sequence of bits R corresponding to meaning that it has extracted.

and R = M(P,B,i) for some partial model (meaning extraction algorithm) acting on a sequence of observations B.

then M(P,B,i) is a physics-like partial model - a meaning extraction algorithm - for that parameter P if R is always the same irrespective of i: changing the index does not alter the returned meaning.

For a different parameter P however, R=M(P,B,i) may not be independent of i and may not be physics-like: each parameter needs to be considered separately.

R is not required to stay the same irrespective of the parameter P. Furthermore, if M(P,B,i) is physics-like for two different parameters it does not follow that the returned result R is the same for both parameters: all that is required for each instance of M(P,B,i) to be physics-like is for the result not to depend on the index i for a given parameter P.

The concept of physics-like behaviour does not have to refer to what people think of as "the laws of physics". It could relate to all kinds of "general" statements about reality, expressed algorithmically in this way, many of which would not seem particularly "physics-like" to most people.

Extra Terminology for Physics-Like Partial models

For completeness, I will introduce some extra terminology regarding physics-like partial models that take parameters.

If a partial model M(P,B,i) is physics-like for each of the parameters P in some set of possible parameters then it will be said to be completely physics-like for that set of parameters. We may also say completely physics-like over a set - referring to the set of possible parameters. This could also be abbreviated to physics-like over a set.

Any parameter P is a sequence of binary digits, and so could be considered as a binary number. This means that a set of parameters could be considered to be every parameter P within some range of parameters. If it is physics-like for every parameter in such a range then a partial model M(P,B,i) could be said to be completely physics-like over a range. Being completely physics-like over a range is a special case of being completely physics-like over a set. This could also be abbreviated to physics-like over a range.

If a partial model is completely physics-like over a set or range this does not mean that the returned result R is the same for every parameter in the set or range. It merely means that for any single parameter P in the set or range the returned result R is not dependent on the index i: the returned result R always has to be the same for any parameter P but could be different for any two different parameters. There are possible cases, however, in which the returned result R is the same for a given set or range of parameters, independently of the parameter P or the index i. Such partial models will be said to uniformly physics-like over a set or uniformly physics-like over a range.

If a meaning extraction-algorithm is uniformly physics-like over the set of all possible parameters - that is to say that the result R=M(P,B,i) returned by it is always the same for any P or i - then it will be said to be totally physics-like.

The Purpose of the Physics-Like Concept

The purpose of the physics-like concept is to capture the generality of laws of nature. Some claims about reality are specific; for example "There is a vase on the red table." Some are more general - laws of nature being an example.

What does it mean to say that a claim about reality is general? It is supposed to be true always - anywhere and at any time. This is what is captured by the physics-like concept. If the sequence of bits representing the observations of reality is observations made at different times and places then changing the index for a partial model is equivalent to it being applied by observers at different times and places and if the partial model really is a law of nature then it should give the same results independently of any reference to place or time.

The physics-like concept expresses this generality of laws of nature.

This also shows my true purpose in introducing the index i. The index allows a partial model to be applied with different indices, allowing a partial model for which the result changes to be shown as not physics-like. This is necessary for the concept of a physics-like partial model to have any meaning.

Object-Like Partial Model Results

Not all partial models are physics-like. The concept of objects will relate to the results of other partial models.

The term object-like will be used to describe a particular set of results returned by a particular partial model without the requirement of any interesting degree of physics-like behaviour. We will decide what set of results constitute the object-like behaviour for the algorithm and when the algorithm returns a result that is a member of that set then the result will be considered object-like. When the result is not within that set it will not be considered object-like - unless, of course, it happens to be part of a different set also declared to constitute object-like behaviour for the same partial model.

A particularly convenient application of this is likely to be with Boolean partial models that are not physics-like. One of the two possible returned values may be considered to constitute the entire set for object-like behaviour. This means that we may consider the returned result of "1" to mean that the result is object-like and a returned result of "0" to mean that the result is not object-like. I will refer to this as object detection.

As the index i for an algorithm is changed it may return results that are sometimes object-like and sometimes not object-like.

If objects do not have to be physics-like it may be asked what purpose they can serve in helping us to make predictions. This will relate, in part, to the concept of hierarchy and we will explore this later.

The Purpose of the Object-Like Concept

The physics-like concept was introduced to allow general claims to be made. I am introducing the object-like concept to allow less general claims to be made.

An object-like partial model result is likely to be of interest if it occurs with high frequency, while lacking the generality of physics-like partial models.

A common use of the object-like concept is likely to be in object detection. If we have a partial model that returns two results, one of which is considered to be object-like, then when we use an index that causes the partial models to generate the object-like result we can consider this to be a successful object detection and when we used an index that causes the partial model to generate the results that is not considered objects-like we can consider this to be a failed object detection.

This raises the question of what an object actually is. The simplest answer would be that an object is a "thing" - external to everything that we are doing - and that an object detection simply tells us about this.

There is another way of looking at things. In the way that we consider the concept of "object" here an object is simply a pattern in the sequence of bits representing the sequence of observations. Object-like results of partial models can be used to detect objects in the sequence of observations. I suggest that that is all that there is, as far as objects are concerned. Partial models do not detect objects that exist that have an independent existence in their own right. It is ontologically meaningless to talk about such existence. Objects only exist by virtue of being revealed by object-detection in partial models. The process of object-detection does not really detect the object - terminology which I merely use to fit in with our everyday approach to the world: it is the object.

I want to be clear here that I am not escalating human observation or consciousness to any important role in this, or indeed anything else's observation or consciousness. I am not saying that we make objects exist by using partial models to "detect" them. I am saying they exist by virtue of the capability for an algorithm to be stated that "detects" them - which is not the same.

The Importance of Results

I want to ensure that I have made the purpose of partial models and their results clear.

Partial models are used to make claims about reality. The sort of claim made by a partial model is that the sequence of observations most fulfil certain criteria. This means that the partial model is serving as an envelope of possible future behaviour of reality as was discussed earlier. Providing that the partial model has defined an appropriate envelope then any future observations that do not meet the requirements of that envelope are considered not to be possible. It is in this way that the partial model constrains reality and makes future prediction possible. This makes partial models a more sophisticated version of the partial models that we have been discussing in previous articles.

This does not mean, however, that a partial model amounts to a claim of such an envelope in itself. The partial model can return different results and there is nothing intrinsic to the partial model to say which results are valid. That decision comes from the interpretation of the result of the partial model. When a partial model is used there may be an expectation about its results and when future use of the partial model is considered there may be a similar expectation, meaning that any future observations which would cause the results not to meet this expectation are considered incapable of happening. This is most obvious in relation to physics-like partial models. A physics-like partial model is expected to return the same result all the time, probably as a result of it previously having been observed to return the same result all the time, and this leads to the future expectation of it returning that result. This, in turn, acts as a constraint or "envelope" on possible future observations as they have to result in a sequence of observations which continues to cause that partial model to return that result. In this way the partial model is helping to make predictions, thought more indirectly than partial models in earlier articles.

Uncertainty

After what I have just said, one issue that I should mention is that of uncertainty. It may seem that I am saying that we can have a particular partial model and be 100% confident that it is physics-like and will always obtain the same result. We cannot have such certainty. Our main evidence that a partial model is physics-like will be from using it a lot and noticing that the result is the same. This could never constitute absolute proof: it could only ever be statistical evidence.

This does not mean that any of the definitions that I have given in this article are wrong. If a partial model is physics-like then it must always give the same results. The problem is in whether or not we can know with total confidence that any given partial model is physics-like. We may think that a particular algorithm is physics-like, while admitting to some uncertainty about whether or not this is really the case.

I do not intend to go too deeply into this issue of uncertainty now. We have enough to do in determining how to represent other aspects of reality. It will, however, need considering later.

Multi-Dimensional Indices

Some readers may have noted that representing reality as a sequence of observations - bits - does not map very easily onto how we perceive reality. Humans view reality as being multi-dimensional. In fact, we can manage quite well using merely a one dimensional sequence of bits. For a practical version of Occam's razor however, it would be better to express multi-dimensional aspects of reality without complications. For this reason, I will now take the practical step of allowing models to directly express multi-dimensional aspects of reality.

The one-dimensional sequence of observations that has been discussed so far throughout these article now becomes an n-dimensional matrix of observations. This simply means that the observations are not required to be arranged in a linear sequence occurring over time. They may be a two-dimensional grid of bits, or a three-dimensional matrix of bits. Any number of dimensions can be accommodated to reflect the dimensionality of any situation that we are trying to model.

The index i needs to refer to a specific location within the matrix of observations. This means that the index has to have the same dimensionality as the matrix of observations. The index now becomes a vector.

All of the definitions that have been given still apply with the index simply becoming an n-dimensional vector.

For example, the definition of a partial model with parameters would now be:

Let B = an n-dimensional matrix of bits representing observations of reality that is generated by the actual model. B is the matrix of bits that we regard as being reality and which conforms to the assumption of algorithmic description.

Let i = an index, which is an n-dimensional vector giving a position in the n-dimensional matrix of bits B.

Let P = a sequence of bits provided as a parameter to any meaning extraction algorithm.

Let M(P,B,i) be a partial model - a meaning extraction algorithm - that accepts B as input data and returns some sequence of bits R corresponding to meaning that it has extracted.

That is to say:

R = M(P,B,i) for some partial model (meaning extraction algorithm) acting on an n-dimensional matrix of observations B.

I will not list all the other definitions revised to allow this multi-dimensionality as I think it will be quite obvious what they will be now.

We should ask what the assumption of algorithmic description means in such a multi-dimensional context. We will leave this issue for now, but I will return to it later.

Does this mean that the idea of representing observations of reality as a one-dimensional sequence of bits has been abandoned totally? It has not been abandoned at all. I may use it later when I want to consider simplified situations. It is simply a special case of what we now have. If we set the number of dimensions to one then the matrix of observations once more becomes a sequence of bits and the index simply comes a point in that sequence of bits. We can consider such a special case whenever it is convenient to do so.

Applying Meaning Extraction Algorithms to Meaning Extraction Algorithms

Partial models (meaning extraction algorithms) operate on matrices of bits. A partial model itself is simply a sequence of bits - as is any algorithm - and there is therefore the potential to have partial models operate on other partial models to extract meaning from them.

At this stage it is too early to explore this in detail. I merely make this comment as it could be useful in future.

Making Occam's Razor Work

There is not much about Occam's razor itself in this article - nor, very likely, in the next one. How does all this relate to Occam's razor?

In previous articles it became apparent that the way in which models were being represented was too limited. Before any future discussion is useful a more suitable method of describing models is needed and this is the purpose of this article.

At some stage, we will return to the issue of Occam's razor. An infinite number of partial models can be created and Occam's razor will provide guidance on which ones we should use. Suitable partial models will need to satisfy various criteria to be useful to us and it is this criteria that we will be establishing in our future consideration of Occam's razor.

The Need for Hierarchy

In human generated models we are familiar with the concept of hierarchy. Hierarchy is used both in scientific models and in models that represent our everyday understanding of the world. As examples:

  • chemistry is based on fundamental physics.
  • the concept of the atom is based on the concept of various subatomic particles.
  • the concept of a crowd of people is based on the concepts of individual people within that crowd.

So far, in the modelling system as I have described it there is no concept of hierarchy. Hierarchy clearly needs to be introduced to have a practical modelling system and any chance of applying Occam's razor practically.

What would hierarchy mean in the context of this sort of modelling system? We cannot use vague ideas such as one thing containing another or being based on another because we do not really have things any more. We have dispensed with things and we are seeing reality merely as patterns that can be detected and analysed by partial models. How can hierarchy be obtained from all of this mess of patterns?

Hierarchy in human models works by means of meaning being extracted from reality and this meaning then being treated as a new "layer" of reality - as if it is fundamental in nature - from which "higher level" meaning is then extracted.

Our hierarchy will work in the same way. Partial models will extract meaning from the matrix of observation bits. This meaning will be treated as if it itself constitutes a matrix of observation bits. Other partial models will be applied to this meaning and extract "higher level" meaning which will form a further matrix of observation bits, to which still further partial models can be applied, and so on.

Hierarchy is essential to what I intend. For a start, it is an important aspect of human modelling, making it an essential part of any practical model representation system. There is a further reason for having hierarchy, however.

If we are using hierarchy because humans use it, we should ask why we use it. This may seem a trivial question. An easy answer is that we use hierarchy because the universe is hierarchical - that we see giraffes as containing atoms and atoms containing electrons because that is how the universe is.

I am not satisfied with this. If we apply partial models in the hierarchical way that I described then we will have the experience of a hierarchy, but that experience is based on nothing but our ability to apply partial models hierarchically. This suggests that hierarchy in nature is simply the capability to employ partial models hierarchically and that "hierarchy" in nature is more nebulous than many people think it is. Why should the capability to do this make it a useful thing to do? It may be that some feature, some pattern in the matrix of observations allows this, but there will be many other features that are not similarly made into a big issue for our brains.

What it would be like if we did not use a hierarchical approach in our thinking. Can we even imagine this? I think we should be able to. Instead of our meaning of the world being built on layer upon layer of "invented" realities we would simply extract all meaning directly, with no intermediate stages. Instead of many short partial models, each creating the "reality" for the next, such an entity would simply use a very long one. The partial models would have to be made according to some criteria that we have not considered yet - they would basically have to satisfy Occam's razor and, possibly, other criteria. Generating very long partial models is probably harder than generating shorter ones - the set of algorithms of a given length is larger for longer lengths - and attempting to assemble a worldview in one leap of modelling like this is probably computationally unfeasible. Our brains have evolved to use the approach, instead, of extracting meaning progressively in small amounts , such that the results can be tested at each stage, and then using them as the "reality" for the next stage of meaning extraction.

An analogy for this is the travelling salesman problem. If we are given a map of points and asked to find the shortest route passing through all points we have a very large number of possible routes and, even using various techniques to make the problem easier, as the number of points grows the problem takes longer to solve, eventually becoming intractable. This sort of problem is known as an NP-complete problem. One way to deal with the problem would be to split it up into manageable pieces: we could divide the map into sets of points and find the most efficient route through each set. This would allow us to obtain answer, but at a cost: we would no longer be guaranteed the absolutely optimal solution, but the answer may still be acceptable.

I suggest something very similar is happening with the hierarchical way in which we view things in physical reality. We are splitting a basic sequence of observations into manageable pieces, again also at cost: there may be some meanings which could be extracted if we could do everything in one step that are lost when we rely only on the meaning that is conveyed through each of a "layer" of short partial models.

Reality is not hierarchical, or if it is then it is only hierarchical by virtue of our ability to use the computational trick of hierarchy on it successfully: the semantics of how we regard this do not really matter. The use of hierarchy by us does not capture some important feature of reality. It is a computational trick which evolution built into our brains, driven by the benefit that it allows us to make a workable worldview of reality despite us being too stupid to do it all at once. Some readers may notice something that resembles irony here: processes of evolution and, ultimately process in our own brains bring about partial modelling systems that use hierarchy simply as a computational trick to process reality, yet it is that same computational trick that gives us the hierarchical view that gives us concepts like "evolution" and "brains" anyway. This is rather strange. It gives a picture of a computational trick being capable of seeing itself, but I am not suggesting we go too far with this. Nor am I suggesting that we cause a computational trick that causes us. As I said previously, the best way of looking at this is in terms of the logical capability for an algorithm to be expressed, not the actual expression.

Such a computational limit imposed on us like this will be imposed on anything else by the difficulty of making appropriate partial models. This gives us another reason for including hierarchy in a practical version of Occam's razor. We will need it if we want a computer program that generates partial models according to a working artificial intelligence version of Occam's razor, as well as taking into account any other criteria that partial models need to satisfy; that is to say, a machine that makes its own model of the world. Hierarchy is critical to being able to build brains. This series of articles has now started to head more firmly into the area of artificial intelligence, which is where I wanted it all along, and from now on, most of our discussion about how create models and use Occam's razor will be linked to the issue of how we get machines to do all this.

This should also give us an idea of the purpose of objects. They involve extracted meaning which forms part of the invented "reality" from which physics-like partial models extract meaning. This is not the only purpose of objects - as we will see later.

In a later article I will show in more detail how we can establish the concept of hierarchy within the approach that is described here.

The Need to Remove Dependence on the Assumption of Algorithmic Description

So far, throughout this entire series of articles we have been treating reality as if it is a sequence of observations being churned out by an algorithm - the actual algorithm. This is the assumption of algorithmic description.

I have never liked having to use the assumption of algorithmic description. It is too much like a cosmological assumption - a scientific theory - and that is undesirable in a system that is supposed to be above, and to contain, all scientific theories.

It may seem to some readers that nothing I did so far amounted to any assumption of such a principle. Of course, I have to assume that we can describe things algorithmically (or at least formally, which people like Penrose may say means a rather different thing), but why should this amount to any assumption that the universe actually is algorithmic? To such readers it may seem that the assumption of algorithmic description (about reality) and the principle of algorithmic description are the same thing and it may be hard to see the wide-sweeping - and undesirable - cosmological assumption that is implied in the assumption of algorithmic description and why it may be philosophically problematic. The assumption of algorithmic description, however, is the assumption. The question of why it should be valid it is really the same question asked by Kant about the principle of induction.

We need to be able to philosophically operate without the assumption of algorithmic description. In later articles I will show how we can do this by modifying the way that we express models accordingly. The assumption of algorithmic description is linked very strongly to our notion of time. If we discard the assumption of algorithmic description we will lose the concept of time as fundamental - as part of the framework of the universe. Some philosophers have already taken this view about time. I will be doing this as well. Later articles will allow the ideas of time and space to be derived as part of a worldview - that is to say, as part of the meaning obtained by partial models, rather than being implicitly assumed.

The assumption of algorithmic description does have its uses. I think that in many real modelling situations we will still find it useful to simply assume that it applies. I do not see any problem with this at all. We know that Einstein's theory of general relativity is more accurate than Newton's theory of gravity, yet we still use Newton's theory to compute spacecraft trajectories. This is because Newton's idea is not wrong. It is simply contained within Einstein's idea as a special case. It will be just this way with the assumption of algorithmic description and concepts of time and space. Although we will be able to discard these as fundamental ideas we will be able to show how they can be derived and why they feature in our models as characteristics of reality. In many practical modelling situations, for example (possibly) in some artificial intelligence programs, we may just ignore any artificiality of the assumption of algorithmic description, time and space. All of this will be discussed more deeply in later articles.

Future Intentions

In future articles I intend to:

  • discuss the criteria needed by particular partial models to satisfy Occam's razor and why this criteria should make them suitable for adoption as possible models.
  • describe in more detail how hierarchy could be introduced.
  • remove the need for the assumption of algorithmic description.
  • discuss various ontological issues.

Conclusion

The main principle establish in this article is as follows:

Let B = an n-dimensional matrix of bits representing observations of reality that is generated by the actual model. B is the matrix of bits that we regard as being reality and which conforms to the assumption of algorithmic description.

Let i = an index, which is an n-dimensional vector giving a position in the n-dimensional matrix of bits B.

Let P = a sequence of bits provided as a parameter to any meaning extraction algorithm.

Let M(P,B,i) be a partial model - a meaning extraction algorithm - that accepts B as input data and returns some sequence of bits R corresponding to meaning that it has extracted.

That is to say:

R = M(P,B,i) for some partial model (meaning extraction algorithm) acting on an n-dimensional matrix of observations B.

Use of an index allows us to differentiate between physics-like meaning extraction algorithms - ones that have more generality in their behaviour than others - and less general ones which will give us an idea of "objects".

The criteria we should use to assess the value of specific partial models like this in modelling will need to be considered in future articles.

A way for hierarchy to work will also need further consideration.

References

[1] Web Reference: Almond, P. (2005). Occam's Razor Part 1: What Is Occam's Razor? Retrieved 22 August 2005 from http://www.paul-almond.com/OccamsRazorPart01.htm.

[2] Web Reference: Almond, P. (2005). Occam's Razor Part 2: Principles of Language. Retrieved 9 October 2005 from http://www.paul-almond.com/OccamsRazorPart02.htm.

[3] Web Reference: Almond, P. (2005). Occam's Razor Part 3: Assumptions About Reality. Retrieved 13 November 2005 from http://www.paul-almond.com/OccamsRazorPart03.htm.

[4] Web Reference: Almond, P. (2005). Occam's Razor Part 4: An Overview of How Occam's Razor Works. Retrieved 24 December 2005 from http://www.paul-almond.com/OccamsRazorPart04.htm.

[5] Web Reference: Almond, P. (2006). Occam's Razor Part 5: How Mapping Can Work. Retrieved 14 January 2006 from http://www.paul-almond.com/OccamsRazorPart05.htm.

[6] Web Reference: Almond, P. (2005). What is a Low Level Language? Retrieved 17 July 2005 from http://www.paul-almond.com/WhatIsALowLevelLanguage.htm.

[7] Web Reference: Standish, R. K. (2002). Why Occam's Razor. Retrieved 24 December 2005 from http://parallel.hpc.unsw.edu.au/rks/docs/occam/occam.html.

Home Guest Book Links Email

© Copyright Paul Almond 2003-2006. All Rights Reserved. Email: info@paul-almond.com
This page last modified: Sunday April 23, 2006 4:46