By Paul Almond, 13 November 2005
Previous Articles in this Series
Reading of the following articles is suggested before reading this article:
Occam's Razor Part 1: What Is Occam's Razor? http://www.paul-almond.com/OccamsRazorPart01.htm. [1]
Occam's Razor Part 2: Principles of Language http://www.paul-almond.com/OccamsRazorPart02.htm. [2]
The following article, although not part of this series, is closely related to it and is a prequel to it:
What is a Low Level Language? http://www.paul-almond.com/WhatIsALowLevelLanguage.htm. [3]
Introduction
This is the third article in the series about Occam's razor. The previous article Occam's Razor Part 2: Principles of Language [2] discussed the requirements for languages used to represent models when we are attempting to derive and apply Occam's razor.
This in itself is not enough for us to be able to assert Occam's razor. Somehow, these models have to apply to reality. There needs to be a kind of way of looking at reality which allows us to determine which model is best for describing it. This article will deal with this. It will present the minimal meta-assumptions regarding reality that are needed to allow Occam's razor to be obtained. I use the term meta-assumptions to distinguish these from conventional assumptions about reality and will explain this in the article. For now, it may be convenient just to assume that we are discussing the minimum assumptions needed about reality.
Meta-Assumptions About Reality
Some assumptions are implied by the requirements for language stated in the previous article. For example, the principle of formal description stated in the previous article, although it was stated as a requirement for language, implies a clear assumption that reality can be formally described and that any formal description can represent reality without omission. Because these principles imply such assumptions they will not be stated here without good reason. Instead, I will try to deal with the new assumptions that do not obviously follow from the previous article.
Meta-Assumption 1: The principle of algorithmic description
While it may appear that I am going to say here that we have to assume that reality can be algorithmically described, despite just saying that I was not going to try not to repeat previous assumptions, that is not really the point. The point of this assumption is that, if we consider all the possible algorithmically expressed descriptions models of reality that could be made by following the previous principles in the previous article, one of these is correct. We may not know which algorithm is correct, but there is a correct algorithm out there somewhere that describes reality.
This principle does actually need to be stated, even though I have previously said that models have to be algorithmic. There is a clear assumption here about how descriptions relate to reality and about the objectivity of reality. It implies that if we get the right algorithmic model we can be correct.
Definition: The actual model or actual algorithm is the model that correctly describes reality. The difference between the actual model and other models is that it is the one that is actually in effect, regardless of whether or not humans know what it is. The best model that could be conceived would be equivalent to the actual model.
Definition: A candidate model or candidate algorithm is a model proposed for the purpose of modelling reality. It is desirable that candidate models match the behaviour of the actual model as much as possible, but it is not necessary for a candidate model to agree very well with reality: simply being proposed as a model of reality is adequate, no matter how good the model is.
The term "actual model" could be viewed as oxymoronic. We are talking about the idea that reality develops according to some algorithm and the objection could therefore be made that this algorithm is simply the "actual algorithm" and that it is not a model - that it is, in fact the real algorithm. In this view the "actual model" is the thing that any other proposed algorithm is supposed to be modelling. Against this, it could be argued that the actual model is itself simply a model of how reality behaves and that while reality is modelled by the actual algorithm it is not the same as it. In this view the actual model is really just the best model and, like all the other proposed models, it merely describes something else. There is no obvious resolution to the difference between these two views and the difference is probably just semantics. I have decided to use the term "actual model", regardless of whether or not it is oxymoronic, because it is convenient and fits in with the other terminology that I am using: all that is really important is that we know what we mean.
Meta-Assumption 2: The principle of elimination of inconsistent models
Any candidate model that is not consistent with known data cannot be the actual model and can be eliminated.
An infinite number of possible candidate models can be conceived. Some of these, however, can be easily eliminated. Models are not made in the absence of any knowledge about reality. We have data from observations about reality and, if it is to be considered viable, any candidate model should agree with this data. Any model that is not consistent with observations that have already been made can be eliminated.
Definition: A consistent model or consistent algorithm is a candidate model that agrees with observations about reality that have already been made.
Definition: An inconsistent model or inconsistent algorithm is a candidate model that disagrees with observations about reality that have already been made. Such a model need not be considered further and can be eliminated from the set of algorithms that can possibly make predictions that agree with the actual model.
When we have eliminated all the possible models that could not be the actual model we are still left with an infinite set of models that are consistent with known data. What do we know about these? We should not let any prejudice affect our preferences here and it is better to start with what we do not know:
Meta-Assumption 3: The principle of statistical impartiality with respect to algorithms
Any particular consistent model is as likely as any other to be the true representation of reality in the absence of any knowledge about reality that contradicts this.
I have said that a particular algorithm model would correctly describe reality, but we do not know exactly which one it is. What can we say? This assumption simply involves not assuming anything that we cannot justify about the probabilities of various models being correct.
Assumptions, Meta-Assumptions and Meta-Reality
The meta-assumptions above may appear to be assumptions about reality. It does not really do any harm to consider them as such, but if we want to be pedantic we should really describe them differently. This is why I give them the term meta-assumptions.
Why are they not really assumptions about reality? A model, which I consider to be an algorithm, is the only thing that can really contain assumptions about reality. These meta-assumptions, however, are not associated with any particular model: they relate to what we can say about all the possible models.
As an example, let us consider Meta-Assumption 3, the principle of statistical impartiality. All this really assumes is that we should use our conventional understanding of probability when considering all the possible consistent models that could be true. If we have a number of possibilities - and there is no reason why one should be more likely than any of the others - then we should consider them all equally likely. I know that this may appear to be a conventional assumption that reality follows our ideas of probability, but it cannot really be this. If it were, then this idea should be built into an appropriate model. The principle cannot be put inside a model, however, because it stands outside models and makes an assertion about how we should treat groups of models. The principle transcends individual models and says things about models in general. It is therefore not an assumption about reality: instead it is a meta-assumption. It could also be said that it is not totally correct to assert that such meta-assumptions are about reality when they do not relate to one specific model of reality but instead to the entire logical "framework" of possible models of reality - which we could maybe call meta-reality. I decided to let the article's title simply refer to "reality" for simplicity.
With all this talk of meta-assumptions and meta-reality we are clearly doing metaphysics here. Some people think that metaphysics is always meaningless with no relevance to reality. I disagree: how Occam's razor should be stated and its validity are clearly metaphysical issues that have relevance to reality.
Objections
Objection 1: Are you not assuming too much?
Answer
This is assuming a good deal less than Occam's razor. Occam's razor would actually have us favour a particular sort of model: a model with a short description, probably. What we have done here is taken all that away and simply said that any model is as likely as any other. This applies irrespective of model description length. A good way of imagining this is to think about every possible model description that can be conceived being laid out in front of you. There is an infinite set of such possible model descriptions, ranging from very short ones to very long ones, with no upper limit on the length of the description. We cannot assume that any of these is more likely than any other. If we made such an assumption we would be assuming Occam's razor and we simply cannot do that in this sort of project. Really, in the absence of any principle like Occam's razor, the assumption of statistical lack of bias is all that we can assert.
"It is all we can do" may seem weak to some readers. I would make the following points to support the case for assuming that we can use ideas of probability:
- Necessity independently of Occam's razor - We have to use the idea of probability anyway. Even after Occam's razor has been accepted, the idea of probability is needed to make sense of the world. Even with the models that we have we need to use probability to describe the possible outcomes, for example when we roll a die. In some cases we need to use probability to derive models of reality - the derivation of the ideal gas equations in physics, which is based on statistical arguments about large numbers of particles, being an example.
- No viable alternative to impartiality - If you disapprove of me saying each of the possible models is equally likely, what position do you think I should take? The only alternative would be to say that some models are more likely than others for arbitrary reasons.
Objection 2: Your answers to the previous objection were weak. You just want to assume that some "law of probability" applies in nature. You are actually assuming that a model about reality is correct. You have no grounds for this. If there is a number of possible models, and there is no way of knowing which is right, you are not entitled simply to declare them all equally likely. The fact is we just don't know.
Answer
Some people persist in thinking that we cannot make probability judgements in the absence of some extra knowledge about the situation that should tell us what the probabilities are.
As an example, let us suppose that we have two mutually exclusive outcomes and we know that one of them must happen - we just do not know which is going to happen, nor do we have idea which is more likely to happen. People who make this sort of objection would say, "You can't say it is 50/50 - we just don't know enough to say what is more likely." They would also make a similar objection to what I have said here about models. They would say that we cannot say that every candidate model is equally likely to be the actual model and that we just do not know.
People making such objections think that if we declare the probabilities of two mutually exclusive events to be 50/50 then we are actually claiming some knowledge about the events, as if their probabilities of happening are properties of the events that we are claiming to know something about. This view shows a failure to understand what statements about probability actually mean.
Probability is not included in statements to express knowledge about the properties of things: it is actually to express our level of ignorance. To demonstrate this, let us imagine a situation in which there are two possible events, A and B, just one of which must occur. We will look at three examples of different amounts of knowledge that we have about this situation:
For our first example, let us imagine that we have some knowledge that makes us confident in saying that Event A has a 100% chance of happening and Event B has a 0% chance of happening. In this case we are claiming to have complete knowledge about which of the events is going to happen. In fact, when we speak casually, we rarely even mention probability in such situations: we are more likely simply to say, "Event A will happen and Event B will not happen."
For our second example, let us consider a situation in which we cannot state anything with certainty, but think that we can say that Event A is 80% likely and Event B is 20% likely. In this situation we are still claiming to have some knowledge about the outcome - we think Event A is favoured for some reason - but we are not claiming the complete knowledge of the last example. When we have less knowledge about the outcome we use probability to express our ignorance about it. Probability is used to express our degree of lack of knowledge.
In the case where we have no knowledge at all about whether Event A or Event B is favoured then we all we can do is express our lack of knowledge and we do this by saying that both Event A and Event B have a 50% chance of happening. To say anything else would be to express some information about Event A or Event B - information that we just do not have.
This point seems to be lost on some people who seem to think that statements of probability are claims of knowledge and that they should not be made in the absence of more knowledge about "what is more likely". This is strange as the probability statements are admissions of the very lack of knowledge that they think we should accept!
One possibility for what may be psychologically going on here is that some sort of cognitive illusion [4] may be involved. When statements are made we are used to looking for some hidden layer of knowledge that gives us confidence in those statements. As an example, if a statement is made about economics we may test the statement by checking to see if it complies with principles of economics. We may then be extra careful and want to make sure that these principles are sound. We may look for some extra layer of principles that can be used to test the principle we have just used, and so on. We can generally test some system of knowledge by finding some wider system in which it is asserted. There may well be an intuitive desire to do this even when statements about probability are made, but this ignores the fact that statements about probability are generally admitting to lack of knowledge. We do not need to find some other "layer" of knowledge to justify an assertion of lack of knowledge.
For this reason, when we have a set of models, each of which is a possible candidate for being the actual model, and we have no way of knowing which is favoured, it is quite valid to say that each has the same chance of being the actual model. All we are doing when we say this is admitting our lack of knowledge. We are not assigning properties to models.
Some readers may recognise this as conforming to the Bayesian interpretation of probability.
Objection 3: You have a naive view of probability. You seem to be saying that if we have two possibilities, one of which must be true, then we can always assume that they each have a 50% chance of being true. This sort of reasoning is the sort of fallacy used to support absurd arguments like Pascal's Wager - something which I would expect you to find nonsensical. It is a surprise to see you accepting such a simplistic idea of probability.
Answer
This objection is a straw man argument: my position is not as this objection tries to make it appear. I do not think that statements involving probabilities should never be questioned! Assertions of probability values should be questioned in the following situations:
- when the set of outcomes has been incorrectly restricted, excluding some outcomes that could occur, which also has the effect of assigning inappropriately high probabilities to those outcomes which have been included.
- when the set of outcomes has been incorrectly expanded, including some outcomes that could never occur.
- when information is assumed to be correct about certain outcomes being favoured but there are actually no grounds for thinking this information to be correct and it could actually be incorrect.
- when information that is possessed about certain outcomes being favoured is ignored, leading to a probability assertion that needlessly claims less knowledge than can actually be claimed.
Not all of these situations imply that an assertion of probability is wrong. It is a little more complex than that for the second and fourth situations. Let us just consider the fourth situation:
Someone who lacks any information about which of Event A or Event B is favoured may say that there is a 50% chance of each. Let us assume that we have some information that causes us to assert that Event A has an 80% chance and Event B has a 20% chance. We will feel "correct" in any debate with someone who lacks the information. Does it mean that he/she is wrong to give the 50/50 estimate? Not really: "wrong" is a harsh way of putting it. If someone does not possess any information then admitting this in probability terms (the 50/50 statement) is all that he/she should do. On the other hand, if we were to do this, while in possession of our information about Event A being favoured, we would be making a probability statement that was needlessly conservative, in terms of assuming too much ignorance on our part. Likewise, if someone lacks the information needed to do anything but give a 50/50 estimate and we provide him/her with information about the outcome then he/she should adjust his/her probability statement accordingly.
Does this not just mean that we are "correct" and that, when confronted with our information, the person who made the 50/50 statement should admit his/her "wrongness" and defer to our judgement? Not really: even after we have persuaded him/her about the error of his/her ways and got him/her to accept our revised estimates, someone who knows some extra information that (is unknown to us) could do the same to us.
One analogy for looking at this is to imagine a probability claim as being like a bitmapped image of a scene - like the sort of image obtained by a digital camera in which the image is encoded as a grid of pixels. If we have an image of a scene then we have a certain amount of information about that scene. If someone else has the same image, but with a lower resolution (fewer less pixels), then he/she has less information about that scene. Does it make his/her image "wrong"? Not really - but it does make the information possessed by him/her inferior. He/she has a lower resolution, cruder view of the scene than ours. We may feel that our view of the scene is the "correct" one, but if someone were to come along with a still higher resolution image of the scene then he/she would know more than either of us. We can consider someone who makes probability claims that do not take account of information that is known to us to have a "lower resolution" or "cruder" view of the statistics of the situation.
This does not, of course, provide any excuse for someone who makes a probability claim based on information that is wrong or claims of possession of information that are unjustified. Nor does it make it sensible for someone to ignore a good argument that provides information that has an effect on a claim.
Objection 4: You are assuming that reality is deterministic. You are living in the 19th century. Quantum mechanics shows that reality is non-deterministic. Reality, as we understand it, could not even be described in this sort of system.
Answer
The issue of whether or not reality is really non-deterministic, what non-determinism really means and whether or not it is a coherent concept is actually debatable.
At this stage, however, I do not even intend to deal with this objection. We need some principles to allow us to make progress with Occam's razor and if non-determinism causes any problems alterations can always be made later - for example by allow non-determinism in the way models are described and admitting the possibility of a non-deterministic actual model. The main thing is that the general ideas on which things will be based later are reasonable.
I will return to the issue of non-determinism later.
Objection 5: You have said that any consistent candidate model is as likely as any other to be the actual model, yet you say that the only support for this statement is that we make it from a position of ignorance. This is clearly wrong. Suppose you did not understand that candidate models needed to be consistent or even realize that models can be inconsistent? You would have said that any candidate model is equally likely as any other and all the probabilities of the consistent models being true would be different. Clearly, the probabilities obtained in this way would all be wrong and it shows that assigning probabilities from a position of ignorance leads to errors.
Answer
There is no problem here. Such a position would simply be a cruder version of what we have when we know that models need to be consistent.
Objection 6: You have said that any consistent model is as likely as any other to be the actual model, but there is an infinite number of possible consistent candidate models. If they are all equally likely than the chance of any one of these being "correct" is very small - actually 1/∞. Whether or not such a number means anything or not is debatable, but it is certain that the probability of any model being correct is for all practical purposes zero, making the whole discussion meaningless. We could multiply the probability of a particular model being correct by a million and it would still essentially be zero! This result is useless.
Answer
The result would be useless if we seriously expected consideration about a single consistent candidate model to achieve anything else. In later articles we will be considering groups of models and the problem will disappear then.
As an analogy, in geometry we can consider a line to be made of an infinite number of points. Whether or not "infinite number" means anything need not concern us. If someone chooses a point on a line then the chances of us guessing it by choosing another single point are 1/∞, so a similar sort of situation exists, but if we were to select a range of points - a piece of the line - then it would be meaningful to talk about our chances of choosing a piece of the line that contains this point. Similarly, when we later start to discuss groups of consistent candidate models meaningful results can be obtained. In the meantime, any discussion of single consistent candidate models is a mathematical contrivance expected to achieve a useful end - rather like the dxs and dys of differential or integral calculus.
Objection 7: What do you think you are you doing? You are supposed to be trying to demonstrate that Occam's razor is correct. Instead, you have declared that any model is as likely to be the "true" model as any other. You have actually stated that Occam's razor is not true. You have left us worse off!
Answer
I have no intention of not ending up by asserting Occam's razor, but we cannot start by assuming it. It may seem as though I am saying that it is not true, but that is only because I have removed it as an assumption. Later articles will show how Occam's razor actually does follow from this. The two problems of the probability values all being 1/∞ (that is to say, not being meaningful) and the probability values all being the same (that is to say, nothing like Occam's razor being in effect) will be resolved in the same way.
This objection does raise one interesting idea: it suggests that the justification for Occam's razor should be primarily statistical and based on the issue of which models have the greatest probability of being true. This basic idea is not too far from what we will be using later.
Conclusion
This article has added to the previous two articles in setting down some of what we need as a starting point before we can assert Occam's razor.
This article has stated the following ideas:
- Reality can be described by a single algorithm. The algorithm which would do this correctly is known as the actual model.
- We can consider many possible models - actually there is no limit on the number - as candidates for the actual model. These models will be known as candidate models.
- We know things about reality - things we have observed and the results of experiments that we have done. Any candidate model that disagrees with such knowledge is an inconsistent model and need not be considered further as a possible candidate for the actual model.
- After inconsistent models have been discarded the only candidate models that remain are consistent models: they agree with previous observations of reality.
- Any of the consistent models could potentially be the actual model, but we do not know which one actually is.
- In the absence of any reason to think otherwise we can only say that each consistent model is equally likely to be the actual model.
If we expect a statistical justification of Occam's razor, this may seem to leave us worse off than before. If we want to show that Occam's razor is viable then we need a reason for thinking that some models are preferable to others and the assumption that all models are equally likely to be "true" may appear to contradict this. This is because we are being careful not to assume Occam's razor at this stage, but instead will be obtaining it later and the problem will be resolved in future articles. For now, it is important that no unnecessary assumptions are made.
References
[1] Almond, P. (2005). Occam's Razor Part 1: What Is Occam's Razor? Retrieved 22 August 2005 from http://www.paul-almond.com/OccamsRazorPart01.htm.
[2] Almond, P. (2005). Occam's Razor Part 2: Principles of Language. Retrieved 9 October 2005 from http://www.paul-almond.com/OccamsRazorPart02.htm.
[3] Almond, P. (2005). What is a Low Level Language? Retrieved 17 July 2005 from http://www.paul-almond.com/WhatIsALowLevelLanguage.htm.
[4] Piattelli-Palmarini, M. (1996). Inevitable Illusions: How Mistakes of Reason Rule Our Minds, Re-issue - Published by John Wiley and Sons (Originally published 1994, in Italian, as L'Illusione Di Sapere).