Is a General Theory of Intelligence Possible?

Statistical thinking is merely an artifact of human ignorance. We need to use statistics nature does not.

-- Pierre-Simon Laplace

The orthogonality thesis speaks not of rationality or reason, but of intelligence.

-- Nick Bostrom, Superintelligence: Paths, Dangers, Strategies

How much intelligence does it take to solve Fermat's Last Theorem?

-- George Hotz

Introduction

At the time of writing this note, it is beyond any doubt that we are living through the dawn of artificial intelligence. Just how high the sun will rise remains to be seen, but it is also beyond any doubt that a lot is at stake and the world as we know it will likely change irrevocably after this inflection point in our technological history as a species.

The word "intelligence" carries with it the force behind most of the worries that we might have about it. If some system is intelligent that means it can change its own environment, solve complex problems (possibly better than humans), adapt, and dominate competitive niches. We ought to respect this prospective competition, even at this early date. While today's AIs are super-human in narrow ranges of tasks, no current AI, at the time of writing this note, has been clearly demonstrated to be generally intelligent. It is general intelligence that this note is concerned with.

Regardless of your stance on the current trajectory of AI technology, I believe there is one question that we should at least try to answer or consider the nature of as we continue the break-neck-paced development of artificially intelligent systems. That question is: What is intelligence? More precisely:

Can we have a general, predictive, and quantitative theory of intelligence?

When is an AI system generally intelligent? What degree of intelligence does it have? Why is AI $A$ smarter than AI $B$ ? Is this AI smarter than a squirrel? A dog? How much smarter is some system than us? These are all questions that we might be able to answer more precisely with such a theory.

Let's define some terms at the outset. I will use a very broad working definition of intelligence:

A system is generally Intelligent if: Given a goal or set of goals, it is capable of accomplishing those goals with a broad array of solutions, a broad array of types of solutions, and recursively so for sub-goals. (Meaning it can have goals and subgoals each of which it is capable of accomplishing with a broad array of solutions and a broad array of types of solutions)

(Clearly, there are many psychological theories of intelligence. This definition loosely glosses over them as my purpose here is a bit different.)

If we want to explain and predict intelligent systems we need a theory of intelligence that is general, predictive, and quantitative:

I will take a general theory to mean some theory that applies to a very broad array of cases, across different species and different substrates (machine or artificial intelligence). A theory of dolphin intelligence or LLM intelligence is not a general theory of intelligence. A theory that can attribute intelligence to almost any system is.
I will take a predictive theory to mean some theory that can tell us if some phenomenon is an instance of intelligence with some reasonable level of confidence. We have a predictive theory of "Water", for example. It is $H 2 O$ . We see behavior that tells us that we have two hydrogen molecules and one oxygen molecule in the right configuration and we know we have water.
I will take a quantitative theory to mean that a mathematical model with well-defined terms can explain the phenomena and make predictions. For example, we have a predictive and quantitative theory of electromagnetism. If we know the value of certain vectors, we can compute the rest with a high degree of accuracy.

I will argue that there are reasons to doubt that such a theory of intelligence is possible. These reasons are not overwhelming, but they prompt methodological considerations in other directions. I will also argue that if we had such a theory of intelligence, it would be unlikely that it would be sufficiently predictive over a broad array of cases. I will first introduce several positions on the possibility of a theory of intelligence, show some reasons for doubting the possibility of such a general theory, and explain why narrower theories of intelligence (e.g. theories of intelligence for one particular type of system) are more likely to be predictively useful. On the basis of these claims I will argue that instead of searching for a fundamental theory of general intelligence, we should search for "intelligence constraining factors". I will conclude by showing how the arguments here apply to a broader class of concepts that contains 'intelligence'.

Why Do We Care?

Why might such a general, quantitative, and predictive theory of intelligence be important? Why is something like the working definition above not good enough? The simple answer is that it cannot predict intelligence from the simpler properties of a system and therefore can really only be applied post hoc, once we see the system behave. Imagine living on a planet with water and schwater. Schwater is behaviorally identical to water except it explodes when it contacts certain bacteria found in the human gut. We would want to have a theory of what distinguishes schwater from water, in exact detail, as soon as possible. Obviously, we can tell someone drank schwater and not water if they explode, but what good is the knowledge at that point? I believe the analogy holds for where we stand with AI. If there is some turn-key element of information processing that makes a system intelligent, we want to be able to measure and distinguish that before the starting pistol fires, for both doing good and preventing bad.

Some Model or No Model?

There are two routes we can go with the question of whether there could be a general, quantitative, and predictive theory of intelligence. I will call them the 'Some Model View' and the 'No Model View'.

The Some Model View claims that at least one mathematical model can map a well-defined type of information processing to the concept of intelligence and remain predictive with a reasonable number of counterexamples or edge cases.

A predictive model is not the same as a descriptive one. I can describe a basketball game play-by-play in perfect physical detail, but no two games are alike. We need generality for a model to be predictive. Notice that the Some Model View seems committed to the multiple realizability or substrate independence (in principle) of intelligent systems, in the idea that some purely mathematical theory could explain their behavior, presumably in information processing.

The No Model View claims that there can be no general, predictive, and quantitative model that would count as a theory of intelligence, so defined.

For some theory of intelligence to meet the criteria laid out here, it seems like it would have to exist on a lower-level of description than a behavioral one like the working definition given in this note. I think there is little doubt that we can have the kind of definition like this note's working definition, but its behavioral nature seems to preclude it from being precise and therefore from being quantitative and predictive. It also precludes it from applying to systems whose behavior we cannot interpret saliently. The question at hand is not whether we can estimate intelligence based on some loose definition from intuitions post hoc, but rather can we predict its presence in any system based on some lower-level behavior of that system.

Some Kind or No Kind?

It is a separate question from there being a model or theory that can quantitatively and predictively explain intelligence, whether or not there is a real kind that underlies intelligence. To clarify, something has a kind when it has clear identity criteria. A "Royal Flush" has a kind. One particular set of phenomena is called a royal flush and it has clear identity conditions. A hand of standard playing cards is a royal flush if and only if it consists of the five highest-ranking cards in a standard deck of playing cards, all of which belong to the same suit. "Whimsy" on the other hand is a concept that we can confidently say has no kind. Whether a person is whimsical is not a matter of clear and well-definable criteria.

I will largely steer clear of the question of whether intelligence has a kind, and whether it is more like a royal flush or whimsy. I think this question is a moving target as the definition might shift along with the adoption of a new theoretical framework. There most likely is not a kind for the folk-concept of intelligence, as we use the word in everyday life, but a more precise concept may capture lots of what that folk one does and it is more likely to have some real kind. I will remain largely agnostic on whether or not intelligence has a kind, though admittedly I lean strongly in the direction that it does, but perhaps only at a high-level of description.

A Note on Scientific and Folk-Concepts

What does "intelligence" mean? Can that meaning change over time? At some point, scientific and folk-concepts have to diverge for scientific progress to be made and for folk-talk to resume, unimpeded by tedious clarification or changes in idioms. I have used the example in these notes before of time to illustrate this point. We can safely say that it is the same time on the other side of the room or on the moon as it is where I sit right now and also acknowledge that we use the concept differently when talking about near-light-speed travel and real simultaneity. Folk "Time" is not relativistic "Time". Intelligence will doubtless be the same if we end up with a rigorous theory of how it works. If we had a more rigorous theory, we might, using the folk-concept, say that Parakeets are intelligent but in the back of our minds we know that Parakeets don't make the cut and some GPU-farm doing the right kind of information processing does.

The Some Model View

The Some Model View of Intelligence claims that there is some well-defined mathematical model that can underpin our understanding of intelligence in a general and predictive way. I want to be clear that this model or theory would not have to be immune to all counterexamples, it merely would have to be a very very strong predictor of a system's intelligence and come to be the meaning of "intelligence" when used in a more rigorous scientific sense.

Photosynthesis is an example from the biological world where we already have a Some Model Theory. There is a model that is photosynthesis and it is described by this equation:

CO 2 + 2H 2 A + photons \to [CH 2 O] \to + 2A + H 2 O

Photosynthesis is any reaction where carbon dioxide and an electron donor are converted into carbohydrates, an oxidized electron donor, and water in the presence of light energy (photons). If we see that process, which we can rigorously test for, we've got photosynthesis. There are weird edge cases, of course, and other processes in some crazy archaea that are like photosynthesis but not quite etc, etc. That's ok. This is a predictive model of photosynthesis because we can show up to some green thing harvesting sunlight and we can really tell if it is photosynthesizing and predict is behavior. The model is general. Something need not be a plant for it to be photosynthesizing, it just has to have the above chemical reaction. Additionally the model is quantitative because we can use a mathematically precise theory to show that it holds true. Compare this with a merely descriptive account where we sit organisms in the sun and see if the do or do not die. While that might describe lots of organisms that photosynthesize, it does not predict photosynthesis.

Considering the general, predictive, and quantitative model of photosynthesis, could we have something analogous for intelligence? In a very interesting stream, one which was the tipping point for me writing this note, tech-wizard and renowned hacker George Hotz asks:

Thermodynamics is to Energy as ??? is to Intelligence

This is a truly fascinating question. Hotz also points out that we don't have a basic theory of intelligence, yet we are cranking out AI at a blinding pace. My question is different than Hotz's but it shares a lot of common ground. Something analogous to thermodynamics, perhaps harnessing information theory, computer science, and who knows what else, could be how we might get a photosynthesis or thermodynamics-esque theory of intelligence.

Hotz makes the following suggestion via an analogy to "Horse Power" (which literally referred to horses at one point):

"Person-years" (7300 $P$ FLOP-days)

FLOPS (floating-point operations per second) are a measure of how fast a computer can perform calculations. A floating-point operation is a type of mathematical calculation dealing with numbers that can have both whole and decimal parts (e.g. $7.233$ ). Peta, denoted with $P$ is 1,000,000,000,000,000 $X$ . According to some credible estimates(?) the brain has between $0.9 - 33.7 \times 10^{16}$ FLOPS or $0.9-33.7$ $P$ FLOPS. Can we measure intelligence with FLOPS?

This approach is no doubt going to have all kinds of strong counterexamples and is very computer-centric. On what epistemic basis do we know that FLOPS is a good measure of intelligence? Surely it will be correlated, but we need a bit more than that to really have a theory of intelligence. We can easily imagine a system with an astounding magnitude of FLOPS but that just performs some demanding but narrowly scoped task like cracking hashes. My intention is not to go down the rabbit hole with the FLOPS theory, it is just to give an example candidate for what a theory of intelligence might look like if we think the Some Model View is correct. Perhaps FLOPS plus some level of complexity, recursion, and some other secret ingredients are enough for a theory of intelligence, but there are good reasons to doubt that as the next section will show.

The No Model View

The No Model View of intelligence claims that there is not a well-defined general, predictive, and quantitative model that can underpin our understanding of intelligence. There could be many reasons for adopting the No Model View. Perhaps we think that "intelligence" is just too vague of a concept, that it is inherently mysterious, or that it is a strictly high-level concept that can only be applied on the behavioral level (post hoc) and at no finer-grained level of detail. Maybe there is no breaking it down into the behavior of a Universal Turing Machine ( $UTM$ ), a theoretical machine capable of computing anything that is computable, or no breaking it down into some type of physical theory either.

I am presently in favor of the No Model View over the Some Model View. I am a "Some Kind, No Model" intelligence theorist, when speaking of intelligence generally, in any system. How can there be a kind, but no model? The existence of a kind does not entail either that it is useful in capturing the folk-concept or that it is sufficient for allowing predictive models. To clarify, I do not claim that the folk-concept of intelligence as it currently stands has a kind (I doubt that very much). I rather believe that there is some more rigorous definition yet to come (or never to be discovered) that describes some kind and sufficiently captures the folk-concept. I believe that there could be some descriptive account of intelligence, and perhaps predictive theories for narrower meanings of intelligence for specific types of systems, but I think it is unlikely that there will be a predictive and general account of intelligence. I will introduce three reasons to be skeptical of the Some Model View (which is an affirmation of the No Model View).

The first is that higher-level (behavioral level) concepts are complex and supervene on lower-level phenomena. This means there can never be a change in the higher-level phenomena (intelligent behavior) without a change in the lower-level phenomena (bits flipping or particle behavior), but there can be a change in lower-level phenomena without a change in the higher-level phenomena. Our attribution of the higher-level concept of "intelligence" to the phenomena is what counts as it being an instance of intelligence and that seems to suggest that the higher-level concepts are really in charge (for now). As long as that remains the case, then the theory is hardly more predictive than our high-level estimations are in the first place, unless the lower-level theory begins to take the wheel.

The second and related reason is that computations themselves are multiply realizable like instances of information processing. Information processing can happen in a circuit board and in the brain. A given computation, like "add two integers" can happen in many ways as well, by summing the lengths of lines or counting marbles, for example. There is no one organization of bits in a $UTM$ , or anywhere else, that counts fundamentally as the computation "add two integers". I argue that is powerful reason to doubt that there could be any fundamental-computation for intelligence.

The third reason to be skeptical of the Some Model View is that the complexity of a hypothesis is usually inversely proportional to its predictive accuracy. We need complex models to model complex phenomena (if we want to model them on a lower level), which is exactly what a theory of intelligence seems like it needs to do.

High-Level Concepts and Supervenience

The Some Model View hinges on the ability to somehow map lower-level attributes onto higher-level ones. This can be easy or tricky depending on the concepts at play, even assuming the concepts are clearly defined (e.g. not whimsy). Heat is somewhat straightforwardly the kinetic energy associated with the motion of some atoms and molecules. We can tell a system is "hot" based only on the information about the atoms and molecules. Heat supervenes on these lower-level properties of molecules and atoms as does intelligence on whatever lower-level phenomena underly some instance of intelligent behavior.

In the case of "heat", at one point we can imagine that we might have checked if "the kinetic energy of atoms and molecules" was a valid theory of "heat" (where "heat" could only before have meant "felt heat") by feeling a liquid and then taking some measurements. We can now check if something that "feels hot" is in fact "hot" by checking the kinetic energy of the atoms and molecules. Metals with high thermal conductivity feel hotter or cooler than they really are, so we can even flip the old concept completely backward: This seems to be hot, but it can be shown to be the same temperature as your skin. What justifies this shift from folk-concept to scientific-concept and is it reasonable to expect this shift to be possible for the concept of intelligence?

Let's say we have a model like the Some Model View claims we can have. Suppose we start making predictions with it. How do we know our model has captured intelligence and not something else? The ultimate criterion for "intelligence" being what we are tracking is a higher-level criterion, until or unless "intelligence" means something beyond the folk-concept. So either the concept changes in meaning, like heat, or the attribution of intelligence to some system just is a higher-level attribution. That would mean that any lower-level model that predicted intelligence being there would still face the tribunal of the higher-level criterion. It is also possible that a conceptual paradigm shift, like that of heat, occurs for the concept of intelligence?

To make the situation clearer, let's imagine that someone comes up with a theory of intelligence $I$ that is fairly comprehensive when it comes to artificial systems, but it declares that monkeys and octopuses are not intelligent. We might then use the higher-level criteria of something like this note's working definition of intelligence to claim that $I$ is just not a theory of intelligence. It is tracking something maybe, but not intelligence. There is, at present, no other way of verifying that than checking against our intuitions about the meaning of "intelligence". That does not mean there can never be a lower-level theory that tracks something close to what "intelligence" tracks, but it doesn't give us strong reasons to believe there will be either. Until $I$ reaches an overwhelmingly critical mass of success among cases where the higher-level attribution is the judge, it will not really be a theory of intelligence.

Let's now suppose that $I$ correctly attributes intelligence to all expected cases with the exception of birds. At that point, birds may be to the folk-concept of intelligence, as things with high thermal conductivity were to the folk-concept of heat. A paradigm shift occurs when $I$ achieves enough success in explanation and prediction that we begin to measure the folk-concept against it, rather than judge success solely based on the folk-concept. At this point, when the comprehensive theory has a few obvious counterexamples when compared with a higher-level model, it will likely cause a divergence of the folk-concept with the new theoretical one. This is what we see with heat. The theory of heat via kinetic energy reaches a critical explanatory mass in physics and everyday life and the folk and scientific concepts diverge. At present, it does not seem like we even have a roadmap for having a technical theory of intelligence that could get us to this critical mass point, but this is exactly how we would expect it to look before a potential paradigm shift. We should remain agnostic for now, but acknowledge the challenges for a potential theory of intelligence.

Higher-level concepts tend to be coextensive with many complex lower-level descriptions as the supervenience relationship entails. This one-to-many complex relationship makes it seem unlikely, though not impossible, that a predictive model could glean something from complex lower-level behavior that would get us to the high. The more complex the concept, the less likely it seems. We know the information that higher-level models compress is not random, by virtue of it being compressible, but that does not mean that it has to track any lower-level pattern at all by any necessity. It is important to note that the supervenience relationship alone does not entail reducibility or laws. We might have 'intelligent' systems and no law or definition by which we can predict intelligence from the lower-level phenomena alone. If there is no law, then a successful model is far less likely and we can be more skeptical of any paradigm shift analogous to the heat case.

The Multiple Realizability of Computation

We might want to try to pin down intelligence as information processing which is reducible to certain types of complex computations. I will argue that another reason to doubt the Some Model View is that what counts as a given computation might have no determinate fundamental case. That means there is very likely no "one computational type" that corresponds to intelligence. If there is no such "motion(s) of the bits" that constitutes intelligence, then if there is a Some Model View of intelligence, it cannot be at the low level of information processing. This abstraction to the level of information is important because if a lower-level theory of intelligence were to make sense across substrates (circuit boards, neuron activity) information would be the common denominator. If a given computation cannot definitively be a certain type of information processing, then there is reason to doubt that any lower-level substrate-independent theory is possible.

If there are fundamental forms of a given computation, that would mean that there were certain types of behaviors of bits in a system that count as fundamental cases for that computation. If there are "fundamental computational models" for an extremely simple operation, maybe we can expect there to be one for a more complex operation. I will argue the converse as well: that if there is not a "fundamental computational model" for a simpler operation then there almost certainly is not one for a more complex operation like we could expect intelligence to be.

Let's take integer addition for our simple example: $5 + 5$ . There are many ways a $UTM$ could add two integers (many ways to computably add two integers). Presumably, if there is a "fundamental" way to compute something, a $UTM$ should be the first place to look as it is the simplest theoretical computing machine.

   0101   (5 in binary)
 + 0101   (5 in binary)
 --------
  1010    (10 in binary)

This can be done by binary addition, checking places from right to left, and adding the numbers of the same valence. If the two bits are $1+1$ , the result is $10$ so that place becomes a $0$ and the $1$ carries over to the next pair of bits to the left. Let's call any computation that has the correct solutions for the addition of any two integers from the set of all integers an integer addition computation, a computation that counts as 'integer addition'. It is an integer addition computation if and only if it has the correct solution set for all sets of two integers.

0101
   | <- Both of the first right-most places are 1, so we make that place a 0 and carry a 1 to the second place, one place to the left 
 | (repeat that operation for this place as well)
0101
 --------
  1010    This is the result.

That is how a $UTM$ could simply do "integer addition" but it is not the only way to have an integer addition computation. Suppose we had a lookup table like this:

|     | 1  | 2  | 3  | 4  | 5  | 6  | 7  | 8  | 9  | 10 |
|-----|----|----|----|----|----|----|----|----|----|----|
| 1   | 2  | 3  | 4  | 5  | 6  | 7  | 8  | 9  | 10 | 11 |
| 2   | 3  | 4  | 5  | 6  | 7  | 8  | 9  | 10 | 11 | 12 |
| 3   | 4  | 5  | 6  | 7  | 8  | 9  | 10 | 11 | 12 | 13 |
| 4   | 5  | 6  | 7  | 8  | 9  | 10 | 11 | 12 | 13 | 14 |
| 5   | 6  | 7  | 8  | 9  | 10 | 11 | 12 | 13 | 14 | 15 |
| 6   | 7  | 8  | 9  | 10 | 11 | 12 | 13 | 14 | 15 | 16 |
| 7   | 8  | 9  | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 |
| 8   | 9  | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 |
| 9   | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 |
| 10  | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 |

Now, our $UTM$ can look through these tables stored in some binary format and come up with sums for each. Or it could combine sets, compare line lengths, or do any other mathematical operation (however complex) that is equivalent to addition etc, etc. There are infinite possibilities. They are all integer addition computations. Is any one of them the fundamental computation for integer addition in some sense?

We might want to say only the one that uses the least bits or least operations is the real computation that should count as an instance of integer addition or be fundamental. But why should it be more fundamental? The $UTM$ 's binary addition might be said to be the most fundamental because all others in the set of integer addition computations can be shown to be reduced to this one or to be special cases of it. But in what sense is the lookup table a special case of binary addition which does not apply vice-versa? Maybe if I can predict that that process has certain mathematical properties I can instantly know it is a special case of binary addition. But what properties are those other than having the right solutions? The identity of integer addition computations is purely extensional.

To make the case even more clear, allow me to introduce one more admittedly silly example. The Integer Addition Rube Goldberg Machine ( $IARGM$ ). You give me two numbers, and I drop a marble in a hole based on the first number and another marble in yet another hole for the second and a hilarious chain of events ensues in which both marbles roll down ramps, light matches, knock over dominoes, etc until finally, I get some output number of marbles from some contraption that represents the solution. You give me any two integers, I give you a number of marbles of some color or other that represent the solution. My wacky machine is never wrong. Now here's the challenge: is there any property other than the solution set (the extension) that gives the actions of my ridiculous $IARGM$ the rightful title of integer addition computation? I think not. Even if there is, that underlying theory is probably so complex that it fails to be predictive or general. I will say more on that in the coming sections.

I think this attempt to operationalize concepts is barking up the wrong tree. Maybe the level of "raw bits" is too low and something mathematically precise, within reason, but slightly more abstract is the right tool in the toolkit. The proof seems to be nowhere but the pudding for higher-level concepts like intelligence. So, it seems, that whether we use bits or FLOPS or $IARGM$ s, we end up with the same problem. Any candidate theory for intelligence needs to at least climb the ladder of the high-level concept, even if it can eventually kick it away. Starting from the ground floor seems impossible.

Complexity, Simplicity, and Prediction

What if the Some Model View is correct and there is some pattern in the data, that can generally, predictively, and quantitatively define intelligence, but it is just intractably complex? Intelligent systems are no doubt complex phenomena like stock markets or weather systems, so any successful theory will in some sense reflect that complexity, whether by detail or abstraction. I have two points to make here, one is to compare other disciplines' approaches to these complex phenomena and the other is to make some probabilistic claims about simplicity, complexity, and prediction that show that narrower system-dependent theories of intelligence are far more likely to be useful than a general theory, if such a general theory is possible.

Suppose we are studying hurricanes, another complex adaptive system. Our goal is to predict hurricane paths and formations, mitigating disasters. It would be a bad idea to begin trying to look for the hurricane on the low level of individual anemometer measurements or individual temperature readings and try to amass them into a lower-level theory of hurricane formation. Perhaps that strategy seems reasonable at first glance, but often times all of those small lower-level details line up, and no hurricane forms. Instead, we look for higher-level patterns and work backward. There might be some exhaustive set of LaPlacean physical details that is a quantitative and predictive theory of hurricanes, but assembling those details is not how we predict them. We start from the high-level and work down. Atmospheric scientists come up with a rough list of hurricane-enabling factors and apply predictive models to existing weather systems at a high level of abstraction.

The "Hurricane Model" we apply might only ever be a black-box prediction algorithm too detailed and full of ad hoc details to ever be human-readable. The same might be true of intelligence. In fact, one could argue that each of our brains has a black-box theory of intelligence that evolution has made handy for us to interpret other systems with. We can predict intelligence, but we cannot readily know exactly how our brains are doing that. Black box algorithms are complex beyond human readability. This is not a problem for model-predictiveness. However, as we continue up the sliding scale of complexity, problems start to arise.

Even supposing we had a near-perfect black box algorithm, there is a further problem. It is a model-fitting problem: the better our model fits the data the worse it will be at making a broader range of predictions. Even the best hurricane model is likely an "earth hurricane model". There is an inverse relationship between simplicity (and therefore broader applicability) on the one hand and predictive accuracy on the other. Some information is lost between the representation of the data (the model) and the data itself. This relationship can be demonstrated with Akaike's Information Criterion and with some simpler Bayesian examples as well. I will keep it even simpler here. Let's imagine two extreme hypotheses about intelligence to illustrate this point about model fitting:

Super Fancy Hypothesis ( $F$ ): Intelligence is {{ Insert a perfect LaPlacean description of every intelligent system that ever existed or will exist in the entire history and future of the universe }}

Super Simple Hypothesis ( $S$ ): Intelligence is having a working human brain.

We might immediately say that $F$ is better, but not so fast. What if we replay the entire history of the universe, only some things are slightly different. $F$ has an incredibly large set of adjustable parameters in its model. It takes account of every single atomic particle's behavior from the dawn of the universe till the end. When we replay the tape, $F$ will say "Wait a minute, particle 112,313,364,335... is not there. Well, no intelligence here." Obviously $S$ is a terrible hypothesis too, because there are clearly animals right alongside us that are intelligent and so on. The model we are looking for needs to balance between $S$ and $F$ such that it is still predictive but does not have too many obvious counter-examples in the data. Too much detail reduces prediction to description.

Let's revisit the first sentence of this section:

What if the Some Model View is correct and there is some pattern in the data, that can generally, predictively, and quantitatively define intelligence, but it is just intractably complex?

is an intractably complex model actually a theory? At some point, it is not. Spurious correlations illustrate this point extremely well. The "divorce rate in Maine" has a correlation coefficient of $r=0.99255$ compared with the U.S. "per capita consumption of margarine". Some LaPlacean hypothesis can be imagined that links these two things together, but that hypothesis is virtually guaranteed to be predictively useless. (Under certain physical models there need not actually be such a LaPlacean hypothesis.)

The more complex a phenomenon, the more complex of a theory is needed to account for it (if we want a lower-level description e.g. "bits" or "particles" level), the less likely that a theory of that phenomenon will be predictive if it is highly detailed. If we want to predict if intelligence (starting off with it defined as a high-level pattern) is there from the low-level, we should expect the model that makes that prediction to either be very inaccurate generally or to fit only a very narrow set of observations well. This conclusion does not bode well for any theory attempting to graft law-like relationships between high-level concepts and low-level phenomena. To avoid being merely descriptive, the model must be compressive. It must simplify. If it is too simple, it does no better than the high-level description and has a plethora of counter-examples. If it is too complex, then it becomes increasingly less likely for the model to be predictive over a general set of phenomena. This is by no means an impossible line to tow, but it points towards domain-specific theories of intelligence being far more likely to be predictive than general ones.

Conclusion: Constraining/Enabling Factors & Narrow Theories

There is an allure to what I have called the Some Model View of a potential theory of intelligence. However, there are reasons to doubt that something like it is attainable or even worth chasing. In summary, these are the reasons why a Some Model View theory of intelligence seems implausible or not worth pursuing:

High-level concepts and paradigm shifts: Typically, explaining high-level concepts via lower-level ones is done when some lower-level theory achieves a critical mass and is able to outrun the folk-concept, predictively speaking. We have yet to see any indication of this for intelligence.
Supervenience: Explaining high-level concepts in low-level terms has the inherent difficulty of the high-level concept supervening on the low-level one which makes the existence of laws or generalities from the low to high level dubious.
The proof is only in the pudding: The lower-level behavior of a system constituting even a simple computation is purely extensional. The existence of particular 'behaviors of bits' that underlie intelligence seems dubious.

Model fitting: Extremely complex models become either poorly predictive or narrow in scope as the adjustable parameters of the model increase.

The entire inquiry into the viability of a general theory of intelligence is motivated by the desire for prediction, both for our safety and innovation. The primary goals that we should have with the development of AI are:

Safe development (being one step ahead whenever possible).
Creating AI technology that helps make the world a better place not a worse one.

Something like a general, quantitative, and predictive theory of intelligence would help us greatly in achieving these ends. However, given the strong reasons to doubt the possibility of such a theory, I would like to propose two alternatives.

Alternative Approaches

The first is system-specific or domain-specific intelligence theories. We may be able to have qauntitative and predictive but highly specific theories of intelligence. Starting from the high-level, it may be easy to show that once a certain metric is reached or a certain ability enabled, a specific type of system sky-rockets in its ability to satisfy high-level criteria for intelligence. Through experimentation, we may be able to advance causal hypotheses about what 'intelligence' means for that type of system. For example, perhaps we will be able to show that computers of a certain organization and processing power are intelligent under specific circumstances, or that neurons in a certain configuration and in sufficient number make an organic brain intelligent.

The second alternative is searching for general Intelligence Constraining (or enabling) Factors. We likely cannot have laws that bridge the gap between information processing and goal-directed solution-finding. We also likely cannot have well-defined sets of metrics that constitute necessary and sufficient conditions for intelligence. What we can have, generally and for any system, is how the ability to widen and tighten the bottleneck. All AI may be stoutly bottlenecked by FLOPS or energy flux through the system, for example. General intelligence may struggle to develop without multi-media training data or without some kind of bottom-up process. These might be examples of constraining or enabling factors.

Even system-specific theories of intelligence will be highly complex and finding and understanding these Intelligence Constraining (or enabling) Factors will be no simple task either, but these are much more expedient and conceptually salient than a general theory. This is where we should focus in order to continue developing incredible technology and stay safe.

Mind Concepts

Almost everything I have said here with regards to a "theory of intelligence" applies equally well to a "theory of consciousness" or a "theory of minds". Whether a system is conscious or has a mind also depends on complex high-level concepts that are still in their scientific infancy. Turning the thoughts in this note towards these more general mind concepts, we should be open minded to paradigm-shifting theories that alter their meaning, understand that they are complex, accept that there may not be psycho-physical laws, and not cling too tightly to counter examples in our folk-theories.

Introduction​

Why Do We Care?​

Some Model or No Model?​

Some Kind or No Kind?​

A Note on Scientific and Folk-Concepts​

The Some Model View​

The No Model View​

High-Level Concepts and Supervenience​

The Multiple Realizability of Computation​

Complexity, Simplicity, and Prediction​

Conclusion: Constraining/Enabling Factors & Narrow Theories​

Alternative Approaches​

Mind Concepts​