Yampolskiy usefully lays out a new impossibility result for explainability, an “important requirements for intelligent systems deployed in real-world domains.” To summarize the argument, the complexity of an AI can be high enough that explainability to a limited human mind is impossible, because it is larger, in the sense Demski and Garrabrant use when discussing embedded agents. While correct, this argument runs exactly at odds with the poorly understood reasons why deep learning is successful in the first place — that the structures discovered are in some sense “already there” to be found.
The expressiveness of deep learning with models is almost always sufficient to overfit available training data data, but such models nevertheless are empirically successful at out-of-sample prediction and extrapolation. It seems that it is is some sense “easy” to learn some domains with NNs. (Unsurprisingly, perhaps, these tend to be the domains that humans were initially surprised machines couldn’t do intuitively — they are things we do easily, which are by definition the types of things that neural networks, in both the original sense, and the modern computational model sense, excel at doing.) This leads to the conclusion that DNNs are appropriate because they are similar to their creators, they are not too distance in the space of all possible minds. To support this claim, for example, some work has found that it is usually possible to train smaller models to emulate larger ones, once the larger model is trained. This is because the underlying structure that is discovered is less complex than the model used to discover it — which must be true given that otherwise the expressive power of DNNs would find it so easy to overfit that generalizing would be impossible.
To explain this, it is useful to note a fact which may be obvious to others, that the complexity of an AI system is bounded jointly by three factors; the expressive power of the model, the available samples, and the complexity of the domain. To provide examples, we can dissect each case. A linear model can only find a simple relationship, no matter how much data is provided or how complex the task to be performed may be. An expressive DNN model given a difficult image classification task will (without strong structural prior knowledge and/or pre-training) invariably under-fit if given few samples. And a complex model with a large number of samples designed to classify pictures into “dark” or “light” will still be relatively simple.
This points to a loophole, not a flaw, in Yampolskiy’s analysis; he is right that the pigeonhole principle implies that there is no way to make all possible AI systems explainable, but it does not rule out the possibility that there are interpretability schemes that allow cogent explanations in, say, all cases maximizing a specific utility function. More likely, there is a scheme that can cover all cases that concern systems that Humans are capable of understanding at all. That is, if the systems we build are incompressible and unexplainable, it means that not only is the system complex, and the sample size large, but also that the system being modeled is fundamentally complex, likely in ways not capable of being understood by humans. Further, given that, as noted above, DNNs seem to excel at the types of analysis and classification that human minds also do, albeit sometimes less well, it seems plausible or even likely that the impossibility result is most important for domains poorly suited to tractable analysis of the types both human minds and DNNs are successful at, rather than for domains that are “only” impossible to communicate.