Essay talk:A counterexample to a theorem of William Dembski and Richard Marks

This is the draft for a letter to Journal of Advanced Computational Intelligence and Intelligent Informatics (JACIII), criticizing the article The Search for a Search - Measuring the Information Cost of Higher Level Search.

I'm grateful for any input!

13:02, 24 September 2010 (UTC)

P.S.: In fact, there is a lot more to say about this article, I'm pondering on writing an extensive rebuttal, but wanted to keep my letter short and snappy...


 * Go for it! Although I have only a basic grasp of what you are saying. 17:52, 27 September 2010 (UTC)

Ugh
The more I think I understand this, the more hopelessly moronic it looks. The notation is driving me crazy (this is not my field of course), but I think I'm getting most of it. Here's my beef with the article, and you (more knowledgeable than I here) can perhaps resolve any lingering confusion I have:

An "assisted search" only counts as "assisted" if you do get some extra information that helps you find something. The phrase "uninformed assisted search" seems to be a rank contradiction. The only way I can possibly make sense of it is if "assisted" is taken to merely mean "non-uniform", which is a clear abuse of language.

Your third paragraph elucidates a lot. In fact, I think that this objection holds any time the search has more than one iteration (unless the search is so dumb that it always looks in the same place every time). Even their uniform search would not give a good probability measure here, would it? I mean, if it was searching for one of three candidates, and got two guesses, even if it was too dumb to not look in the same place twice, I'm getting a 5/9 chance of it guessing the right answer regardless of which one is the target, which, summed over three possible targets, gives 5/3...

So it looks like we can immediately discount this derivation for all "searches" which are not one-shot (i.e. most things that would count as "searches"). Furthermore, there are some problems with their use of the K-L divergence.

For one, the "active information" seems like a very poor measure of how well a search does. It penalizes searches way too hard for ignoring a possible answer, while not rewarding searches comparably for focusing on equally probable answers. As a result, you have bizarre cases like your coin flipping example. There's an even worse case.

Say that you have a blind uniform search for a coin flip. Then you have an "assisted" search, which has been told that the coin is weighted towards heads. This search therefore guesses "heads" with 75% probability. The actual probability of the coin coming up heads, unbeknownst to either searcher, is 63%. In this case, the expected "active information" is not the K-L divergence, but instead will be 0.63*log(0.75/0.5)+0.34*log(0.25/0.5), which is about -10^3. So the assisted search, which is slightly closer to the true probability and has a 56.5% chance of success, is rated worse than the uniform search. Even worse, the optimal search in this case (what the assisted searcher should have chosen) is to guess heads every time. But the "active information" for this option is always infinitely bad, as long as there is the tiniest, most minute chance of tails coming up.

This leads to an unfortunate conclusion. Not only does this measure punish searches simply for being non-uniform, but it can actually punish them so much that it prefers a less accurate uniform distribution to a more accurate non-uniform one. This is actually not so surprising. What the active information is actually measuring is, given a target, how surprising it is for the assisted search to succeed in finding it (as compared to the uniform search). But the question that people actually care about is how likely it is that the assisted search will succeed, when one does not know the target in advance. If the assisted search underestimates the chance of something by a factor of a million, but that thing only has a one in 10^23 chance of happening anyway, we really shouldn't care.

Of course, Dembski and Marks might object to my modification above; they could reply that the used the K-L divergence (rather than a generic averaging) to emphasize that the actual probability distribution for the target is unknown. To which I can say that their use of the K-L divergence measures nothing but the non-uniformity of a distribution. Furthermore, if the "assisted" search has no correspondence with reality, not even the simple feedback of "you've got it wrong, try again", it's not assisted at all. One must take into account, somewhere, that the assisted search gets feedback which is not used by the uniform search.

Anyway, that's my rambling reaction. If any of that is useful, feel free to cannibalize it, or come back and tell me that I totally misunderstood something. --Quantheory (talk) 13:18, 28 September 2010 (UTC)


 * Ah, thanks for pointing out this stuff about the KL distance. I think you're right that this isn't what they should be looking at.  And I agree with LArron that the remark after the corollary to to HNFLT is plainly false even in the case he gives.  So assuming that whatever quantitative statement they make about $$H_+^{\tilde{T}}(\phi | \psi)$$ is correct, they must be intepreting it wrong.  Can you shed any light on the claim "The NFLT dictates that any search without active information will, on average, perform no better than blind search."?  Wasn't this their definition of actie information in the first place?  But even then I'm not sure where the condition of having active information enters into their statement of the HNFLT at all.
 * Could you make anything of the vertical NFLT? All I could discern is that they're saying that as some threshhold $$\hat{q}$$ increases, the proportion of searches (as measured by the Wasserstein metric on $$M(\Omega)$$; but they call this endogenous information) which have active information greater than $$\hat{q}$$ goes to 0 as $$\hat{q}$$ increases.  But surely this not surprising? --MarkGall (talk) 14:40, 28 September 2010 (UTC)


 * Oh, sorry, I got caught up on my own tangent and managed to mentally drag myself into the draft version of the article. To bring it back to the published version, their claim is that the active information "characterizes the amount of information that ϕ (representing the assisted search) adds with respect to U (representing the blind search) in the search for T."


 * I don't know why this would be so. The way they initially define their "active information" is as the difference between the self-information for the directed search visiting T and the self-information for the undirected search visiting T, for a specific given target. One could imagine two distinct events, $$S$$, which represents the search succeeding, and $$C_T$$, which represents a given candidate $$C$$ being an actual target. Knowing $$P(S|C_T)$$ for each C does not give you $$P(S)$$, unless you also know the distribution $$P(C_T)$$ for each C (they assume a uniform distribution). Furthermore, if you want to extract a probability measure over the candidates, what you actually want is the fourth set of quantities P(C_T|S). I feel like this paper just switches willy-nilly between them, especially with the quantities "q" and "$$\phi(T)$$". Using one quantity to sub in for another quantity, then throwing it into a logarithm and averaging it, that all seems like sophistry to me. For one, shouldn't you calculate the average probability over targets first, then calculate the "information"? They do this the other way around. And since the K-L divergence is clearly not monotonic in P(S), it simply doesn't represent anything about whether the search will work.


 * I'm a bit under-motivated to go into all the details of the vertical NFLT, because I'm already convinced that using this "active information" is a bit of symbol abuse. But if there's something specific happening there I might consider looking it up. --Quantheory (talk) 04:30, 29 September 2010 (UTC)


 * Thanks - Mark and Quant - for your ideas on these matters!
 * The way they initially define their "active information" is as the difference between the self-information for the directed search visiting T and the self-information for the undirected search visiting T, for a specific given target. That is one of my main concerns, too: they use searches for targets for which they weren't defined (designed), and complain that the searches don't do well..
 * The language of the paper is - at best - imprecise, leading to all kinds of possible (mis)interpretations...
 * "The NFLT dictates that any search without active information will, on average, perform no better than blind search." The NFLT dictates that any search (fulfilling a few simple restrictions)  will  on average, perform as well as blind search on a set of functions closed under permutation. It is very surprising that uninformed assisted searches should do worse! Which average are they taking? Questions about questions :-)
 * 14:18, 30 September 2010 (UTC)

Ugh 2
I had a quick read of the article and your letter, but not yet your full rebuttal (disclaimer: this sort of thing is quite far removed from math I actually know anything about, so forgive me if I make no sense). I think most of my confusion stems from the fact that they never really clearly say what the problem is, or what they mean by a measure on search space. Maybe it's a standard thing in this field and their readers will know. Or maybe this obfuscation is intentional. Can you tell me if this example I'm imagining is an example of the problem they have in mind, and the rest of my summary is right?

Take, say, a 4x4 grid with an easter egg in one square (unknown to the searcher). The searcher wants to find the egg; he gets to choose one of the squares as a guess, so $$\Omega$$ is just a set with 16 elements, and $$T$$ is just some target (a singleton, in my case). Suppose he's trying to find the egg within $$Q=10$$ searches. The space $$\Omega_Q$$ consists of all length-10 strings of 16 (distinct, though see later) guesses.

One strategy is to guess randomly each time. This induces the uniform distribution on $$\Omega_Q$$. Alternatively, we could do an "assisted search", where I'm told "warmer" or "colder" after each guess by someone who knows where the egg is. Assuming I have some strategy, my full query string is no longer going to be uniform on $$\Omega_Q$$ -- it's going to favor strings that don't move away from the egg. So a strategy, perhaps assisted in whatever sense, induces a non-uniform distribution $$\mu$$ on $$\Omega_Q$$. Then they make up a lot of words for quantities related to $$\mu(T_Q)$$ which are supposed to measure how much better our strategy is than the uniform one. Some of these useless terms sound like things creationists would want to use informally in other contexts ("active entropy"?).

One of these is "active information". They write "The NFLT dictates that any search without active information will, on average, perform no better than blind search." Hopefully I am misunderstanding this, because it appears to me that a search without active information is by definition one which doesn't perform better than blind search. Whatever the case, no theorem about what's going on so far could possibly have any interesting content, because they haven't done anything besides fix notation and make up words.

Now, I'm interpreting a search strategy (maybe assisted) to be basically equivalent to a measure on $$\Omega_Q$$. They huff and puff about Borel measures and so on, but I think this is stupid because $$\Omega$$ (and thus $$\Omega_Q$$) appears to be finite. Probably it matters latter when we do a "search for search" since $$M(\Omega)$$ isn't finite. I think they want a Radon measure anyway, but I didn't think very hard about it.

But we can do a "search for search"! By this we mean choosing a measure on $$M(\Omega_Q)$$ (which inherits a metric space structure from $$\Omega$$ in a pretty reasonable way), and making $$Q^\prime$$ guesses. I don't remember what they call such a thing, so I'll go with $$\eta$$. We can talk about the active information of $$\eta$$, since it induces a metric on $$\Omega_Q$$ via integration. Fine. Now we define a target space $$T_2 \subset M(\Omega_Q)$$ to be the set of all search strategies on $$\Omega$$ which are better than the uniform search by some threshhold, and we want to do a search on $$M(\Omega_Q)$$ to find these. The vertical NFLT says that a search on $$M(\Omega_Q)$$ that wins has to have high endogenous information. But I thought endogenous information only depends on uniform distribution's win probability on $$M(\Omega_Q)_{Q^\prime}$$. If this is right, then the VNFLT says that if we set our threshold $$\hat{q}$$ too high, then the search for search will probably fail. There's a shocker. I assume I'm no longer understanding this correctly, since I don't see the content. Then the generalized VNFLT is some equally obvious generalization to searches on $$M^q(\Omega)$$.

As for your letter, I don't know what an "uninformed assisted search" is either -- it's a good question. I'd interpret your example of coin flips as giving two different strategies both with zero active information, applied in the case $$Q=1$$, but I don't know how to make any sense of "assisted" (maybe it means positive active information?). Their suppression of the $$Q$$ subscript is the source of some of my confusion -- I thought we want distributions on $$\Omega_Q$$ with $$Q>1$$, which is the case "assistance" would help with, whatever it means. I think a "perfect search" makes sense in my interpretation -- e.g. we're told the answer beforehand and so always win, in that $$\mu(T_Q)=1$$ (or, if we get "warmer" and "colder" hints and $$Q$$ is fairly big, we can always win). We could also have a search guaranteed to fail by being told the answer and then guessing other things. Such a thing also exists, as long as we either drop the unnecessarily complicating condition that we choose Q distinct elements, or else insist that $$Q < |\Omega|$$.

Anyway, that's all I could make of it. Not sure which part is supposed to be surprising, if this interpretation is correct. What do you think? The letter sounds good to me, because we need a definition of "assisted" better than "provides more information about the search environment or candidate solutions than a blind search" (positive active information?). --MarkGall (talk) 01:48, 28 September 2010 (UTC)


 * I agree: it's hard to find the surprising part... the interesting thing may be that they took the NFLTs from their discrete, finite habitat and tried to proof something for infinite sets... But I'm not convinced that Dembski's uniform distribution is the right one to use, and all the musings about Bernoulli's PrOIR aren't convincing, neither.
 * 14:24, 30 September 2010 (UTC)