The density-probability principle is true!
Dec. 6th, 2025 07:00 pmI never properly wrote about this question here -- only obliquely referred to it here with the text "proving it seems annoying" -- but now it has an answer!
This is a question about automatic sets, so I should probably explain what those are, since a lot of you won't know. A b-automatic set is a set of whole numbers which is given by, you have some DFA on the alphabet {0,...,b-1}, you write numbers in base b and run them through the DFA, and the numbers it accepts are the ones it accepts. (And an automatic set is a b-automatic set for some b. There are also automatic sequences, where instead of just accept/reject, the DFA returns some other output. But we'll stick to sets here.)
Now, there's a number of questions you might have here about the details, but the concept is actually quite robust to small modifications and so precisely how you handle the details in your definition is just not going to matter. For instance, do you write numbers little-endian or big-endian? It doesn't matter, because regular languages are closed under reversal.
For the purposes of this question, though, that particular detail will matter: We're going to assume numbers are fed into the machine little-end first.
So let's say we have a b-automatic set, and it's given by a machine M when the numbers are written from the little end. And let's suppose that M has a particular property: That from every state, it is possible to reach a "sink state" -- a state that only ever loops back to itself. In this case, we can meaningfully ask the question, if we feed into M a uniformly random string of digits, with what probability does it accept? Since with probability 1, it will always hit a sink state eventually, and then we can judge acceptance or rejection by that. Let's call this probability p(M).
The question, then, is: Say S is the automatic set determined by M. Is d(S), the natural density of S, equal to p(M)?
It sure seemed like the answer should be yes, because both can be thought of as plugging random digits into M. But I couldn't prove it! And surprisingly, I couldn't find any results on this in the literature, either. Now I'm no expert on automatic sets, so I could have missed something, but still, I couldn't find it. People said the problem sounded easy -- and then produced proofs that were incorrect. No correct ones.
Well, I asked around, I asked on MathOverflow, and now, a year and a half later, I finally got an answer, due to Alex Mennen. Yes it's true, and it turns out it really wasn't that hard after all! I'm just not much of an analyst. :P
(On MathOverflow, after feedback from Jeffrey Shallit, I ended up reformulating the question a bit; but the version I posted is in fact equivalent to my original version, and also, my original version is what Mennen ended up proving.)
But, the problem as stated above wasn't really what I wanted. Let's throw in some complications.
Complication #1: Multidimensional sets. One can define multidimensional b-automatic sets, subsets of Nk, where the alphabet is {0,...,b-1}k. Does it still work here?
Complication #2: The set I cared about wasn't actually a b-automatic set at all, but a Fibonacci-automatic set, where the input is in Zeckendorf! (So, some parts of the setup above have to be appropriately modified; the distribution on digit strings is no longer uniform, for instance.) Can we get it to work here?
Complication #3: Actually, it was a 2-dimensional Fibonacci-automatic set.
Fortunately I was able to modify Mennen's proof to handle these complications (going to multiple dimensions is easy; handling Fibonacci required slightly more work). So yay, it's true!
But, there's more we could ask about. Like, what about other numeral systems? Yeah, I don't want to go there. (Handling bijective base-b I'm pretty sure should work the same by the same proof. I expect dual Zeckendorf should also work, but I haven't tried. But those are minor variations.)
And the other question I have to wonder about is... what about the big end? If we feed in numbers from the big end, then it's easy to come up with counterexamples; but maybe it would work if instead of natural density we used logarithmic density? (Recall, logarithmic density isn't really a separate quantity from natural density -- it just exists more often.) The failures I've seen in the big-endian case have always been of the form "the natural density doesn't exist", not "the natural density exists but has the wrong value". I'm guessing it's probably true! But again, I don't really want to be the one to do it.
Jeff says I should write this up as a paper along with Alex Mennen, which, yeah, I guess I should. Kind of annoying, because it'll push other things back. But, oh well, one more thing for the backlog I guess. Still, first I think I might ask around a bit more, see if anyone knows of a result like this already out there. If it's just something I can cite, great! But my guess is that even if someone has done it for base b, they probably wouldn't have done it for Fibonacci...
-Harry
This is a question about automatic sets, so I should probably explain what those are, since a lot of you won't know. A b-automatic set is a set of whole numbers which is given by, you have some DFA on the alphabet {0,...,b-1}, you write numbers in base b and run them through the DFA, and the numbers it accepts are the ones it accepts. (And an automatic set is a b-automatic set for some b. There are also automatic sequences, where instead of just accept/reject, the DFA returns some other output. But we'll stick to sets here.)
Now, there's a number of questions you might have here about the details, but the concept is actually quite robust to small modifications and so precisely how you handle the details in your definition is just not going to matter. For instance, do you write numbers little-endian or big-endian? It doesn't matter, because regular languages are closed under reversal.
For the purposes of this question, though, that particular detail will matter: We're going to assume numbers are fed into the machine little-end first.
So let's say we have a b-automatic set, and it's given by a machine M when the numbers are written from the little end. And let's suppose that M has a particular property: That from every state, it is possible to reach a "sink state" -- a state that only ever loops back to itself. In this case, we can meaningfully ask the question, if we feed into M a uniformly random string of digits, with what probability does it accept? Since with probability 1, it will always hit a sink state eventually, and then we can judge acceptance or rejection by that. Let's call this probability p(M).
The question, then, is: Say S is the automatic set determined by M. Is d(S), the natural density of S, equal to p(M)?
It sure seemed like the answer should be yes, because both can be thought of as plugging random digits into M. But I couldn't prove it! And surprisingly, I couldn't find any results on this in the literature, either. Now I'm no expert on automatic sets, so I could have missed something, but still, I couldn't find it. People said the problem sounded easy -- and then produced proofs that were incorrect. No correct ones.
Well, I asked around, I asked on MathOverflow, and now, a year and a half later, I finally got an answer, due to Alex Mennen. Yes it's true, and it turns out it really wasn't that hard after all! I'm just not much of an analyst. :P
(On MathOverflow, after feedback from Jeffrey Shallit, I ended up reformulating the question a bit; but the version I posted is in fact equivalent to my original version, and also, my original version is what Mennen ended up proving.)
But, the problem as stated above wasn't really what I wanted. Let's throw in some complications.
Complication #1: Multidimensional sets. One can define multidimensional b-automatic sets, subsets of Nk, where the alphabet is {0,...,b-1}k. Does it still work here?
Complication #2: The set I cared about wasn't actually a b-automatic set at all, but a Fibonacci-automatic set, where the input is in Zeckendorf! (So, some parts of the setup above have to be appropriately modified; the distribution on digit strings is no longer uniform, for instance.) Can we get it to work here?
Complication #3: Actually, it was a 2-dimensional Fibonacci-automatic set.
Fortunately I was able to modify Mennen's proof to handle these complications (going to multiple dimensions is easy; handling Fibonacci required slightly more work). So yay, it's true!
But, there's more we could ask about. Like, what about other numeral systems? Yeah, I don't want to go there. (Handling bijective base-b I'm pretty sure should work the same by the same proof. I expect dual Zeckendorf should also work, but I haven't tried. But those are minor variations.)
And the other question I have to wonder about is... what about the big end? If we feed in numbers from the big end, then it's easy to come up with counterexamples; but maybe it would work if instead of natural density we used logarithmic density? (Recall, logarithmic density isn't really a separate quantity from natural density -- it just exists more often.) The failures I've seen in the big-endian case have always been of the form "the natural density doesn't exist", not "the natural density exists but has the wrong value". I'm guessing it's probably true! But again, I don't really want to be the one to do it.
Jeff says I should write this up as a paper along with Alex Mennen, which, yeah, I guess I should. Kind of annoying, because it'll push other things back. But, oh well, one more thing for the backlog I guess. Still, first I think I might ask around a bit more, see if anyone knows of a result like this already out there. If it's just something I can cite, great! But my guess is that even if someone has done it for base b, they probably wouldn't have done it for Fibonacci...
-Harry