So, you've likely heard that Harvey and Van Der Hoeven recently proved that integer multiplication can be done in time O(n log n), solving a long-standing open problem. (Some people will say they only solved half of a long-standing open problem, because they didn't prove the corresponding lower bound. I say this is silly and just showing it can be done in time O(n log n) is itself a long-standing open problem and there's no need to talk about a lower bound in this context.)
Of course, when we say that it can be done in time O(n log n), this means, as is standard, in the multi-tape Turing machine model. Which has got me thinking once again, just what is the right model of computation, anyway? Now I am not any sort of expert on this -- although I gather that not much is proven here anyway -- but here are some scattered thoughts. Some of this is stuff I've said before, some of it isn't. Likely very little is original.
Now famously if you're a complexity theorist you don't care as long as it's polynomially equivalent, but here we're talking algorithms, so we do want to dig into that.
We start with the single-tape Turing machine. These are famously slow; for instance, detecting whether your input is a palindrome takes Θ(n²) time (I think this is proven, though I don't have a reference on hand). That's ridiculous, right? Clearly not a good model. (Or is it? We'll get back to this.)
The standard model is the multi-tape Turing machine, but this has always seemed to me to be kind of poorly motivated. Still, results there match much better with what we might expect.
But the multidimensional Turing machine has always made more sense to me, although it's not what's standard. For a long time I figured it was probably basically equivalent to multitape, right? Now, well, I don't think that's the case. But let's get back to this.
Finally we have the many variants on the register machine. I'm going to ignore the more primitive variants and just consider the random-access machine (I don't really care whether we have a stored program or not). This is, on the surface, the closest to how actual computers work. But, it yields results that are... counterintuitively fast. For instance, on a random-access machine, integer multiplication can be performed in linear time, which seems faster than should be possible.
What's going on here? If a random-access machine looks like the closest to a real computer, why is it faster? Well, because, of course, it's an abstraction, and so abstracts away various messy details of real computers. Now there's some good reasons we want to abstract such things away. Like, a number of features of real computers just make things more complicated and don't really affect the overall result, so we're better off ignoring them. And, some abstractions are crucial to the whole enterprise -- you can't really throw out the idea of an infinite tape, for instance, without trivializing the whole theory. So those are some abstractions we want to make.
But other sorts of abstractions could be considered something of a model failure, and I think the fact that multiplication is possible in linear time on register machines is an indicator that something is wrong with this model. Specifically, something is wrong with the assumption that we can look up memory in unit time.
But on a real computer we can look up memory in constant time, can't we? I mean yeah there's caching and all but we don't care about that, that doesn't change the fact that it's still bounded above by a constant. Except the thing is, we can't, not really. Sure, for anything stored in RAM we can... but let's consider our abstraction again. In the abstract register machine, "memory" is an abstraction of all the storage resources the computer has available. And, remember, it's supposed to be unlimited.
RAM on an actual computer is not unlimited... but we can give it more to work with if we let it use the hard drive too. Again, we're trying to make it match the register machine model, so we'll allow the CPU to access the hard drive directly as if it were an extension of memory, just with higher addresses. (Word size? What's that? OK, maybe these larger addresses have to be stored in memory, but whatever.) That'll help... but what if that's not enough? OK, give it additional hard drives. But maybe we're not sure in advance how many hard drives we need. So, OK, we need a warehouse full of hard drives, with an automated system whereby, when the CPU looks up memory with a really high address, a machine will find the appropriate hard drive and plug it into the computer, which then continues its compuation.
Constant-time lookup doesn't make so much sense anymore, does it? Remember, the warehouse may have to be arbitrarily large.
So you'll notice I started with a real computer here, but quickly the problem became about modelling not what a real computer is like, but rather the limits of physical reality. These aren't unrelated, to be clear; our usual ideas of how a real computer works are themselves an abstraction, and when things get truly extreme, these abstractions leak -- lookup is no longer constant-time -- and it's the limits of physical reality that become more important. So it's really those that we want our abstract model of computation to reflect, not the particular way our computers work. (Except, as mentioned, that we really do want infinite memory; that particular physical limit, we want to ignore.) Looked at in this light, Turing-machine models make more sense.
So maybe we might try to model computation with a register machine, but with built-in lookup delays; the operation no longer has unit cost, but depends on the address you're looking up (let's call it n). How much of a delay is appropriate?
Well, if we imagine filling the universe with storage, down to some fine resolution, then, in a d-dimensional Euclidean universe, lookup ought to take Θ(n1/d time), right? In a hyperbolic universe, though, it might take only Θ(log n) time. I have to wonder -- is a logarithmic delay really a delay at all? That is to say, given that you have to already write out all the logarithmically-many bits of n before you can use it, the built-in delay would only double this, and so maybe this is actually essentially equivalent to the constant-time lookup model. Or maybe it's not, since you might just alter n slightly and reuse it, and then the extra cost becomes a real factor. But who knows, maybe in a hyperbolic universe you really could compute integer multiplication in linear time. (Or perhaps not, for other reasons; we'll return to this.)
(What about a universe like ours where the presence of matter itself affects the curvature of space? Yeah, uh, let's just ignore that.)
So this is how I was thinking about things until recently -- that a d-dimensional Turing machine is likely equivalent to a register machine with Θ(n1/d) access delays; and that probably this is all basically equivalent to a multitape Turing machine as well. And now, well, I strongly suspect the second is false, and the first seems to have some problems as well (though I won't get to those till later).
So here's the thing: It was pointed out to me the other day by Reddit commenter orangejake that the multitape Turing machine model violates locality! Because the multiple heads have to be able to communicate in constant time regardless of how far they are from one another. And, well, the impossibility of signalling at arbitrary speed seems like one of those physical constraints that we don't want to ignore.
So this has got me thinking that the multitape Turing machine model might just be, well, not a good model after all. But its time estimates are still reasonable, right? Because a multidimensional Turing machine doesn't have this locality problem, and those should be similar timewise, right?
Well... no, probably not. I'd always thought of single-tape Turing machines as slow because, well, they're 1-dimensional, so you have to always go back and forth over the same area. I had assumed the additional freedom of movement in multiple dimensions meant that wasn't a problem in a multidimensional Turing machine, so it would be more like the multitape model. But orangejake pointed out to me that I was wrong about that as well.
Let's consider that palindrome problem again. Say you've got two dimensions to work with. How does that second dimension help? Well, it doesn't. Since you can only store finitely much state in the head itself, you still have to go back and forth across the string to check each character one-by-one, taking quadratic time! Sure, you can go off the string itself and move through empty space... but how does that help? It doesn't. If we imagine the string being laid out diagonally, well, that doesn't help either, same reason. (Though more on this in a bit.)
So, I think now I was seriously wrong to imagine that the multidimensional Turing machine model was basically similar to the multitape model. It isn't; it's much more similar to the single-tape model. Perhaps it still achieves speedups in some cases, but -- well, maybe we should just accept that yes, it really does take Θ(n²) time to check for a palindrome. I don't actually expect algorithmists to switch over to such a model anytime soon (especially as I'm not one!), but, well, that's what seems correct to me now.
But wait! Maybe we can save all this with data layout? Sure, maybe detecting a palindrome takes quadratic time when your input is laid out in a straight line... but what if it were tightly packed in a spiral? Then it might be faster! And so maybe integer multiplication might be linear time in hyperbolic space, but only with appropriate data layout?
And, yeah, maybe. But here's the thing: In CS, if you change the encoding of a problem's input, it's not the same problem anymore. So sure, maybe detecting palindromicity of strings encoded in a spiral is fast, but what if it's not? OTOH, maybe we should just define the model that way, that input is always laid out in a spiral (or the d-dimensional equivalent), and so then we would indeed have a model where it's faster than quadratic. (But still not linear time, like it is in the multitape model, and like it intuitively seems like it should be.)
In fact, that model, where input strings are spirally-packed by definition, might actually be equivalent to the register-with-access-delays model. So maybe that should be our model after all!
...actually, it might be faster than the register-with-delays model, because the register-with-delays model is basically assuming that the head always returns to the center after each access, when in reality, it has greater freedom of movement than that. Hm. Not sure what to make of that.
Except that there's still a problem: How many dimensions? I mean, arbitrarily many, right? But if how long it takes to detect a palindrome depends on how many dimensions (which is, note, only because of the convention I suggested that this determines how you're packing it -- not because of any direct help from the extra dimensions themselves), well, you see the problem...
So maybe we just have to give up on that packing idea. And maybe just plain-old multidimensional Turing machines, no packing -- or even 1-dimensional, single-tape Turing machines, which might be equivalent -- is the right model after all. And maybe we really do have to accept that, when dealing with the limits of physical reality, determining palindromicity really takes time Θ(n²), and basically every problem we know in fact requires much longer to solve than we're used to thinking of...
-Harry
(Next time, maybe: Why I think we should reject the Satan's Apple problem)
(Edit: Or maybe tales of drunk people on the bus, instead :P )
Of course, when we say that it can be done in time O(n log n), this means, as is standard, in the multi-tape Turing machine model. Which has got me thinking once again, just what is the right model of computation, anyway? Now I am not any sort of expert on this -- although I gather that not much is proven here anyway -- but here are some scattered thoughts. Some of this is stuff I've said before, some of it isn't. Likely very little is original.
Now famously if you're a complexity theorist you don't care as long as it's polynomially equivalent, but here we're talking algorithms, so we do want to dig into that.
We start with the single-tape Turing machine. These are famously slow; for instance, detecting whether your input is a palindrome takes Θ(n²) time (I think this is proven, though I don't have a reference on hand). That's ridiculous, right? Clearly not a good model. (Or is it? We'll get back to this.)
The standard model is the multi-tape Turing machine, but this has always seemed to me to be kind of poorly motivated. Still, results there match much better with what we might expect.
But the multidimensional Turing machine has always made more sense to me, although it's not what's standard. For a long time I figured it was probably basically equivalent to multitape, right? Now, well, I don't think that's the case. But let's get back to this.
Finally we have the many variants on the register machine. I'm going to ignore the more primitive variants and just consider the random-access machine (I don't really care whether we have a stored program or not). This is, on the surface, the closest to how actual computers work. But, it yields results that are... counterintuitively fast. For instance, on a random-access machine, integer multiplication can be performed in linear time, which seems faster than should be possible.
What's going on here? If a random-access machine looks like the closest to a real computer, why is it faster? Well, because, of course, it's an abstraction, and so abstracts away various messy details of real computers. Now there's some good reasons we want to abstract such things away. Like, a number of features of real computers just make things more complicated and don't really affect the overall result, so we're better off ignoring them. And, some abstractions are crucial to the whole enterprise -- you can't really throw out the idea of an infinite tape, for instance, without trivializing the whole theory. So those are some abstractions we want to make.
But other sorts of abstractions could be considered something of a model failure, and I think the fact that multiplication is possible in linear time on register machines is an indicator that something is wrong with this model. Specifically, something is wrong with the assumption that we can look up memory in unit time.
But on a real computer we can look up memory in constant time, can't we? I mean yeah there's caching and all but we don't care about that, that doesn't change the fact that it's still bounded above by a constant. Except the thing is, we can't, not really. Sure, for anything stored in RAM we can... but let's consider our abstraction again. In the abstract register machine, "memory" is an abstraction of all the storage resources the computer has available. And, remember, it's supposed to be unlimited.
RAM on an actual computer is not unlimited... but we can give it more to work with if we let it use the hard drive too. Again, we're trying to make it match the register machine model, so we'll allow the CPU to access the hard drive directly as if it were an extension of memory, just with higher addresses. (Word size? What's that? OK, maybe these larger addresses have to be stored in memory, but whatever.) That'll help... but what if that's not enough? OK, give it additional hard drives. But maybe we're not sure in advance how many hard drives we need. So, OK, we need a warehouse full of hard drives, with an automated system whereby, when the CPU looks up memory with a really high address, a machine will find the appropriate hard drive and plug it into the computer, which then continues its compuation.
Constant-time lookup doesn't make so much sense anymore, does it? Remember, the warehouse may have to be arbitrarily large.
So you'll notice I started with a real computer here, but quickly the problem became about modelling not what a real computer is like, but rather the limits of physical reality. These aren't unrelated, to be clear; our usual ideas of how a real computer works are themselves an abstraction, and when things get truly extreme, these abstractions leak -- lookup is no longer constant-time -- and it's the limits of physical reality that become more important. So it's really those that we want our abstract model of computation to reflect, not the particular way our computers work. (Except, as mentioned, that we really do want infinite memory; that particular physical limit, we want to ignore.) Looked at in this light, Turing-machine models make more sense.
So maybe we might try to model computation with a register machine, but with built-in lookup delays; the operation no longer has unit cost, but depends on the address you're looking up (let's call it n). How much of a delay is appropriate?
Well, if we imagine filling the universe with storage, down to some fine resolution, then, in a d-dimensional Euclidean universe, lookup ought to take Θ(n1/d time), right? In a hyperbolic universe, though, it might take only Θ(log n) time. I have to wonder -- is a logarithmic delay really a delay at all? That is to say, given that you have to already write out all the logarithmically-many bits of n before you can use it, the built-in delay would only double this, and so maybe this is actually essentially equivalent to the constant-time lookup model. Or maybe it's not, since you might just alter n slightly and reuse it, and then the extra cost becomes a real factor. But who knows, maybe in a hyperbolic universe you really could compute integer multiplication in linear time. (Or perhaps not, for other reasons; we'll return to this.)
(What about a universe like ours where the presence of matter itself affects the curvature of space? Yeah, uh, let's just ignore that.)
So this is how I was thinking about things until recently -- that a d-dimensional Turing machine is likely equivalent to a register machine with Θ(n1/d) access delays; and that probably this is all basically equivalent to a multitape Turing machine as well. And now, well, I strongly suspect the second is false, and the first seems to have some problems as well (though I won't get to those till later).
So here's the thing: It was pointed out to me the other day by Reddit commenter orangejake that the multitape Turing machine model violates locality! Because the multiple heads have to be able to communicate in constant time regardless of how far they are from one another. And, well, the impossibility of signalling at arbitrary speed seems like one of those physical constraints that we don't want to ignore.
So this has got me thinking that the multitape Turing machine model might just be, well, not a good model after all. But its time estimates are still reasonable, right? Because a multidimensional Turing machine doesn't have this locality problem, and those should be similar timewise, right?
Well... no, probably not. I'd always thought of single-tape Turing machines as slow because, well, they're 1-dimensional, so you have to always go back and forth over the same area. I had assumed the additional freedom of movement in multiple dimensions meant that wasn't a problem in a multidimensional Turing machine, so it would be more like the multitape model. But orangejake pointed out to me that I was wrong about that as well.
Let's consider that palindrome problem again. Say you've got two dimensions to work with. How does that second dimension help? Well, it doesn't. Since you can only store finitely much state in the head itself, you still have to go back and forth across the string to check each character one-by-one, taking quadratic time! Sure, you can go off the string itself and move through empty space... but how does that help? It doesn't. If we imagine the string being laid out diagonally, well, that doesn't help either, same reason. (Though more on this in a bit.)
So, I think now I was seriously wrong to imagine that the multidimensional Turing machine model was basically similar to the multitape model. It isn't; it's much more similar to the single-tape model. Perhaps it still achieves speedups in some cases, but -- well, maybe we should just accept that yes, it really does take Θ(n²) time to check for a palindrome. I don't actually expect algorithmists to switch over to such a model anytime soon (especially as I'm not one!), but, well, that's what seems correct to me now.
But wait! Maybe we can save all this with data layout? Sure, maybe detecting a palindrome takes quadratic time when your input is laid out in a straight line... but what if it were tightly packed in a spiral? Then it might be faster! And so maybe integer multiplication might be linear time in hyperbolic space, but only with appropriate data layout?
And, yeah, maybe. But here's the thing: In CS, if you change the encoding of a problem's input, it's not the same problem anymore. So sure, maybe detecting palindromicity of strings encoded in a spiral is fast, but what if it's not? OTOH, maybe we should just define the model that way, that input is always laid out in a spiral (or the d-dimensional equivalent), and so then we would indeed have a model where it's faster than quadratic. (But still not linear time, like it is in the multitape model, and like it intuitively seems like it should be.)
In fact, that model, where input strings are spirally-packed by definition, might actually be equivalent to the register-with-access-delays model. So maybe that should be our model after all!
...actually, it might be faster than the register-with-delays model, because the register-with-delays model is basically assuming that the head always returns to the center after each access, when in reality, it has greater freedom of movement than that. Hm. Not sure what to make of that.
Except that there's still a problem: How many dimensions? I mean, arbitrarily many, right? But if how long it takes to detect a palindrome depends on how many dimensions (which is, note, only because of the convention I suggested that this determines how you're packing it -- not because of any direct help from the extra dimensions themselves), well, you see the problem...
So maybe we just have to give up on that packing idea. And maybe just plain-old multidimensional Turing machines, no packing -- or even 1-dimensional, single-tape Turing machines, which might be equivalent -- is the right model after all. And maybe we really do have to accept that, when dealing with the limits of physical reality, determining palindromicity really takes time Θ(n²), and basically every problem we know in fact requires much longer to solve than we're used to thinking of...
-Harry
(Next time, maybe: Why I think we should reject the Satan's Apple problem)
(Edit: Or maybe tales of drunk people on the bus, instead :P )