Figure 2. Three simplified models of the representation of the position of a moving object throughout the visual processing hierarchy under the predictive coding framework. Each rectangle denotes the neural representation of the position of the object at a given hierarchical level and at a given time, with the filled circle indicating the object in one of five possible positions. In this simplified representation, all connections are modeled as incurring an equal transmission delay (Δt). Colored bands link corresponding representations, and numbered circles highlight the core features of each model. a, The classical predictive model (model A) of predictive coding comprises forward connections from one hierarchical level to the next level (solid lines), and backward connections to the previous level (dashed lines). No allowance is made for neural transmission delays, such that backward connections carry a position representation (asterisks) that is outdated by the time that signal arrives. The resulting mismatch with more recent sensory input generates large errors, which subsequently propagate through the hierarchy (emphasized with starbursts). b, In the predictive model with extrapolated feedback (model B), an extrapolation mechanism operates on predictive backward projections, anticipating the future position of the object. This mechanism on the backward projections compensates for the total time-cost incurred during both the forward and the backward portion of the loop. This would minimize total prediction error in this simplified model. However, the mechanism would rapidly become more complex when one considers that individual areas tend to send and receive signals to and from multiple levels of the hierarchy. c, In the predictive model with real-time alignment (model C), extrapolation mechanisms compensate for neural delays at both forward and backward steps. This parsimoniously minimizes total prediction error, even for more complex connectivity patterns. Additionally, the model differs from the first two models in that at any given time, all hierarchical levels represent the same position. Conversely, in the first two models, at any given time neural transmission delays mean that all hierarchical levels represent a different position. This crucial difference is evident as vertical, rather than diagonal, colored bands linking matching representations across the hierarchy. The consequence of this hypothesis is therefore that the entire visual hierarchy becomes temporally aligned. This provides an automatic and elegant solution to the computational challenge of establishing which neural signals belong together in time: the temporal binding problem. It is also consistent with demonstrated extrapolation mechanisms in forward pathways and provides a parsimonious explanation for a range of motion-induced position shifts.