that is not magic. and it is not great timing. pretty sure all he did was composite that together from different elements with After Effects or some similar compositing program. to get the timing right all in one take would be incredibly difficult. not saying it cant be done, just really hard.
if i were to do something similar i would have three touch's that i would do my spiel with. just move them around according to the script. basically you would plan out the hand movement like a choreographed dance. you would practice it and make sure you got your timing and movements the way you wanted. film it.
then you come up with what you want to show on the three touches. you track the screens of each and then make a composition for each that you then composite all three over the tracked hand dance.
some of the ways the videos on each pod interacted would be very difficult to match up so perfectly when doing it "live". the way the butterfly wings and stuff lined up through out three ipods that were not as perfectly lined up suggests that this was composited. look at the video still frame before playing and you will notice the videos look perfectly lined up, but the ipod in the center is slightly to the right.
must say the idea is still very well done.