Well, there’s a lot to it but I’ll try to answer at least some of your questions.
Basically, a move from one architecture to the other is difficult because of the lack of compatibility. Most code base is not platform-independent, meaning, you usually need to invest a huge amount of effort into porting it. This is one of the reasons why Apple puts so much pressure on the developers to use the official API and Swift. Having that done, it’s Apples job to make those two work on whatever system they wish, and the developers usually need some minor adjustments in order to make their products work on another kind of system.
The electrons flow with the speed of light (well, close to it), so this isn’t an issue. However, space is one of them. You can make the transistors and other elements as small as the current technology allows for. And don’t forget, you need to connect them as well. Obviously, the more of them there are, the more potential heat will be generated (transistors in a CPU don’t consume any power if they don’t “work,” only the switching process requires energy, however, you do want them to work, otherwise, why would you put them there
)
That’s a whole other story. One issue is the process of how you “draw” your design on a silicon wafer. It uses light to project it onto a wafer first. And if you want to make a transistor smaller, you also need to create a proper light source with a shorter wavelength, which gets more and more complicated as the length decreases.
There’s also another, more important issue here: quantum tunneling, which I would recommend you to read about it as it’s pretty complex but also, very interesting. Basically, as the connections are getting smaller, there’s a bigger chance for the electrons to overcome the potential barrier between circuits even though they are electrically insulated. It’s a little like a short circuit.
Size and space is one of them, as I’ve explained above, the cost is another. You also need to have an OS that is sophisticated enough to spread load between the cores effectively, as it doesn’t happen on the CPU level. And your software has to allow for parallel processing as well. So, if most of your programs use linear processing, you won’t benefit from that at all. Unless you run a huge number of different ones at the same time.
You also have to remember that even though you have multiple cores, you still share many other elements like system buses and memory, access to which has to be synchronized in order to make sure that data stays consistent. And it costs a lot of time.
I hope this puts some more light on those issues.