Bits and bytes, binary code question

I am trying to understand how bits and bytes work.
There are 8 bits in a byte and each bit represents a 1 or zero. Which means on or off. Which I read was originally based on the telephone relay by Konrad Zuse.

Why does it have to turn off? What actually happens in the cpu? Does one of the transistors turn off? The faster the cpu is essentially the one with more transistors because it run more code at once? Computers can only register up to 64 bits on a 64 bit cpu so once it gets to 64 numbers it has to use a floating point?

For brevity (and this has turned into a long post even so), I have simplified some terms and cut some deeper explanations of how registers etc. work.

A computer reads the bits in 8 bit blocks, each bit is flipped on and off (carrying current or not) to represent a 1(on) or 0 (off), those are your transistors in the RAM chips. Those 8 bits are then a byte - early computers also used a half a byte, or a nibble, yes, seriously!

Early home computers worked with 8 bits, so they read one byte at a time and processed it. That meant that they could deal with values from 0 to 255, or signed -127 to +127. They could use 16 bits, by combining 2 bytes for larger numbers, but they did that in software, either by working on the bytes or using a pair of registers.

If even bigger numbers were needed, the software had to calculate that itself, by using multiple bytes and performing the math on each bit and byte at a time, and carrying over the result etc.

Modern computers work with 8 bytes at a time, for 64 bit registers and 64 bit numbers - they can work on smaller numbers of bytes, down to a single byte for ASCII characters, for example, but those are pushed into a 64-bit register and the remaining bits over the first 8 are then ignored.

That means that we can have huge numbers calculated automatically by the processor these days (18,446,744,073,709,551,615 for unsigned and --9,223,372,036,854,775,808 to +9,223,372,036,854,775,807 for signed).

A floating point number is something totally different. Processors generally work in integers (whole number, with no fractions, no decimal points). A floating point number is a number with a decimal point. In the old days, the processor and the software had to manipulate integers and work out the floating point part the hard way. Modern processors include a so-called math co-processor (they were originally additional chips on early Motorola 68xxx and Intel 80x86 processors, but are usually integrated into the main CPU as an additional processing unit these days). The encoding of the floating point (we’ll just take 64-bit for brevity) ranges from a minimum value of approximately 1.175494351 E-38 to a maximum value of approximately 1.7976931348623158 E+308.

Integers and Floats (decimal numbers) are not interchangeable, and if you want to move from and integer to a float or back, you actually have to go through some long winded conversion to format the number correctly, so that the CPU doesn’t choke on it.

Like in the 8 bit days, if you need really big numbers, you need to combine multiple clusters of 64-bits and deal with each 64-bit block on its own, then take any carry into the next one. It isn’t as efficient as doing 64-bit math, but it is possible, with a lot of work.

That is a very brief introduction of how a computers work with bits and bytes.

It isn’t the number of transistors that makes up the speed. The Intel chips have a lot more transistors than some ARM processors, but on certain types of calculations a smaller ARM processor will be quicker, due to its design. At the end of the day, there are many factors that come into the equation of what makes the fastest computer:

  • Clock speed - how fast the processor work, essentially how many instructions cycles the processor can perform per second, with modern CISC processors, this is a bit of a misnomer, many instructions can take several cycles to execute and with multi-core processors, sometimes one core will have to wait for another to finish execution, before it can continue, because it needs a result from it, for example. The clock speed is, shall we say, the theoretical maximum number of “simple” instructions that the processor can perform per second.
  • Bus and memory speed - this is how fast the processor gets information from memory and other peripherals on the system bus. The faster the bus and the memory, the more information you can pass to the CPU or put back into memory in a given period of time. When the system is busy moving bits into the CPU cache, for example, the processor can’t generally do anything, as it is waiting for the relevant information to become available
  • Caching - how the CPU manages small amounts of memory in its cache of very fast (and very expensive) memory. A computer with 64GB RAM can hold a lot of data, but a processor has a cache of a few hundred MB, or thousandths of a gigabyte, of fast cache memory. You could use more cache memory, but the price of the computer would quickly climb into the tens of thousands, if not hundreds of thousands of dollars.
  • Number of core - the more cores, the slower the clock speed, but the more calculations can be performed at the same time - assuming the work being done can be parallelized and the processing pushed to the other cores; some tasks can’t be parallelized and can only run on a single core, so pushing more cores at it won’t improve performance.

Computers and CPUs trade all of this off to give certain levels of performance - and is also why multi-core processors can shut down cores and boost the speed of a single core, to calculate something more quickly, if the task can’t be done in parallel.

Apple have moved most of the memory and storage onto the same package as the CPU (and GPU, NPU etc.), meaning huge numbers of transistors, all running on a close bus and not having to travel over long distances and not running through subsidiary components. This means they are incredibly fast, even if the “CPU” itself doesn’t have all of those transistors itself.

But there are negatives, you only have a comparatively small amount of memory and storage built into the package. With Apple devices at the moment, once you have used all that memory, that’s it, you can’t add any more. With Ultra chips, you are up to 192GB of RAM, but that is two packages “welded” together. That might sound like a lot, but modern, high power servers often have access to terabytes of RAM, for example. The same goes for storage, the first few GB or TB of storage are quick, but if you need more - modern high power systems use petabytes of storage - you need to put that additional storage out on the slower bus that every other computer also uses…

It is all about trade-offs and flexibility, price and common sense. You really need to work out what you are trying to achieve and how much you can spend and see what best fits your needs.

2 Likes

Your question sparked me to go look at https://www.youtube.com/@BranchEducation/videos to see if they had something relevant, and while they have a lot of videos about how things work, there doesn’t appear to be one on microprocessors or their internals. Still, I provided the link because they do have lots of other interesting videos about how computer peripherals work to learn from.

When I was a kid, I learned from a show called “Bits & Bytes” by TVO (a Canadian province’s public TV channel). I see they’re on YouTube now, so maybe you’ll enjoy this retro https://www.youtube.com/watch?v=5vSap60SAKM (all the other episodes are there too, https://www.youtube.com/@bitsandbytestvo/videos , but I’m sure they’re less relevant now.)

2 Likes

Hi Mason,

In essence, yes. The RAM is where bits are held in short term, relatively fast storage. They are then read by the CPU, either as part of the program it is executing (the bytes tell the CPU which instruction to execute) or they are the data the program is working on. These are either pulled into a register, where execution of instructions on them is at its fastest, but there are a limited number of them, or the CPU instructions use them direct in memory.

There is an in-between stage, the cache memory on the processor, that I mentioned above. That is much faster than the normal memory, but incredibly expensive per byte in comparison, which is built directly into the CPU, where space is also limited for such memory. Part of the processor looks at the code being executed and the memory it is working on and tries to predict which parts of memory will be used next and pulls them into the cache in anticipation of them being used. If the CPU guessed correctly, the execution is much faster, if it guessed wrong, there is a short pause (nano seconds) as the cache is flushed and the correct areas of memory are stuffed into the cache.

Sort of. Most languages, like C or C++ are compiled, this means that the translation from human readable code to the 1s and 0s the processor needs to execute happens once, at compile time. The developer finishes the code and runs it through the compiler and the compiler spits out the machine readable version of the code at the other end - an instruction in C can be 10s of bytes long, for example, as the instruction is a complex task for a processor to execute, even if it is something simple for a human to read (E.g. for(a==0;a<100;a++){do_something();} is for a programmer a nice easy short hand for making a loop, in executable code terms, there are a lot more individual instructions involved - it must instantiate a and move it to a register, it must perform what is in the loop and at the end of the loop, it needs to increment the register and it then needs to check the register hasn’t reached the value that will cause the loop to finish and jump back to the top of the loop). Once the executable code has been created, it is stored in a file, which the CPU can then read into memory and execute.

There are also interpreted languages, or scripts, where a program runs on the CPU and reads in the script in real time and converts the instructions in the code into something the CPU can understand on the fly. This has the advantage to the user of the code being flexible and easily modified, without it having to be recompiled, but it isn’t as efficient and fast as compiled code.

(There is also a middle ground of so-called p-code, which covers languages like Java, .Net etc. which take the program code and translates it into executable tokens, which can also be platform independent, converting the p-code tokens into machine code at run time. This is faster than interpreted code, but not as fast as natively compiled code).

Essentially, yes. All math falls back to those 4 things. The CPU can perform those operations on integers (whole numbers), the Math Co-Processor, or Floating Point Unit (FPU, as it is called these days) does the floating point math (decimals), although on modern chips, this is integrated into the main CPU package and isn’t generally talked about, it is “just there” these days.

I was talking about the bus on the motherboard. The CPU can’t talk directly to the memory, it doesn’t know how to. There is a memory controller. Again, on many modern chips, this has been moved onto the CPU die to make things quicker and simpler, but it is still a separate entity. This knows how to talk to the memory bus and it pulls in the data from memory into the processors cache and changes back into main memory. The CPU doesn’t have to worry itself with such “nonsense”, it can get on with what it does.

Modern CPUs are much more than CPUs, they often include the FPU, GPU (graphics), NPU (Neural for AI), memory controller, bus controller and peripheral controllers (USB, IDE/M2 for storage, PCIe and various others). These were all discrete chips up until a few years ago and on some desktop CPUs, you will find that they still have external controllers. For mobile “CPUs”, much of that functionality is all on the die or in the chip package, meaning it all comes from one company.

Many Intel desktop processors, for example, have USB and storage built in, but you will find additional controllers as discrete chips on the motherboard, meaning that, instead of the limited number of lanes available for USB (usually 2-6 ports, depending on the chip), an additional USB controller will be on the PCIe bus and provide additional ports, allowing more devices to be attached to the PC. This is also why some PCs have different speed ports and you have to be careful where you plug your peripherals in, to get the best performance out of them.

In essence, yes. Although the Apple parts are on a par with desktop processors in many instances, even their phone processors are much faster than low end desktop parts (Celeron, Pentium, Atom and some low end mobile Core i3 parts, for example), but it comes down to efficiency and power use.

Apple’s mobile (iPhone and iPad chips) are very low power, but incredibly efficient, meaning they get performance that is better than low end Intel and AMD processors, which use several times the amount of power to reach similar performance.

Apple’s desktop chips (laptops, mini, iMac, Studio, Pro) can use more power and are thus much faster and often comparable to Intel and AMD desktop parts in their performance, but still at a fraction of the power those parts use.

This is the problem that Intel currently faces with the introduction of the Qualcomm Snapdragon X Elite processors for Windows, they have the same performance as mid-range Intel laptop processors, but they use a lot less power to get to that level of performance, meaning that, for the same battery size, the Qualcomm (and Apple) parts will run for much longer. They also need much less cooling, so they don’t sound like jet planes taking off, when they are under load.

This is something Intel has been ignoring since the mid-2000s. To get better performance, they threw more power at the problem. They did get a more efficient design from their Israeli skunkworks division, but they never looked at improving the efficiency with newer generations, now they are left there with impressive desktop PC processors, where power consumption and noise isn’t such a problem, but their laptop processors are running too hot and using too much power to compete with the competition. Their high-end laptop parts might be a bit quicker, but they’ll run out of just after a couple of hours, not 15-20 hours, and they will generate so much heat in the process that the fans will be running very loudly for most of that time (Meteor Lake is especially bad).

Paul Thurrott stated over on his site that the current Intel chips have lost about 30% battery life, compared to Intel chips from 4-5 years ago. This is something they really need to tackle, going forward, if they want to remain competitive in the laptop market.

1 Like

The code might be 664KB compiled, but it will use external libraries that will be loaded with it, the program will also reserve space for its working values etc. it all adds up. Without actually studying the code and seeing how it is running on your system I can’t say.

Don’t forget the operating system and other applications will also swallow a lot of RAM.

As to the decompiled version, it has taken the machine code and converted it into assembler - human readable version of machine code. Whilst the individual bits are all on or off (represented by 1s and 0s), most processors group the bits into multiples of 8 bits (1 byte). This is one of the reasons why we use hexadecimal to represent the numbers a byte can be represented by 00 through to FF Hex, so 2 digits, whereas in decimal, you would need 3 digits.

Each opcode or instruction is a byte long (I think), its data in multiples of bytes (1 byte, a 2 byte Word, 4 byte Double Word or an 8 byte Quad Word, which equate to 8, 16, 32 or 64 bits of 1s and 0s).

If you are really interested, the Intel software developer manuals are a good learning source, if very heavy.

https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html

1 Like