The concept behind the assembler language in micro-controllers

Attention! These pages are on programming micro-controllers, not on PCs with Linux- or Windows operating systems and similar elephants, but on small mice. It is not on programming Ethernet mega-machines but on the question of why a Beginner should start with an assembler and not with a complex high-level language.
This page shows the concept behind assembler, what those familiar with high-level languages have to give up to learn assembler, and why assembler is not machine language.




The hardware of micro-controllers


What has the hardware to do with the assembler? Much, as can be seen from the following.
The concept behind assembler is to make the hardware resources of the processor-accessible. Resources mean all hardware components, like
  • * the central processing unit (CPU) and its math servant, the arithmetic and logic unit (ALU),
  • * the diverse storage units (internal and external RAM, EEPROM storage),
  • * the ports that control characteristics of port-bits, timers, AD converters, and other devices.
Accessible means directly accessible and not via drivers or other interfaces, that an operating system provides. That means you control the serial interface or the AD converter, not some other layer between you and the hardware. As an award for your efforts, the complete hardware is at your command, not only the part that the compiler designer and the operating system programmer provide for you.

How the CPU works

Most important for understanding assemblers is to understand how the CPU works. The CPU reads instructions (instruction fetch) from the program storage (the flash), translates those into executable steps, and executes those. In AVRs, those instructions are written as 16-bit numbers to the flash storage, and are read from there (first step). The number read then translates (second step) e. g. to transporting the content of the two registers R0 and R1 to the ALU (third step), to add those (fourth step), and to write the result into the register R0 (fifth step). Registers are simple 8-bit wide storages that can directly be tied to the ALU to be read from and to be written to.

Instructions in assembler There is no need to learn 16-bit numbers and the crazy placement of bits within those, because in assembler you'll use human-readable abbreviations for that, so-called mnemonics, an aid to memory. The assembler representation for hex 9588 is simply the abbreviation "SLEEP". In contrast to 9588, SLEEP is easy to remember. Even for someone like me that has difficulties in remembering their own phone number. Adding simply is "ADD". For naming the two registers, that are to be added, they are written as parameters. (No, not in brackets. C programmers, forget those brackets. You don't need those in assembler.) Simply type "ADD R0, R1". The line translates to a single 16-bit word, 0C01. The translation is done by the assembler. The CPU only understands 0C01. The assembler translates the line to this 16-bit word, which is written to the flash storage, read from the CPU from there, and executed. Each instruction that the CPU understands has such a mnemonic. And vice versa: each mnemonic has exactly one corresponding CPU instruction with a certain course of actions. The ability of the CPU determines the extent of instructions that are available in the assembler. The language of the CPU is the base, the mnemonics only represent the abilities of the CPU itself.

Differences in high-level languages


 Here are some hints for high-level programmers. In high-level languages, the constructions are not depending on the hardware or the abilities of a CPU. Those constructions work on very different processors if there is a compiler for that language and for the processor family available. The compiler translates those language constructions to the processor's binary language. A GOTO in Basic looks like a JMP in assembler, but there is a difference in the whole concept between those two. A transfer of program code to another processor hardware does only work if the hardware is able to do the same. If a processor CPU doesn't have access to a 16-bit timer, the compiler for a high-level language has to simulate one, using an 8- bit timer and some time-consuming code. If three timers are available, and the compiler is written for only two or a single timer, the available hardware remains unused. So you totally depend on the compiler's abilities, not on the CPU's abilities. Another example with the above-shown instruction "MUL". In assembler, the target processor determines if you can use this instruction or if you have to write a multiplication routine. If in a high-level language, you use a multiplication the compiler inserts a math library that multiplies every kind of number, even if you have only 8-by-8-bit numbers and MUL alone would do it. The lib offers an integer, a long word, and some other routines for multiplications that you don't need. A whole package of things you don't really need. So you run out of flash in a small tiny AVR, and you change to a mega with 35 unused port pins. Or an xmega, just to get your elefant lib with superfluous routines into the flash. That is what you get from a simple "*", without even being asked. 

Assembler is not machine language

 Because assembler is closer to the hardware than any other language, it is often called machine language. This is not exact because the CPU only understands 16-bit instruction words in binary form. The string "ADD R0, R1" cannot be executed. An assembler is much simpler than a machine language. Similarities between machine language and assembler are a feature, not a bug.

High-level languages and Assembler 


High-level languages insert additional nontransparent separation levels between the CPU and the source code. An example of such a nontransparent concept is variables. These variables are storages that can store a number, a text string, or a single Boolean value. In the source code, a variable name represents a place where the variable is located, and, by declaring variables, the type (numbers and their format, strings and their length, etc.). For learning assembler, just forget the high-level language concept of variables. Assembler only knows bits, bytes, registers, and SRAM bytes. The term "variable" has no meaning in assembler. Also, related terms like "type" are useless and do not make any sense here. High-level languages require you to declare variables prior to their first use in the source code, e. g. as Byte (8-bit), double word (16-bit), integer (15-bit plus 1 sign bit). Compilers for that language place such declared variables somewhere in the available storage space, including the 32 registers. If this placement is selected rather blind by the compiler or if there is some priority rule used, like the assembler programmer carefully does it, is depends more from the price of the compiler. The programmer can only try to understand what the compiler "thought" when he placed the variable. The power to decide has been given to the compiler. That "relieves" the programmer from the trouble of that decision, but makes him a slave of the compiler. The instruction "A = A + B" is now type-proofed: if A is defined as a character and B a number (e. g. = 2), the formulation isn't accepted because character codes cannot be added with numbers. Programmers in high-level languages believe that this type of check prevents them from programming nonsense. The protection, that the compiler provides in this case by prohibiting your type error, is rather useless: adding 2 to the character "F" of course should yield an "H" as result, what else? Assembler allows you to do that, but not a compiler. 

What is really easier in assembler? 

All words and concepts that the assembler programmer needs are in the datasheet of the processor: the instruction and the port table. Done! With the words found there, anything can be constructed. No other documents are necessary. How the timer is started (is writing "Timer. Start(8)" somehow easier to understand than "LDI R16,0x02” and “OUT TCCR0, R16"?), how the timer is restarted at zero ("CLR R16” and “OUT TCCR0, R16"), it is all in the data sheet. No need to consult more or less good documentation on how a compiler defines this or that. No special, compiler-designed words and concepts to be learned here, all is in the datasheet. If you want to use a certain timer in the processor for a certain purpose in a certain mode of the 15 different possible modes, nothing is in the way to access the timer, to stop and start it, etc. What is in a high-level language easier to write "A = A + B" instead of "MUL R16, R17"? Not much. If A and B aren't defined as bytes or if the processor type is tiny and doesn't understand MUL, the simple MUL has to be exchanged with some other source code, as designed by the assembler programmer or copy/pasted and adapted to the needs. No reason to import a nontransparent library instead, just because you're too lazy to start your brain and learn. Assembler teaches you directly how the processor works. Because no compiler takes over your tasks, you are completely the master of the processor. As a reward for doing this work, you are granted full access to everything. If you want, you can program a baud rate of 45.45 bps on the UART. A speed setting that no Windows PC allows because the operating system allows only multiples of 75 (Why? Because some historic mechanical teletype writers had those special mechanical gearboxes, allowing quick selection of either 75 or 300 bps.). If, in addition, you want 1 and a half stop bytes instead of either 1 or 2, why not program your own serial device with assembler software. No reason to give things up.

No comments