ARM, Software

ARM Instruction Set(Part 1): Initialization


I recently brought the ARM edition of the book Computer Organization and Design by Patterson and Hennessy from library. Half way through the book, I thought of sharing the content of this book in a more elaborated manner by making a series of tutorials on Arm Instruction Set.
So, here’s part one of the series.


  • Basic definitions
  • Types of Instruction sets Architecture(ISA)
    • General purpose register architecture
    • Stack architecture
    • Accumulator architecture

Basic Definitions  


A program selected for use in comparing computer performance.SPEC(System Performance Evaluation Cooperative) is such an effort supported by a number of vendors to create standard sets of benchmarks for modern computing systems.
SPEC CPU2006 is a set of 12 integer benchmarks and 17 floating point benchmarks. Integer benchmarks may vary from a part of a C compiler to a chess program to a quantum computer simulation.
Today(2017), 3DMark, PCMark10 and VRMark are some of the most popular benchmarks.
Have a look at to compare various processors.

Amdahl’s Law: 

Make the common case fast!!
A rule stating that the performance enhancement possible with a give improvement is limited by the amount that improved feature is used.


Million Instruction Per Second  is a measurement of program execution speed based on the number of millions of instructions. MIPS is computed as the instruction count divided by the product of the execution time and 10^6.

Ic – Instruction count

T  – Execution Time

CPI – Clock Cycles Per Instructions
f – Clock frequency
Note: MIPS also refer to a RISC ISA developed by MIPS Technology

Stored Program Concept:

The idea that instructions and data of many types can be stored in memory as numbers, leading to stored-program computers.This sole concept enables today’s computers to run/share a wide variety of programs just by executing from the memory.


Endianness is one of the distinguishing factors which defines an architecture.
It’s based on whether LSB or MSB is stored first in the memory.
It’s generally of two types: Little Endian and Big Endian.
Little Endian: Here, numbers are stored in the following formats:
Big Endian: Here, numbers are stored in the following formats:
Most of the primitive architectures were Little Endian based(ARM was one of them), but with the latest architectures endianness is configurable.
But, How simply storing the numbers in a reciprocated format helps?
While adding two numbers, we generally move by adding LSB first then MSB.
On an architecture like Intel 8088, where size of data bus(8 bits) is not equal to size of registers(16 bits), storing in little endian format helps in decreasing operations while adding as LSB is fetched first the MSB. So in a clock cycle we can perform both fetching and arithmetic operation.
Big-endian numbers, on the other hand, can be compared faster then little endian numbers. Since there first bit(MSB) is the sign bit itself we can easily compare on basis of their signs.

Big-endian: IBM,SPARC, Motorola
Little-endian: Intel,DEC
Supporting both:MIPS, PowerPC


  • PowerPC ( Performance Optimization With Enhanced RISC – Performance Computing) is a RISC instruction set architecture created by Apple–IBM–Motorola alliance, known as AIM.
  • Scalable Processor Architecture (SPARC) is a reduced instruction set computing (RISC) instruction set architecture (ISA) originally developed by Sun Microsystems
Note: Before moving forward I would like to clear a very basic doubt that i had when reading it for first time:

What’s the difference between ‘Registers’ and Memory?
Here, by registers I mean the registers internal to a CPU.  while by, memory i mean RAM.
ARM has 16 internal registers ranging from r0,r1………r12,SP,LR,PC.
Obviously, access to registers is faster then memory thus by using registers for common operations reduces code spill and memory traffic.
MIPS64 has 32 such General purpose registers.


Instruction sets can be broadly divided into 3 main categories:
  1. General purpose register architecture

    Let’s take an example of a higher-level statement A=B+C. This statement can be visualized in this architecture as :
    LDR R1,B
    LDR R2,C
    ADD R3,R1,R2
    STR R3,A

    i.e here values of B and C (assuming these variables are in memory) are loaded into registers and then operations are performed on registers itself.
    This is one of the most widely used architecture with an added advantage that the variable has to loaded from memory only once thus reducing LOAD/STORE instructions.
    It is further of 3 types:

    • Register-Register type(2 address):
      • Here, both operands are registers and output/result of the operation will overwrite one of the register. 
      • For ex: ADD R1,R2,R1;
    • Register-Register type(3 address): 
      • Here, both operands are registers and result is stored in another register.
      • For ex: ADD R1,R2,R3;
    • Register-Memory Type:
      • Here, one operand is a register and other is the content of a memory location.

    There is another type of architecture Memory-Memory Type in which both the operands come from memory, but it’s obsolete nowadays. Both X86 and ARM are general-purpose register type machines , but X86 has Memory-Register Architecture along with some instructions from 2 address Register-register type while ARM(and most of RISC machines) has Register-Register Architecture(both 2 and 3 addresses).

    Image courtesy:
  2. Stack architecture

    Here, The instruction A=B+C will be decoded as:
    PUSH B;
    PUSH C;
    POP A;
    i.e, here instead of registers stack is used. Variables are loaded in and from stack and operations are performed there itself.
    It’s very rare to find stack architecture in modern chips rather it’s quite popular in designing interpreters like Java ByteCode

  3. Accumulator architecture

    It’s the second most popular architecture after general purpose register(GPR) architecture.
    Here, A=B+C will be converted to:

    LOAD B into Accumulator
    ADD C to Accumulator
    Store Accumulator to A

    Here, Accumulator can be any register stored directly within ALU. Thus, Accessing it is very fast.  It has a drawback that variable need to be loaded from memory again and again in case of multiple operation with same variable.
    One of the operand in this architecture is mostly accumulator.

    Image courtesy:×225.jpg

Further Reading:

Next post topics:

  • RISC,CISC and EPIC /  Harvard and Von Neumann architecture
  • Instruction set extensions
  • Difference between ARM,AVR,MSP430,8085,PIC on basis of Instruction set architecture and organization

Please comment below to suggest any interesting topic for my next post.
If you like it don’t forget to share it.
At last, Feedback???

Tagged ,

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.