Questions about this topic? Sign up to ask in the talk tab.

Difference between revisions of "Assembly"

From NetSec
Jump to: navigation, search
(Number handling)
Line 12: Line 12:
 
A linker is a program that combines the compiled assembly objects into a binary. 'ld' is the standard linker on Linux platforms.
 
A linker is a program that combines the compiled assembly objects into a binary. 'ld' is the standard linker on Linux platforms.
  
Compilers such as GCC/CC do both operations dynamically.
+
The way in which assembly code is assembled and linked is similar to the way in which higher-level compilers operate. Compilers such as GCC are often simply wrappers for an assembler and a linker that perform both functions dynamically. GCC assembles code into a number of object files based on the
 +
rules of the C programming language, then links them into a flat binary of the opcode sequences created at the assembly stage - this flat binary is encoded according to the executable format of the Operating System, which is why the same code will need to be recompiled for different operating systems in order for it to run cross-platform. At runtime, this flat binary interacts with ram and hardware gates according to the rules that the Operating System follows, performing the desired function of the program.
 +
 
  
* Assemble-time: Assembly & operands -> Opcode Sequence
 
* Link-time: Flat binary of opcode sequence -> executable file format for OS
 
* Runtime: Opcode Sequence -> hardware gates (may interact with ram etc)
 
  
 
== Binary ==
 
== Binary ==

Revision as of 17:40, 25 May 2012

RPU0j.png
Assembly is currently in-progress. You are viewing an entry that is unfinished.
Assembly requires a basic understanding of bitwise math


Introduction

  • Assembler

An assembler is a program that compiles relatively human-readable operations into instructions that can be directly interpreted by the processor. Assembly Language can be seen as a "mmemonic" for machine instructions - it consists of words that are easier for humans to remember than machine language instructions - for example, "int" for the interrupt operation rather than "0xcd". The assembler produces compiled "objects", translating these mmemonics into machine code - also known as opcodes or as "binary code" in popular culture. By convention, many assembly programmers use the .s extension for assembly code and the .o extension for compiled objects.

On Linux platforms, 'as' is the standard assembler.

  • Linker

A linker is a program that combines the compiled assembly objects into a binary. 'ld' is the standard linker on Linux platforms.

The way in which assembly code is assembled and linked is similar to the way in which higher-level compilers operate. Compilers such as GCC are often simply wrappers for an assembler and a linker that perform both functions dynamically. GCC assembles code into a number of object files based on the rules of the C programming language, then links them into a flat binary of the opcode sequences created at the assembly stage - this flat binary is encoded according to the executable format of the Operating System, which is why the same code will need to be recompiled for different operating systems in order for it to run cross-platform. At runtime, this flat binary interacts with ram and hardware gates according to the rules that the Operating System follows, performing the desired function of the program.


Binary

Main article: Bitwise Math
  • counting
  • endianness
  • nybble - An uncommon unit of memory equivalent to 4 bits.
  • byte - A byte is a unit of memory equivalent to 8 bits.
  • word - 2 bytes
  • dword - 4 bytes, also called a long
  • qword - 8 bytes

Number handling

c3el4.png
Two's complement is the mathematics principle behind the computer's ability to track positive and negative numbers.
  • signed - Signed values are required to represent negative numbers. Most languages by default assume values are signed. The range of numbers it can assign extends from -1 downwards, depending on the data type.
  • unsigned - Despite not being able to assign negative numbers, unsigned values are particularly advantageous for positive ranges. The memory that would have been assigned to the negative range is instead added to the positive range (twice as many positive numbers).

Data storage

register

A location where memory can be stored temporarily. A register has the bit-width of a cpu's bit description. So for 32 bit systems, a register is 32 bits (4 bytes or a doubleword, also called long) whereas on a 64 bit system a register is 64 bits in length (8 bytes or a qword).

pointer

An address that points to another location in memory

sub-register

A portion of another register always divisible by 8 bits (1 byte) in size

cpu flag registers

  • pflag
  • zflag

architecture-specific registers and sub-registers

x86

32 bit general purpose

  • eax
eax
ax
'ah' 'al'
## ## ## ##
  • ebp

64 bit general purpose

  • rax
  • r8-15

mmx

sse

Memory Addressing

Stack Pointer

Commonly known as the ESP in x86 Assembly, the stack pointer is a register that contains the location of the top of the stack.

Instruction Pointer

Commonly known as the EIP in x86 Assembly, the instruction pointer is a register that holds the address to the next instuction. When a return instruction is executed, the instruction pointer derives its address from the return address, which exists on the stack.

Base Pointer

Commonly known as the EBP register in 32 bit x86 Assembly, the base pointer is generally used to find local variables and parameters on the stack.
  • addressing mode
  • index
  • scalar multiplier

Instructions

Syntaxes

Primarily two syntaxes of assembly have been the most prominent to date. Intel assembly syntax is traditionally used for Microsoft Windows environments, whereas AT&T System V syntax is generally used on Linux and Unix machines.

Intel Syntax (dest, src)

Generally, in intel syntax, all instructions are applied to destination, source operands. For example, to move 8 into the eax register:

mov eax, 8h

ATT Syntax (src, dest)

Data manipulation basic primitives

  • mov
  • push
  • pop


Basic arithmetic

  • add
  • sub
  • div
  • mul


Bitwise mathematics operators

  • and
  • not
  • or
  • xor

Shifts and rotations

  • shl
  • shr
  • rol
  • ror

Control flow operators

  • cmp
  • jmp
  • call
  • ret

Taking it further

  • kernel interrupt
  • architecture - i386, i686, x86_64
  • operating system