Questions about this topic? Sign up to ask in the talk tab.

Byte

From NetSec
Jump to: navigation, search

A byte represents (most often) 8 (can be 10, 12, 15... depending on the architecture) bits of data. A bit of data is simply a 1 or 0, that may represent an "off" or "on" state, or whatever the programmer decides.

Representation

8-bits bytes are typically described by two hexadecimal characters.

Hexadecimal is often notated with character ranges from 0 to 9 and then from A F. A represents "10", B 11, C 12, D 13, E 14, F 15.

How to read an hexadecimal number, and a byte

Taking a random byte: 11010011 In hexadecimal, it is: D3

Hexadecimal, means (that's greek) "radix 16", or more commonly "base 16".

This means, reading from right to left, the first ranking number will be multipled by 16^0 (equals 1), second ranking number will be multipled by 161 (equals 16), third by 162, nth by 16n+1...

  • 3 in hexadecimal is 3 in decimal, too (how surprising!)
  • D in hexadecimal is 13 in decimal, too.

To convert this number to base 10, the following calculations are used:

3 * 1 + 13 * 16 = 3 + 208 = 211


Computer data storage

Bytes are a unity of storage. However, in most processors, data is stored as "words", which are constituted from 2 or more bytes. When storing data, a question arises, which byte shall be stored first?

There are (mainly) two answers to this.


Big-endianness

The naive, instinctive and natural storage order is to store the number just as most english-speaking humans read it: most significant (heaviest) byte first, smallest byte rightmost.

To store 0XDEADBEEF in memory, it will be stored like so:

Memory slots:   [1 |2 |3 |4 ]
Bytes:          [DE|AD|BE|EF]

This has good advantages for networking: no matter the atomic transmission word size, the message will not change its signature, it will stay in the same order.

However this has implications, like, to determine parity, the 4th memory slot has to be read. Some people invented another way to store data.

Processors such as PowerPC, Motorola 68x, and others are big-endian. The human brain is big-endian too.

Little-endianness

Little endianness is commonly found on x86 processors. In little endianness the least-significant byte is stored leftmost in memory, then the more significant... to the most significant byte, staying rightmost.

Taking 0XDEADBEEF, it will be stored in memory as

Memory slots:   [1 |2 |3 |4 ]
Bytes:          [EF|EB|AD|DE]

if the memory has 1 byte slots.

If the memory or whatever storage/transmission medium has a word size bigger than 1 byte, then the data is stored with the least significant word first, that is if 2-byte words are being used:

Memory slots:   [1 |2 |3 |4 ]
Words:          [1    |2    ]
Bytes:          [BE EF|DE AD]

Since most machines are at least 16 bits, this poses big problem when communicating from little- to big-endian and vice versa. Further information can be found by googling "the NUXI problem" to understand the problem. When decoding data, the position of the data has to be taken into account to ensure accurate decryption.

You cannot understand little-endian data if you read it with a big-endian point of view, thus for this reason, each endian form must be taken from the correct point of view.

Reference

  0   1   2   3   4   5   6   7   8   9   A   B   C   D   E   F
  0   1   2   3   4   5   6   7   8   9   10  11  12  13  14  15