Disassembly: Difference between revisions

From LRREW
Jump to navigation Jump to search
No edit summary
Line 60: Line 60:
| EFLAGS || FLAGS. This is where the cmp instruction stores its results. This is not directly accessible by code itself, but can be manipulated via flow control using the {{code|lang=asm |JNZ}}, {{code|lang=asm |JNE}}, {{code|lang=asm |JEQ}} and miscellaneous instructions.
| EFLAGS || FLAGS. This is where the cmp instruction stores its results. This is not directly accessible by code itself, but can be manipulated via flow control using the {{code|lang=asm |JNZ}}, {{code|lang=asm |JNE}}, {{code|lang=asm |JEQ}} and miscellaneous instructions.
|}
|}
All of the registers here are 32 bit registers, which when PUSHed take up 4 bytes in the stack.


=== Memory ===
=== Memory ===


Memory is the second fastest way to store data on x86. It is a large array of sorts, storing the program itself, all of the data the program reads and writes to, and everything else necessary for system functioning.  
Memory is the second fastest way to store data on x86. It is a large array of sorts, storing the program itself, all of the data the program reads and writes to, and everything else necessary for system functioning.  
Values have different size depending on their storage type.
{| class="wikitable"
! Type !! sizeof(Type) !! Max Value
|-
| uint64_t (long) || 8 || ±9,223,372,036,854,775,807 (signed), 18,446,744,073,709,551,615 (unsigned)
|-
| uint32_t (int) || 4 || ±2,147,483,647 (signed), 4,294,967,295 (unsigned)
|-
| uint16_t (short) || 2 || ±32,767 (signed), 65,535 (unsigned)
|-
| uint8_t (char) || 1 || ±127 (signed), 255 (unsigned)
|}


=== Stacks ===
=== Stacks ===
Line 121: Line 137:


This will compile to
This will compile to
caller:                ; fake function, just showing how its called
  push ebp
  call function


  function:
  function:
Line 137: Line 157:
! Offset !! Name  
! Offset !! Name  
|-
|-
| esp-0, ebp+4 || int x;
| esp-0, ebp+8 || int x;
|-
| esp-4, ebp+4 || Return Address
|-
|-
| esp-4, ebp+0 || Return Address
| esp-8, ebp+0 || EBP (old stack frame, pushed by caller)
|}
|}



Revision as of 14:38, 1 July 2023

Assembly is something we all have to learn eventually in order to properly modify Roblox without having its source code.

Usually, we use a tool such as IDA Pro or x32dbg. Because Roblox (before Byfron) uses VMProtect, simply modifying its executable isn't possible, and you must attach to it while its running.

This article isn't finished yet, sorry.

This article assumes you have basic knowledge in C++, and in general Computer Science.

Instructions

The x86 instruction set is a vast instruction set with various extensions. Luckily, you'll only really see basic x86 instructions when debugging Roblox.

These are some common instructions (but not every instruction) that can be seen whilst debugging Roblox.

x86 instructions (partial list)
Instruction (NASM syntax) Name Purpose
jne [address] Jump if Not Equal The processor will set EIP to [address], if EFLAGS has the NE (Not equal) bit set.
jnz [address] Jump if Not Zero The processor will set EIP to [address], if EFLAGS has the NZ (Not zero) bit set.
call [address] CALL The processor will set EIP to [address], then push the current address.
cmp [a], [b] CoMPare The processor will compare [a] and [b], and set EFLAGS with the results of the comparison.
mov [a], [b] MOVe The processor will set [b] to [a].
nop NO Operation The processor will not do anything.

Where's All the Data?

It may be noticed, that in the set provided above there are terms such as 'EFLAGS', and '%eip'. These are CPU registers. CPU registers are the fastest way to retrieve, manipulate and store data but are limited in size.

x86 registers (partial list)
Register Purpose
EAX General purpose register, sometimes called the Accumulator register
EBX General purpose register, sometimes called the Base register
ECX General purpose register, sometimes used to store the loop counter. In C++, *sometimes* this points to this, the current class.
EDX General purpose register
EBP Stack Frame Pointer. This is how programs will typically safely address other values in the stack, because ESP will fluctuate wildly during execution.
ESP Stack Pointer. This is where the x86 fetches the top of the stack from. This decrements (decreases) when PUSHed to, and increments (increases) when POPed from.
EDI Destination index (typically used for arrays)
ESI Source index (typically used for arrays)
EIP Instruction Pointer. This is where the x86 fetches the next instruction from memory from, and is incremented by the size of the decoded instruction every instruction.
EFLAGS FLAGS. This is where the cmp instruction stores its results. This is not directly accessible by code itself, but can be manipulated via flow control using the JNZ, JNE, JEQ and miscellaneous instructions.

All of the registers here are 32 bit registers, which when PUSHed take up 4 bytes in the stack.

Memory

Memory is the second fastest way to store data on x86. It is a large array of sorts, storing the program itself, all of the data the program reads and writes to, and everything else necessary for system functioning.

Values have different size depending on their storage type.

Type sizeof(Type) Max Value
uint64_t (long) 8 ±9,223,372,036,854,775,807 (signed), 18,446,744,073,709,551,615 (unsigned)
uint32_t (int) 4 ±2,147,483,647 (signed), 4,294,967,295 (unsigned)
uint16_t (short) 2 ±32,767 (signed), 65,535 (unsigned)
uint8_t (char) 1 ±127 (signed), 255 (unsigned)

Stacks

Stacks are a form of data storage employed by most CPU architectures, including x86. In x86, the stack can be imagined as a stack of plates. You can put a plate on top of the stack (PUSH to the stack), and take the top most one off (POP from the stack).

When you call for example, the processor will PUSH the value of %eip, then go to the new address. When that subroutine eventually executes a RETurn instruction, the processor will POP the last value on the stack (which in this case, is what %eip used to be!) and then set %eip to the old address.

The stack of plates analogy breaks when it is possible to access ANY value in the stack at ANY time without POPing it, because in x86's case it has full access to the memory of the stack. This is useful when you use a calling convention, which is explained further below.

Calling Conventions

To complicate stacks further, most programs employ a calling convention. This is most used by programming languages such as C, in order to keep track of data between functions. When functions run, without storing its initial registers the new function will overwrite them, and the data held beforehand will be lost. This is a problem when in this case:

int x = 10; // imagine this is the register x
printf("Hello World"); // this function will likely need to use the register x in its lifetime
printf("%i\n", x) // x returns 5, function before overwrote x 'mangling' it

The solution to this is to use the aforementioned stack. When a function is called, it will PUSH certain registers and then CALL the address. In our basic calling convention where the ```callee``` (the thing that calls the function) stores EAX.

callee:
 mov eax, 20
 push eax
 call function
 ; eax now equals 10
 pop eax
 ; eax now equals 20
function:
 mov eax, 10
 ret

There is a catch here, for function to manipulate data. When function reaches its first instruction, this is what memory looks like at ESP for itself:

x86 stack frame
Offset Name
esp-0, ebp+4 Return Address
esp-4, ebp+0 EAX

Through accessing [esp-4], or [ebp+0] the function can modify the stack above it. This is used internally with functions too, to allocate values on the stack.

int function(void) {
  int x = 0;
  x = x + 1;
  return x;
}

This will compile to

caller:                ; fake function, just showing how its called
 push ebp
 call function
function:
 push 0                ; int x = 0;
 mov eax, [ebp-4]
 add eax, 1            ; x = x + 1
 mov [ebp-4], eax
 pop eax               ; in C, eax is the return value
 ret

In memory, the function will see this before POPing to eax:

x86 stack frame
Offset Name
esp-0, ebp+8 int x;
esp-4, ebp+4 Return Address
esp-8, ebp+0 EBP (old stack frame, pushed by caller)

Why does Roblox keep on stopping?

This is probably because you ran into an Exception. Exceptions are used on C++ to signal when a function must quickly exit and return some error data to the parent function. For example, when trust check fails it throws a C++ exception, and then outputs an error to the Roblox console.

Uh... it says ACCESS_VIOLATION though...

That probably means something broke with Roblox. An access violation is when the program attempts to access unallocated or unusable memory, and the OS notices this. It signals to the program "You have done something wrong" (an access violation) and in Robloxes case, it shuts down and makes a "An unexpected error has occoured and ROBLOX needs to quit. We're sorry!" error message.

Why is there an infinite amount of exceptions when Internet Explorer opens on the studio?

Thats because Internet Explorer sucks.