0x00 Syscall

In this section we would review the syscall from both high level and low level! Also, we would also go through other similar mechanisms, such as the interupt, to have a more clear view of the operation system. Let’s start from something simple, adding a system call.

0x01 sys_n132

In this section, we gonna add a new syscall SYS_n132 to the xv6 system! We need to moddify the source code There are 5 related Files: syscall.h, syscall.c, sysproc.c, usys.S, and user.h.

In order to add a new syscall, we need to add a new definition in syscall.h.

...
#define SYS_close  21
#define SYS_n132   22

And add the new syscall to the syscall function_ptr lists in syscall.c. This’s the first time I see this kind of initialization.

static int (*syscalls[])(void) = {
[SYS_fork]    sys_fork,
[SYS_exit]    sys_exit,
[SYS_wait]    sys_wait,
[SYS_pipe]    sys_pipe,
[SYS_read]    sys_read,
[SYS_kill]    sys_kill,
[SYS_exec]    sys_exec,
[SYS_fstat]   sys_fstat,
[SYS_chdir]   sys_chdir,
[SYS_dup]     sys_dup,
[SYS_getpid]  sys_getpid,
[SYS_sbrk]    sys_sbrk,
[SYS_sleep]   sys_sleep,
[SYS_uptime]  sys_uptime,
[SYS_open]    sys_open,
[SYS_write]   sys_write,
[SYS_mknod]   sys_mknod,
[SYS_unlink]  sys_unlink,
[SYS_link]    sys_link,
[SYS_mkdir]   sys_mkdir,
[SYS_close]   sys_close,
};

This is a function ptr list and the number in [] declares the index of the syscall and the value is the function address. The syscall would not take any parameter and would return a int. It takes parameters from the stack. This feature is a little different from Linux. For linux, x32/x64 syscall would take the parameters from the registers.

In order to register the entrance, we need to declare the function and add our syscall to this list.

extern int sys_n132(void);
static int (*syscalls[])(void) = {
...
[SYS_close]   sys_close,
[SYS_n132]    sys_n132,
};

The next step is coding the sys_n132, to really implement our syscall. The bussiness logics for all the syscalls are in sysproc.c.

int sys_n132(void)
{
  return 0x132;
}

After that, we need to give a interface to the users. Declare the function in user.h and usys.S.

//user.h
...
int uptime(void);
int n132(void);
...
//usys.S
...
SYSCALL(uptime)
SYSCALL(n132)

Compile the testcode and the system.

//TestCode
#include "types.h"
#include "user.h"
#include "stat.h"
int main(void) 
{
printf(1, "%d\n", n132());
exit(1);
}

0x02 Trace the syscall

We can trigger a syscall by using some user space interfaces, such as exit.

//TestCode
#include "types.h"
#include "user.h"
#include "stat.h"
int main(void) 
{
printf(1, "%d\n", n132());
exit(1);
}

Take this program as an example, it would trigger 3 syscalls, SYS_n132, SYS_write and SYS_exit. We can see the syscall in asm code.

.text:00000000 ; int __cdecl main(int argc, const char **argv, const char **envp)
.text:00000000                 public main
.text:00000000 main            proc near
.text:00000000
.text:00000000 argc            = dword ptr  8
.text:00000000 argv            = dword ptr  0Ch
.text:00000000 envp            = dword ptr  10h
.text:00000000
.text:00000000                 lea     ecx, [esp+4]
.text:00000004                 and     esp, 0FFFFFFF0h
.text:00000007                 push    dword ptr [ecx-4]
.text:0000000A                 push    ebp
.text:0000000B                 mov     ebp, esp
.text:0000000D                 push    ecx
.text:0000000E                 sub     esp, 4
.text:00000011                 call    n132
.text:00000016                 sub     esp, 4
.text:00000019                 push    eax
.text:0000001A                 push    offset fmt      ; fmt
.text:0000001F                 push    1               ; fd
.text:00000021                 call    printf
.text:00000026                 call    exit
...
.text:00000322                 public n132
.text:00000322 n132            proc near               ; CODE XREF: main+11↑p
.text:00000322                 mov     eax, 16h
.text:00000327                 int     40h             ; Hard disk - Relocated Floppy Handler (original INT 13h)
.text:00000329                 retn
.text:00000329 n132            endp

At .text+0x11, we are going to call n132 function, it’s the user-space-interface of the real syscall. And the user space would use the interupts to jump to the kernel. As you can see in the above code, it’s INT 40h. It calls interrupt 0x40’s handle.

You can see the definitions of all the traps and interupts in trap.h, like the syscall:

#define T_SYSCALL 64 // system call

Also, we know the timers implemented in hardware keep the OS running by interrupting periodically to handle kinds of interrupts, such as the int 0x40.

#include "mmu.h"

  # vectors.S sends all traps here.
.globl alltraps
alltraps:
  # Build trap frame.
  pushl %ds
  pushl %es
  pushl %fs
  pushl %gs
  pushal
  
  # Set up data segments.
  movw $(SEG_KDATA<<3), %ax
  movw %ax, %ds
  movw %ax, %es

  # Call trap(tf), where tf=%esp
  pushl %esp
  call trap
  addl $4, %esp

  # Return falls through to trapret...
.globl trapret
trapret:
  popal
  popl %gs
  popl %fs
  popl %es
  popl %ds
  addl $0x8, %esp  # trapno and errcode
  iret

It uses alltraps to as the entry of the handle and you can find its source code in trapasm.S. It would firstly store the trap frame for furture returning and it would set the data segement to the kernel data segement’s address to accomplish the context switch. After that, it calls trap to really heandle the interupts. Btw, you can find the return part in trapret it reverses the options we did at the first part of alltraps to switch the context back to the user space.

//trap.c
void
trap(struct trapframe *tf)
{
  if(tf->trapno == T_SYSCALL){
    if(myproc()->killed)
      exit();
    myproc()->tf = tf;
    syscall();
    if(myproc()->killed)
      exit();
    return;
  }

  switch(tf->trapno){
  case T_IRQ0 + IRQ_TIMER:
    if(cpuid() == 0){
      acquire(&tickslock);
      ticks++;
      wakeup(&ticks);
      release(&tickslock);
    }
    lapiceoi();
    break;
  case T_IRQ0 + IRQ_IDE:
    ideintr();
    lapiceoi();
    break;
  case T_IRQ0 + IRQ_IDE+1:
    // Bochs generates spurious IDE1 interrupts.
    break;
  case T_IRQ0 + IRQ_KBD:
    kbdintr();
    lapiceoi();
    break;
...

The trap function would handle not only the syscalls, but other kinds of trap/interupts, such as the keyborad actions and timer interupts. For the syscall, it would call syscall to process the syscall and you can find the source code in syscall.c. It takes the paremeters from the tf(trapfram) and choose the corresponding function. Aftrer finishing the tasks, it store return value in the tf’s eax and return back to syscall(), trap(), and alltraps and use the stored trapfram to recover user space context.

So far, the whole procedure of the syscall finished.

0x03 scheduler

In the boot part, the CPU would start their work by running the scheduler. We can find the source code of this function in proc.c. This piece of code is super important. It would be run so many times very second.

void
scheduler(void)
{
  struct proc *p;
  struct cpu *c = mycpu();
  c->proc = 0;
  
  for(;;){
    // Enable interrupts on this processor.
    sti();

    // Loop over process table looking for process to run.
    acquire(&ptable.lock);
    for(p = ptable.proc; p < &ptable.proc[NPROC]; p++){
      if(p->state != RUNNABLE)
        continue;

      // Switch to chosen process.  It is the process's job
      // to release ptable.lock and then reacquire it
      // before jumping back to us.
      c->proc = p;
      switchuvm(p);
      p->state = RUNNING;

      swtch(&(c->scheduler), p->context);
      switchkvm();

      // Process is done running for now.
      // It should have changed its p->state before coming back.
      c->proc = 0;
    }
    release(&ptable.lock);

  }
}

It has a infinit loop and in each loop, it check every process in the ptable (process table) until it finds a runnable one. After that, the scheduler use switchuvm to load the context from user space. And use swtch to run the process.

I find an intersting fact about the swtch, it can’t return by itself. The process return to the scheduler by call it again with different parameters (&p->context, mycpu()->scheduler)!

# Context switch
#
#   void swtch(struct context **old, struct context *new);
# 
# Save the current registers on the stack, creating
# a struct context, and save its address in *old.
# Switch stacks to new and pop previously-saved registers.

.globl swtch
swtch:
  movl 4(%esp), %eax
  movl 8(%esp), %edx

  # Save old callee-saved registers
  pushl %ebp
  pushl %ebx
  pushl %esi
  pushl %edi

  # Switch stacks
  movl %esp, (%eax)
  movl %edx, %esp

  # Load new callee-saved registers
  popl %edi
  popl %esi
  popl %ebx
  popl %ebp
  ret

The source code of swtch is in swtch.S, and it’s quite simple but elegant. The first 2 lines would load the parameters to eax and edx. And the following 4 instructions would save current registers. After that, the swtch siwtch the stack by movl %edx, %esp (We did store the second parameter in the edx). The use 4 pop to pop out the new process’s registers. If every process follow this convention, store the registers on stack in order, the swtch could switch from different process!

As we known, not excatly, there are 4 main types of states of a process, including RUNNABLE, RUNNING, SLEEPING, and ZOMBIE. The shceduler would run the runnable process, aka changing RUNNABLE to RUNNING. And there are some other simple transitions which I didn’t take a not about:

sleep would change RUNNING to SLEEPING
exit would change RUNNING to ZOMBIE
awake would change SLEEPING to RUNNABLE

So far, we still didn’t talk about the last transition. That’s the transition from RUNNING to RUNABLE. Some process would be stopped by the CPU forcibly. More specificlly, the timer would send an IRQ_TIMER interupt and the CPU would use the TRAP function to handle it.

  // Force process to give up CPU on clock tick.
  // If interrupts were on while locks held, would need to check nlock.
  if(myproc() && myproc()->state == RUNNING &&
     tf->trapno == T_IRQ0+IRQ_TIMER)
    yield();

The yield function is a wrapper of sched and sched is a wrapper of swtch in deed.

//porc.c
// Give up the CPU for one scheduling round.
void
yield(void)
{
  acquire(&ptable.lock);  //DOC: yieldlock
  myproc()->state = RUNNABLE;
  sched();
  release(&ptable.lock);
}
// Enter scheduler.  Must hold only ptable.lock
// and have changed proc->state. Saves and restores
// intena because intena is a property of this
// kernel thread, not this CPU. It should
// be proc->intena and proc->ncli, but that would
// break in the few places where a lock is held but
// there's no process.
void
sched(void)
{
  int intena;
  struct proc *p = myproc();

  if(!holding(&ptable.lock))
    panic("sched ptable.lock");
  if(mycpu()->ncli != 1)
    panic("sched locks");
  if(p->state == RUNNING)
    panic("sched running");
  if(readeflags()&FL_IF)
    panic("sched interruptible");
  intena = mycpu()->intena;
  swtch(&p->context, mycpu()->scheduler);
  mycpu()->intena = intena;
}

In above code, the yield function changes the state of the process and dives into the sched function. And the sched function would call swtch to give the control back to the scheduler after some verbose checks.

That totally makes sense!

0x04 Process Switching

Trace the process switching.

Process A
Timer Interupt
alltraps()
trap()
yield()
swtch()
scheduler()
swtch()
yeild()
trap()
alltraps()
Process B