Prologue
The first task of a system is boot.
real mode, aka real address mode. It uses the real address and there is no protection.
bootasm.S
This file would xxxxxx.
The BIOS would jmp
to the real address 0x7c00 to start first operation system’s instruction and that’s also the start
of bootasm.S
. It’s still in 16-bit real mode and would finish some important work before jumping to the bootmain.c
.
#include "asm.h"
#include "memlayout.h"
#include "mmu.h"
# Start the first CPU: switch to 32-bit protected mode, jump into C.
# The BIOS loads this code from the first sector of the hard disk into
# memory at physical address 0x7c00 and starts executing in real mode
# with %cs=0 %ip=7c00.
.code16 # Assemble for 16-bit mode
.globl start
start:
cli # BIOS enabled interrupts; disable
# Zero data segment registers DS, ES, and SS.
xorw %ax,%ax # Set %ax to zero
movw %ax,%ds # -> Data Segment
movw %ax,%es # -> Extra Segment
movw %ax,%ss # -> Stack Segment
# Physical address line A20 is tied to zero so that the first PCs
# with 2 MB would run software that assumed 1 MB. Undo that.
The first part is kind of straight forward. It uses cli
instruction to disable the interrupts and inits the sgement registers to clear the original garbage data.
# Physical address line A20 is tied to zero so that the first PCs
# with 2 MB would run software that assumed 1 MB. Undo that.
seta20.1:
inb $0x64,%al # Wait for not busy
testb $0x2,%al
jnz seta20.1
movb $0xd1,%al # 0xd1 -> port 0x64
outb %al,$0x64
seta20.2:
inb $0x64,%al # Wait for not busy
testb $0x2,%al
jnz seta20.2
movb $0xdf,%al # 0xdf -> port 0x60
outb %al,$0x60
# Switch from real to protected mode. Use a bootstrap GDT that makes
# virtual addresses map directly to physical addresses so that the
# effective memory map doesn't change during the transition.
lgdt gdtdesc
movl %cr0, %eax
orl $CR0_PE, %eax
movl %eax, %cr0
//PAGEBREAK!
# Complete the transition to 32-bit protected mode by using a long jmp
# to reload %cs and %eip. The segment descriptors are set up with no
# translation, so that the mapping is still the identity mapping.
ljmp $(SEG_KCODE<<3), $start32
And then the system would waite for a ‘signal’ from 0x64 port and sends 0xd1 to the same port. Also my professor talked about the every step’s meaning but I choose to skip that because I garuantee you and me would forget that in one day. Anyway, the comment says seta20.1
and seta20.2
are remove the limitation to A20. After that, the system would switch to the protected mode.
GDT, The Global Descriptor Table
. It records the Segment Descriptors, which stores the information to the corresponding sgement, including the address and the permissions. In order to jmp to the protected mode, the code firstly sets the CR0_PE(protect mode enable) and secondly use a long jump to jump to the 32bit-start. Btw, the long jump woould also set the cs register to $(SEG_KCODE<<3)
.
.code32 # Tell assembler to generate 32-bit code now.
start32:
# Set up the protected-mode data segment registers
movw $(SEG_KDATA<<3), %ax # Our data segment selector
movw %ax, %ds # -> DS: Data Segment
movw %ax, %es # -> ES: Extra Segment
movw %ax, %ss # -> SS: Stack Segment
movw $0, %ax # Zero segments not ready for use
movw %ax, %fs # -> FS
movw %ax, %gs # -> GS
# Set up the stack pointer and call into C.
movl $start, %esp
call bootmain
# If bootmain returns (it shouldn't), trigger a Bochs
# breakpoint if running under Bochs, then loop.
movw $0x8a00, %ax # 0x8a00 -> port 0x8a00
movw %ax, %dx
outw %ax, %dx
movw $0x8ae0, %ax # 0x8ae0 -> port 0x8a00
outw %ax, %dx
spin:
jmp spin
Similar as the start
, start32
would also set the register and jump to next boot handle: bootmain
. That’s a c
code function in bootmain.c
.
bootmain.c
void
bootmain(void)
{
struct elfhdr *elf;
struct proghdr *ph, *eph;
void (*entry)(void);
uchar* pa;
elf = (struct elfhdr*)0x10000; // scratch space
// Read 1st page off disk
readseg((uchar*)elf, 4096, 0);
// Is this an ELF executable?
if(elf->magic != ELF_MAGIC)
return; // let bootasm.S handle error
// Load each program segment (ignores ph flags).
ph = (struct proghdr*)((uchar*)elf + elf->phoff);
eph = ph + elf->phnum;
for(; ph < eph; ph++){
pa = (uchar*)ph->paddr;
readseg(pa, ph->filesz, ph->off);
if(ph->memsz > ph->filesz)
stosb(pa + ph->filesz, 0, ph->memsz - ph->filesz);
}
// Call the entry point from the ELF header.
// Does not return!
entry = (void(*)(void))(elf->entry);
entry();
}
The bootmain loads the real kernel. First, it read the kernel-elf from the disk and does some simple checkes. Second, it loads the segments and jmp to the entry. So it’s the loader of the kernel. And we only have the physical address now but the kernel should be on 0x80100000 virtual memory so we need a simple linker. You can find it in kernel.ld
file. By the linker, We could map the entry function to the correct virtual memory address.
The start of the systerm
In entry.S:
.globl entry
entry:
# Turn on page size extension for 4Mbyte pages
movl %cr4, %eax
orl $(CR4_PSE), %eax
movl %eax, %cr4
# Set page directory
movl $(V2P_WO(entrypgdir)), %eax
movl %eax, %cr3
# Turn on paging.
movl %cr0, %eax
orl $(CR0_PG|CR0_WP), %eax
movl %eax, %cr0
# Set up the stack pointer.
movl $(stack + KSTACKSIZE), %esp
# Jump to main(), and switch to executing at
# high addresses. The indirect call is needed because
# the assembler produces a PC-relative instruction
# for a direct jump.
mov $main, %eax
jmp *%eax
It did some preparations and jump to the kernel main, that’s in main.c
int
main(void)
{
kinit1(end, P2V(4*1024*1024)); // phys page allocator
kvmalloc(); // kernel page table
mpinit(); // detect other processors
lapicinit(); // interrupt controller
seginit(); // segment descriptors
picinit(); // disable pic
ioapicinit(); // another interrupt controller
consoleinit(); // console hardware
uartinit(); // serial port
pinit(); // process table
tvinit(); // trap vectors
binit(); // buffer cache
fileinit(); // file table
ideinit(); // disk
startothers(); // start other processors
kinit2(P2V(4*1024*1024), P2V(PHYSTOP)); // must come after startothers()
userinit(); // first user process
mpmain(); // finish this processor's setup
}
Here are all the procedures before the main. This passage tells nothing important, it’s just a simple “go through” passage.
kinit1
kinit1
is the first function call of the main, it would init the memory manager of xv6. Actually, it’s pretty easy. It has 1024 chunks whose size euqals the pagesize. It may lead to great waste but simple. kinit would add 1024 pages to the freelist. It’s a linked list of freed pages.
void
kfree(char *v)
{
struct run *r;
if((uint)v % PGSIZE || v < end || V2P(v) >= PHYSTOP)
panic("kfree");
// Fill with junk to catch dangling refs.
memset(v, 1, PGSIZE);
if(kmem.use_lock)
acquire(&kmem.lock);
r = (struct run*)v;
r->next = kmem.freelist;
kmem.freelist = r;
if(kmem.use_lock)
release(&kmem.lock);
}
// Allocate one 4096-byte page of physical memory.
// Returns a pointer that the kernel can use.
// Returns 0 if the memory cannot be allocated.
char*
kalloc(void)
{
struct run *r;
if(kmem.use_lock)
acquire(&kmem.lock);
r = kmem.freelist;
if(r)
kmem.freelist = r->next;
if(kmem.use_lock)
release(&kmem.lock);
return (char*)r;
}
virtual memory
In main, the kvmalloc is the entry of setting up the virtual memory. As you can see in the following code, it’s just a simple warrper. The setupkvm
would set up the virtual memory, while the sitchkvm
would set the cr3
register so that the whole system is going to run in virtual memory.
void
kvmalloc(void)
{
kpgdir = setupkvm();
switchkvm();
}
// Set up kernel part of a page table.
pde_t*
setupkvm(void)
{
pde_t *pgdir;
struct kmap *k;
if((pgdir = (pde_t*)kalloc()) == 0)
return 0;
memset(pgdir, 0, PGSIZE);
if (P2V(PHYSTOP) > (void*)DEVSPACE)
panic("PHYSTOP too high");
for(k = kmap; k < &kmap[NELEM(kmap)]; k++)
if(mappages(pgdir, k->virt, k->phys_end - k->phys_start,
(uint)k->phys_start, k->perm) < 0) {
freevm(pgdir);
return 0;
}
return pgdir;
}
And in the for loop
, the system maps elements in the kmap to the vertial memory. The struction is shown in the same file.
static struct kmap {
void *virt;
uint phys_start;
uint phys_end;
int perm;
} kmap[] = {
{ (void*)KERNBASE, 0, EXTMEM, PTE_W}, // I/O space
{ (void*)KERNLINK, V2P(KERNLINK), V2P(data), 0}, // kern text+rodata
{ (void*)data, V2P(data), PHYSTOP, PTE_W}, // kern data+memory
{ (void*)DEVSPACE, DEVSPACE, 0, PTE_W}, // more devices
};
The Physical adddress layout is similar to
0x...fffffff
---------------
Devices
---------------> 0xFE000000
...
---------------> 0x0E000000
kernel data
--------------->
kernel text
---------------> 0x100000
I/O space
---------------> 0x0
And these segments’ virtual address equals physical address+0x80000000
.
The mappages
function is very important, we can have deeper nderstanding about the virtual memory by reading its code.
// Create PTEs for virtual addresses starting at va that refer to
// physical addresses starting at pa. va and size might not
// be page-aligned.
static int
mappages(pde_t *pgdir, void *va, uint size, uint pa, int perm)
{
char *a, *last;
pte_t *pte;
a = (char*)PGROUNDDOWN((uint)va);
last = (char*)PGROUNDDOWN(((uint)va) + size - 1);
for(;;){
if((pte = walkpgdir(pgdir, a, 1)) == 0)
return -1;
if(*pte & PTE_P)
panic("remap");
*pte = pa | perm | PTE_P;
if(a == last)
break;
a += PGSIZE;
pa += PGSIZE;
}
return 0;
}
The mappages
function takes 5 parameters, the pgdir
means Page Directory
and the av
means Virtual Address
. It uses walkpgdir
function to interacte with the Page Directory
.
// Return the address of the PTE in page table pgdir
// that corresponds to virtual address va. If alloc!=0,
// create any required page table pages.
static pte_t *
walkpgdir(pde_t *pgdir, const void *va, int alloc)
{
pde_t *pde;
pte_t *pgtab;
pde = &pgdir[PDX(va)];
if(*pde & PTE_P){
pgtab = (pte_t*)P2V(PTE_ADDR(*pde));
} else {
if(!alloc || (pgtab = (pte_t*)kalloc()) == 0)
return 0;
// Make sure all those PTE_P bits are zero.
memset(pgtab, 0, PGSIZE);
// The permissions here are overly generous, but they can
// be further restricted by the permissions in the page table
// entries, if necessary.
*pde = V2P(pgtab) | PTE_P | PTE_W | PTE_U;
}
return &pgtab[PTX(va)];
}
It would firstly take the PDX(Page Directory Index) of the virtual address (pde = &pgdir[PDX(va)];
) and check the corresponding page table
’s entry on the Page Directory
. If the entry is NULL, it would call kalloc()
to create a new page table
. Besides, the function will find and return the PTE(page table entry
) by using the PTX(Page Table Index
). In summary, this function would check if the virtual memory page exists or not.
- If not, this function would allocate the page and return the physical address.
- If yes, return the physical address.
The virtual memory could be split to three parts(PDX,PTX,OFFSET). PDX and PTX would be used to locate the physical memory pages.
// A virtual address 'la' has a three-part structure as follows:
//
// +--------10------+-------10-------+---------12----------+
// | Page Directory | Page Table | Offset within Page |
// | Index | Index | |
// +----------------+----------------+---------------------+
// \--- PDX(va) --/ \--- PTX(va) --/
Finally, in one word, setupkvm
sets up the virtual memory from the given kmap
and return the page directory.
After inititalizing the virtual memory, the system also calls tons of init functions. I read the code and have a more completed image about the system. But I don’t totally understand these topics so I would not have intro about these inits.
Run The System
mpmain
is the last function call of the main
function.
// Common CPU setup code.
static void
mpmain(void)
{
cprintf("cpu%d: starting %d\n", cpuid(), cpuid());
idtinit(); // load idt register
xchg(&(mycpu()->started), 1); // tell startothers() we're up
scheduler(); // start running processes
}
In this function CPUs would start their work by running the scheduler
. I’ll read the code about schduler
in the next note.