0xFF tl;dr

This article document what I learned from @zolutal’s blog Understanding x86_64 Paging

0x00 Prologue

This article will go through paging and understand the weird values in memory.

0x01 Enviroment

KZong: git@github.com:n132/KZone.git

0x02 Paging

We need paging since we want to provide virtual memory space for each process so all processes can assume they own the whole memory space. However, we can’t satisfy the huge amount of memory requests: a pointer in x64 has 64 bits so we can logically visit 2^64(more than 16777216 TB) memory but as we know, our little laptops don’t have so much RAM. Therefore, we should have a way to solve the problem and let the processes feel like they have 2^64 bytes of memory space. So a method to map the physical memory address to the virtual memory address is designed.

The smallest element of mapping for x86 linux is page, which is 0x1000 bytes for x86-64. We’ll introduce the method to transform a virtual memory address into a physical memory address.

0x02 Virtual Address

You may feel confused if you debugged user space programs with GDB when you find that all addresses didn’t use the high 2 bytes. For example, if you turn off aslr, 0x7ffffffde000 could be a stack address. If we counter it meticulously, we’ll find there are 12 hex characters, which means 6 bytes are used but a pointer has 8 bytes why do we only use low 48 bits(6 bytes)?

The virtual address is a set of indexes. It includes page offset and the indexes for physically addressing paging tables(directories). The paging table is also some page in on your RAM so we have at most 0x1000 bytes to store the values(512 entries), which is not enough to accommodate all page addresses so we need multi-level level tables.

In the implementation, there are 4 levels of paging tables: Page Global Directory(PGD), Page Upper Directory(PUD), Page Mid-level Directory(PMD), and Page Table Entry(PTE). Each directory requires one index to represent the entries to the next level paging directory. Knowing that each directory has 0x1000 bytes, we can easily compute that we at most have 0x200 entries, which need 9 bits to represent. Plug the 12 bits to represent the in-page offset, we need at least 48(9*4+12) bits. This is the reason why you see so many 6-byte addresses in user space.

There is a little function to get each level’s index from a virtual address:

def page_dir_index(addr):
    offset = addr&0xfff
    PET    = (addr>>12)&0x1ff
    PMD    = (addr>>12>>9)&0x1ff
    PUD    = (addr>>12>>9>>9)&0x1ff
    PGD    = (addr>>12>>9>>9>>9)&0x1ff
    return PGD, PUD, PMD, PET, offset

Assume we have a virtual memory address 0x7f9e8f25b001:

pwndbg> x/8gx 0x7f9e8f25b000
0x7f9e8f25b000: 0x00010102464c457f  0x0000000000000000
0x7f9e8f25b010: 0x00000001003e0003  0x00000000000020e0
0x7f9e8f25b020: 0x0000000000000040  0x000000000000b2c8
0x7f9e8f25b030: 0x0038004000000000  0x001d001e00400007

The corresponding indexes are: (255, 122, 121, 91, 1)

$CR3

We have all the indexes but where should we start? Also, for different processes, we may use the same virtual address. For example, we fork from one process. The child process has the same memory as the parent. However, what they do to their memory space should not influence each other. If we only use the knowledge we talked about in the previous section, we can’t make it. We have to have something special for each process.

The answer is $CR3 register. $CR3 stores the PGD value for each process. The child process has a different $CR3 value than the parent so we solve the problem!

While debugging, you can print the value of $CR3: (If you are not debugging the kernel, you can’t check $cr3)

pwndbg> p/x $cr3
$1 = 0x138138000

Also, you can find the value of PGD by checking the value of task_struct->mm->pgd for a specific task/process.

So we have all the information we need to convert a virtual memory address to a physical memory address!

0x03 Multi-level Page Directory

The more detailed structure of the Multi-level Paging Directory is demonstrated at Zolutal’s blog. I’ll give a lite and the not precise version here for later reference.

PGD

Assume our $cr3 value is 0x138138000 (if it doesn’t end with 000, please zero the last 12 bits to get PGD) and our address’s page directory indexes are (255, 122, 121, 91) so we need to visit the 255th slot of physical memory 0x138138000 to find the entry of next level page directory.

We have two methods to check the physical memory in GDB: monitor xp/gx <physical address> or x/gx physmap+<physical address>. Physmap is an area directory mapped from physical memory and it’s stored in the symbol page_offset_base and its default value is 0xffff888000000000 if there is no kaslr.

So we can use the above two commands to check the entry of next level page directory:

pwndbg> x/gx 0xffff888000000000+0x138138000+255*8
0xffff8881381387f8: 0x000000013836e067
pwndbg> monitor xp/gx 0x138138000+255*8
00000001381387f8: 0x000000013836e067

PUD

We’ll find another physical address: 0x000000013836e067. So we can simply zero the last 12 bits and the first 13 bits to get the PUD Physical Address:

# Since for most cases the high 2 bytes are zero, which means the first 13 bits should be zero I just simply zero the last 12 bits.
pwndbg> p/x (0x000000013836e067>>12<<12)
$1 = 0x13836e000
pwndbg> x/8gx 0xffff888000000000+0x13836e000+8*122
0xffff88813836e3d0: 0x000000013824c067  0x0000000000000000
0xffff88813836e3e0: 0x0000000000000000  0x0000000000000000
0xffff88813836e3f0: 0x0000000000000000  0x0000000000000000
0xffff88813836e400: 0x0000000000000000  0x0000000000000000
pwndbg> 

Another thing we should be careful of is the 7th bit for PUD and PWD. If it’s set, it’s hugfied, which means the physical address we got is the physical address of the final physical page page. But, for our example, it’s not hugified.

PMD

We use the same method to process the entry we got from PUD.

pwndbg> p/x (0x000000013824c067>>12<<12)
$2 = 0x13824c000
pwndbg> x/8gx 0xffff888000000000+0x13824c000+121*8
0xffff88813824c3c8: 0x00000001382fc067  0x00000001382fb067
0xffff88813824c3d8: 0x00000001382fa067  0x0000000138251067
0xffff88813824c3e8: 0x000000013701a067  0x0000000137017067
0xffff88813824c3f8: 0x0000000137016067  0x0000000137015067

So we got the next level entry: 0x00000001382fc067

PET

Apply the same method again, we’ll get the physical address.

pwndbg> p/x (0x00000001382fc067>>12<<12)
$3 = 0x1382fc000
pwndbg> monitor xp/8gx 0x1382fc000+0x8*91
00000001382fc2d8: 0x0000000139523025 0x000000013bb39025
00000001382fc2e8: 0x000000013bab9025 0x000000013958b025
00000001382fc2f8: 0x0000000138f60025 0x0000000138f61025
00000001382fc308: 0x0000000138f7d025 0x0000000138f5e025

pwndbg> monitor xp/8gx 0x0000000139523000+1
0000000139523001: 0x0000010102464c45 0x0300000000000000
0000000139523011: 0xe000000001003e00 0x4000000000000020
0000000139523021: 0xc800000000000000 0x00000000000000b2
0000000139523031: 0x0700380040000000 0x01001d001e004000

pwndbg> x/8gx 0x7f9e8f25b001
0x7f9e8f25b001: 0x0000010102464c45  0x0300000000000000
0x7f9e8f25b011: 0xe000000001003e00  0x4000000000000020
0x7f9e8f25b021: 0xc800000000000000  0x00000000000000b2
0x7f9e8f25b031: 0x0700380040000000  0x01001d001e004000

0x03 TLB

TLB is Translation Lookaside Buffer. As we talked above, we need to access the memory 4 times to get the physical address of a virtual address, which is expensive. But if we maintain a table to record the mapping, we can finish translation faster. Therefore, TLB is used.

0x04 Epilogue

In short, all virtual memory addresses can be transformed into physical memory addresses through paging.