Introduction
I took ECE 565 in the final semester of my bachelors degree. It was generally known to be a hard class, and while I did find it challenging, it was also very interesting. We covered a variety of topics including scheduling, synchronization, virtual memory, and file systems.
There were four class projects. Each project required us to modify the XINU operating system to implement various features. XINU is an educational x86 operating system which we ran inside VirtualBox. Everything was programmed in C.
My instructor was Dr. Michela Becchi. She did a very good job explaining the course content and seemed to have a genuine interest in the material.
Project 1
For the first project, we modified XINU’s process creation and termination logic. The highlight of this project was the implementation of the UNIX fork function.
XINU is not a UNIX compliant OS and all processes are created through a create system call. The system create function takes a function pointer and sets everything up for a brand new process to spawn running that function. Unlike the fork function, it does not have the ability to copy memory from an existing process’s stack, however.
UNIX Fork
To implement the fork function, I started with XINU’s existing code for creating a new process. My new code had to copy the stack of the current process up until when fork was called. Because XINU does not implement virtual memory or address spaces, I had to modify the new stack to ensure frame pointers were still valid. I also had to push registers onto the new stack to ensure the new process spawned with the right return values from fork.
There were some flaws with our approach for this project. If the parent process stored a pointers to its own stack, the pointers would still point to the parent’s stack for the child. In a modern system, virtual memory would address this issue. With virtual memory, the child and parent could share the same virtual address space but with isolated physical pages.
Project 2
For the second project, we implemented a few new scheduler for the XINU system.
Lottery Scheduler
The first scheduler I implemented was a lottery scheduler. In a lottery scheduler, each process is assigned a certain number of tickets. Whenever processes are eligible for execution, a “lottery” is held and a random ticket is selected. The process holding this ticket gets to execute.
One of the major advantages of a lottery scheduler is that it reduces task starvation. Unlike a standard priority system where high priority tasks always preempt low priority tasks, a process with fewer tickets still has a chance to run every now and then.
Multilevel Feedback Queue
The second scheduler I implemented was a MLFQ, or multilevel feedback queue, scheduler. This scheduler works with several different levels of “queues.” Initially, new processes are placed in the highest priority queue. However, each process has a fixed processing time allotment for each level of the queue. When the time allotment runs out, the process is moved to a lower priority queue.
Over time, this means all processes will eventually use up their time allotments and all have the same lowest priority. To avoid this, the priority of all tasks is periodically boosted back to the highest priority and the entire cycle repeats.
A MLFQ scheduler is primarily beneficial for I/O bound processes. The time allotment for a process only decreases when it is actually running and taking up CPU time. On modern systems where there are many I/O processes which spend a lot of time waiting on I/O like disk, network, or user input, I/O processes can remain in the higher priority queues for much longer.
Project 3
For the third project, we implemented various synchronization mechanisms for XINU. We implemented a lock using the x86 XCHG instruction, and then had to implement deadlock and priority inversion detection for it.
Test and Set
For the first task, we implemented a test_and_set function which atomically sets a variable and returns its old value. This function had to be written in x86 assembly with the x86 XCHG instruction. The XCHG function can atomically exchange the contents of a register and a memory location. The atomicity of this function is important to ensure that two processes can’t try to lock a lock at the same time.
Lock Parking
Although the first implementation of the lock simply used a spin loop to wait for lock to become available, the second implementation was more advanced. Spin locks can easily cause deadlocks when dealing with different process priorities, especially the lock never sleeps while waiting. They also waste system resources.
As described by the textbook, my code first acquired a spin locked “guard” to ensure it is safe to modify the locks variables. This spin lock guard is mostly safe since it is quickly released after being acquired. After acquiring the guard, if the lock has already been acquired, the kernel puts the process into a waiting queue. Finally, the process goes into a “parked,” or sleeping, state.
When lock is released, the guard must acquired as before. If there are other processes in the queue, we can unpark them and let them run. By parking processes, busy waiting is avoided and system resources are saved.
Deadlock Detection
A deadlock occurs when when a process (A) tries to acquire a lock held by a second process (B) which is waiting for a lock held by the first process (A). Since the second process (B) might be waiting indirectly on the first process (A), this an be a bit tricky to detect. For example, the second process (B) might be waiting on a third process (C) which is waiting on the first process (A).
To detect deadlocks, whenever a process tried to acquire a lock, I checked the process holding the lock. If that process was waiting to acquire a lock, I would check the process which currently held the lock. If as the code loops through the checks it finds the original process, then I knew that there was a cyclic dependency and a deadlock has occurred.
Priority Inversion
Priority inversion occurs when a high priority process waits for a lock held by a low priority process. If the system is busy and the low priority process is being starved, the high priority process could be stuck for quite a while waiting for the low priority process to finish.
To avoid priority inversion, I upgraded the priority of the process holding the lock to the highest priority of any process waiting on or holding the lock. By upgrading the priority, a low priority process won’t be delayed unless the highest priority process would also have been delayed.
Project 4
In the final project, we implemented virtual memory for XINU. Unlike the other projects, this one was a group project, so I worked on a team of two. We only implemented virtual memory for user space processes. All processes were provided access to the kernel’s memory, which greatly simplified the programming.
Virtual Memory
Virtual memory is a way of separating physical memory addresses from the addresses which individual processes use. Memory is divided into chunks called “pages,” and each page can be mapped to different virtual addresses depending on the current page table.
Paging and virtual memory has a multitude of advantages. For example, suppose processes are assigned a stack at the beginning of a memory space and the heap at the end after a ~4GB gap. In a traditional physically addressed system, this would be wildly inefficient since the 4GB gap is wasted. With virtual memory, however, the operating system can wait to assign physical pages of memory to the unused gap until it is accessed.
Page Tables
One of the trickiest aspects of adding virtual memory was initializing the page directories and page tables. The page tables contain physical addresses pointing to different pages in memory. Each process has a page directory which points to the process’s page tables. The x86 processor uses a combination of registers and the entries in these table to translate virtual addresses into their corresponding physical addresses.
Lazy Allocation
As part of the project, me and my teammate implemented a lazy malloc function. Our malloc function found free space in the virtual address space. It marked the corresponding page table entries as valid, but it did not set the present flag or allocate a physical page for the entry.
The first time this memory is accessed, the x86 hardware throws a page fault because the present bit is unset. We wrote a page fault handler which allocates a physical page and sets the present bit when this occurs.
Swapping
For extra credit, my teammate implemented swapping. Unfortunately, I was busy with other end of semester projects, so I was not able to help much with it. However, I helped enough to understand how our implementation works.
Whenever a page is accessed, the x86 hardware sets an accessed flag on the corresponding page table entry. If the kernel ever tries to allocate a physical page but fails, it can “swap” some pages out to make room. Pages which do not have this bit set are the first ones to go.
In our implementation, we swapped the pages to a different section of memory, set a custom swapped flag, and unset the present flag. In a real system, the pages would likely be swapped to a storage device, such as an SSD. Since this would complicate the project and distract from virtual memory, we were not required to do this.
When the swapped memory is accessed, a page fault is thrown. Similarly to how we lazily allocate pages, we detect the swapped flag and move the page back into physical memory.
Review
If you are a Computer Engineering student at NCSU, I would highly recommend this course if it sounds interesting. However, be warned that the projects, although interesting, are a lot of work. If you don’t enjoy the subjects or have trouble grasping the concepts, this course will be very difficult and might not be a good choice.
I have probably done a poor job explaining how most of these concepts work, but Dr. Becchi does a much better job over the course of a semester.