As with previous assignments, we will be using GitHub to distribute skeleton code and collect submissions. Please refer to our Git Workflow guide for more details. Note that we will be using multiple tags for this assignment, for each deliverable part.
NOTE: If at all possible, please try to submit using x86. If one of your group
members owns an x86 machine, test on that machine prior to submitting, and do not
commit a .armpls
file. This will make grading much easier for us.
For students on arm64 computers (e.g. M1/M2 machines): if you want your
submission to be built/tested for ARM, you must create and submit a file called
.armpls
in the top-level directory of your repo; feel free to use the
following one-liner:
cd "$(git rev-parse --show-toplevel)" && touch .armpls && git add -f .armpls && git commit .armpls -m "ARM pls"
You should do this first so that this file is present in all parts.
There is a script in the skeleton code named run_checkpatch.sh
. It is a
wrapper over linux/scripts/checkpatch.pl
, which is a Perl script that comes
with the Linux kernel that checks if your code conforms to the kernel coding
style.
Execute run_checkpatch.sh
to see if your code conforms to the kernel style –
it’ll let you know what changes you should make. You must make these changes
before pushing a tag. Passing run_checkpatch.sh
with no warnings and no errors
is required for this assignment.
First, read the OSTEP and LKD reading assignments on scheduling and the Linux CFS scheduler.
Start 10 processes of a CPU-bound program in a VM that has multiple CPUs.
Make all 10 processes run on a single CPU. Arrange the relative priorities
between them so that 5 of them will use about 70% of the CPU and the other 5
will use the remaining 30%. The processes in each group of 5 will have equal
priorities between them. Verify it using the top
or htop
command.
Here is a part of the screen of a sample top
session:
top - 15:25:39 up 30 min, 3 users, load average: 9.59, 6.08, 2.58
Tasks: 220 total, 11 running, 209 sleeping, 0 stopped, 0 zombie
%Cpu0 : 0.0/0.7 1[ ]
%Cpu1 : 0.3/0.0 0[ ]
%Cpu2 : 0.0/0.0 0[ ]
%Cpu3 : 100.0/0.0 100[|||||||||||||||||||||||||||||||||||||||||||]
%Cpu4 : 0.3/0.3 1[ ]
%Cpu5 : 0.0/0.0 0[ ]
%Cpu6 : 0.0/0.3 0[ ]
%Cpu7 : 0.0/0.0 0[ ]
MiB Mem : 20.2/3895.9 [|||||||||||||||| ]
MiB Swap: 0.0/975.0 [ ]
PID %CPU COMMAND
1282 13.3 `- myprogram
1283 13.3 `- myprogram
1284 13.3 `- myprogram
1285 13.3 `- myprogram
1286 13.3 `- myprogram
1287 6.6 `- myprogram
1288 6.6 `- myprogram
1289 6.6 `- myprogram
1290 6.6 `- myprogram
1291 6.6 `- myprogram
It shows the 10 processes of myprogram
, dividing the CPU time
as specified. I chose to run them on the Cpu2
among the
8 available cores.
You can code up your own program (and you don’t have to name it
myprogram
), or you can use any existing command line tools to create the
scenario.
Now, add one more process of myprogram
, but this time with real-time
priority. Verify that the real-time process preempts all the other 10
processes. Here is a sample top
session for this:
top - 15:28:32 up 33 min, 3 users, load average: 10.33, 7.92, 3.90
Tasks: 219 total, 13 running, 206 sleeping, 0 stopped, 0 zombie
%Cpu0 : 0.0/2.9 3[|| ]
%Cpu1 : 0.0/0.0 0[ ]
%Cpu2 : 0.0/0.0 0[ ]
%Cpu3 : 100.0/0.0 100[|||||||||||||||||||||||||||||||||||||||||||]
%Cpu4 : 0.0/0.0 0[ ]
%Cpu5 : 0.0/0.0 0[ ]
%Cpu6 : 0.3/0.0 0[ ]
%Cpu7 : 0.0/0.0 0[ ]
MiB Mem : 21.2/3895.9 [|||||||||||||||| ]
MiB Swap: 0.0/975.0 [ ]
PID %CPU COMMAND
1282 0.7 `- myprogram
1283 0.7 `- myprogram
1284 0.7 `- myprogram
1285 0.7 `- myprogram
1286 0.7 `- myprogram
1287 0.3 `- myprogram
1288 0.3 `- myprogram
1289 0.3 `- myprogram
1290 0.3 `- myprogram
1291 0.3 `- myprogram
1297 95.0 `- myprogram
Describe in your written_answers.txt
how you created the task #1 scenario.
Include the command lines you executed
Include the top
output
You can remove the irrelevant rows like I did above, but do NOT remove any columns
Make sure to include the part where it shows the loads of all CPUs
Include your program in the user/test/
subdirectory if you wrote any
Do the same for the task #2
To submit this part, push the hw6p1handin
tag with the following:
$ git tag -a -m "Completed hw6 part1." hw6p1handin
$ git push origin master
$ git push origin hw6p1handin
LKD chapter 11
Pages 207 - 217: Required
Pages 217 - 230: Recommended – It will help round out your understanding of the topic; at the very least, skim the pages.
Con Kolivas’s BFS FAQ
Con Kolivas’s MuQSS guide
Please write the answers to the following questions in your written_answers.txt
.
Briefly describe the advantages and disadvantages of a larger HZ.
What is the HZ currently configured for your running Linux system?
What are jiffies? Explain the relationship between jiffies, HZ, and time.
Find the current value of jiffies
in your system, and report it in your
written_answers.txt
.
In minutes, how much time does this jiffies
value represent?
Does it match the uptime reported by the uptime
command? (Hint: it
doesn’t.) Please give the formula to convert jiffies
to the current
(real) uptime, in minutes.
Why does this large difference exist? (Hint: in 32-bit Linux systems,
jiffies
is a 32-bit value.)
What are Niffies? How do they differ from Jiffies?
To submit this part, push the hw6p2handin
tag with the following:
$ git tag -a -m "Completed hw6 part2." hw6p2handin
$ git push origin master
$ git push origin hw6p2handin
In this assignment, we add a new Linux scheduler called Freezer. It has the following features and characteristics:
Implements a simple round-robin scheduling algorithm.
Supports SMP.
Every task has the same time slice of 100 milliseconds.
When deciding which CPU a task should be assigned to, it should be assigned to the CPU with the fewest Freezer tasks. In addition, when a CPU becomes idle, it will steal a task from another CPU.
The Freezer scheduling policy, SCHED_FREEZER
, sits right in between
SCHED_NORMAL
and the real-time scheduling policies. That is,
SCHED_FREEZER
takes priority over SCHED_NORMAL
, but not over SCHED_RR
or SCHED_FIFO
.
Parts 3-6 are structured to guide you in developing a new scheduler. Each part represents a milestone towards a working scheduler implementation. This forces you to develop incrementally. This also allows us to award you partial credit if you cannot get the whole scheduler working at the end.
In part 3, you will code up all the necessary data structures for Freezer and add them to the right places, but they will remain unused by the rest of the kernel.
In part 4, you will flesh out the implementation of Freezer and enable it in the kernel. But you will keep the old CFS scheduler as the default, so that the system will boot even if your Freezer implementation is still unstable.
In part 5, you will make Freezer the default scheduler of your kernel. All normal processes and kernel threads will run on Freezer (unless they are explicitly assigned to a specific scheduling policy).
In part 6, you will add idle load balancing to Freezer. This allows processes to make better use of available CPUs.
Please make sure you have done the OSTEP and LKD reading assignments on scheduling and reviewed the lecture slides on Linux scheduling classes.
A section from another book, Professional Linux Kernel Architecture, by Wolfgang Mauerer
Implement the following data structures and definitions. For grading purposes, please use the EXACT names.
/* SCHED_FREEZER must be defined to be 7 */
#define SCHED_FREEZER 7
/*
* define FREEZER_TIMESLICE so that every task receives
* a time slice of 100 milliseconds
*/
#define FREEZER_TIMESLICE <the-right-timeslice-value>
/* freezer run queue and entity */
struct freezer_rq;
struct sched_freezer_entity;
/*
* freezer scheduling policy implementation;
* put the code in kernel/sched/freezer.c
*/
struct sched_class freezer_sched_class;
The freezer_sched_class
methods can be left incomplete at this point because
no one is going to call them. Maybe have the struct and functions defined,
but leave most of them empty. Maybe put some TODO
comments. Your objective
in this part is to get all your data structures in the right place and get
them compiled.
idle.c
for an example of a very minimal sched_class
object.Learn from the man page of the ps
command what the following command shows
you:
ps -e --forest -o sched,policy,psr,pcpu,c,pid,user,cmd
Use the ps
command above before and after this part, to verify that your
addition to the kernel had no effect.
Points for this part will be based on your design and placement of these data structures.
Please put freezer_sched_class
in kernel/sched/freezer.c
in your Linux
source tree.
To submit this part, push the hw6p3handin
tag with the following:
$ git tag -a -m "Completed hw6 part3." hw6p3handin
$ git push origin master
$ git push origin hw6p3handin
Now we turn on the freezer, and hook it up to the rest of the kernel. You will
have to modify various parts of the kernel to enable the new SCHED_FREEZER
scheduling policy. However, in this part, SCHED_NORMAL
will remain as the
default policy.
Begin to implement the Freezer scheduler according to the specifications in
the overview, such that you should be able to set a process to
SCHED_FREEZER
and see it run.
ps
command
from part 3 to verify that everything is still the same when you booted
into your new Freezer-enabled kernel.Write set-freezer
, a command line program that changes a process’s
scheduling policy to SCHED_FREEZER
. It takes a PID as a argument.
Your objective in this part is to have your kernel and the set-freezer
program to work together successfully, so you can change a process’s
scheduling policy to Freezer, and verify it with the ps
command from part
3.
If your Freezer code is not quite stable at this point, it might be easier
to get things to work if you use I/O-bound processes for testing, like your
editor. Here is a sample session of changing 6 gvim
processes, and an
excerpt from the ps
output:
$ ./set-freezer 1522
[1522] sched policy changed: 0 --> 7
$ ./set-freezer 1527
[1527] sched policy changed: 0 --> 7
$ ./set-freezer 1532
[1532] sched policy changed: 0 --> 7
$ ./set-freezer 1537
[1537] sched policy changed: 0 --> 7
$ ./set-freezer 1542
[1542] sched policy changed: 0 --> 7
$ ./set-freezer 1547
[1547] sched policy changed: 0 --> 7
0 TS 0 0.0 0 514 jae /usr/lib/at-spi2-core/at-spi2-registryd
0 TS 2 0.0 0 1165 root /usr/sbin/rpc.statd --no-notify
0 TS 1 0.0 0 1167 rpc /usr/bin/rpcbind -w
7 #7 0 0.0 0 1522 jae gvim -R set-freezer.c
7 #7 1 0.0 0 1527 jae gvim -R set-freezer.c
7 #7 1 0.0 0 1532 jae gvim -R set-freezer.c
7 #7 1 0.0 0 1537 jae gvim -R set-freezer.c
7 #7 0 0.0 0 1542 jae gvim -R set-freezer.c
7 #7 2 0.0 0 1547 jae gvim -R set-freezer.c
Some relevant source files: core.c
, rt.c
, fair.c
This is going to be really hard to debug. You might want to invest a little
bit of time up front and learn how to use trace_printk()
, and the ftrace
tracing framework in general. Googling turns up many resources.
You might also find that your kernel will crash before you can see your
debug messages show up in the log buffer (ie. before you can run dmesg
).
Instead of printing to the kernel log buffer, you can “redirect” your
messages to a file on your host machine by setting up a serial port. See the
Serial Console section of our Linux Kernel Debugging guide
for details.
Make sure that your Freezer task is actually using freezer_sched_class
.
For example, you can confirm that your functions are being called by adding
a pr_info()
to a function like freezer_enqueue_task()
, and verify that
it appears in dmesg
output upon running your Freezer program. You should
not submit with these debugging prints; these can break your Part 5 if
invoked too frequently.
Modifications to the Linux source tree implementing the Freezer scheduler.
Please implement your freezer_sched_class
in kernel/sched/freezer.c
in your Linux source tree.
You may also need to modify other source files in the Linux kernel.
A set-freezer
program that takes in a PID as an argument and sets the
scheduling policy of that process to SCHED_FREEZER
.
Makefile
for set-freezer
in
user/test/set-freezer/
. Please make sure to put these files in the
set-freezer
subdirectory, not in the test
directory.A shell session in your README.txt
showing you successfully setting a
process’s scheduling policy to SCHED_FREEZER
with set-freezer
, and
verifying it with ps
.
To submit this part, push the hw6p4handin
tag with the following:
$ git tag -a -m "Completed hw6 part4." hw6p4handin
$ git push origin master
$ git push origin hw6p4handin
Now let’s make Freezer the default scheduling policy. This means that all system
and user tasks will have the SCHED_FREEZER
policy by default. You need to set
the Freezer scheduling policy for the init_task
, which will be inherited by
all its descendants. You also need to set it for all kernel threads, which is
done in kernel/kthread.c
.
Modify the Linux kernel source to use SCHED_FREEZER
as the default
scheduling policy.
Verify that the ps
output shows that most tasks, including kernel threads,
are running under Freezer. The SCH column will show 7
and the POL column
will show #7
.
Verify that new tasks get the Freezer policy.
Verify that the tasks are assigned to CPUs as specified.
Verify that the tasks run in round-robin fashion.
Verify that your scheduler is stable and can handle running multiple CPU-intensive tasks.
Here is a part of a ps
command output that shows 20 CPU-bound processes (not
counting the four parent processes that forked them) running across 4 CPUs:
$ ps -e --forest -o sched,policy,psr,pcpu,c,pid,user,cmd
SCH POL PSR %CPU C PID USER CMD
7 #7 3 0.0 0 2 root [kthreadd]
7 #7 0 0.0 0 3 root \_ [rcu_gp]
7 #7 0 0.0 0 4 root \_ [rcu_par_gp]
7 #7 0 0.0 0 6 root \_ [kworker/0:0H-kblockd]
7 #7 1 0.0 0 7 root \_ [kworker/u8:0-events_unbound]
7 #7 0 0.0 0 8 root \_ [mm_percpu_wq]
7 #7 0 0.0 0 9 root \_ [ksoftirqd/0]
7 #7 2 0.0 0 10 root \_ [rcu_sched]
7 #7 0 0.0 0 11 root \_ [rcu_bh]
1 FF 0 0.0 0 12 root \_ [migration/0]
7 #7 0 0.0 0 13 root \_ [kworker/0:1-events]
7 #7 0 0.0 0 14 root \_ [cpuhp/0]
7 #7 1 0.0 0 15 root \_ [cpuhp/1]
1 FF 1 0.0 0 16 root \_ [migration/1]
...
7 #7 3 0.1 0 1 root /sbin/init
7 #7 3 0.0 0 221 root /lib/systemd/systemd-journald
7 #7 3 0.0 0 241 root /lib/systemd/systemd-udevd
7 #7 1 0.0 0 454 root /usr/sbin/ModemManager --filter-policy=strict
7 #7 2 0.0 0 455 root /usr/sbin/cron -f
7 #7 0 0.0 0 456 avahi avahi-daemon: running [porygon.local]
7 #7 2 0.0 0 509 avahi \_ avahi-daemon: chroot helper
...
7 #7 3 0.0 0 465 root /lib/systemd/systemd-logind
7 #7 1 0.0 0 510 root /usr/lib/policykit-1/polkitd --no-debug
7 #7 1 0.0 0 511 root /usr/sbin/alsactl -E HOME=/run/alsa -s -n 19 -c rdaemon
7 #7 0 0.0 0 535 root /usr/sbin/sshd -D
7 #7 0 0.0 0 761 root \_ sshd: hans [priv]
7 #7 2 0.0 0 779 hans \_ sshd: hans@pts/0
7 #7 0 0.0 0 780 hans \_ -bash
7 #7 1 0.0 0 808 hans \_ tmux
...
7 #7 2 0.0 0 810 hans tmux
7 #7 2 0.0 0 811 hans \_ -bash
7 #7 0 0.0 0 1001 hans | \_ ps -e --forest -o sched,policy,psr,pcpu,c,pid,user,cmd
7 #7 0 0.0 0 822 hans \_ -bash
7 #7 0 0.0 0 970 hans \_ ./myprogram
7 #7 0 19.9 19 972 hans | \_ ./myprogram
7 #7 0 19.8 19 973 hans | \_ ./myprogram
7 #7 0 19.8 19 974 hans | \_ ./myprogram
7 #7 0 19.9 19 975 hans | \_ ./myprogram
7 #7 0 19.9 19 976 hans | \_ ./myprogram
7 #7 1 0.0 0 977 hans \_ ./myprogram
7 #7 1 19.8 19 979 hans | \_ ./myprogram
7 #7 1 19.8 19 980 hans | \_ ./myprogram
7 #7 1 19.9 19 981 hans | \_ ./myprogram
7 #7 1 19.9 19 982 hans | \_ ./myprogram
7 #7 1 19.8 19 983 hans | \_ ./myprogram
7 #7 2 0.0 0 985 hans \_ ./myprogram
7 #7 2 20.0 20 987 hans | \_ ./myprogram
7 #7 2 20.0 20 988 hans | \_ ./myprogram
7 #7 2 19.9 19 989 hans | \_ ./myprogram
7 #7 2 19.8 19 990 hans | \_ ./myprogram
7 #7 2 19.8 19 991 hans | \_ ./myprogram
7 #7 3 0.0 0 992 hans \_ ./myprogram
7 #7 3 19.9 19 994 hans \_ ./myprogram
7 #7 3 20.0 20 995 hans \_ ./myprogram
7 #7 3 19.9 19 996 hans \_ ./myprogram
7 #7 3 19.9 19 997 hans \_ ./myprogram
7 #7 3 19.9 19 998 hans \_ ./myprogram
Focus on making a stable system that can handle running multiple CPU-bound processes, distributed to all available CPUs.
Refer to the debugging tips in Part 4.
Note that the ps
output above shows balanced usage stats in the “%CPU”
column, as expected. If you can get the stats to display correctly, awesome.
If not, and yours shows nonsensical numbers, don’t worry about it.
Modifications to the Linux source tree setting the Freezer scheduler to be the default scheduler.
Documentation of any testing you did to verify that your Freezer scheduler is
indeed the default, is stable, and schedules processes according to
specification. Write this in README.txt
.
To submit this part, push the hw6p5handin
tag with the following:
$ git tag -a -m "Completed hw6 part5." hw6p5handin
$ git push origin master
$ git push origin hw6p5handin
Finally, let’s add idle load balancing to Freezer. If a CPU is about to start
running the idle_task
because there are no other tasks running on it, try to
move a task from another CPU to the current CPU. This helps us utilize as many
CPUs as possible at all times, speeding up performance.
Our load balancing policy is as follows:
Try to find the CPU with the highest number of running Freezer tasks and manually move a task from that CPU to the current CPU.
There are a few exceptions to the above rule:
Do not steal a task from CPUs with fewer than 2 tasks.
Make sure to respect the CPU affinity of a given task.
Make sure to respect per-CPU kthreads. These should remain on their specified CPUs.
Do not move tasks that are currently running on a CPU (obviously).
If there is no movable task on the busiest CPU based on the above criteria, then it’s ok to not idle balance. Just give up.
The relevant sched_class
function for load balancing is .balance
.
Investigate how this is implemented and used in the CFS, RT, and DL
schedulers.
You may find referencing the following function helpful:
can_migrate_task()
.
Avoid deadlocks! To move a task from one rq
to another, you may need to
manually grab a rq
lock. Make sure you do this in a deadlock-free manner.
It’s okay if the busiest CPU you found is no longer the busiest CPU when you are actually moving the task.
To submit this part, push the hw6p6handin
tag with the following:
$ git tag -a -m "Completed hw6 part6." hw6p6handin
$ git push origin master
$ git push origin hw6p6handin
Compare Freezer with CFS.
If you got Freezer working, say a few words in your README.txt
about how it
performs compared to CFS. Your comparison doesn’t have to be a quantitative one.
You can explain the perceived difference as you have interacted with the two
schedulers.
If your Freezer is rock solid, you may perform a shoot-out between CFS and Freezer by timing kernel builds in both your fallback and your Freezer kernels:
Note that there is no right answer here. Your mileage may vary. The fact that we are running in a VM affects the results too.
Run make mrproper
to clean up your kernel build directory before you
build. Note that this will delete your .config
as well.
If you’ve optimized your kernel builds with ccache
, you should disable
that for this experiment.
Make sure to compile with make -j$(nproc)
.
You may find the time
utility handy.
Some other activities you might like to consider:
Watching hi-res video
Highly interactive tasks
CPU-intensive tasks
If you’d like to submit this part, push the hw6p7handin
tag with the
following:
git tag -a -m "Completed hw6 part7." hw6p7handin
git push origin master
git push origin hw6p7handin
Good luck!
Sankalp Singayapally, a TA for COMS W4118 Operating Systems I, Spring 2015, Columbia University, wrote the solution code for this assignment, based on his work in Fall 2014 in collaboration with Nicolas Mesa and Di Ruan.
Mitchell Gouzenko and Akira Baruah, TAs in Spring 2016, updated the code for the 4.1.18 kernel.
Benjamin Hanser and Emily Meng, TAs in Spring 2017, updated the code for the 4.4.50 kernel.
John Hui, TA in Spring 2018, updated the code for the 64-bit 4.9.81 kernel, and updated the BFS portion of the assignment to MuQSS.
Jonas Guan, TA in Spring 2019, updated the code for the 4.9.153 kernel.
Dave Dirnfeld and Hans Montero, TAs in Spring 2020, updated the assignment for the 4.19.50 kernel.
Kent Hall, John Hui, Xijiao Li, Hans Montero, Akhil Ravipati, Maÿlis Whetsel, and Tal Zussman, TAs in Fall 2021, updated the assignment for the 5.10.57 kernel.
Andy Cheng, Helen Chu, Phoebe Lu, Cynthia Zhang, and Tal Zussman, TAs in Spring 2023, updated the assignment for the 5.10.158 kernel and implemented idle load balancing.
Brennan McManus, TA in Spring 2024, updated the code for the 5.10.205 kernel.
Last updated: 2024-03-14