freezer

Freezer

Why's this one called freezer? ...Oh

Submission

As with previous assignments, we will be using GitHub to distribute skeleton code and collect submissions. Please refer to our Git Workflow guide for more details. Note that we will be using multiple tags for this assignment, for each deliverable part.

NOTE: If at all possible, please try to submit using x86. If one of your group members owns an x86 machine, test on that machine prior to submitting, and do not commit a .armpls file. This will make grading much easier for us.

For students on arm64 computers (e.g. M1/M2 machines): if you want your submission to be built/tested for ARM, you must create and submit a file called .armpls in the top-level directory of your repo; feel free to use the following one-liner:

cd "$(git rev-parse --show-toplevel)" && touch .armpls && git add -f .armpls && git commit .armpls -m "ARM pls"

You should do this first so that this file is present in all parts.

Code Style

There is a script in the skeleton code named run_checkpatch.sh. It is a wrapper over linux/scripts/checkpatch.pl, which is a Perl script that comes with the Linux kernel that checks if your code conforms to the kernel coding style.

Execute run_checkpatch.sh to see if your code conforms to the kernel style – it’ll let you know what changes you should make. You must make these changes before pushing a tag. Passing run_checkpatch.sh with no warnings and no errors is required for this assignment.

Part 1: Priorities in CFS

Tasks

First, read the OSTEP and LKD reading assignments on scheduling and the Linux CFS scheduler.

Start 10 processes of a CPU-bound program in a VM that has multiple CPUs. Make all 10 processes run on a single CPU. Arrange the relative priorities between them so that 5 of them will use about 70% of the CPU and the other 5 will use the remaining 30%. The processes in each group of 5 will have equal priorities between them. Verify it using the top or htop command.

Here is a part of the screen of a sample top session:

top - 15:25:39 up 30 min,  3 users,  load average: 9.59, 6.08, 2.58
Tasks: 220 total,  11 running, 209 sleeping,   0 stopped,   0 zombie
%Cpu0  :   0.0/0.7     1[                                           ]
%Cpu1  :   0.3/0.0     0[                                           ]
%Cpu2  :   0.0/0.0     0[                                           ]
%Cpu3  : 100.0/0.0   100[|||||||||||||||||||||||||||||||||||||||||||]
%Cpu4  :   0.3/0.3     1[                                           ]
%Cpu5  :   0.0/0.0     0[                                           ]
%Cpu6  :   0.0/0.3     0[                                           ]
%Cpu7  :   0.0/0.0     0[                                           ]
MiB Mem : 20.2/3895.9   [||||||||||||||||                           ]
MiB Swap:  0.0/975.0    [                                           ]

  PID      %CPU     COMMAND

 1282       13.3                                    `- myprogram
 1283       13.3                                    `- myprogram
 1284       13.3                                    `- myprogram
 1285       13.3                                    `- myprogram
 1286       13.3                                    `- myprogram

 1287       6.6                                     `- myprogram
 1288       6.6                                     `- myprogram
 1289       6.6                                     `- myprogram
 1290       6.6                                     `- myprogram
 1291       6.6                                     `- myprogram

It shows the 10 processes of myprogram, dividing the CPU time as specified. I chose to run them on the Cpu2 among the 8 available cores.

You can code up your own program (and you don’t have to name it myprogram), or you can use any existing command line tools to create the scenario.

Now, add one more process of myprogram, but this time with real-time priority. Verify that the real-time process preempts all the other 10 processes. Here is a sample top session for this:

top - 15:28:32 up 33 min,  3 users,  load average: 10.33, 7.92, 3.90
Tasks: 219 total,  13 running, 206 sleeping,   0 stopped,   0 zombie
%Cpu0  :   0.0/2.9     3[||                                         ] 
%Cpu1  :   0.0/0.0     0[                                           ]
%Cpu2  :   0.0/0.0     0[                                           ]
%Cpu3  : 100.0/0.0   100[|||||||||||||||||||||||||||||||||||||||||||]
%Cpu4  :   0.0/0.0     0[                                           ]
%Cpu5  :   0.0/0.0     0[                                           ]
%Cpu6  :   0.3/0.0     0[                                           ]
%Cpu7  :   0.0/0.0     0[                                           ]
MiB Mem : 21.2/3895.9   [||||||||||||||||                           ]
MiB Swap:  0.0/975.0    [                                           ]

  PID      %CPU     COMMAND

 1282       0.7                                    `- myprogram
 1283       0.7                                    `- myprogram
 1284       0.7                                    `- myprogram
 1285       0.7                                    `- myprogram
 1286       0.7                                    `- myprogram

 1287       0.3                                    `- myprogram
 1288       0.3                                    `- myprogram
 1289       0.3                                    `- myprogram
 1290       0.3                                    `- myprogram
 1291       0.3                                    `- myprogram

 1297      95.0                                    `- myprogram

Deliverables

Describe in your written_answers.txt how you created the task #1 scenario.
- Include the command lines you executed
  - Indicate if you needed root privileges for any of those commands
- Include the top output
  - You can remove the irrelevant rows like I did above, but do NOT remove any columns
  - Make sure to include the part where it shows the loads of all CPUs
- Include your program in the user/test/ subdirectory if you wrote any
Do the same for the task #2

Submission

To submit this part, push the hw6p1handin tag with the following:

$ git tag -a -m "Completed hw6 part1." hw6p1handin
$ git push origin master
$ git push origin hw6p1handin

Part 2: HZ and Jiffies

Reading assignment

LKD chapter 11
- Pages 207 - 217: Required
- Pages 217 - 230: Recommended – It will help round out your understanding of the topic; at the very least, skim the pages.
Con Kolivas’s BFS FAQ
- In particular, read the part that explains, “What about other HZ, no preempt and dynamic ticks?”
Con Kolivas’s MuQSS guide
- In particular, read about “Tickless expiry” and “Niffies”

Deliverables

Please write the answers to the following questions in your written_answers.txt.

Briefly describe the advantages and disadvantages of a larger HZ.
What is the HZ currently configured for your running Linux system?
- Using a kernel module might be convenient for this. No need to submit your code, but please describe what you did.
What are jiffies? Explain the relationship between jiffies, HZ, and time.
Find the current value of jiffies in your system, and report it in your written_answers.txt.
- In minutes, how much time does this jiffies value represent?
- Does it match the uptime reported by the uptime command? (Hint: it doesn’t.) Please give the formula to convert jiffies to the current (real) uptime, in minutes.
- Why does this large difference exist? (Hint: in 32-bit Linux systems, jiffies is a 32-bit value.)
What are Niffies? How do they differ from Jiffies?

Submission

To submit this part, push the hw6p2handin tag with the following:

$ git tag -a -m "Completed hw6 part2." hw6p2handin
$ git push origin master
$ git push origin hw6p2handin

Freezer Overview

In this assignment, we add a new Linux scheduler called Freezer. It has the following features and characteristics:

Implements a simple round-robin scheduling algorithm.
Supports SMP.
Every task has the same time slice of 100 milliseconds.
When deciding which CPU a task should be assigned to, it should be assigned to the CPU with the fewest Freezer tasks. In addition, when a CPU becomes idle, it will steal a task from another CPU.
The Freezer scheduling policy, SCHED_FREEZER, sits right in between SCHED_NORMAL and the real-time scheduling policies. That is, SCHED_FREEZER takes priority over SCHED_NORMAL, but not over SCHED_RR or SCHED_FIFO.

Parts 3-6 are structured to guide you in developing a new scheduler. Each part represents a milestone towards a working scheduler implementation. This forces you to develop incrementally. This also allows us to award you partial credit if you cannot get the whole scheduler working at the end.

In part 3, you will code up all the necessary data structures for Freezer and add them to the right places, but they will remain unused by the rest of the kernel.

In part 4, you will flesh out the implementation of Freezer and enable it in the kernel. But you will keep the old CFS scheduler as the default, so that the system will boot even if your Freezer implementation is still unstable.

In part 5, you will make Freezer the default scheduler of your kernel. All normal processes and kernel threads will run on Freezer (unless they are explicitly assigned to a specific scheduling policy).

In part 6, you will add idle load balancing to Freezer. This allows processes to make better use of available CPUs.

Part 3: Freezer, unplugged

Reading assignment

Please make sure you have done the OSTEP and LKD reading assignments on scheduling and reviewed the lecture slides on Linux scheduling classes.
A section from another book, Professional Linux Kernel Architecture, by Wolfgang Mauerer
- Section 2.5, Implementation of the Scheduler (pages 83 - 105)

Tasks

Implement the following data structures and definitions. For grading purposes, please use the EXACT names.

/* SCHED_FREEZER must be defined to be 7 */
#define  SCHED_FREEZER  7

/*
 * define FREEZER_TIMESLICE so that every task receives
 * a time slice of 100 milliseconds
 */
#define  FREEZER_TIMESLICE  <the-right-timeslice-value>

/* freezer run queue and entity */
struct freezer_rq;
struct sched_freezer_entity;

/*
 * freezer scheduling policy implementation;
 * put the code in kernel/sched/freezer.c
 */
struct sched_class freezer_sched_class;

The freezer_sched_class methods can be left incomplete at this point because no one is going to call them. Maybe have the struct and functions defined, but leave most of them empty. Maybe put some TODO comments. Your objective in this part is to get all your data structures in the right place and get them compiled.
- See idle.c for an example of a very minimal sched_class object.
Learn from the man page of the ps command what the following command shows you:
```
ps -e --forest -o sched,policy,psr,pcpu,c,pid,user,cmd
```
Use the ps command above before and after this part, to verify that your addition to the kernel had no effect.

Deliverables

Points for this part will be based on your design and placement of these data structures.
Please put freezer_sched_class in kernel/sched/freezer.c in your Linux source tree.

Submission

To submit this part, push the hw6p3handin tag with the following:

$ git tag -a -m "Completed hw6 part3." hw6p3handin
$ git push origin master
$ git push origin hw6p3handin

Part 4: Turn on the Freezer

Now we turn on the freezer, and hook it up to the rest of the kernel. You will have to modify various parts of the kernel to enable the new SCHED_FREEZER scheduling policy. However, in this part, SCHED_NORMAL will remain as the default policy.

Tasks

Begin to implement the Freezer scheduler according to the specifications in the overview, such that you should be able to set a process to SCHED_FREEZER and see it run.
- Make sure that CFS is still the default scheduler. Use the ps command from part 3 to verify that everything is still the same when you booted into your new Freezer-enabled kernel.
Write set-freezer, a command line program that changes a process’s scheduling policy to SCHED_FREEZER. It takes a PID as a argument.

Your objective in this part is to have your kernel and the set-freezer program to work together successfully, so you can change a process’s scheduling policy to Freezer, and verify it with the ps command from part 3.

If your Freezer code is not quite stable at this point, it might be easier to get things to work if you use I/O-bound processes for testing, like your editor. Here is a sample session of changing 6 gvim processes, and an excerpt from the ps output:

$ ./set-freezer 1522
[1522] sched policy changed: 0 --> 7
$ ./set-freezer 1527
[1527] sched policy changed: 0 --> 7
$ ./set-freezer 1532
[1532] sched policy changed: 0 --> 7
$ ./set-freezer 1537
[1537] sched policy changed: 0 --> 7
$ ./set-freezer 1542
[1542] sched policy changed: 0 --> 7
$ ./set-freezer 1547
[1547] sched policy changed: 0 --> 7

TS    0  0.0  0   514 jae    /usr/lib/at-spi2-core/at-spi2-registryd
TS    2  0.0  0  1165 root   /usr/sbin/rpc.statd --no-notify
TS    1  0.0  0  1167 rpc    /usr/bin/rpcbind -w
#7    0  0.0  0  1522 jae    gvim -R set-freezer.c
#7    1  0.0  0  1527 jae    gvim -R set-freezer.c
#7    1  0.0  0  1532 jae    gvim -R set-freezer.c
#7    1  0.0  0  1537 jae    gvim -R set-freezer.c
#7    0  0.0  0  1542 jae    gvim -R set-freezer.c
#7    2  0.0  0  1547 jae    gvim -R set-freezer.c

Tips

Some relevant source files: core.c, rt.c, fair.c
- There is a lot of code in these files, for sure. An important goal here is for you to understand how to abstract the scheduler code so that you learn in detail the parts that matter, and ignore the parts that don’t.
This is going to be really hard to debug. You might want to invest a little bit of time up front and learn how to use trace_printk(), and the ftrace tracing framework in general. Googling turns up many resources.
You might also find that your kernel will crash before you can see your debug messages show up in the log buffer (ie. before you can run dmesg). Instead of printing to the kernel log buffer, you can “redirect” your messages to a file on your host machine by setting up a serial port. See the Serial Console section of our Linux Kernel Debugging guide for details.
Make sure that your Freezer task is actually using freezer_sched_class. For example, you can confirm that your functions are being called by adding a pr_info() to a function like freezer_enqueue_task(), and verify that it appears in dmesg output upon running your Freezer program. You should not submit with these debugging prints; these can break your Part 5 if invoked too frequently.

Deliverables

Modifications to the Linux source tree implementing the Freezer scheduler.
- Please implement your freezer_sched_class in kernel/sched/freezer.c in your Linux source tree.
- You may also need to modify other source files in the Linux kernel.
A set-freezer program that takes in a PID as an argument and sets the scheduling policy of that process to SCHED_FREEZER.
- Please place the source code and Makefile for set-freezer in user/test/set-freezer/. Please make sure to put these files in the set-freezer subdirectory, not in the test directory.
A shell session in your README.txt showing you successfully setting a process’s scheduling policy to SCHED_FREEZER with set-freezer, and verifying it with ps.

Submission

To submit this part, push the hw6p4handin tag with the following:

$ git tag -a -m "Completed hw6 part4." hw6p4handin
$ git push origin master
$ git push origin hw6p4handin

Part 5: Freeze everything

Now let’s make Freezer the default scheduling policy. This means that all system and user tasks will have the SCHED_FREEZER policy by default. You need to set the Freezer scheduling policy for the init_task, which will be inherited by all its descendants. You also need to set it for all kernel threads, which is done in kernel/kthread.c.

Tasks

Modify the Linux kernel source to use SCHED_FREEZER as the default scheduling policy.
Verify that the ps output shows that most tasks, including kernel threads, are running under Freezer. The SCH column will show 7 and the POL column will show #7.
Verify that new tasks get the Freezer policy.
Verify that the tasks are assigned to CPUs as specified.
Verify that the tasks run in round-robin fashion.
Verify that your scheduler is stable and can handle running multiple CPU-intensive tasks.
- You may write your own stress-test program, use an existing utility, or reuse what you wrote in part 1.

Here is a part of a ps command output that shows 20 CPU-bound processes (not counting the four parent processes that forked them) running across 4 CPUs:

$ ps -e --forest -o sched,policy,psr,pcpu,c,pid,user,cmd

SCH POL PSR %CPU  C   PID USER     CMD

#7    3  0.0  0     2 root     [kthreadd]
#7    0  0.0  0     3 root      \_ [rcu_gp]
#7    0  0.0  0     4 root      \_ [rcu_par_gp]
#7    0  0.0  0     6 root      \_ [kworker/0:0H-kblockd]
#7    1  0.0  0     7 root      \_ [kworker/u8:0-events_unbound]
#7    0  0.0  0     8 root      \_ [mm_percpu_wq]
#7    0  0.0  0     9 root      \_ [ksoftirqd/0]
#7    2  0.0  0    10 root      \_ [rcu_sched]
#7    0  0.0  0    11 root      \_ [rcu_bh]
FF    0  0.0  0    12 root      \_ [migration/0]
#7    0  0.0  0    13 root      \_ [kworker/0:1-events]
#7    0  0.0  0    14 root      \_ [cpuhp/0]
#7    1  0.0  0    15 root      \_ [cpuhp/1]
FF    1  0.0  0    16 root      \_ [migration/1]

                                  ...

#7    3  0.1  0     1 root     /sbin/init
#7    3  0.0  0   221 root     /lib/systemd/systemd-journald
#7    3  0.0  0   241 root     /lib/systemd/systemd-udevd
#7    1  0.0  0   454 root     /usr/sbin/ModemManager --filter-policy=strict
#7    2  0.0  0   455 root     /usr/sbin/cron -f
#7    0  0.0  0   456 avahi    avahi-daemon: running [porygon.local]
#7    2  0.0  0   509 avahi     \_ avahi-daemon: chroot helper

                                  ...
    
#7    3  0.0  0   465 root     /lib/systemd/systemd-logind
#7    1  0.0  0   510 root     /usr/lib/policykit-1/polkitd --no-debug
#7    1  0.0  0   511 root     /usr/sbin/alsactl -E HOME=/run/alsa -s -n 19 -c rdaemon
#7    0  0.0  0   535 root     /usr/sbin/sshd -D
#7    0  0.0  0   761 root      \_ sshd: hans [priv]
#7    2  0.0  0   779 hans          \_ sshd: hans@pts/0
#7    0  0.0  0   780 hans              \_ -bash
#7    1  0.0  0   808 hans                  \_ tmux
        
                                  ...

#7    2  0.0  0   810 hans     tmux
#7    2  0.0  0   811 hans      \_ -bash
#7    0  0.0  0  1001 hans      |   \_ ps -e --forest -o sched,policy,psr,pcpu,c,pid,user,cmd
#7    0  0.0  0   822 hans      \_ -bash
#7    0  0.0  0   970 hans          \_ ./myprogram
#7    0 19.9 19   972 hans          |   \_ ./myprogram
#7    0 19.8 19   973 hans          |   \_ ./myprogram
#7    0 19.8 19   974 hans          |   \_ ./myprogram
#7    0 19.9 19   975 hans          |   \_ ./myprogram
#7    0 19.9 19   976 hans          |   \_ ./myprogram
#7    1  0.0  0   977 hans          \_ ./myprogram
#7    1 19.8 19   979 hans          |   \_ ./myprogram
#7    1 19.8 19   980 hans          |   \_ ./myprogram
#7    1 19.9 19   981 hans          |   \_ ./myprogram
#7    1 19.9 19   982 hans          |   \_ ./myprogram
#7    1 19.8 19   983 hans          |   \_ ./myprogram
#7    2  0.0  0   985 hans          \_ ./myprogram
#7    2 20.0 20   987 hans          |   \_ ./myprogram
#7    2 20.0 20   988 hans          |   \_ ./myprogram
#7    2 19.9 19   989 hans          |   \_ ./myprogram
#7    2 19.8 19   990 hans          |   \_ ./myprogram
#7    2 19.8 19   991 hans          |   \_ ./myprogram
#7    3  0.0  0   992 hans          \_ ./myprogram
#7    3 19.9 19   994 hans              \_ ./myprogram
#7    3 20.0 20   995 hans              \_ ./myprogram
#7    3 19.9 19   996 hans              \_ ./myprogram
#7    3 19.9 19   997 hans              \_ ./myprogram
#7    3 19.9 19   998 hans              \_ ./myprogram

Tips

Focus on making a stable system that can handle running multiple CPU-bound processes, distributed to all available CPUs.
Refer to the debugging tips in Part 4.
Note that the ps output above shows balanced usage stats in the “%CPU” column, as expected. If you can get the stats to display correctly, awesome. If not, and yours shows nonsensical numbers, don’t worry about it.

Deliverables

Modifications to the Linux source tree setting the Freezer scheduler to be the default scheduler.
Documentation of any testing you did to verify that your Freezer scheduler is indeed the default, is stable, and schedules processes according to specification. Write this in README.txt.

Submission

To submit this part, push the hw6p5handin tag with the following:

$ git tag -a -m "Completed hw6 part5." hw6p5handin
$ git push origin master
$ git push origin hw6p5handin

Part 6: Freezer Load Balancing

Finally, let’s add idle load balancing to Freezer. If a CPU is about to start running the idle_task because there are no other tasks running on it, try to move a task from another CPU to the current CPU. This helps us utilize as many CPUs as possible at all times, speeding up performance.

Our load balancing policy is as follows:

Try to find the CPU with the highest number of running Freezer tasks and manually move a task from that CPU to the current CPU.
There are a few exceptions to the above rule:
- Do not steal a task from CPUs with fewer than 2 tasks.
- Make sure to respect the CPU affinity of a given task.
- Make sure to respect per-CPU kthreads. These should remain on their specified CPUs.
- Do not move tasks that are currently running on a CPU (obviously).
If there is no movable task on the busiest CPU based on the above criteria, then it’s ok to not idle balance. Just give up.

Tips

The relevant sched_class function for load balancing is .balance. Investigate how this is implemented and used in the CFS, RT, and DL schedulers.
You may find referencing the following function helpful: can_migrate_task().
Avoid deadlocks! To move a task from one rq to another, you may need to manually grab a rq lock. Make sure you do this in a deadlock-free manner.
It’s okay if the busiest CPU you found is no longer the busiest CPU when you are actually moving the task.

Submission

To submit this part, push the hw6p6handin tag with the following:

$ git tag -a -m "Completed hw6 part6." hw6p6handin
$ git push origin master
$ git push origin hw6p6handin

Part 7: CFS v. Freezer (optional)

Compare Freezer with CFS.

Tasks

If you got Freezer working, say a few words in your README.txt about how it performs compared to CFS. Your comparison doesn’t have to be a quantitative one. You can explain the perceived difference as you have interacted with the two schedulers.
If your Freezer is rock solid, you may perform a shoot-out between CFS and Freezer by timing kernel builds in both your fallback and your Freezer kernels:
- Note that there is no right answer here. Your mileage may vary. The fact that we are running in a VM affects the results too.
- Run make mrproper to clean up your kernel build directory before you build. Note that this will delete your .config as well.
- If you’ve optimized your kernel builds with ccache, you should disable that for this experiment.
- Make sure to compile with make -j$(nproc).
- You may find the time utility handy.
Some other activities you might like to consider:
- Watching hi-res video
- Highly interactive tasks
- CPU-intensive tasks

Submission (optional)

If you’d like to submit this part, push the hw6p7handin tag with the following:

git tag -a -m "Completed hw6 part7." hw6p7handin
git push origin master
git push origin hw6p7handin

Good luck!

Acknowledgments

Sankalp Singayapally, a TA for COMS W4118 Operating Systems I, Spring 2015, Columbia University, wrote the solution code for this assignment, based on his work in Fall 2014 in collaboration with Nicolas Mesa and Di Ruan.

Mitchell Gouzenko and Akira Baruah, TAs in Spring 2016, updated the code for the 4.1.18 kernel.

Benjamin Hanser and Emily Meng, TAs in Spring 2017, updated the code for the 4.4.50 kernel.

John Hui, TA in Spring 2018, updated the code for the 64-bit 4.9.81 kernel, and updated the BFS portion of the assignment to MuQSS.

Jonas Guan, TA in Spring 2019, updated the code for the 4.9.153 kernel.

Dave Dirnfeld and Hans Montero, TAs in Spring 2020, updated the assignment for the 4.19.50 kernel.

Kent Hall, John Hui, Xijiao Li, Hans Montero, Akhil Ravipati, Maÿlis Whetsel, and Tal Zussman, TAs in Fall 2021, updated the assignment for the 5.10.57 kernel.

Andy Cheng, Helen Chu, Phoebe Lu, Cynthia Zhang, and Tal Zussman, TAs in Spring 2023, updated the assignment for the 5.10.158 kernel and implemented idle load balancing.

Brennan McManus, TA in Spring 2024, updated the code for the 5.10.205 kernel.

Last updated: 2024-03-14