freezer-Sp2025

Freezer


Why's this one called freezer? ...Oh

Submission

As with previous assignments, we wil be using GitHub to distribute skeleton code and collect submissions. Please refer to our Git Workflow guide for more details. Note for this homework, you do not need to commit and push tags for any of the part. We will simply grade your last submission before the deadline.

For students on arm64 computers (e.g. M1/M2 machines): if you want your submission to be built/tested for ARM, you must create and submit a file called .armpls in the top-level directory of your repo; feel free to use the following one-liner:

cd "$(git rev-parse --show-toplevel)" && touch .armpls && git add -f .armpls && git commit .armpls -m "ARM pls"

You should do this first so that this file is present in all parts.

Code Style

There is a script in the skeleton code named run_checkpatch.sh. It is a wrapper over linux/scripts/checkpatch.pl, which is a Perl script that comes with the Linux kernel that checks if your code conforms to the kernel coding style.

Execute run_checkpatch.sh to see if your code conforms to the kernel style – it’ll let you know what changes you should make. You must make these changes before pushing a tag. Passing run_checkpatch.sh with no warnings and no errors is required for this assignment.

Part 0: Before Start

In this assignment, we will work on the Linux kernel’s scheduler, specifically modifying and analyzing scheduling policies. To help you understand the impact of different schedulers, we will introduce a simulated workload using a Fibonacci calculator.

The user/taskset1.txt and user/taskset2.txt provide two task set workloads, each consisting of a list of jobs, with each line representing the N-th Fibonacci number to compute.

Start by taking a look at user/fibonacci.c and user/run_tasks.sh. Play around with the scripts to get familiar with how tasks are executed and scheduled.

Note: The Fibonacci calculator has been provided as user/fibonacci.c. It uses an inefficient algorithm by design to produce differing job lengths. You can modify the file, but do not modify the fib function.

Part 1: Measure scheduler performance with eBPF

To see how well a scheduler performs, it is necessary to first have a way to measure performance,specifically completion time of tasks. In this part, we are looking for a way to trace scheduling events and determine how well a scheduler functions by measuring tasks’ run queue latency and total duration from start to finish.

Extended Berkeley Packet Filter (eBPF) is a powerful feature of the Linux kernel that allows programs to inject code into the kernel at runtime to gather metrics and observe kernel state. You will use eBPF to trace scheduler events. Tracepoints for these scheduler events are already available in the scheduler (for example, you can search in kernel/sched/core.c for trace_sched_*), so your job is to write code that will be injected into the kernel to use them. You should not need to modify any kernel source code files for this first part of the assignment.

bpftrace is a Linux tool that allows using eBPF with a high level, script-like language. Install it on your VM with:

sudo apt install bpftrace

Instead of searching through source code, you can run sudo bpftrace -l to see what available tracepoints there are in the kernel.

Write a bpftrace script named trace.bt to trace how much time a task spends on the run queue, and how much time has elapsed until the task completed. Your profiler should run until stopped with Ctrl-C. While tracing, your script should, at tasks’ exits, print out a table of the task name, task PID, milliseconds spent on the run queue (not including time spent running), and milliseconds until task completion. Note that task PID here is the actual pid field in a task_struct, not what is returned by getpid. The output should be comma-delimited and follows the format:

COMM,PID,RUNQ_MS,TOTAL_MS
myprogram1,5000,2,452
myprogram2,5001,0,15
...

Ensure that the output of your eBPF script is synchronous. That is, it should print each line immediately after a task completes, and not when Ctrl-C is sent to the eBPF script. You should also only print traces from processes that have started after your script begins running. You should trace all such process and perform no additional filtering.

Test your script by running sudo bpftrace trace.bt in one terminal. In a separate terminal, run a few commands and observe the trace metrics in your first terminal. Experiment with different task sizes. Submit your eBPF script as user/trace.bt.

Now that you have an eBPF measurement tool, use it to measure the completion times for the the two task set workloads when running on the default CFS scheduler in Linux. What is the resulting average completion time for each workload? Write the average completion times of both workloads in your README. You may find user/run_tasks.sh helpful for running your tasks. You should perform your measurements for two different VM configurations, a single CPU VM and a four CPU VM. We are using the term CPU here to mean a core, so if your hardware has a single CPU but four cores, that should be sufficient for running a four CPU VM. If your hardware does not support four cores, you may instead run with a two CPU VM. Specify in your README the number of CPUs used with your VM for your measurements.

Hint: You may find it useful to reference the bpftrace GitHub project, which contains a manual and a list of sample scripts. A useful example script to start with is runqlat.bt.

Note: Since your profiler will run on the same machine as the test workloads, we will need to ensure that the profiler gets sufficient CPU time to record data. Consider why running chrt --fifo 90 is helpful in this context in user/run_tasks.sh. You may find chrt helpful.

Deliverables

Freezer/Heater Overview

In this assignment, we will add two new Linux scheduler classes called Freezer and Heater(optional, 50 extra credits). Freezer will be default to Linux scheduler in the end, while Heater does not.

Freezer will have the following features and characteristics:

Heater(optional, 50 extra credits) will have the following features and characteristics:

Parts 2-7 are structured to guide you in developing new schedulers. Each part represents a milestone towards a working scheduler implementation. This forces you to develop incrementally. This also allows us to award you partial credit if you cannot get the whole scheduler working at the end.

In part 2, you will code up all the necessary data structures for Freezer and add them to the right places, but they will remain unused by the rest of the kernel.

In part 3, you will flesh out the implementation of Freezer and enable it in the kernel. But you will keep the old CFS scheduler as the default, so that the system will boot even if your Freezer implementation is still unstable.

In part 4(optional), you will code up the necessary data structures for Heater, flesh out the implementation of Heater and enable it in the kernel. Still, you will keep the old CFS scheduler as the default.

In part 5, you will compare the performance of Fibonacci tasks across different scheduling policies, including FIFO, Freezer and Heater.

In part 6, you will make Freezer the default scheduler of your kernel. All normal processes and kernel threads will run on Freezer (unless they are explicitly assigned to a specific scheduling policy).

In part 7, you will add idle load balancing to Freezer. This allows processes to make better use of available CPUs.

Part 2: Freezer, unplugged

Reading assignment

Tasks

Deliverables

Part 3: Turn on the Freezer

Now we turn on the freezer, and hook it up to the rest of the kernel. You will have to modify various parts of the kernel to enable the new SCHED_FREEZER scheduling policy. However, in this part, SCHED_NORMAL will remain as the default policy.

Tasks

Tips

Deliverables

Part 4: Turn on the Heater(optional, 50 extra credits)

Now we will implement the heater, and hook it up to the rest of the kernel. Again, You will have to modify various parts of the kernel to enable the new SCHED_HEATER scheduling policy. SCHED_HEATER could sit either above or below the SCHED_FREEZER at this time. Still, SCHED_NORMAL will remain as the default policy.

Tasks

Tips:

Deliverables(optional)

Part 5: Freezer or Heater? or FIFO?

The Heater measurement is only required if you chose to implement it.

Recall your average completion time measurements from Part 1, which used the default CFS scheduler. Now, evaluate the performance of the Freezer and Heater you implemented in the previous parts. Additionally, compare these results with FIFO, a real-time scheduling policy already implemented in Linux.

Modify the user/fibonacci.c or user/tun_tasks.sh to measure the performance for Freezer and Heater. Again, conduct these measurements for two different VM configurations, a single CPU VM and a four CPU VM. Record the average completion times of both workloads in your README, and specify the number of CPUs used with your VM for your measurements.

Next, assess the performance of FIFO. Note that Linux’s built-in FIFO scheduling policy includes load balancing and task migration support. To ensure a fair comparison with Freezer and Heater, you should disable these features. One approach to achieving this is by setting the CPU affinity of tasks to restrict them to a specific core, preventing unintended migrations. You may find the taskset command helpful for this purpose. Once again, Record the average completion times of both workloads in your README.

After completing your measurements, compare the results of the three scheduling policies. Which scheduler performed best in terms of average completion time? Which one was the worst? What are the possible reasons?

Another important metric to consider is tail completion time, which is the maximum completion time across 99% of all jobs. Tail completion time focuses on ensuring that the completion time of most jobs is no worse than some amount. Using tail completion time as the performance metric, repeat the measurement process described above. Record the tail completion times for both workloads across all three scheduling policies in your README. Finally, revisit the comparison and analysis: Which scheduler performed best? Which performed the worst? What factors contributed to these results?

Deliverables

Part 6: Freeze everything

Now let’s make Freezer the default scheduling policy. This means that all system and user tasks will have the SCHED_FREEZER policy by default. You need to set the Freezer scheduling policy for the init_task, which will be inherited by all its descendants. You also need to set it for all kernel threads, which is done in kernel/kthread.c.

Tasks

Here is a part of a ps command output that shows 20 CPU-bound processes (not counting the four parent processes that forked them) running across 4 CPUs:

$ ps -e --forest -o sched,policy,psr,pcpu,c,pid,user,cmd

SCH POL PSR %CPU  C   PID USER     CMD

  7 #7    3  0.0  0     2 root     [kthreadd]
  7 #7    0  0.0  0     3 root      \_ [rcu_gp]
  7 #7    0  0.0  0     4 root      \_ [rcu_par_gp]
  7 #7    0  0.0  0     6 root      \_ [kworker/0:0H-kblockd]
  7 #7    1  0.0  0     7 root      \_ [kworker/u8:0-events_unbound]
  7 #7    0  0.0  0     8 root      \_ [mm_percpu_wq]
  7 #7    0  0.0  0     9 root      \_ [ksoftirqd/0]
  7 #7    2  0.0  0    10 root      \_ [rcu_sched]
  7 #7    0  0.0  0    11 root      \_ [rcu_bh]
  1 FF    0  0.0  0    12 root      \_ [migration/0]
  7 #7    0  0.0  0    13 root      \_ [kworker/0:1-events]
  7 #7    0  0.0  0    14 root      \_ [cpuhp/0]
  7 #7    1  0.0  0    15 root      \_ [cpuhp/1]
  1 FF    1  0.0  0    16 root      \_ [migration/1]

                                  ...

  7 #7    3  0.1  0     1 root     /sbin/init
  7 #7    3  0.0  0   221 root     /lib/systemd/systemd-journald
  7 #7    3  0.0  0   241 root     /lib/systemd/systemd-udevd
  7 #7    1  0.0  0   454 root     /usr/sbin/ModemManager --filter-policy=strict
  7 #7    2  0.0  0   455 root     /usr/sbin/cron -f
  7 #7    0  0.0  0   456 avahi    avahi-daemon: running [porygon.local]
  7 #7    2  0.0  0   509 avahi     \_ avahi-daemon: chroot helper

                                  ...
    
  7 #7    3  0.0  0   465 root     /lib/systemd/systemd-logind
  7 #7    1  0.0  0   510 root     /usr/lib/policykit-1/polkitd --no-debug
  7 #7    1  0.0  0   511 root     /usr/sbin/alsactl -E HOME=/run/alsa -s -n 19 -c rdaemon
  7 #7    0  0.0  0   535 root     /usr/sbin/sshd -D
  7 #7    0  0.0  0   761 root      \_ sshd: hans [priv]
  7 #7    2  0.0  0   779 hans          \_ sshd: hans@pts/0
  7 #7    0  0.0  0   780 hans              \_ -bash
  7 #7    1  0.0  0   808 hans                  \_ tmux
        
                                  ...

  7 #7    2  0.0  0   810 hans     tmux
  7 #7    2  0.0  0   811 hans      \_ -bash
  7 #7    0  0.0  0  1001 hans      |   \_ ps -e --forest -o sched,policy,psr,pcpu,c,pid,user,cmd
  7 #7    0  0.0  0   822 hans      \_ -bash
  7 #7    0  0.0  0   970 hans          \_ ./myprogram
  7 #7    0 19.9 19   972 hans          |   \_ ./myprogram
  7 #7    0 19.8 19   973 hans          |   \_ ./myprogram
  7 #7    0 19.8 19   974 hans          |   \_ ./myprogram
  7 #7    0 19.9 19   975 hans          |   \_ ./myprogram
  7 #7    0 19.9 19   976 hans          |   \_ ./myprogram
  7 #7    1  0.0  0   977 hans          \_ ./myprogram
  7 #7    1 19.8 19   979 hans          |   \_ ./myprogram
  7 #7    1 19.8 19   980 hans          |   \_ ./myprogram
  7 #7    1 19.9 19   981 hans          |   \_ ./myprogram
  7 #7    1 19.9 19   982 hans          |   \_ ./myprogram
  7 #7    1 19.8 19   983 hans          |   \_ ./myprogram
  7 #7    2  0.0  0   985 hans          \_ ./myprogram
  7 #7    2 20.0 20   987 hans          |   \_ ./myprogram
  7 #7    2 20.0 20   988 hans          |   \_ ./myprogram
  7 #7    2 19.9 19   989 hans          |   \_ ./myprogram
  7 #7    2 19.8 19   990 hans          |   \_ ./myprogram
  7 #7    2 19.8 19   991 hans          |   \_ ./myprogram
  7 #7    3  0.0  0   992 hans          \_ ./myprogram
  7 #7    3 19.9 19   994 hans              \_ ./myprogram
  7 #7    3 20.0 20   995 hans              \_ ./myprogram
  7 #7    3 19.9 19   996 hans              \_ ./myprogram
  7 #7    3 19.9 19   997 hans              \_ ./myprogram
  7 #7    3 19.9 19   998 hans              \_ ./myprogram

Tips

Deliverables

Part 7: Freezer Load Balancing

Finally, let’s add idle load balancing to Freezer. If a CPU is about to start running the idle_task because there are no other tasks running on it, try to move a task from another CPU to the current CPU. This helps us utilize as many CPUs as possible at all times, speeding up performance.

Our load balancing policy is as follows:

Tips

Deliverables

Good luck!


Acknowledgments

Sankalp Singayapally, a TA for COMS W4118 Operating Systems I, Spring 2015, Columbia University, wrote the solution code for this assignment, based on his work in Fall 2014 in collaboration with Nicolas Mesa and Di Ruan.

Mitchell Gouzenko and Akira Baruah, TAs in Spring 2016, updated the code for the 4.1.18 kernel.

Benjamin Hanser and Emily Meng, TAs in Spring 2017, updated the code for the 4.4.50 kernel.

John Hui, TA in Spring 2018, updated the code for the 64-bit 4.9.81 kernel, and updated the BFS portion of the assignment to MuQSS.

Jonas Guan, TA in Spring 2019, updated the code for the 4.9.153 kernel.

Dave Dirnfeld and Hans Montero, TAs in Spring 2020, updated the assignment for the 4.19.50 kernel.

Kent Hall, John Hui, Xijiao Li, Hans Montero, Akhil Ravipati, Maÿlis Whetsel, and Tal Zussman, TAs in Fall 2021, updated the assignment for the 5.10.57 kernel.

Andy Cheng, Helen Chu, Phoebe Lu, Cynthia Zhang, and Tal Zussman, TAs in Spring 2023, updated the assignment for the 5.10.158 kernel and implemented idle load balancing.

Brennan McManus, TA in Spring 2024, updated the code for the 5.10.205 kernel.

Ruizhe Fu and Nicholas Yap, TAs in Spring 2025, created and changed part 1, 4, 5, and updated code and solution to the 6.8 kernel.


Last updated: 2025-03-22