This guide was written by Ryan Wee in Spring 2024. The code snippets and links in this post correspond to Linux v5.10.205.
First of all, it is entirely possible (and probable) that there are mistakes in this guide. If so, feel free to contact the TA team.
This guide is meant to complement Mitchell’s guide
– it’s a good idea to read both of them. Mitchell’s guide explains what each of
the sched_class
functions do. The aim of this guide is to provide an
event-driven perspective of how these functions are invoked. In particular, how
is the freezer runqueue modified in response to different events?
pick_next_task()
implementation does NOT modify the freezer
runqueue. It only picks the task at the front of the freezer runqueue.task_tick()
implementation is responsible for modifying the freezer runqueue.RUNNABLE
, i.e. it would like to stay on the
run queue: In this case, freezer’s yield_task()
and task_tick()
implementations are responsible for modifying the freezer runqueue.RUNNABLE
, i.e. it would like to leave the
run queue: In this case, freezer’s dequeue_task()
implementation is responsible for modifying the freezer runqueue.scheduler_tick()
.curr->sched_class->task_tick()
.task_tick()
implementation should decrement the current task’s
timeslice. (Note that the timeslice should therefore use jiffies as the unit.)task_tick()
implementation should move the current task to the
end of the runqueue. At this point, the task is still running on the CPU!
We’ve just modified its position on the runqueue.task_tick()
implementation should call resched_curr()
, which
sets the TIF_NEED_RESCHED
flag. Again, at this point, the task is still running on the CPU!TIF_NEED_RESCHED
on interrupt and userspace return paths.
When it’s safe (e.g. the process isn’t holding any spinlocks), the kernel
calls schedule()
, which in turn invokes __schedule()
.if
block is NOT executed.
task_struct
has a state
field. When a task is RUNNABLE
, the state field is zero. When a task is not RUNNABLE
, the state field is non-zero.RUNNABLE
, i.e. prev_state
is zero. So the if
block is not executed.schedule()
eventually calls pick_next_task()
here, which calls Freezer’s pick_next_task()
implementation.pick_next_task()
implementation should return a new task from the
front of the feezer runqueue.__schedule()
also eventually calls context_switch()
here, which is where the actual context switch happens.In this case, the task wants to be remain on the runqueue. It’s just telling the kernel: “Okay, you can shift me to the back of the runqueue because I’m nice. But feel free to select me again when you want to!”
sched_yield
syscall, which in turn invokes do_sched_yield()
.do_sched_yield
calls current->sched_class->yield_task()
.yield_task()
implementation should set the timeslice of the
task to 0.task_tick()
implementation. (I put that in quotation marks because the task voluntarily decided to let itself be pre-empted.)In this case, the task wants to be taken off the runqueue. This could be because the task called __wait_event_interruptible()
, or sleep()
, or anything that basically indicates it wants to suspend its execution for the near future. Another common case would be when the task finishes its execution.
RUNNABLE
. It then calls __schedule()
with false
as an argument.
exit()
syscall, which calls do_exit()
.do_task_dead()
here.do_task_dead()
modifies the task state here, and then calls __schedule()
here.sleep()
:
nanosleep()
syscall, which calls hrtimer_nanosleep()
and then do_nanosleep()
.do_nanosleep()
modifies the task state here, and then calls freezable_schedule()
here. This in turn calls schedule()
.__schedule()
is called, the state
field of the task_struct
is no longer RUNNABLE
, i.e. it is non-zero. Since state
is non-zero and since the preempt
argument was false
as mentioned above, we go into this if
block.deactivate_task()
here.deactivate_task()
calls dequeue_task()
, which in turn calls the dequeue_task()
implementation of the current scheduling class.dequeue_task()
implementation should remove the current task from the freezer runqueue.__schedule()
. This invocation calls pick_next_task()
and context_switch()
as mentioned above. Since the current task has been removed from the freezer runqueue, this causes the next task in the runqueue to be run.