(with a bowl because we insist on being kitchen-themed)
As with previous assignments, we wil be using GitHub to distribute skeleton code and collect submissions. Please refer to our Git Workflow guide for more details. Note that we will be using multiple tags for this assignment, for each deliverable part.
NOTE: If at all possible, please try to submit using x86. If one of your group
members owns an x86 machine, test on that machine prior to submitting, and do not
commit a .armpls file. This will make grading much easier for us.
For students on arm64 computers (e.g. M1/M2 machines): if you want your
submission to be built/tested for ARM, you must create and submit a file called
.armpls in the top-level directory of your repo; feel free to use the
following one-liner:
cd "$(git rev-parse --show-toplevel)" && \
touch .armpls && \
git add -f .armpls && \
git commit .armpls -m "ARM pls"
You should do this first so that this file is present in all parts.
There is a script in the skeleton code named run_checkpatch.sh. It is a
wrapper over linux/scripts/checkpatch.pl, which is a Perl script that comes
with the Linux kernel that checks if your code conforms to the kernel coding
style.
Execute run_checkpatch.sh to see if your code conforms to the kernel style –
it’ll let you know what changes you should make. You must make these changes
before pushing a tag. Passing run_checkpatch.sh with no warnings and no errors
is required for this assignment.
In addition to the pristine Linux kernel source tree (now under linux/) we’ve
provided a patch file which will create the syscall stubs for you. You will
need to apply this patch to your repo.
The patch is under the following path:
patch/farfetch.patch
You can use git apply to apply this patch. First, check which files will be
modified by the patch:
git apply --stat patch/farfetch.patch
You should also inspect what the patch is doing by reading the diffs inside. Finally, you can apply the patch with the following:
git apply patch/farfetch.patch
Now, when you run git status, you should see some files modified, as well as
some .c and .h files added. After verifying that these changes worked as
intended, commit them.
Build your kernel. Make sure you’re building with a local version that is
different from your fallback (-cs4118), so you don’t overwrite it; set your
local version to your UNI (i.e. -<uni>-HW6).
Now, when you build your kernel, you should have the farfetch() syscall stub
in your kernel.
The syscall you will implement has a cmd parameter whose possible values
(defined by an enum) are unique to the syscall, and which must be known by the
caller. This means that the enum definition needs to be available in both kernel
and user land. You’ll need to install the farfetch header
(include/uapi/linux/farfetch.h) from the kernel source tree to userspace.
Once you’ve built your farfetch()-stubbed kernel, run the following command:
sudo make headers_install INSTALL_HDR_PATH=/usr
This command will install the headers found under include/uapi/ in your Linux
source tree into /usr/include/. Now you should be able to #include
<linux/farfetch.h> from userspace! Additionally, the syscall number should be
available as __NR_farfetch from #include <asm-generic/unistd.h>. Try
compiling the userspace utility (see below) to make sure this works.
farfetch: Fetching Pages from AfarFor this assignment, you will be implementing farfetch(), a kernel function
that allows you to manipulate the memory of a specified process. This function
will be defined within a kernel module. At first, we will use a custom system
call with number 505 in order to call farfetch().
The function prototype for farfetch() is the following:
long farfetch(unsigned int cmd, void __user *addr, pid_t target_pid,
unsigned long target_addr, size_t len);
farfetch() will take in five arguments:
cmd: Indicates whether to read or write the remote memory by specifying
FAR_READ or FAR_WRITE, respectively (defined in the uapi header)
addr: A pointer to the caller’s user-space buffer
target_pid: The PID of the process whose memory is to be fetched
target_addr: The starting memory address in the target process’s virtual
address space
len: The maximum number of bytes to copy
On success, farfetch() should return the number of bytes copied. Make sure
the returned value is <= len.
If the user calling the function is not root, fail with the errno value
EPERM.
If cmd is not FARFETCH_READ or FARFETCH_WRITE, fail with the errno
value EINVAL.
If the specified PID does not exist, fail with the errno value ESRCH.
If copying from/to addr/target_addr fails, fail with the errno value
EFAULT.
If target is not a valid user-space address, or is unmapped in the target
virtual address space, fail with the errno value EFAULT.
You will be implementing farfetch() in this part, but with a few simplifying
limitations. Most significantly, you will only be dealing with the single
physical page that is associated with target_addr, so there’s no need to worry
about traversing to any subsequent pages. You will copy to/from this page up
until either len bytes or the end of the page (whichever comes first).
There is one restriction on your implementation for this part: you may NOT use
get_user_pages_remote()/pin_user_pages_remote(), nor anything which invokes
them. You may reference their implementation for performing a page walk, but
note that the relevant bits are buried in logic that deals with things you don’t
need to worry about (traversing arbitrary address ranges, huge pages, special
mappings, faulting in pages, etc.)—if your module contains such extraneous code,
it will incur a steep deduction. Every line you write should be with purpose, so
avoid haphazardly copy-pasting functions or large chunks of code.
Consequently, you will need to manually perform the 5-level page walk. Some additional simplifying limitations:
If you encounter an entry which is not present in memory, just report
EFAULT.
Do NOT allow writing to a non-writable PTE; in the event you are asked to do
so by FAR_WRITE, just report EFAULT.
If performing a FAR_WRITE, you should mark the modified page as dirty using
set_page_dirty_lock().
To determine if target_addr is a valid user-space address, it is sufficient to
check against the end of the target process’s virtual address space, which is
evaluated by the TASK_SIZE_OF() macro; anything >= TASK_SIZE_OF() cannot be
a valid user address for the task.
Our recommendation is to start with the resources linked below before looking at kernel code, as those more directly get at what you need to implement the page walk.
Able to copy up to a page of memory into/out of any target process.
Doesn’t allow writing to any write-protected PTE.
Does NOT (even indirectly) invoke
get_user_pages_remote()/pin_user_pages_remote().
No significantly extraneous code.
Proper error handling in all specified cases.
Test your implementation as described below.
After testing, answer the following in your written_answers.txt:
Observe and explain any difference in behavior when using farfetchd on the
provided mmap target versus the malloc target.
Hint: try fetching a full page (i.e. 4096 bytes); how many bytes are actually fetched in each case?
Hint: man mmap.
Observe and explain any difference in behavior when using farfetchd on the
provided mmap target versus the fork target (in both the parent and
child).
Observe and explain the behavior of farfetchd on the strlit target.
Try going through Session 2 without using setarch -R, which
is used to disable ASLR for the process; that is, run the twecho target
directly. Briefly describe what ASLR is, and explain how it affects finding
the argv strings.
/proc/<pid>/maps (as done in
Session 3) with and without setarch -R.To submit this part, push the hw6p1handin tag with the following:
git tag -a -m "Completed hw6 part1." hw6p1handin
git push origin master
git push origin hw6p1handin
For this part, we are lifting the main restriction of Part 1 and encouraging
that you use get_user_pages_remote(). You can let the internal “GUP” logic
(belonging to the get_user_pages_* family of functions) handle the details of
the walk.
The use of GUP logic provides the following functionalities which were not required in Part 1:
Deal with arbitrary address ranges (potentially spanning multiple pages)
Modify non-writable memory
Remember to mark any modified pages dirty (as in Part 1).
All the functionality of Part 1.
Able to read/write arbitrary address ranges (potentially > PAGE_SIZE).
Able to write to non-writable memory (e.g. strlit target).
Invoke get_user_pages_remote() exactly once (there should be no need for
repeated calls, e.g. in a loop).
If get_user_pages_remote() fails, relay the errno back to the user.
If it reports less than the requested number of pages, adjust the length of
the copy (< len).
Answer the following in your written_answers.txt:
Ensure that the behavior observed in Part 1 for the fork target is
remedied; explain generally how the GUP logic handles this case. Feel free
to reference line numbers in mm/gup.c.
FOLL_* flag which is pertinent, see where
this is set.To submit this part, push the hw6p2handin tag with the following:
git tag -a -m "Completed hw6 part2." hw6p2handin
git push origin master
git push origin hw6p2handin
farfetchd Hacker UtilityWe’ve provided a userspace utility to test your implementation, under the following path:
user/test/farfetchd/
In particular, farfetchd takes a target PID, address, and maximum length, and
will execute your syscall up to two times; once to FAR_READ from the target,
and then if you choose to modify any memory, once to FAR_WRITE it.
You will need to install bvi before using farfetchd:
sudo apt install bvi
You will find the provided target programs useful for testing under the following path:
user/test/targets/
Though feel free to write your own for additional testing.
Linked below are some example shell sessions of testing with farfetchd, using
the final Part 2 version. Note that the behavior will be different for Part 1
in some cases.
Implement the farfetch() syscall in a kernel module using the function
pointer technique from HW4.
You will find a module stub in your skeleton repo at the path
user/module/farfetch/. Implement your modularized system call here
in farfetch.c.
Don’t modify the existing boilerplate code.
You should start your code in the farfetch() function.
Feel free to define and call any more functions inside your module.
You can find the farfetch cmd values (FAR_READ/FAR_WRITE) defined for
you in the Linux kernel source tree, under include/uapi/linux/farfetch.h.
Remember to install these during the setup stage so that you can include
this file from userspace.
You do not have to worry about ensuring your solution is architecture-independent. That is, we will only test your solution on your specified architecture.
Include your answers to Part 1 and Part 2 questions in written_answers.txt.
In practice, creating new system calls is incredibly rare. This is largely due to broader architectural decisions about the Linux kernel and Linux kernel politics (you’d be surprised how heated things can get in the Linux mailing list!). Once introduced, a system call will need to be maintained in perpetuity, as the golden rule of Linux kernel development is to never break user space; once any widely used piece of software starts using the system call, removing or greatly altering the system call’s behavior would break that program without any easy solution. Besides, if our end goal is to use our kernel function to perform some kind of malicious attack, what kind of idiot would install a custom kernel onto their computer?
If we can’t use system calls, how are we supposed to call farfetched() when
it’s located within the kernel? More broadly, how are we supposed to interact
with kernel code without the use of system calls? There are several methods of
doing this (virtual file systems like
sysfs,
debugfs,
or
configfs(),
AF_NETLINK,
eBPF,
io_uring, etc.), but
we’ll take one of the most common approaches: writing a device driver for a
custom
pseudo-device,
which we will then communicate with using
ioctl()s.
Here’s the problem we need to solve: we have a CPU running an operating system, wired to a bunch of very useful hardware devices that we would like to make use of in software (e.g. storage devices like SSDs, networking cards, GPUs, USB ports, monitors, etc.). If everything on the hardware side is set up correctly, what do we have to do on the software side? More broadly, how do we communicate with and control specific hardware from software? The answer is through the use of device drivers.
Device drivers are just code that implements and exposes a software interface for interacting with a piece of hardware. This approach is used both because giving software direct access to hardware sounds dangerous, and because the lower-level interfaces for hardware can be ferociously complicated and unintuitive. It’s the same philosophy for APIs in general: abstract lower level operations into a set of higher level operations that are simpler to use, without sacrificing too much performance.
As discussed in class (hopefully), the kernel’s main job is to act as the arbiter between software and hardware. As such, device drivers must always lie, at least in some part, within the kernel. With the Linux kernel and its monolithic design, device drivers lie exclusively within the kernel, loaded in through Loadable Kernel Modules(LKMs) (this is just the official term for the kernel modules we’ve been using for the past couple assignments).
UNIX - the operating system Linux is largely based on - relies on a very
powerful philosophy: in UNIX, everything is a file. More accurately, everything
can be interacted with as if it were a file. With devices in particular, this
philosophy is implemented through the use of virtual files, which are files
that can be interacted with using syscalls like open() and read(), but
aren’t actually backed by a physical file. Instead, these syscalls are
mapped to perform different operations: read(), for instance, would naturally
be mapped to a function that can be used to retrieve the contents of the
device. The functions that correspond to these syscalls is specified in struct
file_operations.
An interesting result of the device driver model is that, technically, there is no requirement that our device driver needs to communicate with an actual piece of hardware. If we just emulate the device’s behavior purely in hardware, we can have a device driver for a device that doesn’t actually exist! These devices are called pseudo-devices.
One example of such a pseudo-device is /dev/urandom. When read() from, the
device will return however many random bytes read() is looking for. In
general, pseudo-devices are good for exposing a set of kernel functions or
operations to user space in a format already familiar to user-space: file
operations.
Now that we have all the necessary background information, let’s implement our
own farfetchd pseudo-device!
Every device driver has a major number which identifies it, and every
individual device (whether it’s a real or pseudo-device) has a minor number
which identifies it to the driver. For example, /dev/null, /dev/zero,
/dev/random, and several other pseudo-devices all belong to same devmem
device driver within the Linux kernel, meaning they all share the same major
number. If you’re curious, you can see that implementation in
drivers/char/mem.c.
If you stat these on the command line, you can see in the Device type field
that the major identifier (the first number) is the same between them,
confirming that they share that same driver.
$ stat /dev/null
File: /dev/null
Size: 0 Blocks: 0 IO Block: 4096 character special file
Device: 0,6 Inode: 4 Links: 1 Device type: 1,3
Access: (0666/crw-rw-rw-) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2026-04-12 22:39:33.198843756 -0400
Modify: 2026-04-12 22:39:33.198843756 -0400
Change: 2026-04-12 22:39:33.198843756 -0400
Birth: 2026-04-12 22:39:27.172000000 -0400
$ stat /dev/zero
File: /dev/zero
Size: 0 Blocks: 0 IO Block: 4096 character special file
Device: 0,6 Inode: 6 Links: 1 Device type: 1,5
Access: (0666/crw-rw-rw-) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2026-04-12 22:39:33.199190388 -0400
Modify: 2026-04-12 22:39:33.199190388 -0400
Change: 2026-04-12 22:39:33.199190388 -0400
Birth: 2026-04-12 22:39:27.172000000 -0400
$ stat /dev/random
File: /dev/random
Size: 0 Blocks: 0 IO Block: 4096 character special file
Device: 0,6 Inode: 8 Links: 1 Device type: 1,8
Access: (0666/crw-rw-rw-) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2026-04-12 22:39:33.198602692 -0400
Modify: 2026-04-12 22:39:33.198602692 -0400
Change: 2026-04-12 22:39:33.198602692 -0400
Birth: 2026-04-12 22:39:27.172000000 -0400
To move away from using system calls, you will change your existing kernel
module to instead register a device driver for a pseudo-device, which will then
be used to call farfetch(). To be safe, make all future changes in the
directory user/module/farfetch_p3/, copying over the necessary code from
parts 1 and 2.
The existing skeleton code should already handle most of the annoying stuff,
but you will need to implement parts of farfetchd_init() and
farfetchd_exit() on your own.
If you need some more references on what the device driver boilerplate should look like, look at the following references:
https://docs.kernel.org/driver-api/index.html https://lyngvaer.no/log/writing-pseudo-device-driver
You don’t need to worry about the “state control” global variables present in the second example, as we don’t care to track whether our driver is “busy”.
You shouldn’t have to do much of anything for this, just copy the boilerplate code present in the tutorial.
mknodOnce you have finished setting up your currently non-functional device, you
will notice that the virtual file /dev/farfetch does not exist. Don’t worry,
this is correct: you need to create this virtual file manually. This can be
done using the mknod system call, which creates a filesystem node at the
specified path. If you’re wondering what on earth a filesystem node is, you’ll
find out later ;). For now, all you need to know is that mknod - which also
happens to be a bash command with the same functionality as the system call -
will create /dev/farfetch.
$ grep "farfetch" /proc/devices
238 farfetch
$ sudo mknod -m 0666 /dev/farfetch c 238 0
If all goes to plan, you should end up with a device at /dev/farfetch that
does nothing. You can then proceed.
To remove the device, you just need to call the following:
$ sudo rm /dev/farfetch
$
We still have one big problem: how do we specify the target_pid? There isn’t
an existing syscall that neatly maps to this operation, so won’t we have to
create a new syscall? Fortunately, Linux has our back, and has the solution for
us: ioctl().
ioctl() is a general purpose syscall for communicating with devices through a
series of driver-specific operations. These operations are almost always
functions that fall outside the scope of existing syscalls like read() or
write(), usually related to larger device control. For instance, a storage
device could define an ioctl() operation that returns the device’s total
size, or an ioctl() for flushing all pending write()s to the device.
Another way of thinking of it is that ioctl() facilitates defining
device-specific APIs, solving the syscall problem from earlier; rather than
having a set of syscalls for each device, we have one syscall that all devices
can use!
In our case, we will be creating an ioctl() operation for specifying the
target PID. The skeleton code already has the correct field of struct
file_operations set, you just need to implement farfetch_ioctl().
int fd = open("/dev/farfetch", O_RDWR);
// Assume this passes
int target_pid = 1000; // Or whatever PID you want
// This should return an error if `target_pid` doesn't exist!
ioctl(fd, 0, &target_pid);
Since we’ll be using our own special device controls, we do require a dedicated
C program to make the necessary ioctl() call before reading/writing. Provided
is user/test/farfetchd_p3/ which contains a version of farfetchd that does
not use any special system call, interfacing with /dev/farfetch via
ioctl()/lseek()/read()/write(). Note that the “request” argument passed
to ioctl() is ignored; our driver only has a single IOCTL request, so to
keep things simple, we’ll ignore the op field and just take the argument
after op as the PID. Don’t forget to add a check to ensure this target PID
exists!
And now we can use farfetchd the same as before, on any process we like, for
as long as our module is inserted. Only now, our kernel module can be built
against anyone’s Linux kernel, assuming the version is close enough, and our
user is stupid enough to download a custom kernel module. Also, you can get rid
of the root permission check from earlier, as the file’s access controls
already takes care of that!
To submit this part, push the hw6p3handin tag with the following:
git tag -a -m "Completed hw6 part3." hw6p3handin
git push origin master
git push origin hw6p3handin
Below is some online reading material that you may find helpful for this assignment:
For official Linux documentation on memory management:

The Farfetch’d assignment and reference implementation were designed and implemented by the following TAs of COMS W4118 Operating Systems I, Spring 2022, Columbia University:
The Farfetch’d assignment was further extended by the following TAs of COMS W4118 Operating Systems I, Spring 2026, Columbia University:
Last updated: 2026-04-12