Archive for the ‘kernel’ Category

Writing a new memory device - /dev/clipboard

April 29th, 2021

This article contains code for how to implement a simple device driver in the kernel. The quality of this code is... proof of concept quality. If you're planning to do serious kernel development please don't use this code as a starting point.

Today we're going to use what we learned last time about /dev/null and take it to the next level. We're going to write a new memory device in the kernel. We will call it /dev/clipboard. No, it's not an existing device. Go ahead and check on your system! :)

Why the name /dev/clipboard? Because our new device will work just like a clipboard does. First you write bytes to it, and those bytes are saved. Later you come back and read from the device and it gives you back the same bytes. If you write to it again you overwrite what was there before. (This is not one of those fancy clipboards with multiple slots where you can choose which slot to retrieve.)

Here's a demo of how it works:

# save to clipboard
$ echo "abc 123" > /dev/clipboard 

# retrieve from clipboard
$ cat /dev/clipboard
abc 123

For convenience we will also print a message to the kernel log whenever someone writes to or reads from the clipboard:

$ dmesg  # just after the 'echo' command
Saved 8 bytes to /dev/clipboard

$ dmesg  # just after the 'cat' command
Returned 8 bytes from /dev/clipboard

Pretty neat, huh? Being able to just add a feature to the kernel like that. :cool:

We'll do all our work in drivers/char/mem.c, right next to where /dev/null and the other /dev/xxx memory device drivers live.

First, let's think about where the user's bytes will be kept.. they obviously have to be kept somewhere by the kernel, in the kernel's own memory. We'll keep it simple here and just use a static array of a fixed size. That means we're placing this array in one of the data segments of the kernel's binary, a segment that will be loaded and available at runtime. We'll use 4096 bytes as the size of the array, and that means we're adding 4kb to the kernel's memory footprint (so we don't want to make it too big):

#define CLIPBOARD_BUFSIZE 4096
static char clipboard_buffer[CLIPBOARD_BUFSIZE];
static size_t clipboard_size = 0;

CLIPBOARD_BUFSIZE is a macro that decides how big the fixed array is - a constant. clipboard_buffer is the array itself.

clipboard_size is a variable that reflects how many bytes of user data the array currently holds. This will change every time we write to the clipboard.

Okay, that's it for storage. Now let's turn to how we tell the kernel that this new device exists. We'll add it to the bottom of the list of devices implemented in this file:

static const struct memdev {
	const char *name;
	umode_t mode;
	const struct file_operations *fops;
	fmode_t fmode;
} devlist[] = {
	 [DEVMEM_MINOR] = { "mem", 0, &mem_fops, FMODE_UNSIGNED_OFFSET },
	 [2] = { "kmem", 0, &kmem_fops, FMODE_UNSIGNED_OFFSET },
	 [3] = { "null", 0666, &null_fops, 0 },
	 [4] = { "port", 0, &port_fops, 0 },
	 [5] = { "zero", 0666, &zero_fops, 0 },
	 [7] = { "full", 0666, &full_fops, 0 },
	 [8] = { "random", 0666, &random_fops, 0 },
	 [9] = { "urandom", 0666, &urandom_fops, 0 },
	[11] = { "kmsg", 0644, &kmsg_fops, 0 },
	[12] = { "clipboard", 0666, &clipboard_fops, 0 },
};

And we need to populate a file_operations struct. This is how we tell the kernel what a user can actually do with this new device. In our case, we want the user to be able to open it, read from it, or write to it, then close it. To achieve that we just need to implement a read and a write function. We can populate the other fields with stub functions that do nothing:

static const struct file_operations clipboard_fops = {
	.llseek		= null_lseek,
	.read		= read_clipboard,
	.write		= write_clipboard,
	.read_iter	= read_iter_null,
	.write_iter	= write_iter_null,
	.splice_write = splice_write_null,
};

Alright, we need a read function and a write function. Let's start with write first, because it's what the user will use first. What does the write function need to do?

Well, when the user writes to the clipboard that write should overwrite whatever was in the clipboard before (as we said before). So let's zero the memory in the array.

Next, we have an array of a fixed size. The user might write a byte string that fits or a byte string that exceeds our capacity. If the input is too big to store, we will just truncate the remainder that doesn't fit. (I haven't checked how other clipboards do this, but presumably they too have an upper limit.) Once we know how many bytes to keep, we'll copy those bytes from the user's buffer (which is transient) into our fixed array.

Then we'll print a message to the kernel log to explain what we did. And finally return the number of bytes the user sent us, in order to confirm that the write succeeded. Put all of that together and we get more or less the following:

static ssize_t write_clipboard(struct file *file, const char __user *buf,
			  size_t count, loff_t *ppos)
{
    // erase the clipboard to an empty state
    memset(clipboard_buffer, 0, CLIPBOARD_BUFSIZE * sizeof(char));

    // decide how many bytes of input to keep
    clipboard_size = count;
    if (clipboard_size > CLIPBOARD_BUFSIZE) {
        clipboard_size = CLIPBOARD_BUFSIZE;
    }

    // populate the clipboard
    if (copy_from_user(clipboard_buffer, buf, clipboard_size)) {
        return -EFAULT;
    }

    printk("Saved %lu bytes to /dev/clipboard\n", clipboard_size);

    // acknowledge all the bytes we received
    return count;
}

What's the EFAULT thing all about? copy_from_user is a function provided by the kernel. If it fails to copy all the bytes we requested it will return a non-zero value. This makes the boolean predicate true and we enter the if block. In the kernel the convention is to return a pre-defined error code prefixed with a minus sign to signal an error. We'll gloss over that here.

That does it for the write function. Finally, we need the read function.

The read function is a bit more tricky, because it's intended to be used to read incrementally. So a user can issue read(1) to read a single byte and then call read(1) again to read the next byte. Each time read will return the bytes themselves and an integer which is a count of the number of bytes returned. Once there are no more bytes to return read will return the integer 0.

How does the read function know how many bytes have been returned thus far? It needs to store this somewhere between calls. It turns out this is already provided for us - it's what the argument ppos is for. ppos is used as a cursor from the beginning of the array. We'll update ppos each time to reflect where the cursor should be, and the function calling our read function will store it for us until the next call.

Other than that, the read function is pretty analogous to the write function:

static ssize_t read_clipboard(struct file *file, char __user *buf,
			 size_t count, loff_t *ppos)
{
    size_t how_many;

    // we've already read the whole buffer, nothing more to do here
    if (*ppos >= clipboard_size) {
        return 0;
    }

    // how many more bytes are there in the clipboard left to read?
    how_many = clipboard_size - *ppos;

    // see if we have space for the whole clipboard in the user buffer
    // if not we'll only return the first part of the clipboard
    if (how_many > count) {
        how_many = count;
    }

    // populate the user buffer using the clipboard buffer
    if (copy_to_user(buf, clipboard_buffer + *ppos, how_many)) {
        return -EFAULT;
    }

    // advance the cursor to the end position in the clipboard that we are
    // returning
    *ppos += how_many;

    printk("Returned %lu bytes from /dev/clipboard\n", how_many);

    // return the number of bytes we put into the user buffer
    return how_many;
}

And that's it! That's all the code needed to implement a simple memory device in the kernel! :party:

What happens when you pipe stuff to /dev/null?

April 28th, 2021

This question is one of those old chestnuts that have been around for ages. What happens when you send output to /dev/null? It's singular because the bytes just seem to disappear, like into a black hole. No other file on the computer works like that. The question has taken on something of a philosophical dimension in the debates among programmers. It's almost like asking: how do you take a piece of matter and compress it down into nothing - how is that possible?

Well, as of about a week ago I know the answer. And soon you will, too.

/dev/null appears as a file on the filesystem, but it's not really a file. Instead, it's something more like a border crossing. On one side - our side - /dev/null is a file that we can open and write bytes into. But from the kernel's point of view, /dev/null is a device. A device just like a physical piece of hardware: a mouse, or a network card. Of course, /dev/null is not actually a physical device, it's a pseudo device if you will. But the point is that the kernel treats it as a device - it uses the same vocabulary of concepts when dealing with /dev/null as it does dealing with hardware devices.

So if we have a device - and this is not a trick question - what do we need to have to able to use it? A device driver, exactly. So for the kernel /dev/null is a device and its behavior is defined by its device driver. That driver lives in drivers/char/mem.c. Let's have a look!

Near the bottom of the file we find an array definition:

static const struct memdev {
	const char *name;
	umode_t mode;
	const struct file_operations *fops;
	fmode_t fmode;
} devlist[] = {
	 [DEVMEM_MINOR] = { "mem", 0, &mem_fops, FMODE_UNSIGNED_OFFSET },
	 [2] = { "kmem", 0, &kmem_fops, FMODE_UNSIGNED_OFFSET },
	 [3] = { "null", 0666, &null_fops, 0 },
	 [4] = { "port", 0, &port_fops, 0 },
	 [5] = { "zero", 0666, &zero_fops, 0 },
	 [7] = { "full", 0666, &full_fops, 0 },
	 [8] = { "random", 0666, &random_fops, 0 },
	 [9] = { "urandom", 0666, &urandom_fops, 0 },
	[11] = { "kmsg", 0644, &kmsg_fops, 0 },
};

We don't need to understand all the details here, but just look at the names being defined: mem, null, zero, random, urandom. Where have we seen these names before? We've seen them endless times as /dev/mem, /dev/null, /dev/zero, /dev/random, /dev/urandom. And it's in this file that they are actually defined.

If we focus on the line that defines null we can see that it mentions something called null_fops. This is a struct that defines the behavior of /dev/null. And the struct looks like this:

static const struct file_operations null_fops = {
	.llseek		= null_lseek,
	.read		= read_null,
	.write		= write_null,
	.read_iter	= read_iter_null,
	.write_iter	= write_iter_null,
	.splice_write	= splice_write_null,
};

The values being used to populate this struct are function pointers. So when /dev/null is being written to the function that is responsible for this is write_null. And when /dev/null is being read from (it's not often we read from /dev/null) the function responsible for that is read_null.

Alright, we are just about to find out what happens when you write to /dev/null. The moment you have been waiting for. Are you ready for this? Here we go:

static ssize_t write_null(struct file *file, const char __user *buf,
                          size_t count, loff_t *ppos)
{
	return count;
}

write_null gets two arguments that are of interest to us: the bytes that we sent to /dev/null - represented by buf, and the size of that buffer - represented by count. And what does write_null do with these bytes? Nothing, nothing at all! All it does is return count to confirm that "we've received your bytes, sir". And that's it.

Writing to /dev/null is literally calling a function that ignores its argument. As simple as that. It doesn't look like that because /dev/null appears to us as a file, and the write causes a switch from user mode into kernel mode, and the bytes we send pass through a whole bunch of functions inside the kernel, but at the very end they are sent to the device driver that implements /dev/null and that driver does... nothing! How do you implement doing nothing? You write a function that takes an argument and doesn't use it. Genius!

And considering how often we use /dev/null to discard bytes we don't want to write to a (real) file and we don't want to appear in the terminal, it's actually an incredibly valuable implementation of nothing! It's the most useful nothing on the computer!