Questions about this topic? Sign up to ask in the talk tab.

Difference between revisions of "LKM"

From NetSec
Jump to: navigation, search
m
 
(41 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{Inprog}}
 
 
 
LKM stands for "Linux Kernel Module" or "Loadable Kernel Module". As the name implies, it is a way to allow code to interact directly with the kernel and extend its functionality. The ability to insert modular components into the kernel allows it to remain relatively lightweight, as it does not need to include every driver ever created. The ability to load modules on-the-fly also saves us from having to recompile every time a change needs to be made.
 
LKM stands for "Linux Kernel Module" or "Loadable Kernel Module". As the name implies, it is a way to allow code to interact directly with the kernel and extend its functionality. The ability to insert modular components into the kernel allows it to remain relatively lightweight, as it does not need to include every driver ever created. The ability to load modules on-the-fly also saves us from having to recompile every time a change needs to be made.
  
Line 266: Line 264:
 
If you have followed this article to this point, you are able to write a basic kernel module with parameters and define its behavior when loaded or unloaded. Writing your own devices is one of the ways you can extend kernel functionality, allowing for interaction between userland and the mysteries of the kernel.
 
If you have followed this article to this point, you are able to write a basic kernel module with parameters and define its behavior when loaded or unloaded. Writing your own devices is one of the ways you can extend kernel functionality, allowing for interaction between userland and the mysteries of the kernel.
  
Like the other devices objects found in the /dev folder, a character device is an object that behaves like a regular file, allowing you to read or write from it. Character devices specifically behaves like a pipe - data is written to or read from it instantly in a byte-by-byte stream.
+
Like the other devices objects found in the /dev folder, a character device is an object that behaves like a regular file, allowing you to read or write from it. Character devices specifically behave like a pipe, meaning that data is written to or read from them instantly in a byte-by-byte stream.
  
 
You can identify the type of a device in the /dev folder when running ls:
 
You can identify the type of a device in the /dev folder when running ls:
Line 290: Line 288:
 
As you can see, all character devices have a 'c' in the first columns. The other devices, which are identified by a 'b', are block devices, which have a buffer and which reads, writes and seeks can be used on as though they were a regular file.
 
As you can see, all character devices have a 'c' in the first columns. The other devices, which are identified by a 'b', are block devices, which have a buffer and which reads, writes and seeks can be used on as though they were a regular file.
  
== Major & Minor Numbers ==
+
Every linux device has a major and minor number. The major number is used by the kernel to identify the correct device driver when the device is accessed, while the minor number is internal and is used differently by different drivers. Listing a device also tells you the major and minor numbers of that device.
  
== Device operations ==
+
For example, in the ls output above we can see that the last line contains the numbers "29, 0". This tells us that the major number for /dev/fb0 is 29, and the minor number is 0. This will be important later, as our LKM will need these numbers to successfully create and interact with a device.
 +
 
 +
== Additional includes ==
 +
 
 +
Our module will need the three includes used in the previous examples for basic kernel functionality. In addition, we will need to include the following headers:
 +
 
 +
{{code|text=
 +
<source lang="c">
 +
#include <linux/device.h>
 +
#include <linux/fs.h>
 +
#include <asm/uaccess.h>
 +
</source>
 +
}}
 +
 
 +
<b><linux/device.h></b> is needed to support the kernel driver model, allowing us to register devices with the kernel.
 +
 +
<b><linux/fs.h></b> is needed for linux file system support, allowing us to map our device to a node on the root fs.
 +
 
 +
<b><asm/uaccess.h></b> is used for the copy_to_user() function, which sends data from kernel land to userland via the device.
 +
 
 +
== Defining file operations ==
 +
 
 +
Before we get started registering a device, we need to define our file operations. These define the ways in which your driver can interact with the device, once it's created. It should be a "struct file_operations", which is a struct defined in <linux/fs.h>.
 +
 
 +
We can see a full list of the operations we can define in the file_operations struct, in <linux/fs.h>:
 +
 
 +
{{code|text=
 +
<source lang="c">
 +
struct file_operations {
 +
  struct module *owner;                           
 +
  loff_t (*llseek) (struct file *, loff_t, int);   
 +
  ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);
 +
  ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *);
 +
  ssize_t (*aio_read) (struct kiocb *, const struct iovec *, unsigned long, loff_t); 
 +
  ssize_t (*aio_write) (struct kiocb *, const struct iovec *, unsigned long, loff_t);
 +
  ssize_t (*read_iter) (struct kiocb *, struct iov_iter *);         
 +
  ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);     
 +
  int (*iterate) (struct file *, struct dir_context *);     
 +
  unsigned int (*poll) (struct file *, struct poll_table_struct *);   
 +
  long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
 +
  long (*compat_ioctl) (struct file *, unsigned int, unsigned long); 
 +
  int (*mmap) (struct file *, struct vm_area_struct *);         
 +
  int (*mremap)(struct file *, struct vm_area_struct *);       
 +
  int (*open) (struct inode *, struct file *);         
 +
  int (*flush) (struct file *, fl_owner_t id);       
 +
  int (*release) (struct inode *, struct file *);       
 +
  int (*fsync) (struct file *, loff_t, loff_t, int datasync); 
 +
  int (*aio_fsync) (struct kiocb *, int datasync);       
 +
  int (*fasync) (int, struct file *, int);               
 +
  int (*lock) (struct file *, int, struct file_lock *);   
 +
  …and so on
 +
};
 +
</source>
 +
}}
 +
 
 +
So lets say we just want our driver to do something every time the device is opened, and not do anything else with it (for simplicity's sake). We can see the file_operations struct has a definition for an open() function:
 +
 
 +
{{code|text=
 +
<source lang="c">
 +
  int (*open) (struct inode *, struct file *);            // first operation performed on a device file
 +
</source>
 +
}}
 +
 
 +
We need to implement the function we want to map to open() in our own code, first of all:
 +
 
 +
{{code|text=
 +
<source lang="c">
 +
static int device_open(struct inode *inode_pointer, struct file *file_pointer)
 +
{
 +
printk(KERN_INFO "The device was just opened!\n");
 +
return 0;
 +
}
 +
</source>
 +
}}
 +
 
 +
Now that we've done that, we can create a "struct file_operations" variable that maps our function to the declaration in <linux/fs.h>:
 +
 
 +
{{code|text=
 +
<source lang="c">
 +
static struct file_operations fops =
 +
{
 +
.open = device_open, //map device_open() to the open() file op
 +
};
 +
</source>
 +
}}
 +
 
 +
Note that if your struct declaration appears before you define device_open() in the code, you will need a function prototype:
 +
 
 +
{{code|text=
 +
<source lang="c">
 +
static int device_open(struct inode *, struct file *);
 +
</source>
 +
}}
 +
 
 +
In this case we have used the open() hook, but you can do this with any of the decalarations in the file_operations struct that you want to hook.
 +
 
 +
== Registering a major number ==
 +
 
 +
Now that we have a struct that defines how our LKM will interact with the device file once it's created, we can begin going through the steps of creating the device.
 +
 
 +
If you want to open a device file, you will need to register a major number in your init function. This requires you to use the register_chrdev() function. This function returns the major number and takes 3 arguments. The first argument is the major number you want to request - or 0 to dynamically assign an unused number. The second argument is a string containing the name of the device you would like register. The third argument is the address of the file_operations struct we mentioned earlier.
 +
 
 +
Here is an example of how registration of a major number works:
 +
 
 +
{{code|text=
 +
<source lang="c">
 +
static int majorNumber; //this should be defined outside of the init function, just in case
 +
majorNumber = register_chrdev(0, "myModule", &fops);
 +
</source>
 +
}}
 +
 
 +
If successful, a positive integer will be returned into majorNumber, which is the major number you have been assigned. Otherwise, the return value will be negative. There is a corresponding unregister_chrdev() function which should be part of your cleanup function:
 +
 
 +
{{code|text=
 +
<source lang="c">
 +
unregister_chrdev(majorNumber, "myModule");
 +
</source>
 +
}}
 +
 
 +
== Registering a class ==
 +
 
 +
Devices have both a device name and a class name. Once you have registered a major number for the device, you must register a class.
 +
 
 +
{{code|text=
 +
<source lang="c">
 +
static struct class* myModule_class = NULL; //this should be defined outside of the init function
 +
myModule_class = class_create(THIS_MODULE, "myMod");
 +
</source>
 +
}}
 +
 
 +
After you create the class, there will be a corresponding folder at /sys/class/myMod. As with the major number, there are corresponding functions to destroy the class, which should be part of your cleanup before the major number is unregistered:
 +
 
 +
{{code|text=
 +
<source lang="c">
 +
class_unregister(myModule_class);
 +
class_destroy(myModule_class);
 +
</source>
 +
}}
 +
 
 +
== Registering the device ==
 +
 
 +
Now that we have registered a major number and a class for the device, we are finally ready to create the device itself, which requires that you have both the major number and the device class:
 +
 
 +
{{code|text=
 +
<source lang="c">
 +
static struct device* myModule_device = NULL; //this should be defined outside of the init function
 +
myModule_device = device_create(myModule_class, NULL, MKDEV(majorNumber, 0), NULL, "mod0");
 +
</source>
 +
}}
 +
 
 +
Once this is done, your device file should have been faithfully created at /dev/mod0. If it's there, then your device was successfully created. Remember that we mapped open() on our device to our device_open() function in the file_operations pointer that we passed when we registered a major number. Therefore, the function should execute every time someone tries to open the device. You can test to see if the open() hook works by using any command that opens the device:
 +
 
 +
{{code|text=
 +
<source lang="bash">
 +
$ cat /dev/mod0
 +
</source>
 +
}}
 +
 
 +
Then check dmesg, and you should see something like:
 +
 
 +
{{code|text=
 +
<source lang="bash">
 +
[22973.469043] The device was just opened!
 +
</source>
 +
}}
 +
 
 +
If you see that, then congratulations! You have successfully created a character driver that prints to the kernel every time it is opened. Note that, like the other functions, there is a corresponding function to destroy the device. This should be part of your cleanup, before you unregister the class or the major number:
 +
 
 +
{{code|text=
 +
<source lang="c">
 +
device_destroy(myModule_class, MKDEV(majorNumber, 0));
 +
</source>
 +
}}
 +
 
 +
== Example drivers ==
 +
 
 +
If you put all the sample code above into a single program, you get a functional (though pointless) device driver that simply writes to the kernel output every time it is opened.
 +
 
 +
{{code|text=
 +
<b>device.c</b>
 +
<source lang="c">
 +
#include <linux/init.h>
 +
#include <linux/module.h>
 +
#include <linux/device.h>
 +
#include <linux/kernel.h>
 +
#include <linux/fs.h>
 +
#include <asm/uaccess.h>
 +
 
 +
MODULE_LICENSE("GPL");
 +
MODULE_AUTHOR("Dade Murphy");
 +
MODULE_DESCRIPTION("1507 systems in one day");
 +
MODULE_VERSION("0.1");
 +
 
 +
//initialize variables used by the driver
 +
 
 +
static int majorNumber;
 +
static struct class* myModule_class = NULL;
 +
static struct device* myModule_device = NULL;
 +
 
 +
//function prototype for our file operations struct
 +
 
 +
static int device_open(struct inode *, struct file *);
 +
 
 +
//struct mapping file operations to custom functions
 +
//this is passed when registering a major number
 +
 
 +
static struct file_operations fops =
 +
{
 +
.open = device_open
 +
};
 +
 
 +
static int __init myModule_init(void)
 +
{
 +
majorNumber = register_chrdev(0, "myModule", &fops);
 +
myModule_class = class_create(THIS_MODULE, "myModule");
 +
myModule_device = device_create(myModule_class, NULL, MKDEV(majorNumber, 0), NULL, "mod0");
 +
printk(KERN_INFO "Hello, I can be found at /dev/mod0 and my major number is %d.\n", majorNumber);
 +
return 0;
 +
}
 +
 
 +
static void __exit myModule_exit(void)
 +
{
 +
device_destroy(myModule_class, MKDEV(majorNumber, 0));
 +
class_unregister(myModule_class);
 +
class_destroy(myModule_class);
 +
unregister_chrdev(majorNumber, "myModule");
 +
printk(KERN_INFO "This device, class and major number were successfully destroyed.\n");
 +
}
 +
 
 +
static int device_open(struct inode *inode_pointer, struct file *file_pointer)
 +
{
 +
printk(KERN_INFO "The device was just opened!\n");
 +
return 0;
 +
}
 +
 
 +
module_init(myModule_init);
 +
module_exit(myModule_exit);
 +
</source>
 +
}}
 +
 
 +
For a more functional example of the power of the functionality of a device driver, check [[LKM/chardev.c|here]] for a piece of code that allows data to be both written to and read from with the <i>cat</i> command, storing the last thing that was sent to it.
 +
 
 +
== File operation examples ==
 +
 
 +
Although open() was used for the worked example given above, any of the file operations in <linux/fs.h> can be implemented in order to extend the functionality of a driver. A few examples are given here; hopefully, reading these will give you an understanding of what is required to implement any of the operations in <linux/fs.h>.
 +
 
 +
=== OPEN ===
 +
 
 +
As we have seen, the open() definition looks like this:
 +
 
 +
{{code|text=
 +
<source lang="c">
 +
int (*open) (struct inode *, struct file *);            // first operation performed on a device file
 +
</source>
 +
}}
 +
 
 +
Writing a handler to map to this function is extremely simple. There is not much to interact with - just put any code that you want to run when the device is opened here:
 +
 
 +
{{code|text=
 +
<source lang="c">
 +
int device_open(struct inode *inode_ptr, struct file *file_ptr)
 +
{
 +
printk(KERN_INFO "The device was just opened!\n");
 +
return 0;
 +
}
 +
</source>
 +
}}
 +
 
 +
=== RELEASE ===
 +
 
 +
The release() definition is very similar to the open() definition:
 +
 
 +
{{code|text=
 +
<source lang="c">
 +
int (*release) (struct inode *, struct file *);          // called when a file structure is being released
 +
</source>
 +
}}
 +
 
 +
Again, writing a handler is simply a matter of writing code that does whatever you want done when the file is released:
 +
 
 +
{{code|text=
 +
<source lang="c">
 +
int device_close(struct inode *inode_ptr, struct file *file_ptr)
 +
{
 +
printk(KERN_INFO "The device was just closed!\n");
 +
return 0;
 +
}
 +
</source>
 +
}}
 +
 
 +
=== WRITE ===
 +
 
 +
The write() function is called whenever the file is written to. Its definition is as follows:
 +
 
 +
{{code|text=
 +
<source lang="c">
 +
ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *);  // Used to send data to the device
 +
</source>
 +
}}
 +
 
 +
The write() function has several interesting parameters that are supplied when it is called. The second parameter is a character pointer that will contain the data that is being written, and the third paramater contains the total amount of data that was written (in bytes). There is one important thing to note about write(), and that is that you must keep track of the total amount of data that has been written to it and return it each time. Otherwise the function will fire endlessly, cause a loop that will make the kernel hang. You can do this by using the offset that is sent along with it each time, which starts out at 0.
 +
 
 +
{{code|text=
 +
<source lang="c">
 +
static ssize_t device_write(struct file *file_ptr, char __user *input, size_t len, loff_t *offset)
 +
{
 +
char buf[50];
 +
copy_from_user(buf, input, 50);
 +
buf[49] = 0x00;
 +
printk(KERN_INFO "The device was just written with: %s", buf);
 +
ssize_t bytes = len - (*offset);
 +
(*offset) += bytes;
 +
return bytes;
 +
}
 +
</source>
 +
}}
 +
 
 +
Effectively, the way the offset stuff works is this. If you send it 10 bytes of data, the value of "len" will be 10 and the value of "offset" will be 0. Therefore, "bytes" will be equal to 10-0, or 10. "Offset" will then be set to 10, and 10 will be returned by the function. Because a non-zero value was returned, the function will fire again. This time, however, the offset is 10 - which means that "bytes" is equal to 10-10, or 0. This means that 0 is returned, which stops the function from looping again.
 +
 
 +
=== READ ===
 +
 
 +
The read() function is called whenever the file is read from. Its definition is as follows:
 +
 
 +
{{code|text=
 +
<source lang="c">
 +
ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);    // Used to retrieve data from the device
 +
</source>
 +
}}
 +
 
 +
When implementing this function, we must use the copy_to_user() function to send data back through the pipe. This function takes three arguments: the buffer to write to, the buffer to send, and the number of bytes sent. However, read() does the same thing as write() - it will keep on firing until the return value is 0, so we need to do the offset thing again:
 +
 
 +
{{code|text=
 +
<source lang="c">
 +
static ssize_t device_read(struct file *file_ptr, char __user *output, size_t len, loff_t *offset)
 +
{
 +
printk(KERN_INFO "The device was just read! Sending it some data...\n");
 +
char *message = "hello there\n";
 +
copy_to_user(output, message, sizeof(message));
 +
ssize_t bytes = len - (*offset);
 +
(*offset) += bytes;
 +
return bytes;
 +
}
 +
</source>
 +
}}
  
 
= Hooking system calls =
 
= Hooking system calls =
  
= See also =
+
While device files are one way in which you can store data and communicate with processes, the real bare-bones way in which an LKM can interact with the kernel is the system call. As seen ubiquitously in assembly:
 +
 
 +
{{code|text=
 +
<source lang="asm">
 +
movl $1, %eax      
 +
int $0x80
 +
</source>
 +
}}
 +
 
 +
System calls are used for all of the basic functionality of the linux kernel. Exiting a process, opening a file for reading, forking a new process, requesting new memory - all of these have an associated syscall. Incidentially, you can use the strace command to see which system calls are used by a program:
 +
 
 +
{{code|text=
 +
<source lang="bash">
 +
$ strace ls
 +
</source>
 +
}}
 +
 
 +
System calls are an exception to the protections that exist between userland and the kernel. Userland applications can't access the kernel; they can't access kernel memory, and they can't call kernel functions - the only way they can interact with the kernel directly is with a syscall. They do this by filling the appropriate registers with the correct values, then initiating a kernel interrupt - on Intel architectures, this is interrupt 0x80. Once you perform a kernel interrupt, your process jumps to a previously defined location in the kernel, and the hardware knows that you are now operating in kernel mode rather than the more restrictive user mode.
 +
 
 +
The procedure that is jumped to when a kernel interrupt is performed is called system_call(). When it is executed, it checks the table of system calls (sys_call_table) for the address of the kernel function to execute. Then it calls the function, waits for it to execute, does a bit of cleanup and checking and returns to the process that made the kernel interrupt.
 +
 
 +
If we want to change the way a system call works ("hook" it), we need to write our own function to implement it - usually the safest way to do this is by writing a function that executes a bit of our own code, then calls the original. We then need to change the pointer on the sys_call_table to point to our function, instead of the normal one - this is the difficult part.
 +
 
 +
In older versions of the linux kernel (before 2.6.x), finding the syscall table programmatically was easy. It was simply a symbol that you could call as an external variable, and the kernel would populate it for you automatically:
 +
 
 +
{{code|text=
 +
<source lang="c">
 +
extern void *sys_call_table[];
 +
</source>
 +
}}
 +
 
 +
In all modern kernels, however, the syscall table is not exported by the kernel. This means that you must find the addresses of system calls by some other technique.
 +
 
 +
== Technique: /proc/kallsyms ==
 +
 
 +
The /proc/kallsyms file is generated automatically by the kernel, and is a comprehensive list of all of the global symbols (variables or functions) used by the kernel, along with their memory address. On some systems upon opening /proc/kallsyms as a non-root user, the memory address of some or all of the symbols may be replaced by nulls - this is a protective measure, to make it more difficult for non-root users to access them.
 +
 
 +
{{code|text=
 +
<source lang="c">
 +
$ head /proc/kallsyms
 +
 
 +
0000000000000000 A irq_stack_union
 +
0000000000000000 A __per_cpu_start
 +
ffffffff810002b8 T _stext
 +
ffffffff81001000 T hypercall_page
 +
ffffffff81001000 t xen_hypercall_set_trap_table
 +
ffffffff81001020 t xen_hypercall_mmu_update
 +
ffffffff81001040 t xen_hypercall_set_gdt
 +
ffffffff81001060 t xen_hypercall_stack_switch
 +
ffffffff81001080 t xen_hypercall_set_callbacks
 +
ffffffff810010a0 t xen_hypercall_fpu_taskswitch
 +
</source>
 +
}}
 +
 
 +
The first column is, obviously, the memory address of the symbol in question. The second column is the symbol's type - e.g. "A" for absolute or "T" for text. You can see a full list of symbol types at the manpage for the nm command. The third column is the name of the symbol.
 +
 
 +
So what if we want to include a symbol from the linux kernel in our LKM? Since we are operating in kernel mode, this is easy. Just include <linux/kallsyms.h> and <linux/unistd.h> and use the kallsyms_lookup_name() function to get the memory address of the sys_call_table:
 +
 
 +
{{code|text=
 +
<source lang="c">
 +
#include <linux/kallsyms.h>
 +
#include <linux/unistd.h>
 +
 
 +
static unsigned long *syscall_table; //this should be declared outside of the init function
 +
syscall_table = (unsigned long *) kallsyms_lookup_name("sys_call_table");
 +
</source>
 +
}}
 +
 
 +
This gives us a pointer to the syscall table, allowing us to lookup any symbol or system call that we want using the offset macros from <linux/unistd.h>. For example, if we want to extract the sys_write system call:
 +
 
 +
{{code|text=
 +
<source lang="c">
 +
asmlinkage int (*sys_write)(unsigned int, const char __user *, size_t);
 +
//the datatype & params for this function prototype come from fs/read_write.c
 +
sys_write = (void *) syscall_table[__NR_write];
 +
</source>
 +
}}
 +
 
 +
Now the address of the original sys_write() has been stored in a function pointer, which we can call to access the original write whenever we want.
 +
 
 +
Note that you can find a full list of these offset macros at the /usr/include/asm/unistd.h files (there are multiple for different architectures, e.g. "unistd_64.h"). You don't need to use the macros at all if you don't want to - they just map to the relevant syscall number. For example, "__NR_write" is equivalent to 4, because write is syscall 4. If you wanted, you could just do:
 +
 
 +
{{code|text=
 +
<source lang="c">
 +
sys_write = syscall_table[4];
 +
</source>
 +
}}
 +
 
 +
Note that if /proc/kallsyms is not available for whatever reason, the symbol table can sometimes be found at '/boot/System.map-$(uname -r)'. Unlike kallsyms, however, there is no easily included header with functions that access the System.map file, so you will need to open the file for reading and parse it manually for the addresses of the symbols you want to hook.
 +
 
 +
== Hooking the syscall ==
 +
 
 +
Whichever technique you used to do it, you now have the memory address of the syscall table, and the ability to load any symbol you want from it. Now we can begin the process of hooking the syscall table. We do this in a few steps - for instance, say sys_write is the syscall we have chosen to hook:
 +
 
 +
#As demonstrated above, extract the original syscall (sys_write) from the syscall table
 +
#Write a function that does some stuff, then calls the original sys_write to avoid breaking anything.
 +
#Overwrite sys_write on the syscall table with our new function.
 +
 
 +
However, before we can do this, there is one problem we must address: on modern versions of linux, the syscall table is not writable - we will need to change this if we want to be able to overwrite the address of sys_write on the syscall table. The cr0 register is responsible for our lack of access - it is one of the control registers that affects basic CPU functionality. By default, the 16th bit of this register is the "Write Protect" bit that prevents anyone, even root, from writing to read-only memory pages. But since we're the kernel of course, we can just... unset the bit. We do this with the write_cro() macro:
 +
 
 +
{{code|text=
 +
<source lang="c">
 +
write_cr0 (read_cr0() & (~0x10000));
 +
</source>
 +
}}
 +
 
 +
Now that we have the permissions to do everything we need, we can get started. First of all, we extract the original sys_write call from the syscall table:
 +
 
 +
{{code|text=
 +
<source lang="c">
 +
asmlinkage int (*original_write)(unsigned int, const char __user *, size_t);
 +
//the datatype & params for this function prototype come from fs/read_write.c
 +
original_write = (void *) syscall_table[__NR_write];
 +
</source>
 +
}}
 +
 
 +
The next step is to write the function that we will hook sys_write with. This function will do some stuff, then call the original sys_write that we saved:
 +
 
 +
{{code|text=
 +
<source lang="c">
 +
asmlinkage int new_write(unsigned int fd, const char __user *buf, size_t count)
 +
{
 +
printk(KERN_INFO "Write successfully hooked!\n);
 +
return (*original_write)(fd, buf, count);
 +
}
 +
</source>
 +
}}
 +
 
 +
Now that everything is set up, we overwrite sys_write on the syscall table with the address of our new function:
 +
 
 +
{{code|text=
 +
<source lang="c">
 +
syscall_table[__NR_write] = new_write;
 +
</source>
 +
}}
 +
 
 +
As long as everything was done correctly, you should see something sent to dmesg every time sys_write is called, which will be a lot.
 +
 
 +
== Cleaning Up ==
 +
 
 +
Hooking syscalls involves some pretty in-depth modification of the kernel, so it's very important to clean up properly in your exit function. If you don't it's very likely that your module will cause a kernel panic and crash your system when unloaded., at the very least.
 +
 
 +
Firstly, unset the 16th bit of the cr0 register again, just to make sure we still have full write access to the kernel table:
 +
 
 +
{{code|text=
 +
<source lang="c">
 +
write_cr0 (read_cr0() & (~0x10000));
 +
</source>
 +
}}
 +
 
 +
Then we need to write the original address of sys_write back to the syscall table, thereby unhooking it:
 +
 
 +
{{code|text=
 +
<source lang="c">
 +
syscall_table[__NR_write] = original_write;
 +
</source>
 +
}}
 +
 
 +
Then finally, we reset the 16th bit of the cr0 register to prevent any instability:
 +
 
 +
 
 +
{{code|text=
 +
<source lang="c">
 +
write_cr0 (read_cro() | (0x10000));
 +
</source>
 +
}}
 +
 
 +
An example of an LKM that hooks the sys_write call can be found [[LKM/syscall.c|here]] for your viewing pleasure.
 +
 
 +
= Further reading =
  
 
[http://www.tldp.org/LDP/lkmpg/2.6/html/index.html The Linux Kernel Module Programming Guide]- an outdated but solid tutorial covering many of the concepts that will help you to understand the linux kernel.
 
[http://www.tldp.org/LDP/lkmpg/2.6/html/index.html The Linux Kernel Module Programming Guide]- an outdated but solid tutorial covering many of the concepts that will help you to understand the linux kernel.
 +
 +
[http://lxr.free-electrons.com/source/include/linux/fs.h Online reference for <linux/fs.h>]- see line 1461 for the definitions of various file operations that can be implemented by a character device.
 +
 +
[https://filippo.io/linux-syscall-table/ Searchable Linux syscall table]- A good online reference for syscall numbers, which can be used to determine the offset in sys_call_table where a syscall will be located.
 +
 +
[https://github.com/mncoppola/suterusu Suterusu]- an example of a relatively recent LKM rootkit written in C.

Latest revision as of 14:05, 26 June 2016

LKM stands for "Linux Kernel Module" or "Loadable Kernel Module". As the name implies, it is a way to allow code to interact directly with the kernel and extend its functionality. The ability to insert modular components into the kernel allows it to remain relatively lightweight, as it does not need to include every driver ever created. The ability to load modules on-the-fly also saves us from having to recompile every time a change needs to be made.

It goes without saying that you need root to modify the kernel. With this restriction in mind, however, LKMs can be very powerful if used correctly, since the kernel operates under significantly elevated privileges compared to userland. In particular, the functionality provided by extending the kernel can be used to great effect in the development of Rootkits.

RPU0j.png LKMs interact with your system on the kernel level, executing with the highest possible level of privilege. A poorly-designed kernel module may make your OS unstable, corrupt your filesystem and even brick your computer. You have been warned.

You can see a list of currently loaded kernel modules in two ways:

 
$ lsmod
$ cat /proc/modules
 

You can (as root) add new modules to your kernel with the insmod and rmmod commands:

 
$ insmod modname.ko
$ rmmod modname
 

These two utilities provide a simple, clean way to insert or remove modules from the kernel. If you need more advanced control over the insertion, removal and alteration of modules in the kernel, use the more fully-featured modprobe utility instead.


Writing a basic LKM

Linux kernel modules are written in C and compiled from one or more source files into a kernel object (.ko) file. In order to write an LKM, you will need a strong grasp of the fundamentals of C programming and at least a basic understanding of the way linux manages files, processes and devices.

Although they are written in C, there are several differences you should keep in mind before you begin writing your first module.

  • There is no standard entry point for an LKM - no main() function. Instead, an initialization function runs and terminates when the module is first loaded, setting itself up to handle any requests it receives - an event-driven model.
  • LKMs operate at a much higher level of privilege than userland programs. In addition to being able to do and access more, this means that they are assigned higher priority when handing out CPU cycles and resources. A poorly-written LKM can easily consume too much of a machine's processing power for anything else to function properly.
  • LKMs do not have automatic cleanup, garbage collection, or many of the other convenience functionality that userland applications do. If you allocate memory without freeing it, it will remain allocated. If your module continues to allocate memory over time, it will negatively affect your system's performance.
  • LKMs can be simultaneously accessed by multiple processes, and they need to be able to gracefully handle being interrupted. If two processes ask a module for output at the same time, it needs to be able to keep track of which is which and avoid mixing the data.

Essential includes

A large-number of low-level and kernel-level headers are available for inclusion, as we will see when designing more fully-featured modules. However, in order to support a module's basic functionality, we will need only three includes:

 
#include <linux/init.h>
#include <linux/module.h>
#include <linux/kernel.h>
 

<linux/init.h> contains macros needed for various purposes, such as marking up functions such as __init and __exit.

<linux/module.h> is the core header for loading modules into the kernel. It includes the macros and functions that allow you to register various aspects of your module with the kernel.

<linux/kernel.h> provides various functions and macros for interacting with the kernel - for example, this header is where we find the printk() function.

Registering your module

Introduced by <linux/module.h> is a series of macros used to declare information about your LKM. This information will be displayed when someone uses a command like modinfo on your module:

 
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Dade Murphy");
MODULE_DESCRIPTION("1507 systems in one day.");
MODULE_VERSION("0.1");
 

Registering parameters

It is possible to pass command-line arguments to the module at the time it is inserted into the kernel. In order to specify a parameter for your module, you must first create a static variable initialized with a default value. As a rule, variables within kernel modules should be static and not global, as global variables are shared kernel-wide.

The next step is to register the parameter with the module_param() function - and optionally with MODULE_PARM_DESC(), which is used to give the parameter some descriptive text for modinfo. The module_param() function takes three arguments:

  • The variable used to store the parameter.
  • The datatype of the parameter, which can be one of: byte, int, uint, long, ulong, short, ushort, bool, an inverse Boolean invbool, or a char pointer charp.
  • The permissions of the parameter - these can either be classic octal permissions(i.e. "0664") or the macro equivalents (i.e. "S_IRUSR|S_IWUSR").

For example:

 
static char *arg1 = "default";
module_param(arg1, charp, 0664);
MODULE_PARM_DESC(arg1, "The description to display in /var/log/kern.log");
 

Initialization and cleanup

In order for your module to actually do anything after insertion, it needs an __init and __exit function. Any setup, preparation of devices, hooking of syscalls and so on should go into the initialization function. Any cleanup, deallocation of memory, and restoration of changes should go into the cleanup function.

To define the LKM initialization function, create a static function with the "int __init" datatype, which returns 0 on success. This is the function that will execute when the module is loaded into the kernel. The __init macro specifies that the function is only used at initialization time and that it can be discarded after that point:

 
static int __init myModule_init(void)
{
	printk(KERN_INFO "Hello %s from this example LKM!\n", arg1);
	return 0;
}
 

The exit function is similar - it should be of type "void __exit", and is executed when the module is unloaded from the kernel:

 
static void __exit myModule_exit(void)
{
   printk(KERN_INFO "Goodbye %s from this example LKM!\n", arg1);
}
 

After you have defined your init and exit functions, you must register them so that the kernel knows about them:

 
module_init(myModule_init);
module_exit(myModule_exit);
 

Example code

Based on all of the examples we have given so far, it is possible to construct a (very basic) kernel module. It won't do much besides print to the kernel log when it is loaded or unloaded, but it should compile into a kernel object without any issues.

module.c

 
#include <linux/init.h> 
#include <linux/module.h>
#include <linux/kernel.h>
 
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Dade Murphy");
MODULE_DESCRIPTION("1507 systems in one day.");
MODULE_VERSION("0.1");
 
static char *arg1 = "default";
module_param(arg1, charp, 0664);
MODULE_PARM_DESC(arg1, "The description to display in /var/log/kern.log");
 
static int __init myModule_init(void)
{
	printk(KERN_INFO "Hello %s from this example LKM!\n", arg1);
	return 0
}
 
static void __exit myModule_exit(void)
{
   printk(KERN_INFO "Goodbye %s from this example LKM!\n", arg1);
}
 
module_init(myModule_init);
module_exit(myModule_exit);
 

Compiling your LKM

In order to compile the module we have just written, you will need to write a Makefile that looks something like this:

Makefile

 
obj-m+=module.o
 
all:
	make -C /lib/modules/$(shell uname -r)/build/ M=$(PWD) modules
clean:
	make -C /lib/modules/$(shell uname -r)/build/ M=$(PWD) clean
 

The first line of this module is a goal definition, which defines the object to be built. The "obj-m" keyword defines a loadable module goal, as opposed to something like "obj-y" which would be a built-in object goal. The remaining lines are more like a traditional Makefile. The -C parameter is used to make sure we are in the module directory before performing any make tasks, using "uname -r" to figure out where that is. The "M=$(PWD)" part tells the make command where the actual project files exist, and the "modules" target is the default target for kernel modules.

In order to compile the module, simply run "make" as root in the same directory as the Makefile and module.c; assuming your code did not contain any errors and you have the correct version of the linux kernel headers, it should compile, producing a number of files in the current directory. One of these files will be called something like module.ko - this is your kernel object file.

In order to insert it into the kernel, do:

 
$ insmod module.ko arg1=null
 

Then, to confirm it has been inserted:

 
$ lsmod | grep module
 
module        16384    0
 

To unload it from the kernel:

 
$ rmmod module
 

Now you have managed to compile your module, insert it and remove it from the kernel, but how do you know if it actually worked? We used the printk() function to print to the kernel message buffer, so let's check that:

 
$ dmesg
 
[ 3728.160984] Hello null from this example LKM!
[ 3730.248728] Goodbye null from this example LKM!
 
$ tail -l 2 /var/log/kern.log
 
Jun 18 20:04:58 Gibson kernel: [ 3728.160984] Hello null from this example LKM!
Jun 18 20:05:00 Gibson kernel: [ 3730.248728] Goodbye null from this example LKM!
 

Compiling multiple source files

If, for whatever reason, your LKM is composed of multiple source files, your Makefile will look a little different. All you need to do is invent an object name for your combined module, then tell make what object files are part of that module. For example, if you have source files named start.c and stop.c:

Makefile

 
obj-m+=startstop.o
startstop-objs := start.o stop.o
 
all:
	make -C /lib/modules/$(shell uname -r)/build/ M=$(PWD) modules
clean:
	make -C /lib/modules/$(shell uname -r)/build/ M=$(PWD) clean
 
 

Permanently adding LKMs

If you're writing a rootkit as a kernel module, you're going to want it to persist and load automatically on boot. In order to do this, you must first install it. First of all, copy it to the correct folder and run depmod:

 
$ cp module.ko /lib/modules/$(uname -r)/misc/
$ depmod
 

The depmod utility will check all of the modules in the /lib/modules/whatever/ folder for dependencies, reading each module and determining what symbols it exports and what symbols it needs. This adds it to modprobe's database, but doesn't set it to automatically load on boot. You must add the name of your LKM (without the .ko) to either /etc/modules or /etc/modules.conf, depending on what distro you're using.

This tells modprobe to automatically try to load that module on boot. To test it, try to reboot and see if your module appears in the output of lsmod. If it does, your module has been loaded and will persist between reboots.

Creating character devices

If you have followed this article to this point, you are able to write a basic kernel module with parameters and define its behavior when loaded or unloaded. Writing your own devices is one of the ways you can extend kernel functionality, allowing for interaction between userland and the mysteries of the kernel.

Like the other devices objects found in the /dev folder, a character device is an object that behaves like a regular file, allowing you to read or write from it. Character devices specifically behave like a pipe, meaning that data is written to or read from them instantly in a byte-by-byte stream.

You can identify the type of a device in the /dev folder when running ls:

 
$ ls -l /dev
 
crw------- 1 root root   10, 175 Jun 18 09:27 agpgart
crw------- 1 root root   10, 235 Jun 18 09:27 autofs
crw------- 1 root root   10, 234 Jun 18 09:27 btrfs-control
crw------- 1 root root    5,   1 Jun 18 09:27 console
crw------- 1 root root   10,  62 Jun 18 09:27 cpu_dma_latency
crw------- 1 root root   10, 203 Jun 18 09:27 cuse
brw-rw---- 1 root disk  254,   0 Jun 18 09:27 dm-0
brw-rw---- 1 root disk  254,   1 Jun 18 11:12 dm-1
brw-rw---- 1 root disk  254,   2 Jun 18 11:12 dm-2
brw-rw---- 1 root disk  254,   3 Jun 18 16:36 dm-3
crw-rw---- 1 root video  29,   0 Jun 18 09:27 fb0
 

As you can see, all character devices have a 'c' in the first columns. The other devices, which are identified by a 'b', are block devices, which have a buffer and which reads, writes and seeks can be used on as though they were a regular file.

Every linux device has a major and minor number. The major number is used by the kernel to identify the correct device driver when the device is accessed, while the minor number is internal and is used differently by different drivers. Listing a device also tells you the major and minor numbers of that device.

For example, in the ls output above we can see that the last line contains the numbers "29, 0". This tells us that the major number for /dev/fb0 is 29, and the minor number is 0. This will be important later, as our LKM will need these numbers to successfully create and interact with a device.

Additional includes

Our module will need the three includes used in the previous examples for basic kernel functionality. In addition, we will need to include the following headers:

 
#include <linux/device.h>		
#include <linux/fs.h>			
#include <asm/uaccess.h>		
 

<linux/device.h> is needed to support the kernel driver model, allowing us to register devices with the kernel.

<linux/fs.h> is needed for linux file system support, allowing us to map our device to a node on the root fs.

<asm/uaccess.h> is used for the copy_to_user() function, which sends data from kernel land to userland via the device.

Defining file operations

Before we get started registering a device, we need to define our file operations. These define the ways in which your driver can interact with the device, once it's created. It should be a "struct file_operations", which is a struct defined in <linux/fs.h>.

We can see a full list of the operations we can define in the file_operations struct, in <linux/fs.h>:

 
struct file_operations {
   struct module *owner;                             
   loff_t (*llseek) (struct file *, loff_t, int);    
   ssize_t (*read) (struct file *, char __user *, size_t, loff_t *); 
   ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *); 
   ssize_t (*aio_read) (struct kiocb *, const struct iovec *, unsigned long, loff_t);  
   ssize_t (*aio_write) (struct kiocb *, const struct iovec *, unsigned long, loff_t); 
   ssize_t (*read_iter) (struct kiocb *, struct iov_iter *);          
   ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);       
   int (*iterate) (struct file *, struct dir_context *);       
   unsigned int (*poll) (struct file *, struct poll_table_struct *);    
   long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
   long (*compat_ioctl) (struct file *, unsigned int, unsigned long);  
   int (*mmap) (struct file *, struct vm_area_struct *);          
   int (*mremap)(struct file *, struct vm_area_struct *);         
   int (*open) (struct inode *, struct file *);          
   int (*flush) (struct file *, fl_owner_t id);        
   int (*release) (struct inode *, struct file *);         
   int (*fsync) (struct file *, loff_t, loff_t, int datasync);  
   int (*aio_fsync) (struct kiocb *, int datasync);         
   int (*fasync) (int, struct file *, int);                 
   int (*lock) (struct file *, int, struct file_lock *);    
   …and so on
};
 

So lets say we just want our driver to do something every time the device is opened, and not do anything else with it (for simplicity's sake). We can see the file_operations struct has a definition for an open() function:

 
   int (*open) (struct inode *, struct file *);             // first operation performed on a device file
 

We need to implement the function we want to map to open() in our own code, first of all:

 
static int device_open(struct inode *inode_pointer, struct file *file_pointer)
{
	printk(KERN_INFO "The device was just opened!\n");
	return 0;
}
 

Now that we've done that, we can create a "struct file_operations" variable that maps our function to the declaration in <linux/fs.h>:

 
static struct file_operations fops =
{
	.open = device_open,	//map device_open() to the open() file op
};
 

Note that if your struct declaration appears before you define device_open() in the code, you will need a function prototype:

 
static int device_open(struct inode *, struct file *);
 

In this case we have used the open() hook, but you can do this with any of the decalarations in the file_operations struct that you want to hook.

Registering a major number

Now that we have a struct that defines how our LKM will interact with the device file once it's created, we can begin going through the steps of creating the device.

If you want to open a device file, you will need to register a major number in your init function. This requires you to use the register_chrdev() function. This function returns the major number and takes 3 arguments. The first argument is the major number you want to request - or 0 to dynamically assign an unused number. The second argument is a string containing the name of the device you would like register. The third argument is the address of the file_operations struct we mentioned earlier.

Here is an example of how registration of a major number works:

 
static int majorNumber; //this should be defined outside of the init function, just in case
majorNumber = register_chrdev(0, "myModule", &fops);
 

If successful, a positive integer will be returned into majorNumber, which is the major number you have been assigned. Otherwise, the return value will be negative. There is a corresponding unregister_chrdev() function which should be part of your cleanup function:

 
unregister_chrdev(majorNumber, "myModule");
 

Registering a class

Devices have both a device name and a class name. Once you have registered a major number for the device, you must register a class.

 
static struct class* myModule_class = NULL;	//this should be defined outside of the init function
myModule_class = class_create(THIS_MODULE, "myMod");
 

After you create the class, there will be a corresponding folder at /sys/class/myMod. As with the major number, there are corresponding functions to destroy the class, which should be part of your cleanup before the major number is unregistered:

 
class_unregister(myModule_class);
class_destroy(myModule_class);
 

Registering the device

Now that we have registered a major number and a class for the device, we are finally ready to create the device itself, which requires that you have both the major number and the device class:

 
static struct device* myModule_device = NULL;	//this should be defined outside of the init function
myModule_device = device_create(myModule_class, NULL, MKDEV(majorNumber, 0), NULL, "mod0");
 

Once this is done, your device file should have been faithfully created at /dev/mod0. If it's there, then your device was successfully created. Remember that we mapped open() on our device to our device_open() function in the file_operations pointer that we passed when we registered a major number. Therefore, the function should execute every time someone tries to open the device. You can test to see if the open() hook works by using any command that opens the device:

 
$ cat /dev/mod0
 

Then check dmesg, and you should see something like:

 
[22973.469043] The device was just opened!
 

If you see that, then congratulations! You have successfully created a character driver that prints to the kernel every time it is opened. Note that, like the other functions, there is a corresponding function to destroy the device. This should be part of your cleanup, before you unregister the class or the major number:

 
device_destroy(myModule_class, MKDEV(majorNumber, 0));
 

Example drivers

If you put all the sample code above into a single program, you get a functional (though pointless) device driver that simply writes to the kernel output every time it is opened.

device.c

 
#include <linux/init.h>
#include <linux/module.h>
#include <linux/device.h>
#include <linux/kernel.h>
#include <linux/fs.h>
#include <asm/uaccess.h>
 
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Dade Murphy");
MODULE_DESCRIPTION("1507 systems in one day");
MODULE_VERSION("0.1");
 
//initialize variables used by the driver
 
static int majorNumber;
static struct class* myModule_class = NULL;
static struct device* myModule_device = NULL;
 
//function prototype for our file operations struct
 
static int device_open(struct inode *, struct file *);
 
//struct mapping file operations to custom functions
//this is passed when registering a major number
 
static struct file_operations fops =
{
	.open = device_open
};
 
static int __init myModule_init(void)
{
	majorNumber = register_chrdev(0, "myModule", &fops);
	myModule_class = class_create(THIS_MODULE, "myModule");
	myModule_device = device_create(myModule_class, NULL, MKDEV(majorNumber, 0), NULL, "mod0");
	printk(KERN_INFO "Hello, I can be found at /dev/mod0 and my major number is %d.\n", majorNumber);
	return 0;
}
 
static void __exit myModule_exit(void)
{
	device_destroy(myModule_class, MKDEV(majorNumber, 0));
	class_unregister(myModule_class);
	class_destroy(myModule_class);
	unregister_chrdev(majorNumber, "myModule");
	printk(KERN_INFO "This device, class and major number were successfully destroyed.\n");
}
 
static int device_open(struct inode *inode_pointer, struct file *file_pointer)
{
	printk(KERN_INFO "The device was just opened!\n");
	return 0;
}
 
module_init(myModule_init);
module_exit(myModule_exit);
 

For a more functional example of the power of the functionality of a device driver, check here for a piece of code that allows data to be both written to and read from with the cat command, storing the last thing that was sent to it.

File operation examples

Although open() was used for the worked example given above, any of the file operations in <linux/fs.h> can be implemented in order to extend the functionality of a driver. A few examples are given here; hopefully, reading these will give you an understanding of what is required to implement any of the operations in <linux/fs.h>.

OPEN

As we have seen, the open() definition looks like this:

 
int (*open) (struct inode *, struct file *);             // first operation performed on a device file
 

Writing a handler to map to this function is extremely simple. There is not much to interact with - just put any code that you want to run when the device is opened here:

 
int device_open(struct inode *inode_ptr, struct file *file_ptr)
{
	printk(KERN_INFO "The device was just opened!\n");
	return 0;
}
 

RELEASE

The release() definition is very similar to the open() definition:

 
int (*release) (struct inode *, struct file *);          // called when a file structure is being released
 

Again, writing a handler is simply a matter of writing code that does whatever you want done when the file is released:

 
int device_close(struct inode *inode_ptr, struct file *file_ptr)
{
	printk(KERN_INFO "The device was just closed!\n");
	return 0;
}
 

WRITE

The write() function is called whenever the file is written to. Its definition is as follows:

 
ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *);   // Used to send data to the device
 

The write() function has several interesting parameters that are supplied when it is called. The second parameter is a character pointer that will contain the data that is being written, and the third paramater contains the total amount of data that was written (in bytes). There is one important thing to note about write(), and that is that you must keep track of the total amount of data that has been written to it and return it each time. Otherwise the function will fire endlessly, cause a loop that will make the kernel hang. You can do this by using the offset that is sent along with it each time, which starts out at 0.

 
static ssize_t device_write(struct file *file_ptr, char __user *input, size_t len, loff_t *offset)
{
	char buf[50];
	copy_from_user(buf, input, 50);
	buf[49] = 0x00;
	printk(KERN_INFO "The device was just written with: %s", buf);
	ssize_t bytes = len - (*offset);
	(*offset) += bytes;
	return bytes;
}
 

Effectively, the way the offset stuff works is this. If you send it 10 bytes of data, the value of "len" will be 10 and the value of "offset" will be 0. Therefore, "bytes" will be equal to 10-0, or 10. "Offset" will then be set to 10, and 10 will be returned by the function. Because a non-zero value was returned, the function will fire again. This time, however, the offset is 10 - which means that "bytes" is equal to 10-10, or 0. This means that 0 is returned, which stops the function from looping again.

READ

The read() function is called whenever the file is read from. Its definition is as follows:

 
ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);    // Used to retrieve data from the device
 

When implementing this function, we must use the copy_to_user() function to send data back through the pipe. This function takes three arguments: the buffer to write to, the buffer to send, and the number of bytes sent. However, read() does the same thing as write() - it will keep on firing until the return value is 0, so we need to do the offset thing again:

 
static ssize_t device_read(struct file *file_ptr, char __user *output, size_t len, loff_t *offset)
{
	printk(KERN_INFO "The device was just read! Sending it some data...\n");
	char *message = "hello there\n";
	copy_to_user(output, message, sizeof(message));
	ssize_t bytes = len - (*offset);
	(*offset) += bytes;
	return bytes;
}
 

Hooking system calls

While device files are one way in which you can store data and communicate with processes, the real bare-bones way in which an LKM can interact with the kernel is the system call. As seen ubiquitously in assembly:

 
movl $1, %eax	       
int $0x80	
 

System calls are used for all of the basic functionality of the linux kernel. Exiting a process, opening a file for reading, forking a new process, requesting new memory - all of these have an associated syscall. Incidentially, you can use the strace command to see which system calls are used by a program:

 
$ strace ls
 

System calls are an exception to the protections that exist between userland and the kernel. Userland applications can't access the kernel; they can't access kernel memory, and they can't call kernel functions - the only way they can interact with the kernel directly is with a syscall. They do this by filling the appropriate registers with the correct values, then initiating a kernel interrupt - on Intel architectures, this is interrupt 0x80. Once you perform a kernel interrupt, your process jumps to a previously defined location in the kernel, and the hardware knows that you are now operating in kernel mode rather than the more restrictive user mode.

The procedure that is jumped to when a kernel interrupt is performed is called system_call(). When it is executed, it checks the table of system calls (sys_call_table) for the address of the kernel function to execute. Then it calls the function, waits for it to execute, does a bit of cleanup and checking and returns to the process that made the kernel interrupt.

If we want to change the way a system call works ("hook" it), we need to write our own function to implement it - usually the safest way to do this is by writing a function that executes a bit of our own code, then calls the original. We then need to change the pointer on the sys_call_table to point to our function, instead of the normal one - this is the difficult part.

In older versions of the linux kernel (before 2.6.x), finding the syscall table programmatically was easy. It was simply a symbol that you could call as an external variable, and the kernel would populate it for you automatically:

 
extern void *sys_call_table[];
 

In all modern kernels, however, the syscall table is not exported by the kernel. This means that you must find the addresses of system calls by some other technique.

Technique: /proc/kallsyms

The /proc/kallsyms file is generated automatically by the kernel, and is a comprehensive list of all of the global symbols (variables or functions) used by the kernel, along with their memory address. On some systems upon opening /proc/kallsyms as a non-root user, the memory address of some or all of the symbols may be replaced by nulls - this is a protective measure, to make it more difficult for non-root users to access them.

 
$ head /proc/kallsyms
 
0000000000000000 A irq_stack_union
0000000000000000 A __per_cpu_start
ffffffff810002b8 T _stext
ffffffff81001000 T hypercall_page
ffffffff81001000 t xen_hypercall_set_trap_table
ffffffff81001020 t xen_hypercall_mmu_update
ffffffff81001040 t xen_hypercall_set_gdt
ffffffff81001060 t xen_hypercall_stack_switch
ffffffff81001080 t xen_hypercall_set_callbacks
ffffffff810010a0 t xen_hypercall_fpu_taskswitch
 

The first column is, obviously, the memory address of the symbol in question. The second column is the symbol's type - e.g. "A" for absolute or "T" for text. You can see a full list of symbol types at the manpage for the nm command. The third column is the name of the symbol.

So what if we want to include a symbol from the linux kernel in our LKM? Since we are operating in kernel mode, this is easy. Just include <linux/kallsyms.h> and <linux/unistd.h> and use the kallsyms_lookup_name() function to get the memory address of the sys_call_table:

 
#include <linux/kallsyms.h>
#include <linux/unistd.h>
 
static unsigned long *syscall_table; //this should be declared outside of the init function
syscall_table = (unsigned long *) kallsyms_lookup_name("sys_call_table");
 

This gives us a pointer to the syscall table, allowing us to lookup any symbol or system call that we want using the offset macros from <linux/unistd.h>. For example, if we want to extract the sys_write system call:

 
asmlinkage int (*sys_write)(unsigned int, const char __user *, size_t); 
//the datatype & params for this function prototype come from fs/read_write.c
sys_write = (void *) syscall_table[__NR_write];
 

Now the address of the original sys_write() has been stored in a function pointer, which we can call to access the original write whenever we want.

Note that you can find a full list of these offset macros at the /usr/include/asm/unistd.h files (there are multiple for different architectures, e.g. "unistd_64.h"). You don't need to use the macros at all if you don't want to - they just map to the relevant syscall number. For example, "__NR_write" is equivalent to 4, because write is syscall 4. If you wanted, you could just do:

 
sys_write = syscall_table[4];
 

Note that if /proc/kallsyms is not available for whatever reason, the symbol table can sometimes be found at '/boot/System.map-$(uname -r)'. Unlike kallsyms, however, there is no easily included header with functions that access the System.map file, so you will need to open the file for reading and parse it manually for the addresses of the symbols you want to hook.

Hooking the syscall

Whichever technique you used to do it, you now have the memory address of the syscall table, and the ability to load any symbol you want from it. Now we can begin the process of hooking the syscall table. We do this in a few steps - for instance, say sys_write is the syscall we have chosen to hook:

  1. As demonstrated above, extract the original syscall (sys_write) from the syscall table
  2. Write a function that does some stuff, then calls the original sys_write to avoid breaking anything.
  3. Overwrite sys_write on the syscall table with our new function.

However, before we can do this, there is one problem we must address: on modern versions of linux, the syscall table is not writable - we will need to change this if we want to be able to overwrite the address of sys_write on the syscall table. The cr0 register is responsible for our lack of access - it is one of the control registers that affects basic CPU functionality. By default, the 16th bit of this register is the "Write Protect" bit that prevents anyone, even root, from writing to read-only memory pages. But since we're the kernel of course, we can just... unset the bit. We do this with the write_cro() macro:

 
write_cr0 (read_cr0() & (~0x10000));
 

Now that we have the permissions to do everything we need, we can get started. First of all, we extract the original sys_write call from the syscall table:

 
asmlinkage int (*original_write)(unsigned int, const char __user *, size_t); 
//the datatype & params for this function prototype come from fs/read_write.c
original_write = (void *) syscall_table[__NR_write];
 

The next step is to write the function that we will hook sys_write with. This function will do some stuff, then call the original sys_write that we saved:

 
asmlinkage int new_write(unsigned int fd, const char __user *buf, size_t count) 
{
	printk(KERN_INFO "Write successfully hooked!\n);
	return (*original_write)(fd, buf, count);
}

Now that everything is set up, we overwrite sys_write on the syscall table with the address of our new function:

 
syscall_table[__NR_write] = new_write;
 

As long as everything was done correctly, you should see something sent to dmesg every time sys_write is called, which will be a lot.

Cleaning Up

Hooking syscalls involves some pretty in-depth modification of the kernel, so it's very important to clean up properly in your exit function. If you don't it's very likely that your module will cause a kernel panic and crash your system when unloaded., at the very least.

Firstly, unset the 16th bit of the cr0 register again, just to make sure we still have full write access to the kernel table:

 
write_cr0 (read_cr0() & (~0x10000));
 

Then we need to write the original address of sys_write back to the syscall table, thereby unhooking it:

 
syscall_table[__NR_write] = original_write;
 

Then finally, we reset the 16th bit of the cr0 register to prevent any instability:


 
write_cr0 (read_cro() | (0x10000));
 

An example of an LKM that hooks the sys_write call can be found here for your viewing pleasure.

Further reading

The Linux Kernel Module Programming Guide- an outdated but solid tutorial covering many of the concepts that will help you to understand the linux kernel.

Online reference for <linux/fs.h>- see line 1461 for the definitions of various file operations that can be implemented by a character device.

Searchable Linux syscall table- A good online reference for syscall numbers, which can be used to determine the offset in sys_call_table where a syscall will be located.

Suterusu- an example of a relatively recent LKM rootkit written in C.