lkm: Kernel hacking made easy
The following applies to the Linux i86 2.0.x kernel series. It may also be accurate for previous releases, but has not been tested. 2.1.x kernels introduced a bunch of changes, notably in the memory managment routines, and are not discussed here.
Thanks to Halflife who first got the idea to use lkm for malicious purposes, and tiepilot, my living hero.
User space vs. Kernel space
Linux is a protected operating system. It is implemented over the protected mode of the i386 series of CPUs.
Memory is divided into roughly two parts: kernel space and user space. Kernel space is where the kernel code lives, and user space is where the user programs live. Of course, a given user program can’t write to kernel memory or to another program’s memory area.
Unfortunately, this is also the case for kernel code. Kernel code can’t write to user space either. What does this mean? Well, when a given hardware driver wants to write data bytes to a program in user memory, it can’t do it directly, but rather it must use specific kernel functions instead. Also, when paramaters are passed by address to a kernel function, the kernel function can not read the parameters directly. It must use other kernel functions to read each byte of the parameters.
Here are a few useful functions to use in kernel mode for transferring data bytes to or from user memory.
#include
get_user(ptr)
Gets the given byte, word, or long from user memory. This is a macro, and it relies on the type of the argument to determine the number of bytes to transfer. You then have to use typecasts wisely.
put_user(ptr)
This is the same as get_user(), but instead of reading, it writes data bytes to user memory.
memcpy_fromfs(void *to, const void *from,unsigned long n)
Copies n bytes from *from in user memory to *to in kernel memory.
memcpy_tofs(void *to,const *from,unsigned long n)
Copies n bytes from *from in kernel memory to *to in user memory.
System calls
Most libc calls rely on system calls, which are the simplest kernel functions a user program can call. These system calls are implemented in the kernel itself or in loadable kernel modules, which are little chunks of dynamically linkable kernel code.
Like MS-DOS and many others, Linux system calls are implemented through a multiplexor called with a given maskable interrupt. In Linux, this interrupt is int 0x80. When the ‘int 0x80’ instruction is executed, control is given to the kernel (or, more accurately, to the function _system_call()), and the actual demultiplexing process occurs.
* How does _system_call() work ?
First, all registers are saved and the content of the %eax register is checked against the global system calls table, which enumerates all system calls and their addresses.
This table can be accessed with the extern void *sys_call_table[] variable. A given number and memory address in this table corresponds to each system call. System call numbers can be found in /usr/include/sys/syscall.h. They are of the form SYS_systemcallname. If the system call is not implemented, the corresponding cell in the sys_call_table is 0, and an error is returned. Otherwise, the system call exists and the corresponding entry in the table is the memory address of the system call code.
Here is an example of an invalid system call.
The control is then transferred to the actual system call, which performs whatever you requested and returns. _system_call() then calls _ret_from_sys_call() to check various stuff, and ultimately returns to user memory.
* libc
The int $0x80 isn’t used directly for system calls; rather, libc functions, which are often wrappers to interrupt 0x80, are used.
libc generally features the system calls using the _syscallX() macros, where X is the number of parameters for the system call.
For example, the libc entry for write(2) would be implemented with a _syscall3 macro, since the actual write(2) prototype requires 3 parameters. Before calling interrupt 0x80, the _syscallX macros are supposed to set up the stack frame and the argument list required for the system call. Finally, when the _system_call() (which is triggered with int $0x80) returns, the _syscallX() macro will check for a negative return value (in %eax) and will set errno accordingly.
Let’s check another example with write(2) and see how it gets preprocessed.
Note that the “0”(4) in the write() function above matches the SYS_write definition in /usr/include/sys/syscall.h.
* Making your own system calls.
There are a few ways to make your own system calls.
For example, you could modify the kernel sources and append your own code. A far easier way, however, would be to write a loadable kernel module.
A loadable kernel module is nothing more than an object file containing code that will be dynamically linked into the kernel when it is needed.
The main purposes of this feature are to have a small kernel, and to load a given driver when it is needed with the insmod(1) command.
It’s also easier to write a lkm than to write code in the kernel source tree.
* Writing a lkm
A lkm is easily made in C.
It contains a chunk of #defines, some functions, an initialization function called init_module(), and an unload function called cleanup_module().
Here is a typical lkm source structure.
Also note that as our lkm will be running in kernel mode, we can’t use libc functions, but we can use system calls with the previously discussed _syscallX() macros.
You would compile this module with ‘gcc -c -O3 module.c’ and insert it into the kernel with ‘insmod module.o’ (optimization must be turned on).
As the title suggests, lkm can also be used to modify kernel code without having to rebuild it entirely. For example, you could patch the write(2) system call to hide portions of a given file.
Seems like a good place for backdoors, too: what would you do if you couldn’t trust your own kernel?
* Kernel and system calls backdoors
The main idea behind this is pretty simple. We’ll redirect those damn system calls to our own ones in a lkm, which will enable us to force the kernel to react as we want it to.
For example, we could hide a sniffer by patching the IOCTL system call and masking the PROMISC bit. Lame but efficient.
To modify a given system call, just add the definition of the extern void *sys_call_table[] in your lkm, and have the init_module() function modify the corresponding entry in the sys_call_table to point to your own code. The modified call can then do whatever you wish it to, call the original system call by modifying sys_call_table once more, and …