A short guide to porting musl to different processor architectures. Copyright (C) 2011 Nicholas J. Kain Prerequsites ============ - Linux kernel source code. Essential for getting things like system call numbers, kernel data structures, and also to understand what the kernel really does and expects. - Platform ABI document. It's critical to know the calling conventions for both standard function calls and system calls on the target platform. Pay close attention to register clobbers for system calls. - Knowledge of assembly language on the new target platform; it's also helpful to know asm for at least one existing platform to use as reference. - Processor reference guide. Useful for writing correct assembly language, and as an authoritative reference. Particularly important for writing locking primitives. - Access to a machine (virtual or otherwise) that can run compiled code for the target platform. Testing is critical to produce a correct port. Suggested order of operation ============================ The very first task is to get executables to run and to make printf() work properly. This is not a trivial task, as printf() actually has lots of dependencies. 1. arch/$PLAT/bits/*.h and arch/$PLAT/bits/alltypes.h.sh need to be ported to the target platform. This task will require examination of the architecture-specific Linux kernel source code. I strongly suggest breaking the modifications here out into per-file patches. No assumptions should be made here. If you are unsure of the type of something in alltypes.h.sh, make a note of that, and you will probably find that converting other files in arch/$PLAT/bits will make the proper value of that type clear. 2. crt.s needs to be written for the target platform. This file produces a crt.o object that is statically linked to executables to allow them to properly start the libc initialization function that in turn calls the program's main() function. Its only job is to provide a _start() function that performs this task and conforms to the target platform's ABI. Examination of src/env/__libc_start_main.c:__libc_start_main() and the target platform ABI reference should give a good idea of what this function should do. This file should ideally be licensed under the public domain, as it is statically linked into user executables. 3. System calls need to be made functional. This task will require porting arch/$PLAT/syscall.h. The target platform ABI document will be critical here, and the Linux kernel source code can also be helpful in some instances. The most obvious change that must be made is in the system call numbers, which all vary per-platform. Additionally, there will be some system calls that exist on a given platform, but not on another. These differences must be abstracted away in a minimally invasive manner. The system call numbers can be found in the Linux kernel source code, and the differences in behavior are also best found by reading kernel source code. It is strongly preferable to use gcc inline __asm__ rather than split out .s files for system call stubs as it will allow gcc to better optimize the system call stubs (quite possibly eliminating a function call). It will also eliminate the need to save and restore register values that the kernel may clobber across system calls that the userspace function ABI requires to be preserved. Make absolutely sure that the register constraints are correct. Missing clobbers will create subtle bugs that will make programs fail in mysterious ways, which will make it hard to get pthreads working later on! 4. C variable length arguments need to work. You will need to refer to the target platform ABI document to see what will be required. For some platforms, varargs is rather simple. For others, it is painfully complex. In the latter case, it may not be a bad idea to use the gcc __builtin_va_* functions. Use of these functions will introduce a build dependency on gcc (or other compilers that also implement this gcc extension), but may make porting easier with little practical cost. After all, the Linux kernel itself is dependent on a gcc-compatible compiler to build. At this point, printf() should work, which will make it possible to run tests and more easily see what a program is doing without use of strace, gdb, or examination of disassembly. 5. Signal handling needs to be ported. Most platforms have a function that is called after the kernel has delivered a signal that is intended to perform any tasks that may need to be done before control is returned to the userspace program that received the signal. This function runs on what the kernel calls the signal stack, which exists in the task state of the userspace program itself. It is important that this function has left no changes to the stack state when it makes the syscall (probably __NR_rt_sigreturn) that instructs the kernel to return control to the normal execution of the userspace program. Failure to meet this condition will result in mysterious problems, most likely SEGVs. These functions must be written in assembly language to ensure that no trace of the execution of these functions is left on the stack. Thankfully, at least on x86 and x86_64, the actual implemention of these routines is trivial, doing little more than calling __NR_rt_sigreturn. 6. setjmp(), longjmp(), and sigsetjmp() need to be ported. These functions must be written in assembly language, as they require direct manipulation of the program stack. Obviously, knowledge of the platform ABI is important here. The jmp_buf structure will need to be sized to contain all of the information that is required for longjmp() to reconstruct the snapshotted stack and register state. sigsetjmp() can most likely be implemented as a wrapper for setjmp(). i386 and x86_64 both successfully use this approach. 7. src/math should not need porting work; it's sufficient to just use the existing C functions with no assembly language. They should work just fine for any sane platform where C double and float are 64-bit and 32-bit IEEE 754 values. Now most things except for pthreads should be functional. pthreads varies more between platforms and may be somewhat more difficult, but I will outline the method that I used to port from i386 to x86_64. 8. Locking primitives must be ported to the target architecture, which must obviously be done in assembly. These live in src/internal/atomic.h. You will need to refer to the processor reference manual on the target platform to be certain that your primitives are correct. The Linux kernel source also has many locking primitives that may be useful for ideas, although it is unlikely that they will be usable without modification. 9. The thread-local storage must be visible to the userspace program in a way that complies with the target platform ABI. Variation is great here between platforms. The best way to learn what needs to be done is to read the target platform ABI document and the Linux kernel source code. As an example, on the i386, the TLS structure is pointed to by the %gs register, which is set by a simple mov instruction. On x86_64, the TLS structure is pointed to by the %fs register, which must be set using the __NR_arch_prctl system call. 10. __uniclone() and __unmapself() must be ported to the target architecture. These functions must be written in assembly language. __uniclone() does the actual work of creating the new thread and starting the main function of that new thread. It also supplies the correct return value of the __NR_clone system call to the caller in the parent task. __unmapself() is used when a thread is cancelled. It must cause the thread to unmap its stack and exit() the thread. Signals are blocked during this process to protect against a possible race where a signal could be received after the stack is unmapped but before the thread calls exit(). It is best to reference src/thread/pthread_create.c to understand what these functions must perform in proper context. It is also useful to look at the implementations for existing architectures. At this point, the port should either be complete or mostly so. i386 and x86_64 are similar architectures, and this document was written after I ported musl from i386 to x86_64, so it is perhaps possible that there will be other things that need to be done that this document does not mention, or that I have simply assumed would be done in the process of handling these main steps. After all, this document is merely a general guide. Debugging ========= A large chunk of the work will be debugging. It is rather hard to debug programs that don't even have the basic functionality expected of any C program, or that can only properly perform a subset of POSIX functionality. strace is extremely useful for performing ports. Critically, it can show the named arguments that the kernel is being sent for system calls, which can be the easiest way to see if argument order for a system call varies between platforms. It can also show the output of multiple threads with the use of the -f flag. gdb is also handy, both for showing where the failure occurs via bt, disassemble, and info regs, but also for disassembling working code to see how it may function as an example. objdump can perform disassembly on executable objects, which is useful. musl's libc-testsuite and libc-bench are very useful for confirming that a ported feature mostly works. It's also useful to write your own testcases.