Lockless Inc

C ABI Hacks

The C standard describes how a C program will work given an initial set of source files to be compiled into an executable. To do this, it has a description of an "abstract machine" in which the program could be thought of running on. The success of the C programming language is due to the fact that the abstract machine description is very close to how modern computers actually work. The result is that individual C statements tend to compile down into very few assembly instructions, allowing a huge amount of control for the programmer.

An interesting fact is that real C programs do not run on abstract machines, they run on real machines under real operating systems. This means that if we go beyond the C standard, we can detect and use this fact. In effect, if we invoke "undefined" or "implementation defined" behaviour the details of the underlying implementation become important. Going beyond the C standard has its drawbacks, the resulting program could legally do anything. For example, early versions of the gcc compiler would try to invoke the rogue computer game if they encountered a #pragma statement during compilation.

Fortunately for us, there exist other standards that describe how machines implement the abstract machine that C programs run on. The most important of these is the Application Binary Interface or ABI. This details how arguments are passed to, and received from functions. It allows libraries and programs compiled with different compilers to talk to each other without problems. Thus if we invoke behaviour undefined or implementation defined by the C standard, but defined by the ABI, the results will still be consistent across all compilers for a given machine architecture. The results will actually be quite portable if the architecture chosen is common.

For this article, we chose the System V ABI for AMD64 machines. This is used by the vast majority of unix-based systems in 64bit mode on commodity hardware. Note that the 64bit Microsoft Windows ABI is similar, so the changes required to get the code working in this article should be relatively small on those machines.

The main detail for the 64bit ABI is that most arguments to functions are passed and returned in registers. This allows a carefully crafted C program to actually read/write to individual asm registers! Normally, that trick would require assembly source code.

Reading and Writing Registers from C

The first set of registers we will access are the integer registers used to pass parameters to a function. The ABI specifies that %rdi, %rsi, %rdx, %rcx, %r8 and %r9 are used in turn for the first six integer or pointer arguments. Thus, if we define a function that reads these integers, we can see what these registers contain. The trick here is that when we call the read functions, we need to call them without parameters so that we don't overwrite the register values we want to obtain. This requires compiling the read functions in a separate compilation unit from whatever uses them. (It also requires turning off link-time optimization to prevent problems.)

The result is a set of functions that look like:


typedef unsigned long long u64b;
u64b read_rdi(u64b p1)
{
	return p1;
}

u64b read_rsi(u64b p1, u64b p2)
{
	return p2;
}

u64b read_rdx(u64b p1, u64b p2, u64b p3)
{
	return p3;
}

u64b read_rcx(u64b p1, u64b p2, u64b p3, u64b p4)
{
	return p4;
}

u64b read_r8(u64b p1, u64b p2, u64b p3, u64b p4, u64b p5)
{
	return p5;
}

u64b read_r9(u64b p1, u64b p2, u64b p3, u64b p4, u64b p5, u64b p6)
{
	return p6;
}

The above use a typedef to decrease the amount of typing in the definitions. They are called with the following function definitions


typedef unsigned long long u64b;

u64b read_rax(void);
u64b read_rdi(void);
u64b read_rsi(void);
u64b read_rdx(void);
u64b read_rcx(void);
u64b read_r8(void);
u64b read_r9(void);

Writing values is a little trickier. Here, we need to make sure we don't alter registers that we don't want changed. To do that, we need to read them all, and then just change the ones we want. Thus we need something like:


void write_int_regs(void)
{
}

Called via


void write_int_regs(u64b rdi, u64b rsi, u64b rdx, u64b rcx, u64b r8, u64b r9);

Another integer register we may modify is %rax. This is designated as the standard return register for integer arguments. Thus to set it, we just need to return something. To read it, we need to read the return value of a function that doesn't actually return anything.


void read_rax(void)
{
}

u64b write_rax(u64b p1)
{
	return p1;
}

Again, the calling definitions are different from the declarations:


u64b read_rax(void);
void write_rax(u64b);

Another register that is useful to read is the stack pointer. Here, we need to think laterally because the obvious trick of returning an offset from a local variable doesn't quite work. The compiler knows that is undefined behaviour, and doesn't generate the code we want. However, it turns out that long double arguments are always passed as function arguments on the stack. Since the address of the first stack argument is the original value of the stack pointer before the function call, we can just return that.


void *read_rsp(long double x)
{
	return &x;
}

The corresponding definition uses a void * instead of a unsigned long long since the stack is more useful as a pointer.


void *read_rsp(void);

Unfortunately, although it is possible to change the value of the stack pointer, it isn't possible to do it cleanly. If the stack pointer is altered in C (say by allocating a variable sized array, or calling a function), the compiler will always generate code to restore the stack once the allocated space is no longer needed. The problem is that the ABI doesn't specify how this restoration procedure is done. Thus overriding it in a portable manner isn't possible.

The remaining alterable register is the instruction pointer, %rip. This can be inspected by looking at the return address of a function on its stack. By changing that address, can get the function to return anywhere you like. This technique is used by some malware to overcome the limitations of a non-executable stack. However, the problem with using this trick in a normal program is that the ABI specifies that several registers should be unchanged across function calls. Using C alone, it is impossible to read or write to these registers in the required way. (Note that they can be saved and restored via the jmp_buf type and the longjmp and siglongjmp functions. However, glibc now encrypts the stored register values for security reasons.)

The next set of registers we can inspect are the sse floating point registers. The first eight of these are used to pass float, double and __m128 arguments to functions. By using the same tricks as we used for integer registers, we may thus read and write to them from C code.


#include <xmmintrin.h>
void read_xmm0(void)
{
}

__m128 read_xmm1(__m128 p0, __m128 p1)
{
	return p1;
}

__m128 read_xmm2(__m128 p0, __m128 p1, __m128 p2)
{
	return p2;
}

__m128 read_xmm3(__m128 p0, __m128 p1, __m128 p2, __m128 p3)
{
	return p3;
}

__m128 read_xmm4(__m128 p0, __m128 p1, __m128 p2, __m128 p3, __m128 p4)
{
	return p4;
}

__m128 read_xmm5(__m128 p0, __m128 p1, __m128 p2, __m128 p3, __m128 p4, __m128 p5)
{
	return p5;
}

__m128 read_xmm6(__m128 p0, __m128 p1, __m128 p2, __m128 p3, __m128 p4, __m128 p5,
 __m128 p6)
{
	return p6;
}

__m128 read_xmm7(__m128 p0, __m128 p1, __m128 p2, __m128 p3, __m128 p4, __m128 p5,
 __m128 p6, __m128 p7)
{
	return p7;
}

void write_sse_regs(void)
{
}

Together with the definitions:


#include <xmmintrin.h>

__m128 read_xmm0(void);
__m128 read_xmm1(void);
__m128 read_xmm2(void);
__m128 read_xmm3(void);
__m128 read_xmm4(void);
__m128 read_xmm5(void);
__m128 read_xmm6(void);
__m128 read_xmm7(void);

void write_sse_regs(__m128 xmm0, __m128 xmm1, __m128 xmm2, __m128 xmm3,
 __m128 xmm4, __m128 xmm5, __m128 xmm6, __m128 xmm7);

Finally, it is also possible to inspect the first two registers on the old X87 floating point stack. These are used to return long double and complex long double arguments from a function. Unfortunately, we can't read the second item on the stack separately, only together with the first. The corresponding code to do this is:


#include <complex.h>

void read_st0(void)
{
}

void read_st01(void)
{
}

long double write_st0(long double p)
{
	return p;
}

complex long double write_st01(complex long double p)
{
	return p;
}

Called via


#include <complex.h>
long double read_st0(void);
complex long double read_st01(void);

void write_st0(long double p);
void write_st01(complex long double p);

The above functions allow us to read and write to many of the registers on a 64bit unix machine directly from C. No assembler source code, or inline assembly is required. However, real implementations do have easy access to assemblers, so the above is mostly for academic interest.

C to C Foreign Function Interface

There is one good use for the code in the previous section though. We can use it to do the impossible, get C to dynamically at run time build the arguments to call arbitrary C functions. The reverse task is relatively easy. The C standard describes the va_args interface which allows a C function to parse any arguments sent to it. The converse problem is typically solved by passing a va_list to a variant of the function you want to call. Thus requiring the existence of "v" functions like vsprintf, vfprintf etc.

The problem is that a va_list version of the function you want to dynamically call may not exist. Unfortunately, the C standard provides no way to surmount this limitation. However, nonstandard approaches exist. The best known is libffi, the library of foreign function interfaces which allows different programming languages to call each other. It, however, uses quite a large amount of assembly code to work its magic. Instead, we will provide a pure C version for the limited case of C calling C on the 64bit SysV ABI.

The first problem we need to tackle is the description of user defined types. To do this, we need some sort of opaque structure to store the information required by the ffi library. A user may then call functions to create and destroy such structures, together with definitions for all the default C types. Such an interface may look like:


/* Pre declare of opaque type structure */
struct type_struct;

/* Standard types */
extern struct type_struct *type_char;
extern struct type_struct *type_short;
extern struct type_struct *type_int;
extern struct type_struct *type_long;
extern struct type_struct *type_longlong;
extern struct type_struct *type_int128;
extern struct type_struct *type_float;
extern struct type_struct *type_double;
extern struct type_struct *type_longdouble;
extern struct type_struct *type_m128;
extern struct type_struct *type_complexfloat;
extern struct type_struct *type_complexdouble;
extern struct type_struct *type_complexlongdouble;
extern struct type_struct *type_ptr;

/* FFI routines */
void free_type_struct(struct type_struct *ts);
struct type_struct *type_struct_create(int num, struct type_struct **ts,
 const int *count, const int *offset);

The opaque type_struct needs to store the information for the number and type of registers used to pass something of the type it describes. It also needs to have the information to describe new compound types containing itself in further calls to type_struct_create. Some code that does this is:


#define C_INTEGER		0
#define C_SSE			1
#define C_X87			2
#define C_COMPLEX_X87	3
#define C_MEMORY		4

struct type_struct
{
	int cnum;
	int class[2];
	int size[2];
	int offset[2];
	size_t struct_size;
};


struct type_struct *type_char =
	&(struct type_struct){1, {C_INTEGER},{1},{0},1};
struct type_struct *type_short =
	&(struct type_struct){1, {C_INTEGER},{2},{0},2};
struct type_struct *type_int =
	&(struct type_struct){1, {C_INTEGER},{4},{0},4};
struct type_struct *type_long =
	&(struct type_struct){1, {C_INTEGER},{8},{0},8};
struct type_struct *type_longlong =
	&(struct type_struct){1, {C_INTEGER},{8},{0},8};
struct type_struct *type_int128 =
	&(struct type_struct){2, {C_INTEGER, C_INTEGER},{8,8},{0,8},16};
struct type_struct *type_float =
	&(struct type_struct){1, {C_SSE},{4},{0},4};
struct type_struct *type_double =
	&(struct type_struct){1, {C_SSE},{8},{0},8};
struct type_struct *type_longdouble =
	&(struct type_struct){1, {C_X87},{16},{0},16};
struct type_struct *type_m128 =
	&(struct type_struct){1, {C_SSE},{16},{0},16};
struct type_struct *type_complexfloat =
	&(struct type_struct){1, {C_SSE},{8},{0},8};
struct type_struct *type_complexdouble =
	&(struct type_struct){2, {C_SSE, C_SSE},{8,8},{0,8},16};
struct type_struct *type_complexlongdouble =
	&(struct type_struct){1, {C_COMPLEX_X87},{32},{0},32};
struct type_struct *type_ptr =
	&(struct type_struct){1, {C_INTEGER},{8},{0},8};

void free_type_struct(struct type_struct *ts)
{
	free(ts);
}

static struct type_struct *init_type_struct(int num)
{
	struct type_struct *ts = malloc(sizeof(struct type_struct));
	
	if (!ts) return NULL;
	ts->cnum = num;
	
	return ts;
}

/* A type that is passed via memory */
static struct type_struct *init_memory_struct(int size)
{
	struct type_struct *nts = init_type_struct(1);
	if (!nts) return NULL;

	/* Round to size in eight-bytes */
	size += 7;
	size &= ~7;

	nts->class[0] = C_MEMORY;
	nts->size[0] = size;
	nts->offset[0] = 0;
	nts->struct_size = size;

	return nts;
}

/* Construct a type_struct pointer for a new data type */
struct type_struct *type_struct_create(int num, struct type_struct **ts,
 const int *count, const int *offset)
{
	struct type_struct *nts;
	
	int i, j;
	
	int misaligned = 0;
	int struct_size = 0;
	
	/* Get size */
	for (i = 0; i < num; i++)
	{
		int size = ts[i]->struct_size * count[i] + offset[i];
		if (size > struct_size) struct_size = size;
		
		/* Is the element missaligned? */
		if (offset[i] & (ts[i]->struct_size - 1)) misaligned = 1;
	}
	
	/* Use memory to pass? */
	if ((struct_size > 16) || misaligned)
	{
		return init_memory_struct(struct_size);
	}
	
	/* Simple case, only one field */
	if ((num == 1) && (count[0] == 1))
	{
		nts = init_type_struct(ts[0]->cnum);
		if (!nts) return NULL;

		/* Just copy from the the original */
		for (i = 0; i < ts[0]->cnum; i++)
		{
			nts->class[i] = ts[0]->class[i];
			nts->size[i] = ts[0]->size[i];
			nts->offset[i] = ts[0]->offset[i];
		}
		
		nts->struct_size = struct_size;
		
		return nts;
	}
	
	/* Only possibilities left are C_INTEGER and C_SSE */
	if (struct_size > 8)
	{
		nts = init_type_struct(2);
		if (!nts) return NULL;
		
		/* Default to passing in SSE register */
		nts->class[0] = C_SSE;
		nts->class[1] = C_SSE;
		
		nts->offset[0] = 0;
		nts->offset[1] = 8;
		
		nts->size[0] = 8;
		nts->size[1] = struct_size - 8;
		nts->struct_size = struct_size;
	}
	else
	{
		nts = init_type_struct(1);
		if (!nts) return NULL;
		/* Default to passing in SSE register */
		
		nts->class[0] = C_SSE;
		nts->offset[0] = 0;
		nts->size[0] = struct_size;
		nts->struct_size = struct_size;
	}
	
	/* Convert to integer register if required */
	for (i = 0; i < num; i++)
	{
		for (j = 0; j < ts[i]->cnum; j++)
		{
			if (ts[i]->class[j] == C_INTEGER)
			{
				nts->class[offset[i] / 8] = C_INTEGER;
			}
		}
	}
	
	return nts;
}

Where the "C_XXX" macros describe the different register types required by the ABI, and this together with the offset and size are stored. Note that since the maximal size of compound objects passed with registers is sixteen bytes, only a total of two registers are ever needed. One final problem is that since the algorithm in the ABI for describing what is passed where is rather complex, the above may still contain bugs. It has been tested with a few simple cases, but unknown problems may unfortunately still exist for more complex types.

The next problem is creating a function that uses such type descriptors to call an arbitrary function returning void. (The lack of a return value simplifies things a bit.) However, before we can do this, we need a few more C-assembly interface functions defined. Basically, since we will be setting many registers simultaneously, it is more efficient to do that with one function call, rather than many. In addition, we need to pay special care with the %rax register. Calling functions that may use va_start requires storing the number of arguments passed in sse registers in %rax. The most obvious way of setting %rax may be used if we don't have further items to pass on the stack. If the stack is also used, we need to get the compiler to set %rax for us in another way. The simplest way for that is to declare variable argument functions with the correct number of sse arguments. Thus the functions we need to write have the following interfaces:


void write_all_regs(u64b rdi, u64b rsi, u64b rdx, u64b rcx, u64b r8, u64b r9,
 __m128 xmm0, __m128 xmm1, __m128 xmm2, __m128 xmm3, __m128 xmm4, __m128 xmm5,
 __m128 xmm6, __m128 xmm7, u64b rax);

void write_all_regs0(u64b rdi, u64b rsi, u64b rdx, u64b rcx, u64b r8, u64b r9,
 ...);
void write_all_regs1(u64b rdi, u64b rsi, u64b rdx, u64b rcx, u64b r8, u64b r9,
 __m128 xmm0, ...);
void write_all_regs2(u64b rdi, u64b rsi, u64b rdx, u64b rcx, u64b r8, u64b r9,
 __m128 xmm0, __m128 xmm1, ...);
void write_all_regs3(u64b rdi, u64b rsi, u64b rdx, u64b rcx, u64b r8, u64b r9,
 __m128 xmm0, __m128 xmm1, __m128 xmm2, ...);
void write_all_regs4(u64b rdi, u64b rsi, u64b rdx, u64b rcx, u64b r8, u64b r9,
 __m128 xmm0, __m128 xmm1, __m128 xmm2, __m128 xmm3, ...);
void write_all_regs5(u64b rdi, u64b rsi, u64b rdx, u64b rcx, u64b r8, u64b r9,
 __m128 xmm0, __m128 xmm1, __m128 xmm2, __m128 xmm3, __m128 xmm4, ...);
void write_all_regs6(u64b rdi, u64b rsi, u64b rdx, u64b rcx, u64b r8, u64b r9,
 __m128 xmm0, __m128 xmm1, __m128 xmm2, __m128 xmm3, __m128 xmm4, __m128 xmm5,
  ...);
void write_all_regs7(u64b rdi, u64b rsi, u64b rdx, u64b rcx, u64b r8, u64b r9,
 __m128 xmm0, __m128 xmm1, __m128 xmm2, __m128 xmm3, __m128 xmm4, __m128 xmm5,
 __m128 xmm6, ...);
void write_all_regs8(u64b rdi, u64b rsi, u64b rdx, u64b rcx, u64b r8, u64b r9,
 __m128 xmm0, __m128 xmm1, __m128 xmm2, __m128 xmm3, __m128 xmm4, __m128 xmm5,
 __m128 xmm6, __m128 xmm7, ...);

Such functions have the trivial implementations:


u64b write_all_regs(u64b rdi, u64b rsi, u64b rdx, u64b rcx, u64b r8, u64b r9,
 __m128 xmm0, __m128 xmm1, __m128 xmm2, __m128 xmm3, __m128 xmm4, __m128 xmm5,
 __m128 xmm6, __m128 xmm7, u64b rax)
{
	return rax;
}

void write_all_regs0(void)
{
}

void write_all_regs1(void)
{
}

void write_all_regs2(void)
{
}

void write_all_regs3(void)
{
}

void write_all_regs4(void)
{
}

void write_all_regs5(void)
{
}

void write_all_regs6(void)
{
}

void write_all_regs7(void)
{
}

void write_all_regs8(void)
{
}

Using the above, it is possible to now construct a function to call any function returning void. We first scan the arguments to see how many registers are required of each type, and how much stack space is required for any remaining arguments. We then allocate the required amount of stack space, and store the stack arguments at the correct offsets mandated by the ABI. Note that since we don't have direct access to the stack pointer in C, we need to use a hack via the alloca() function to make sure enough room exists. Finally, we use the C-asm hacks to store the arguments in the correct registers and call the function.


void call_CABIfuncvoid(void (*func)(void), int argc, void **args, struct type_struct **argt)
{
	unsigned long long int_arg[6];
	__m128 sse_arg[8];
	
	int numint = 0;
	int numsse = 0;
	
	int i, j;
	
	char *p;
	
	size_t stack_size = 0;
	char *stack;
	
	/* Easy case - no arguments at all */
	if (!argc)
	{
		func();
		return;
	}
	
	/* Scan the arguments to initialize register state */
	for (i = 0; i < argc; i++)
	{
		for (j = 0; j < argt[i]->cnum; j++)
		{
			switch (argt[i]->class[j])
			{
				case C_X87:
				case C_COMPLEX_X87:
				case C_MEMORY:
				{
					stack_size += argt[i]->size[j];
					break;
				}
			
				case C_INTEGER:
				{
					if (numint < 6)
					{
						/* Pointer to type */
						p = (char *) args[i] + argt[i]->offset[j];
						
						/* Save for later */
						memcpy(&int_arg[numint], p, argt[i]->size[j]);
					
						numint++;
					}
					else
					{
						/* Always size 8 on the stack */
						stack_size += 8;
					}
					break;
				}
			
				case C_SSE:
				{
					if (numsse < 8)
					{
						/* Pointer to type */
						p = (char *) args[i] + argt[i]->offset[j];
						
						/* Save for later */
						memcpy(&sse_arg[numsse], p, argt[i]->size[j]);
						
						numsse++;
					}
					else
					{
						stack_size += argt[i]->size[j];
					}
					break;
				}
			}
		}
	}
	
	/* Simple case, everything in registers */
	if (!stack_size)
	{
		/*
		 * See up registers to call function.
		 * Note that numsse is passed on the stack which is why we
		 * can't use this method below.
		 */
		write_all_regs(int_arg[0], int_arg[1], int_arg[2], int_arg[3], int_arg[4],
		int_arg[5], sse_arg[0], sse_arg[1], sse_arg[2], sse_arg[3], sse_arg[4],
		sse_arg[5], sse_arg[6], sse_arg[7], numsse);
		
		/* Call it */
		func();
		
		return;
	}
	
	/* Create stack space */
	alloca(stack_size);
	
	/* Note the above may add padding, so get real value of stack pointer */
	stack = read_rsp();
	
	/* Scan again, and fill in stack */
	numint = 0;
	numsse = 0;
	stack_size = 0;
	for (i = 0; i < argc; i++)
	{
		for (j = 0; j < argt[i]->cnum; j++)
		{
			switch (argt[i]->class[j])
			{
				case C_X87:
				case C_COMPLEX_X87:
				case C_MEMORY:
				{
					/* Pointer to type */
					p = (char *) args[i] + argt[i]->offset[j];
						
					/* Save for later */
					memcpy(&stack[stack_size], p, argt[i]->size[j]);
					
					stack_size += argt[i]->size[j];
					break;
				}
			
				case C_INTEGER:
				{
					if (numint < 6)
					{
						numint++;
					}
					else
					{
						/* Pointer to type */
						p = (char *) args[i] + argt[i]->offset[j];
						
						/* Save for later */
						memcpy(&stack[stack_size], p, argt[i]->size[j]);
						
						/* Always size 8 on the stack */
						stack_size += 8;
					}
					break;
				}
			
				case C_SSE:
				{
					if (numsse < 8)
					{
						numsse++;
					}
					else
					{
						/* Pointer to type */
						p = (char *) args[i] + argt[i]->offset[j];
						
						/* Save for later */
						memcpy(&stack[stack_size], p, argt[i]->size[j]);
						
						if (argt[i]->size[j] == 16)
						{
							stack_size += 16;
						}
						else
						{
							/* Always at least size 8 on the stack */
							stack_size += 8;
						}
					}
					break;
				}
			}
		}
	}
	
	/* See up registers to call function */
	switch (numsse)
	{
		case 0:
		{
			write_all_regs0(int_arg[0], int_arg[1], int_arg[2], int_arg[3],
			 int_arg[4], int_arg[5]);
			break;
		}
		
		case 1:
		{
			write_all_regs1(int_arg[0], int_arg[1], int_arg[2], int_arg[3],
			 int_arg[4], int_arg[5], sse_arg[0]);
			break;
		}
		
		case 2:
		{
			write_all_regs2(int_arg[0], int_arg[1], int_arg[2], int_arg[3],
			 int_arg[4], int_arg[5], sse_arg[0], sse_arg[1]);
			break;
		}
	
		case 3:
		{
			write_all_regs3(int_arg[0], int_arg[1], int_arg[2], int_arg[3],
			 int_arg[4], int_arg[5], sse_arg[0], sse_arg[1], sse_arg[2]);
			break;
		}
		
		case 4:
		{
			write_all_regs4(int_arg[0], int_arg[1], int_arg[2], int_arg[3],
			 int_arg[4], int_arg[5], sse_arg[0], sse_arg[1], sse_arg[2],
			 sse_arg[3]);
			break;
		}
		
		case 5:
		{
			write_all_regs5(int_arg[0], int_arg[1], int_arg[2], int_arg[3],
			 int_arg[4], int_arg[5], sse_arg[0], sse_arg[1], sse_arg[2],
			 sse_arg[3], sse_arg[4]);
			break;
		}
		
		case 6:
		{
			write_all_regs6(int_arg[0], int_arg[1], int_arg[2], int_arg[3],
			 int_arg[4], int_arg[5], sse_arg[0], sse_arg[1], sse_arg[2],
			 sse_arg[3], sse_arg[4], sse_arg[5]);
			break;
		}
		
		case 7:
		{
			write_all_regs7(int_arg[0], int_arg[1], int_arg[2], int_arg[3],
			 int_arg[4], int_arg[5], sse_arg[0], sse_arg[1], sse_arg[2],
			 sse_arg[3], sse_arg[4], sse_arg[5], sse_arg[6]);
			break;
		}
		
		case 8:
		{
			write_all_regs8(int_arg[0], int_arg[1], int_arg[2], int_arg[3],
			 int_arg[4], int_arg[5], sse_arg[0], sse_arg[1], sse_arg[2],
			 sse_arg[3], sse_arg[4], sse_arg[5], sse_arg[6], sse_arg[7]);
			break;
		}
	}
		
	/* Call it */
	func();
	
	/*
	 * Enforce existence of stack after the function call,
	 * and thus during it as well.
	 */
	((volatile char *)stack)[0] = 1;
}

Handling a function that returns a value is slightly more complex. For example, if the return value is larger than sixteen bytes, a hidden first parameter is passed that is a pointer to a location to store it. The rest of the logic is the same as the above provided that the argument list is altered to reflect his new pointer.


/* The argument is returned via pointer in rdi */
static void call_CABIfunc_sreturn(void (*func)(void), void *retval, int argc,
 void **args, struct type_struct **argt)
{
	int i;
	void *newargs[argc + 1];
	struct type_struct *newargt[argc + 1];
		
	for (i = 0; i < argc; i++)
	{
		newargs[i + 1] = args[i];
		newargt[i + 1] = argt[i];
	}
		
	/* The first argument is the struct return pointer */
	newargs[0] = retval;
	newargt[i] = type_ptr;
	
	call_CABIfuncvoid(func, argc + 1, newargs, newargt);
}

Other cases may be enumerated, where we need to handle each possible case of integer, SSE and Floating point Stack register pair as a return value. By using memcpy() we can copy into the objects pointed to by the return argument pointer. (A simple store will not work because i.e. a char return will appear in %rax, which is eight times as large. We don't want to stomp on other data with a write that is too large.)


/* Same as call_CABIfuncvoid(), but need to handle return parameter */
void call_CABIfunc(void (*func)(void), void *retval,
 struct type_struct *rettype, int argc, void **args, struct type_struct **argt)
{
	int numint = 0;
	int numsse = 0;
	int numst = 0;
	
	int i;
	
	/* Too big to return in a pair of registers? */
	if (rettype->struct_size > 16)
	{
		call_CABIfunc_sreturn(func, retval, argc, args, argt);
		return;
	}
	
	for (i = 0; i < rettype->cnum; i++)
	{
		switch (rettype->class[i])
		{
			case C_MEMORY:
			{
				/* Need to pass a hidden return pointer to the memory */
				call_CABIfunc_sreturn(func, retval, argc, args, argt);
				return;
			}

			case C_X87:
			case C_COMPLEX_X87:
			{
				numst++;
			}

			case C_INTEGER:
			{
				numint++;
				break;
			}

			case C_SSE:
			{
				numsse++;
				break;
			}
		}
	}
	
	/* Complex return */
	if (numst == 2)
	{
		complex long double out;
		call_CABIfuncvoid(func, argc, args, argt);
		out = read_st01();

		memcpy(retval, &out, 16);
		return;
	}

	/* int128 return */
	if (numint == 2)
	{
		__uint128_t out;
		call_CABIfuncvoid(func, argc, args, argt);
		out = read_rax_rdx();
		memcpy(retval, &out, rettype->struct_size);
		return;
	}

	/* 2xSSE return */
	if (numsse == 2)
	{
		__m128 out[2];
		call_CABIfuncvoid(func, argc, args, argt);
		out[0] = read_xmm0();
		out[1] = read_xmm1();

		memcpy(retval, out, rettype->struct_size);
		return;
	}

	if (numst == 1)
	{
		if (numint == 1)
		{
			/* long double and integer register returned */
			unsigned long long out1;
			long double out2;
			call_CABIfuncvoid(func, argc, args, argt);
			out1 = read_rax();
			out2 = read_st0();

			if (rettype->class[0] == C_INTEGER)
			{
				memcpy(retval, &out1, rettype->size[0]);
				memcpy((char *)retval + rettype->offset[1], &out2,
				 rettype->size[1]);
			}
			else
			{
				memcpy(retval, &out2, rettype->size[0]);
				memcpy((char *)retval + rettype->offset[1], &out1,
				 rettype->size[1]);
			}

			return;
		}

		if (numsse == 1)
		{
			/* long double and sse register returned */
			__m128 out1;
			long double out2;
			call_CABIfuncvoid(func, argc, args, argt);
			out1 = read_xmm0();
			out2 = read_st0();

			if (rettype->class[0] == C_SSE)
			{
				memcpy(retval, &out1, rettype->size[0]);
				memcpy((char *)retval + rettype->offset[1], &out2,
				 rettype->size[1]);
			}
			else
			{
				memcpy(retval, &out2, rettype->size[0]);
				memcpy((char *)retval + rettype->offset[1], &out1,
				 rettype->size[1]);
			}

			return;
		}
		else
		{
			/* Just one long double returned */
			long double out;
			call_CABIfuncvoid(func, argc, args, argt);
			out = read_st0();

			memcpy(retval, &out, 8);
			return;
		}
	}

	if (numsse == 1)
	{
		if (numint == 1)
		{
			/* sse and integer registers returned */
			unsigned long long out1;
			__m128 out2;

			call_CABIfuncvoid(func, argc, args, argt);
			out1 = read_rax();
			out2 = read_xmm0();

			if (rettype->class[0] == C_INTEGER)
			{
				memcpy(retval, &out1, rettype->size[0]);
				memcpy((char *)retval + rettype->offset[1], &out2,
				 rettype->size[1]);
			}
			else
			{
				memcpy(retval, &out2, rettype->size[0]);
				memcpy((char *)retval + rettype->offset[1], &out1,
				 rettype->size[1]);
			}
		}
		else
		{
			/* Just one sse register returned */
			__m128 out;
			call_CABIfuncvoid(func, argc, args, argt);
			out = read_xmm0();

			memcpy(retval, &out, rettype->struct_size);
		}
	}
	else
	{
		/* Just an integer register */
		unsigned long long out;
		call_CABIfuncvoid(func, argc, args, argt);
		out = read_rax();

		memcpy(retval, &out, rettype->struct_size);
	}
}

Conclusion

The result is a FFI library written purely in C. We've used undefined and implementation defined behaviour to set the required registers to get argument passing to work. Such an implementation isn't as efficient as the assembly based version in libffi, but shows how powerful the C programming language actually is. The fact that it compiles down to a predictable set of assembly instructions allow us to use it in unintended ways.

Finally, here is some example code using the above C to C ffi library. (A .tar.gz of the library code is in the downloads directory.)


#include <stdio.h>
#include <stddef.h>
#include "ffi.h"


long double plus(long double x, long double y)
{
	printf("x is %Lf\n", x);
	printf("y is %Lf\n", y);
	return x + y;
}

struct stest
{
	double x;
	int y;
	int z;
};

int show_struct(struct stest st)
{
	printf("Struct passed as %f, %d, %d\n", st.x, st.y, st.z);
	
	return st.y + st.z;
}


int main(void)
{
	long double x, y, z;
	
	int out;
	int counts[] = {1,1,1};
	int offsets[] = {0, offsetof(struct stest, y), offsetof(struct stest, z)};
	
	
	struct stest st = {8.0, -3, 7};
	
	void *args1[] = {&x, &y};
	void *args2[] = {&st};
	
	struct type_struct *ts1[] = {type_longdouble, type_longdouble};
	struct type_struct *ts2[] = {type_double, type_int, type_int};
	struct type_struct *ts3;
	
	x = 2;
	y = 7;
	
	call_CABIfunc((void (*)(void)) plus, &z, type_longdouble, 2, args1, ts1);
	
	printf("Got %Lf\n", z);
	
	ts3 = type_struct_create(3, ts2, counts, offsets);
	
	call_CABIfunc((void (*)(void)) show_struct, &out, type_int, 1, args2, &ts3);
	
	printf("Returned %d\n", out);
	
	free_type_struct(ts3);
	
	return 0;
}

Comments

23e

Enter the 10 characters above here


Name
0
About Us Returns Policy Privacy Policy Send us Feedback
Company Info | Product Index | Category Index | Help | Terms of Use
Copyright © Lockless Inc All Rights Reserved.