Lockless Inc

Pthreads on Microsoft Windows

An extremely common API used for developing parallel programs is the Posix Threads API (pthreads). The API contains many synchronization primitives that allow threaded code to be efficiently written. Unfortunately, Microsoft Windows does not support this interface as-is. Thus if one wishes to port over an application, quite a bit of work may need to be done.

Fortunately, a pthreads library for windows has been written, thus simplifying the porting effort. However, the windows API has progressed significantly with regards to threading since that library was written. Many new functions have been exported that simplify the creation of a pthread library on windows. In fact nearly all of the synchronization primitives now exist, and using them may only require the creation of a few simple macros.

Thus it seems to be time to explore the creation of a new pthreads library for Microsoft windows. To make its use as simple as possible, we will require its entire implementation to be confined to a single header .h file. Thus requiring no explicit library to be linked into an application or dll. This trick can be done if all its global variables are implicitly defined to be zero, and all functions are static.

Finally, we note that whilst many of the synchronization primitives required by the pthreads API are exported by Microsoft windows, not all are. Thus we will need to explore some of the undocumented internals of windows. This is obviously dangerous, as Microsoft may change these undocumented features at any time in the future. However, as a educational learning exercise, we can ignore this unpalatable fact and see exactly how much we can get away with.

Critical Sections for Mutexes

The first part we shall implement are the functions for pthread_mutex_t. This can be done by using a CRITICAL_SECTION object and a typedef. This may not be the most efficient mutex, but it is extremely portable on Microsoft windows. It allows you to use the resulting pthread API on any mutex, even those defined in other libraries. Since the pthread API extends the windows one, this is rather nice.

Most of the mutex functions are simple wrappers around the windows counterparts:


typedef CRITICAL_SECTION pthread_mutex_t;
static int pthread_mutex_lock(pthread_mutex_t *m)
{
	EnterCriticalSection(m);
	return 0;
}

static int pthread_mutex_unlock(pthread_mutex_t *m)
{
	LeaveCriticalSection(m);
	return 0;
}
	
static int pthread_mutex_trylock(pthread_mutex_t *m)
{
	return TryEnterCriticalSection(m) ? 0 : EBUSY; 
}

static int pthread_mutex_init(pthread_mutex_t *m, pthread_mutexattr_t *a)
{
	(void) a;
	InitializeCriticalSection(m);
	
	return 0;
}

static int pthread_mutex_destroy(pthread_mutex_t *m)
{
	DeleteCriticalSection(m);
	return 0;
}

The pthreads API has an initialization macro that has no correspondence to anything in the windows API. By investigating the internal definition of the critical section type, one may work out how to initialize one without calling InitializeCriticalSection(). The trick here is that InitializeCriticalSection() is not allowed to fail. It tries to allocate a critical section debug object, but if no memory is available, it sets the pointer to a specific value. (One would expect that value to be NULL, but it is actually (void *)-1 for some reason.) Thus we can use this special value for that pointer, and the critical section code will work.

The other important part of the critical section type to initialize is the number of waiters. This controls whether or not the mutex is locked. Fortunately, this part of the critical section is unlikely to change. Apparently, many programs already test critical sections to see if they are locked using this value, so Microsoft felt that it was necessary to keep it set at -1 for an unlocked critical section, even when they changed the underlying algorithm to be more scalable. The final parts of the critical section object are unimportant, and can be set to zero for their defaults. This yields an initialization macro:


#define PTHREAD_MUTEX_INITIALIZER {(void*)-1,-1,0,0,0,0}

The next part of the pthread_mutex_t API is the pthread_mutex_timedlock() function. This also has no correspondence to something exported by windows. A second problem is that the Posix function expects a struct timespec object. Unfortunately, no such type is defined in windows headers, so we'll need to define it, and some helper functions that use it.


struct timespec
{
	/* long long in windows is the same as long in unix for 64bit */
	long long tv_sec;
	long long tv_nsec;
};

static unsigned long long _pthread_time_in_ms(void)
{
	struct __timeb64 tb;
	
	_ftime64(&tb);
	
	return tb.time * 1000 + tb.millitm;
}

static unsigned long long _pthread_time_in_ms_from_timespec(const struct timespec *ts)
{
	unsigned long long t = ts->tv_sec * 1000;
	t += ts->tv_nsec / 1000000;

	return t;
}

static unsigned long long _pthread_rel_time_in_ms(const struct timespec *ts)
{
	unsigned long long t1 = _pthread_time_in_ms_from_timespec(ts);
	unsigned long long t2 = _pthread_time_in_ms();
	
	/* Prevent underflow */
	if (t1 < t2) return 1;
	return t1 - t2;
}

Using the above helper functions, we can implement pthread_mutex_timedlock using WaitForSingleObject() and pthread_mutex_trylock()


#define ETIMEDOUT	110
static int pthread_mutex_timedlock(pthread_mutex_t *m, struct timespec *ts)
{
	unsigned long long t, ct;
	
	struct _pthread_crit_t
	{
		void *debug;
		LONG count;
		LONG r_count;
		HANDLE owner;
		HANDLE sem;
		ULONG_PTR spin;
	};
	
	/* Try to lock it without waiting */
	if (!pthread_mutex_trylock(m)) return 0;
	
	ct = _pthread_time_in_ms();
	t = _pthread_time_in_ms_from_timespec(ts);
	
	while (1)
	{
		/* Have we waited long enough? */
		if (ct > t) return ETIMEDOUT;
		
		/* Wait on semaphore within critical section */
		WaitForSingleObject(((struct _pthread_crit_t *)m)->sem, t - ct);
		
		/* Try to grab lock */
		if (!pthread_mutex_trylock(m)) return 0;
		
		/* Get current time */
		ct = _pthread_time_in_ms();
	}
}

This almost completes the mutex API. However, to be fully complete we need to support the pthread_mutexattr_t functions. These affect the type of mutex created by pthread_mutex_init(). However, since Microsoft windows only supports one type of critical section, we will ignore most of this functionality. Instead, these simple wrapper functions will just record the state so that they give consistent results.


#define PTHREAD_MUTEX_NORMAL 0
#define PTHREAD_MUTEX_ERRORCHECK 1
#define PTHREAD_MUTEX_RECURSIVE 2
#define PTHREAD_MUTEX_DEFAULT 3
#define PTHREAD_MUTEX_SHARED 4
#define PTHREAD_MUTEX_PRIVATE 0

#define ENOTSUP		134

#define pthread_mutex_getprioceiling(M, P) ENOTSUP
#define pthread_mutex_setprioceiling(M, P) ENOTSUP

static int pthread_mutexattr_init(pthread_mutexattr_t *a)
{
	*a = 0;
	return 0;
}

static int pthread_mutexattr_destroy(pthread_mutexattr_t *a)
{
	(void) a;
	return 0;
}

static int pthread_mutexattr_gettype(pthread_mutexattr_t *a, int *type)
{
	*type = *a & 3;

	return 0;
}

static int pthread_mutexattr_settype(pthread_mutexattr_t *a, int type)
{
	if ((unsigned) type > 3) return EINVAL;
	*a &= ~3;
	*a |= type;
	
	return 0;
}

static int pthread_mutexattr_getpshared(pthread_mutexattr_t *a, int *type)
{
	*type = *a & 4;
	
	return 0;
}

static int pthread_mutexattr_setpshared(pthread_mutexattr_t * a, int type)
{
	if ((type & 4) != type) return EINVAL;
	
	*a &= ~4;
	*a |= type;
	
	return 0;
}

static int pthread_mutexattr_getprotocol(pthread_mutexattr_t *a, int *type)
{
	*type = *a & (8 + 16);
	
	return 0;
}

static int pthread_mutexattr_setprotocol(pthread_mutexattr_t *a, int type)
{
	if ((type & (8 + 16)) != 8 + 16) return EINVAL;
	
	*a &= ~(8 + 16);
	*a |= type;
	
	return 0;
}

static int pthread_mutexattr_getprioceiling(pthread_mutexattr_t *a, int * prio)
{
	*prio = *a / PTHREAD_PRIO_MULT;
	return 0;
}

static int pthread_mutexattr_setprioceiling(pthread_mutexattr_t *a, int prio)
{
	*a &= (PTHREAD_PRIO_MULT - 1);
	*a += prio * PTHREAD_PRIO_MULT;
	
	return 0;
}

Slim Read Write Locks for rwlocks

Since the earlier pthreads implementation on windows, Microsoft has added Slim Read Write locks (SRWlocks). These allow a simple implementation of much of the pthread_rwlock_t API. However, again the Microsoft API is a subset of the Posix API, so we will again need to explore the undocumented internals to construct the missing functionality.

Using wrappers for the functions that already exist, we can implement:


typedef SRWLOCK pthread_rwlock_t;
static int pthread_rwlock_init(pthread_rwlock_t *l, pthread_rwlockattr_t *a)
{
	(void) a;
	InitializeSRWLock(l);
	
	return 0;
}

static int pthread_rwlock_destroy(pthread_rwlock_t *l)
{
	(void) *l;
	return 0;
}

static int pthread_rwlock_rdlock(pthread_rwlock_t *l)
{
	pthread_testcancel();
	AcquireSRWLockShared(l);
	
	return 0;
}

static int pthread_rwlock_wrlock(pthread_rwlock_t *l)
{
	pthread_testcancel();
	AcquireSRWLockExclusive(l);
	
	return 0;
}

Where we have added explicit calls to pthread_testcancel() mandated by the Posix API. Next, we need to work out an initialization macro, together with a way of implementing the trylock and unlock functionality. The problem here is that the Posix API requires a single unlock function, whereas the Microsoft windows API has separate unlock functions for read and write locks. We'll need to understand the lock internal state to work out whether we are a read or write lock in order to work out which unlock function to call.

By looking at the Microsoft documentation, we notice that the implementation of a SRWLock is a single pointer sized object. The InitializeSRWLock() function simply sets this pointer to zero. Thus, an initialization macro may be written as:


#define PTHREAD_RWLOCK_INITIALIZER {0}

Next, we can construct simple programs to see how the state of this pointer-sized object changes as we read and write lock and unlock it. The first thing that is noticeable is that a SRWLock that is owned exclusively has the value 1, and that if multiple shared owners have taken the lock, then it has value 1+16n, where n is the number of shared owners. If there is contention, then the lock has the value of a pointer with the low bit set.

Thus, if the low bit is set, then the lock is owned by someone. If any other of the bottom three bits are set, then the lock has some internal state. Using this, we can construct the trylock functions which are not implemented by Microsoft.


static int pthread_rwlock_tryrdlock(pthread_rwlock_t *l)
{
	/* Get the current state of the lock */
	void *state = *(void **) l;
	
	if (!state)
	{
		/* Unlocked to locked */
		if (!_InterlockedCompareExchangePointer((void *) l, (void *)0x11, NULL)) return 0;
		return EBUSY;
	}
	
	/* A single writer exists */
	if (state == (void *) 1) return EBUSY;
	
	/* Multiple writers exist? */
	if ((uintptr_t) state & 14) return EBUSY;
	
	if (_InterlockedCompareExchangePointer((void *) l, (void *) ((uintptr_t)state + 16), state) == state) return 0;
	
	return EBUSY;
}

static int pthread_rwlock_trywrlock(pthread_rwlock_t *l)
{
	/* Try to grab lock if it has no users */
	if (!_InterlockedCompareExchangePointer((void *) l, (void *)1, NULL)) return 0;
	
	return EBUSY;
}

Next we need to implement pthread_rwlock_unlock. This is a little tricky. Unfortunately, it doesn't seem that there is an easy way to determine if the lock is owned by a reader or a writer. We have some known special cases. If the lock value is equal to 1, then it is owned by a single writer - so we can write unlock it. If the lock value is equal to 1 + 16n, with n>1 then we must be a shared reader trying to unlock it. If the lock is contended, and there is a list of threads waiting for it, then we need to know who to wake up.

Unfortunately, since the internals of a SRWLock are undocumented we are a little stuck. By exploring the implementation in asm we can see what is going on. Basically, the unlock routines need to check if they are the simple cases described above. If they aren't, and there are waiters, then more complex code is called to wake the required waiter. Fortunately, it seems that the code to wake the next waiter is extremely similar between the shared and exclusive cases. The shared (reader) case appears to be more generic, so we will use that. Testing the resulting function seems to pass. However, this is a little bit of a hack. Unlocking a contended exclusive lock with the shared unlock function may not work in the future. Ignoring that for now, we have:


static int pthread_rwlock_unlock(pthread_rwlock_t *l)
{
	void *state = *(void **)l;
	
	if (state == (void *) 1)
	{
		/* Known to be an exclusive lock */
		ReleaseSRWLockExclusive(l);
	}
	else
	{
		/* A shared unlock will work */
		ReleaseSRWLockShared(l);
	}
	
	return 0;
}

Finally, we need to implement the timedlock functionality. Again, Microsoft windows doesn't have any timeout functions we can use. Even worse, the underlying wait is in an undocumented function NtWaitForKeyedEvent(), so we probably can't do the trick we did with critical sections. Instead, we will implement a busy wait. This isn't optimal, but until Microsoft describes its new interfaces it is free to alter them at will, so depending on them is extremely risky.


static int pthread_rwlock_timedrdlock(pthread_rwlock_t *l, const struct timespec *ts)
{
	unsigned long long ct = _pthread_time_in_ms();
	unsigned long long t = _pthread_time_in_ms_from_timespec(ts);

	pthread_testcancel();
	
	/* Use a busy-loop */
	while (1)
	{
		/* Try to grab lock */
		if (!pthread_rwlock_tryrdlock(l)) return 0;
		
		/* Get current time */
		ct = _pthread_time_in_ms();
		
		/* Have we waited long enough? */
		if (ct > t) return ETIMEDOUT;
	}
}

static int pthread_rwlock_timedwrlock(pthread_rwlock_t *l, const struct timespec *ts)
{
	unsigned long long ct = _pthread_time_in_ms();
	unsigned long long t = _pthread_time_in_ms_from_timespec(ts);

	pthread_testcancel();
	
	/* Use a busy-loop */
	while (1)
	{
		/* Try to grab lock */
		if (!pthread_rwlock_trywrlock(l)) return 0;
		
		/* Get current time */
		ct = _pthread_time_in_ms();
		
		/* Have we waited long enough? */
		if (ct > t) return ETIMEDOUT;
	}
}

The only thing left for a complete implementation of the rwlock API are the attributes for pthread_rwlock_init(). Again, since Microsoft only has one type of read/write lock, these don't need to do anything, and can be implemented as simple wrapper functions.


typedef int pthread_rwlockattr_t;
sta

Comments

sfuerst said...
The get and set concurrency functions had an extra underscore in their name. This has now been fixed in the article, and in the downloadable header file.
Borislav Trifonov said...
The asynchronous cancellation trick is nice, but how do we guarantee C++ destructors are still called for the thread's automatic objects?
said...
Great licencing, for a very useful library! Now it does not compile with vc10 for a C++ project.
sfuerst said...
There was a bug in implementation of pthread_barrier_destroy(). It should wait until all threads have left the barrier before returning. This has been fixed.
thomas said...
I would like to try this in place of the pthread-win32 library, which seems very slow in comparison to pthreads running on other OS.

But including the winpthreads.h in a vs2008 c++ file produces lots of errors.

A small sample:

 error C2440: 'initializing' : cannot convert from 'void (__cdecl *)(pthread_once_t *)' to 'void (__cdecl *)(void *)'
1> None of the functions with this name in scope match the target type
1>c:\blomzlab\zbslib\winpthreads.h(366) : error C3861: 'pthread_rwlock_unlock': identifier not found
1>c:\blomzlab\zbslib\winpthreads.h(381) : error C2440: '=' : cannot convert from 'LPVOID' to 'pthread_t'
1> Conversion from 'void*' to pointer to non-'void' requires an explicit cast
1>c:\blomzlab\zbslib\winpthreads.h(386) : error C2440: '=' : cannot convert from 'void *' to 'pthread_t'
1> Conversion from 'void*' to pointer to non-'void' requires an explicit cast
1>c:\blomzlab\zbslib\winpthreads.h(409) : error C3861: '_endthreadex': identifier not found
1>c:\blomzlab\zbslib\winpthreads.h(443) : error C3861: '_InterlockedCompareExchangePointer': identifier not found


Is this expected to work?
tjm said...
Since keys are reused in a circular fashion, and nothing is done in pthread_key_create() or pthread_key_delete() to clear the TLS slots of existing threads, doesn't this implementation violate the rule that: "Upon key creation, the value NULL shall be associated with the new key in all active threads"?

jsestrad said...
Thank you for this! It helps me get out of a slump of confusion! However, I am getting errors in windows vc++ 2010 as well. Will this code not work because microsfot changed some things? Here are the errors:

1>c:\users\jsestrad\documents\porttest\deleteheaderfiles\winpthread.h(214): error C2440: &apos;initializing&apos; : cannot convert from &apos;void (__cdecl *)(pthread_once_t *)&apos; to &apos;void (__cdecl *)(void *)&apos;
1> None of the functions with this name in scope match the target type
1>c:\users\jsestrad\documents\porttest\deleteheaderfiles\winpthread.h(370): error C3861: &apos;pthread_rwlock_unlock&apos;: identifier not found
1>c:\users\jsestrad\documents\porttest\deleteheaderfiles\winpthread.h(385): error C2440: &apos;=&apos; : cannot convert from &apos;LPVOID&apos; to &apos;pthread_t&apos;
1> Conversion from &apos;void*&apos; to pointer to non-&apos;void&apos; requires an explicit cast
1>c:\users\jsestrad\documents\porttest\deleteheaderfiles\winpthread.h(390): error C2440: &apos;=&apos; : cannot convert from &apos;void *&apos; to &apos;pthread_t&apos;
1> Conversion from &apos;void*&apos; to pointer to non-&apos;void&apos; requires an explicit cast
1>c:\users\jsestrad\documents\porttest\deleteheaderfiles\winpthread.h(413): error C3861: &apos;_endthreadex&apos;: identifier not found
1>c:\users\jsestrad\documents\porttest\deleteheaderfiles\winpthread.h(447): error C3861: &apos;_InterlockedCompareExchangePointer&apos;: identifier not found
1>c:\users\jsestrad\documents\porttest\deleteheaderfiles\winpthread.h(457): error C3861: &apos;_InterlockedCompareExchangePointer&apos;: identifier not found
1>c:\users\jsestrad\documents\porttest\deleteheaderfiles\winpthread.h(465): error C3861: &apos;_InterlockedCompareExchangePointer&apos;: identifier not found
1>c:\users\jsestrad\documents\porttest\deleteheaderfiles\winpthread.h(757): error C2440: &apos;initializing&apos; : cannot convert from &apos;void *&apos; to &apos;_pthread_v *&apos;
1> Conversion from &apos;void*&apos; to pointer to non-&apos;void&apos; requires an explicit cast
1>c:\users\jsestrad\documents\porttest\deleteheaderfiles\winpthread.h(788): error C2440: &apos;initializing&apos; : cannot convert from &apos;void *&apos; to &apos;_pthread_v *&apos;
1> Conversion from &apos;void*&apos; to pointer to non-&apos;void&apos; requires an explicit cast
1>c:\users\jsestrad\documents\porttest\deleteheaderfiles\winpthread.h(814): error C3861: &apos;_beginthreadex&apos;: identifier not found
1>c:\users\jsestrad\documents\porttest\deleteheaderfiles\winpthread.h(964): warning C4244: &apos;argument&apos; : conversion from &apos;unsigned __int64&apos; to &apos;DWORD&apos;, possible loss of data
1>c:\users\jsestrad\documents\porttest\deleteheaderfiles\winpthread.h(1137): error C2440: &apos;=&apos; : cannot convert from &apos;void *&apos; to &apos;void (__cdecl **)(void *)&apos;
1> Conversion from &apos;void*&apos; to pointer to non-&apos;void&apos; requires an explicit cast
1>c:\users\jsestrad\documents\porttest\deleteheaderfiles\winpthread.h(1170): warning C4018: &apos;>&apos; : signed/unsigned mismatch
1>c:\users\jsestrad\documents\porttest\deleteheaderfiles\winpthread.h(1177): warning C4018: &apos;>&apos; : signed/unsigned mismatch
1>c:\users\jsestrad\documents\porttest\deleteheaderfiles\winpthread.h(1188): warning C4018: &apos;>=&apos; : signed/unsigned mismatch
1>c:\users\jsestrad\documents\porttest\deleteheaderfiles\winpthread.h(1198): warning C4018: &apos;>&apos; : signed/unsigned mismatch
1>c:\users\jsestrad\documents\porttest\deleteheaderfiles\winpthread.h(1201): error C2440: &apos;initializing&apos; : cannot convert from &apos;void *&apos; to &apos;void **&apos;
1> Conversion from &apos;void*&apos; to pointer to non-&apos;void&apos; requires an explicit cast
douard BERG said...
using windows mutex is really faster than using critical section, the wrapper is easy as the critical section wrapper. Here is an example:

#define pthread_mutex_t HANDLE
int pthread_mutex_init(pthread_mutex_t *mutex)
{
        char mutex_addr[16+1];
        /* using hex address of mutex as string id */
        sprintf(mutex_addr,"%LX",mutex);
        *mutex=CreateMutex(NULL,FALSE,mutex_addr);
        if (*mutex==NULL) return 1; else return 0;
}
int pthread_mutex_lock(pthread_mutex_t *mutex)
{
        WaitForSingleObject(*mutex,INFINITE);
        return 0;
}
int pthread_mutex_trylock(pthread_mutex_t *mutex)
{
        phtread_mutex_t local_mutex;
        char mutex_addr[16+1];
        sprintf(mutex_addr,"%LX",mutex);
        local_mutex=OpenMutex(MUTEX_ALL_ACCESS,FALSE,mutex_addr);
        if (local_mutex==NULL) return 1; else return 0;
}
int pthread_mutex_unlock(pthread_mutex_t *mutex)
{
        ReleaseMutex(*mutex);
        return 0;
}




Edouard BERGE said...
the lock function was unfinished...

int pthread_mutex_lock(pthread_mutex_t *mutex)
{
        char mutex_addr[16+1];
        sprintf(mutex_addr,"%LX",mutex);
        while (OpenMutex(MUTEX_ALL_ACCESS,FALSE,mutex_addr)==NULL) {
                WaitForSingleObject(*mutex,INFINITE);
        }
        return 0;
}


but instead of Linux pthread usage, you MUST initialise the mutex before locking it, this can be solve with a bigger management...
sfuerst said...
thomas: The casting rules for C++ are different than C. You are trying to compile as C++, and this causes the header to fail. If you replace with reinterpret_cast<>(), it should work. (However, that would then break the C implementation.)

tjm: It is difficult to efficiently fix this, other than leaking old keys. I suppose there could be a list maintained of active threads, whose descriptors could be scanned under some lock... Try using the updated version of the library in mingw64. They may have fixed this.

jsestrad: Yeah, Microsoft broke their headers. They stopped defining the _InterlockedCompareExchangePointer() intrinsic, even though their documentation still mentions it. Try using the version without a leading underscore, or manually using the compare-exchange of the correct size for your pointers.

Frediano: The PTHREAD_MUTEX_INITIALIZER has been tested to work with Windows Vista. Earlier versions won't work. Later ones may not either. Microsoft can change their implementation of the critical section at any time.
said...
Enter your comments here
Alexey Melnichuk said...
pthread_kill(t,0) could be usable to detect working of thread.
may be should implement it as WaitSingleObject with zero timeout?

PS. This implementation works since Windows Vista.
wut said...
@Borislav Trifonov: pthread cancellation and C++ exceptions are entirely unrelated. Some implementations somehow make pthread cancellation run C++ stack cleanup functions, but AFAIK this isn't required.
said...
Enter your comments here
Davy said...
pthread_mutexattr_t is undefined ?

Enter the 10 characters above here


Name
About Us Returns Policy Privacy Policy Send us Feedback
Company Info | Product Index | Category Index | Help | Terms of Use
Copyright © Lockless Inc All Rights Reserved.