1This file documents non-portable functions and other issues.
2
3Non-portable functions included in pthreads-win32
4-------------------------------------------------
5
6BOOL
7pthread_win32_test_features_np(int mask)
8
9	This routine allows an application to check which
10	run-time auto-detected features are available within
11	the library.
12
13	The possible features are:
14
15		PTW32_SYSTEM_INTERLOCKED_COMPARE_EXCHANGE
16			Return TRUE if the native version of
17			InterlockedCompareExchange() is being used.
18			This feature is not meaningful in recent
19			library versions as MSVC builds only support
20			system implemented ICE. Note that all Mingw
21			builds use inlined asm versions of all the
22			Interlocked routines.
23		PTW32_ALERTABLE_ASYNC_CANCEL
24			Return TRUE is the QueueUserAPCEx package
25			QUSEREX.DLL is available and the AlertDrv.sys
26			driver is loaded into Windows, providing
27			alertable (pre-emptive) asyncronous threads
28			cancelation. If this feature returns FALSE
29			then the default async cancel scheme is in
30			use, which cannot cancel blocked threads.
31
32	Features may be Or'ed into the mask parameter, in which case
33	the routine returns TRUE if any of the Or'ed features would
34	return TRUE. At this stage it doesn't make sense to Or features
35	but it may some day.
36
37
38void *
39pthread_timechange_handler_np(void *)
40
41        To improve tolerance against operator or time service
42        initiated system clock changes.
43
44        This routine can be called by an application when it
45        receives a WM_TIMECHANGE message from the system. At
46        present it broadcasts all condition variables so that
47        waiting threads can wake up and re-evaluate their
48        conditions and restart their timed waits if required.
49
50        It has the same return type and argument type as a
51        thread routine so that it may be called directly
52        through pthread_create(), i.e. as a separate thread.
53
54        Parameters
55
56        Although a parameter must be supplied, it is ignored.
57        The value NULL can be used.
58
59        Return values
60
61        It can return an error EAGAIN to indicate that not
62        all condition variables were broadcast for some reason.
63        Otherwise, 0 is returned.
64
65        If run as a thread, the return value is returned
66        through pthread_join().
67
68        The return value should be cast to an integer.
69
70
71HANDLE
72pthread_getw32threadhandle_np(pthread_t thread);
73
74	Returns the win32 thread handle that the POSIX
75	thread "thread" is running as.
76
77	Applications can use the win32 handle to set
78	win32 specific attributes of the thread.
79
80DWORD
81pthread_getw32threadid_np (pthread_t thread)
82
83	Returns the Windows native thread ID that the POSIX
84	thread "thread" is running as.
85
86        Only valid when the library is built where
87        ! (defined(__MINGW64__) || defined(__MINGW32__)) || defined (__MSVCRT__) || defined (__DMC__)
88        and otherwise returns 0.
89
90
91int
92pthread_mutexattr_setkind_np(pthread_mutexattr_t * attr, int kind)
93
94int
95pthread_mutexattr_getkind_np(pthread_mutexattr_t * attr, int *kind)
96
97        These two routines are included for Linux compatibility
98        and are direct equivalents to the standard routines
99                pthread_mutexattr_settype
100                pthread_mutexattr_gettype
101
102        pthread_mutexattr_setkind_np accepts the following
103        mutex kinds:
104                PTHREAD_MUTEX_FAST_NP
105                PTHREAD_MUTEX_ERRORCHECK_NP
106                PTHREAD_MUTEX_RECURSIVE_NP
107
108        These are really just equivalent to (respectively):
109                PTHREAD_MUTEX_NORMAL
110                PTHREAD_MUTEX_ERRORCHECK
111                PTHREAD_MUTEX_RECURSIVE
112
113int
114pthread_delay_np (const struct timespec *interval);
115
116        This routine causes a thread to delay execution for a specific period of time.
117        This period ends at the current time plus the specified interval. The routine
118        will not return before the end of the period is reached, but may return an
119        arbitrary amount of time after the period has gone by. This can be due to
120        system load, thread priorities, and system timer granularity.
121
122        Specifying an interval of zero (0) seconds and zero (0) nanoseconds is
123        allowed and can be used to force the thread to give up the processor or to
124        deliver a pending cancelation request.
125
126        This routine is a cancelation point.
127
128        The timespec structure contains the following two fields:
129
130                tv_sec is an integer number of seconds.
131                tv_nsec is an integer number of nanoseconds.
132
133        Return Values
134
135        If an error condition occurs, this routine returns an integer value
136        indicating the type of error. Possible return values are as follows:
137
138        0          Successful completion.
139        [EINVAL]   The value specified by interval is invalid.
140
141int
142pthread_num_processors_np (void)
143
144        This routine (found on HPUX systems) returns the number of processors
145        in the system. This implementation actually returns the number of
146        processors available to the process, which can be a lower number
147        than the system's number, depending on the process's affinity mask.
148
149BOOL
150pthread_win32_process_attach_np (void);
151
152BOOL
153pthread_win32_process_detach_np (void);
154
155BOOL
156pthread_win32_thread_attach_np (void);
157
158BOOL
159pthread_win32_thread_detach_np (void);
160
161	These functions contain the code normally run via dllMain
162	when the library is used as a dll but which need to be
163	called explicitly by an application when the library
164	is statically linked. As of version 2.9.0 of the library, static
165	builds using either MSC or GCC will call pthread_win32_process_*
166	automatically at application startup and exit respectively.
167
168	Otherwise, you will need to call pthread_win32_process_attach_np()
169	before you can call any pthread routines when statically linking.
170	You should call pthread_win32_process_detach_np() before
171	exiting your application to clean up.
172
173	pthread_win32_thread_attach_np() is currently a no-op, but
174	pthread_win32_thread_detach_np() is needed to clean up
175	the implicit pthread handle that is allocated to a Win32 thread if
176	it calls any pthreads routines. Call this routine when the
177	Win32 thread exits.
178
179	Threads created through pthread_create() do not	need to call
180	pthread_win32_thread_detach_np().
181
182	These functions invariably return TRUE except for
183	pthread_win32_process_attach_np() which will return FALSE
184	if pthreads-win32 initialisation fails.
185
186int
187pthreadCancelableWait (HANDLE waitHandle);
188
189int
190pthreadCancelableTimedWait (HANDLE waitHandle, DWORD timeout);
191
192	These two functions provide hooks into the pthread_cancel
193	mechanism that will allow you to wait on a Windows handle
194	and make it a cancellation point. Both functions block
195	until either the given w32 handle is signaled, or
196	pthread_cancel has been called. It is implemented using
197	WaitForMultipleObjects on 'waitHandle' and a manually
198	reset w32 event used to implement pthread_cancel.
199
200
201Non-portable issues
202-------------------
203
204Thread priority
205
206	POSIX defines a single contiguous range of numbers that determine a
207	thread's priority. Win32 defines priority classes and priority
208	levels relative to these classes. Classes are simply priority base
209	levels that the defined priority levels are relative to such that,
210	changing a process's priority class will change the priority of all
211	of it's threads, while the threads retain the same relativity to each
212	other.
213
214	A Win32 system defines a single contiguous monotonic range of values
215	that define system priority levels, just like POSIX. However, Win32
216	restricts individual threads to a subset of this range on a
217	per-process basis.
218
219	The following table shows the base priority levels for combinations
220	of priority class and priority value in Win32.
221
222	 Process Priority Class               Thread Priority Level
223	 -----------------------------------------------------------------
224	 1 IDLE_PRIORITY_CLASS                THREAD_PRIORITY_IDLE
225	 1 BELOW_NORMAL_PRIORITY_CLASS        THREAD_PRIORITY_IDLE
226	 1 NORMAL_PRIORITY_CLASS              THREAD_PRIORITY_IDLE
227	 1 ABOVE_NORMAL_PRIORITY_CLASS        THREAD_PRIORITY_IDLE
228	 1 HIGH_PRIORITY_CLASS                THREAD_PRIORITY_IDLE
229	 2 IDLE_PRIORITY_CLASS                THREAD_PRIORITY_LOWEST
230	 3 IDLE_PRIORITY_CLASS                THREAD_PRIORITY_BELOW_NORMAL
231	 4 IDLE_PRIORITY_CLASS                THREAD_PRIORITY_NORMAL
232	 4 BELOW_NORMAL_PRIORITY_CLASS        THREAD_PRIORITY_LOWEST
233	 5 IDLE_PRIORITY_CLASS                THREAD_PRIORITY_ABOVE_NORMAL
234	 5 BELOW_NORMAL_PRIORITY_CLASS        THREAD_PRIORITY_BELOW_NORMAL
235	 5 Background NORMAL_PRIORITY_CLASS   THREAD_PRIORITY_LOWEST
236	 6 IDLE_PRIORITY_CLASS                THREAD_PRIORITY_HIGHEST
237	 6 BELOW_NORMAL_PRIORITY_CLASS        THREAD_PRIORITY_NORMAL
238	 6 Background NORMAL_PRIORITY_CLASS   THREAD_PRIORITY_BELOW_NORMAL
239	 7 BELOW_NORMAL_PRIORITY_CLASS        THREAD_PRIORITY_ABOVE_NORMAL
240	 7 Background NORMAL_PRIORITY_CLASS   THREAD_PRIORITY_NORMAL
241	 7 Foreground NORMAL_PRIORITY_CLASS   THREAD_PRIORITY_LOWEST
242 	 8 BELOW_NORMAL_PRIORITY_CLASS        THREAD_PRIORITY_HIGHEST
243	 8 NORMAL_PRIORITY_CLASS              THREAD_PRIORITY_ABOVE_NORMAL
244	 8 Foreground NORMAL_PRIORITY_CLASS   THREAD_PRIORITY_BELOW_NORMAL
245	 8 ABOVE_NORMAL_PRIORITY_CLASS        THREAD_PRIORITY_LOWEST
246	 9 NORMAL_PRIORITY_CLASS              THREAD_PRIORITY_HIGHEST
247	 9 Foreground NORMAL_PRIORITY_CLASS   THREAD_PRIORITY_NORMAL
248	 9 ABOVE_NORMAL_PRIORITY_CLASS        THREAD_PRIORITY_BELOW_NORMAL
249	10 Foreground NORMAL_PRIORITY_CLASS   THREAD_PRIORITY_ABOVE_NORMAL
250	10 ABOVE_NORMAL_PRIORITY_CLASS        THREAD_PRIORITY_NORMAL
251	11 Foreground NORMAL_PRIORITY_CLASS   THREAD_PRIORITY_HIGHEST
252	11 ABOVE_NORMAL_PRIORITY_CLASS        THREAD_PRIORITY_ABOVE_NORMAL
253	11 HIGH_PRIORITY_CLASS                THREAD_PRIORITY_LOWEST
254	12 ABOVE_NORMAL_PRIORITY_CLASS        THREAD_PRIORITY_HIGHEST
255	12 HIGH_PRIORITY_CLASS                THREAD_PRIORITY_BELOW_NORMAL
256	13 HIGH_PRIORITY_CLASS                THREAD_PRIORITY_NORMAL
257	14 HIGH_PRIORITY_CLASS                THREAD_PRIORITY_ABOVE_NORMAL
258	15 HIGH_PRIORITY_CLASS                THREAD_PRIORITY_HIGHEST
259	15 HIGH_PRIORITY_CLASS                THREAD_PRIORITY_TIME_CRITICAL
260	15 IDLE_PRIORITY_CLASS                THREAD_PRIORITY_TIME_CRITICAL
261	15 BELOW_NORMAL_PRIORITY_CLASS        THREAD_PRIORITY_TIME_CRITICAL
262	15 NORMAL_PRIORITY_CLASS              THREAD_PRIORITY_TIME_CRITICAL
263	15 ABOVE_NORMAL_PRIORITY_CLASS        THREAD_PRIORITY_TIME_CRITICAL
264	16 REALTIME_PRIORITY_CLASS            THREAD_PRIORITY_IDLE
265	17 REALTIME_PRIORITY_CLASS            -7
266	18 REALTIME_PRIORITY_CLASS            -6
267	19 REALTIME_PRIORITY_CLASS            -5
268	20 REALTIME_PRIORITY_CLASS            -4
269	21 REALTIME_PRIORITY_CLASS            -3
270	22 REALTIME_PRIORITY_CLASS            THREAD_PRIORITY_LOWEST
271	23 REALTIME_PRIORITY_CLASS            THREAD_PRIORITY_BELOW_NORMAL
272	24 REALTIME_PRIORITY_CLASS            THREAD_PRIORITY_NORMAL
273	25 REALTIME_PRIORITY_CLASS            THREAD_PRIORITY_ABOVE_NORMAL
274	26 REALTIME_PRIORITY_CLASS            THREAD_PRIORITY_HIGHEST
275	27 REALTIME_PRIORITY_CLASS             3
276	28 REALTIME_PRIORITY_CLASS             4
277	29 REALTIME_PRIORITY_CLASS             5
278	30 REALTIME_PRIORITY_CLASS             6
279	31 REALTIME_PRIORITY_CLASS            THREAD_PRIORITY_TIME_CRITICAL
280
281	Windows NT:  Values -7, -6, -5, -4, -3, 3, 4, 5, and 6 are not supported.
282
283
284	As you can see, the real priority levels available to any individual
285	Win32 thread are non-contiguous.
286
287	An application using pthreads-win32 should not make assumptions about
288	the numbers used to represent thread priority levels, except that they
289	are monotonic between the values returned by sched_get_priority_min()
290	and sched_get_priority_max(). E.g. Windows 95, 98, NT, 2000, XP make
291	available a non-contiguous range of numbers between -15 and 15, while
292	at least one version of WinCE (3.0) defines the minimum priority
293	(THREAD_PRIORITY_LOWEST) as 5, and the maximum priority
294	(THREAD_PRIORITY_HIGHEST) as 1.
295
296	Internally, pthreads-win32 maps any priority levels between
297	THREAD_PRIORITY_IDLE and THREAD_PRIORITY_LOWEST to THREAD_PRIORITY_LOWEST,
298	or between THREAD_PRIORITY_TIME_CRITICAL and THREAD_PRIORITY_HIGHEST to
299	THREAD_PRIORITY_HIGHEST. Currently, this also applies to
300	REALTIME_PRIORITY_CLASSi even if levels -7, -6, -5, -4, -3, 3, 4, 5, and 6
301	are supported.
302
303	If it wishes, a Win32 application using pthreads-win32 can use the Win32
304	defined priority macros THREAD_PRIORITY_IDLE through
305	THREAD_PRIORITY_TIME_CRITICAL.
306
307
308The opacity of the pthread_t datatype
309-------------------------------------
310and possible solutions for portable null/compare/hash, etc
311----------------------------------------------------------
312
313Because pthread_t is an opague datatype an implementation is permitted to define
314pthread_t in any way it wishes. That includes defining some bits, if it is
315scalar, or members, if it is an aggregate, to store information that may be
316extra to the unique identifying value of the ID. As a result, pthread_t values
317may not be directly comparable.
318
319If you want your code to be portable you must adhere to the following contraints:
320
3211) Don't assume it is a scalar data type, e.g. an integer or pointer value. There
322are several other implementations where pthread_t is also a struct. See our FAQ
323Question 11 for our reasons for defining pthread_t as a struct.
324
3252) You must not compare them using relational or equality operators. You must use
326the API function pthread_equal() to test for equality.
327
3283) Never attempt to reference individual members.
329
330
331The problem
332
333Certain applications would like to be able to access only the 'pure' pthread_t
334id values, primarily to use as keys into data structures to manage threads or
335thread-related data, but this is not possible in a maximally portable and
336standards compliant way for current POSIX threads implementations.
337
338For implementations that define pthread_t as a scalar, programmers often employ
339direct relational and equality operators on pthread_t. This code will break when
340ported to an implementation that defines pthread_t as an aggregate type.
341
342For implementations that define pthread_t as an aggregate, e.g. a struct,
343programmers can use memcmp etc., but then face the prospect that the struct may
344include alignment padding bytes or bits as well as extra implementation-specific
345members that are not part of the unique identifying value.
346
347[While this is not currently the case for pthreads-win32, opacity also
348means that an implementation is free to change the definition, which should
349generally only require that applications be recompiled and relinked, not
350rewritten.]
351
352
353Doesn't the compiler take care of padding?
354
355The C89 and later standards only effectively guarrantee element-by-element
356equivalence following an assignment or pass by value of a struct or union,
357therefore undefined areas of any two otherwise equivalent pthread_t instances
358can still compare differently, e.g. attempting to compare two such pthread_t
359variables byte-by-byte, e.g. memcmp(&t1, &t2, sizeof(pthread_t) may give an
360incorrect result. In practice I'm reasonably confident that compilers routinely
361also copy the padding bytes, mainly because assignment of unions would be far
362too complicated otherwise. But it just isn't guarranteed by the standard.
363
364Illustration:
365
366We have two thread IDs t1 and t2
367
368pthread_t t1, t2;
369
370In an application we create the threads and intend to store the thread IDs in an
371ordered data structure (linked list, tree, etc) so we need to be able to compare
372them in order to insert them initially and also to traverse.
373
374Suppose pthread_t contains undefined padding bits and our compiler copies our
375pthread_t [struct] element-by-element, then for the assignment:
376
377pthread_t temp = t1;
378
379temp and t1 will be equivalent and correct but a byte-for-byte comparison such as
380memcmp(&temp, &t1, sizeof(pthread_t)) == 0 may not return true as we expect because
381the undefined bits may not have the same values in the two variable instances.
382
383Similarly if passing by value under the same conditions.
384
385If, on the other hand, the undefined bits are at least constant through every
386assignment and pass-by-value then the byte-for-byte comparison
387memcmp(&temp, &t1, sizeof(pthread_t)) == 0 will always return the expected result.
388How can we force the behaviour we need?
389
390
391Solutions
392
393Adding new functions to the standard API or as non-portable extentions is
394the only reliable and portable way to provide the necessary operations.
395Remember also that POSIX is not tied to the C language. The most common
396functions that have been suggested are:
397
398pthread_null()
399pthread_compare()
400pthread_hash()
401
402A single more general purpose function could also be defined as a
403basis for at least the last two of the above functions.
404
405First we need to list the freedoms and constraints with restpect
406to pthread_t so that we can be sure our solution is compatible with the
407standard.
408
409What is known or may be deduced from the standard:
4101) pthread_t must be able to be passed by value, so it must be a single object.
4112) from (1) it must be copyable so cannot embed thread-state information, locks
412or other volatile objects required to manage the thread it associates with.
4133) pthread_t may carry additional information, e.g. for debugging or to manage
414itself.
4154) there is an implicit requirement that the size of pthread_t is determinable
416at compile-time and size-invariant, because it must be able to copy the object
417(i.e. through assignment and pass-by-value). Such copies must be genuine
418duplicates, not merely a copy of a pointer to a common instance such as
419would be the case if pthread_t were defined as an array.
420
421
422Suppose we define the following function:
423
424/* This function shall return it's argument */
425pthread_t* pthread_normalize(pthread_t* thread);
426
427For scalar or aggregate pthread_t types this function would simply zero any bits
428within the pthread_t that don't uniquely identify the thread, including padding,
429such that client code can return consistent results from operations done on the
430result. If the additional bits are a pointer to an associate structure then
431this function would ensure that the memory used to store that associate
432structure does not leak. After normalization the following compare would be
433valid and repeatable:
434
435memcmp(pthread_normalize(&t1),pthread_normalize(&t2),sizeof(pthread_t))
436
437Note 1: such comparisons are intended merely to order and sort pthread_t values
438and allow them to index various data structures. They are not intended to reveal
439anything about the relationships between threads, like startup order.
440
441Note 2: the normalized pthread_t is also a valid pthread_t that uniquely
442identifies the same thread.
443
444Advantages:
4451) In most existing implementations this function would reduce to a no-op that
446emits no additional instructions, i.e after in-lining or optimisation, or if
447defined as a macro:
448#define pthread_normalise(tptr) (tptr)
449
4502) This single function allows an application to portably derive
451application-level versions of any of the other required functions.
452
4533) It is a generic function that could enable unanticipated uses.
454
455Disadvantages:
4561) Less efficient than dedicated compare or hash functions for implementations
457that include significant extra non-id elements in pthread_t.
458
4592) Still need to be concerned about padding if copying normalized pthread_t.
460See the later section on defining pthread_t to neutralise padding issues.
461
462Generally a pthread_t may need to be normalized every time it is used,
463which could have a significant impact. However, this is a design decision
464for the implementor in a competitive environment. An implementation is free
465to define a pthread_t in a way that minimises or eliminates padding or
466renders this function a no-op.
467
468Hazards:
4691) Pass-by-reference directly modifies 'thread' so the application must
470synchronise access or ensure that the pointer refers to a copy. The alternative
471of pass-by-value/return-by-value was considered but then this requires two copy
472operations, disadvantaging implementations where this function is not a no-op
473in terms of speed of execution. This function is intended to be used in high
474frequency situations and needs to be efficient, or at least not unnecessarily
475inefficient. The alternative also sits awkwardly with functions like memcmp.
476
4772) [Non-compliant] code that uses relational and equality operators on
478arithmetic or pointer style pthread_t types would need to be rewritten, but it
479should be rewritten anyway.
480
481
482C implementation of null/compare/hash functions using pthread_normalize():
483
484/* In pthread.h */
485pthread_t* pthread_normalize(pthread_t* thread);
486
487/* In user code */
488/* User-level bitclear function - clear bits in loc corresponding to mask */
489void* bitclear (void* loc, void* mask, size_t count);
490
491typedef unsigned int hash_t;
492
493/* User-level hash function */
494hash_t hash(void* ptr, size_t count);
495
496/*
497 * User-level pthr_null function - modifies the origin thread handle.
498 * The concept of a null pthread_t is highly implementation dependent
499 * and this design may be far from the mark. For example, in an
500 * implementation "null" may mean setting a special value inside one
501 * element of pthread_t to mean "INVALID". However, if that value was zero and
502 * formed part of the id component then we may get away with this design.
503 */
504pthread_t* pthr_null(pthread_t* tp)
505{
506  /*
507   * This should have the same effect as memset(tp, 0, sizeof(pthread_t))
508   * We're just showing that we can do it.
509   */
510  void* p = (void*) pthread_normalize(tp);
511  return (pthread_t*) bitclear(p, p, sizeof(pthread_t));
512}
513
514/*
515 * Safe user-level pthr_compare function - modifies temporary thread handle copies
516 */
517int pthr_compare_safe(pthread_t thread1, pthread_t thread2)
518{
519  return memcmp(pthread_normalize(&thread1), pthread_normalize(&thread2), sizeof(pthread_t));
520}
521
522/*
523 * Fast user-level pthr_compare function - modifies origin thread handles
524 */
525int pthr_compare_fast(pthread_t* thread1, pthread_t* thread2)
526{
527  return memcmp(pthread_normalize(&thread1), pthread_normalize(&thread2), sizeof(pthread_t));
528}
529
530/*
531 * Safe user-level pthr_hash function - modifies temporary thread handle copy
532 */
533hash_t pthr_hash_safe(pthread_t thread)
534{
535  return hash((void *) pthread_normalize(&thread), sizeof(pthread_t));
536}
537
538/*
539 * Fast user-level pthr_hash function - modifies origin thread handle
540 */
541hash_t pthr_hash_fast(pthread_t thread)
542{
543  return hash((void *) pthread_normalize(&thread), sizeof(pthread_t));
544}
545
546/* User-level bitclear function - modifies the origin array */
547void* bitclear(void* loc, void* mask, size_t count)
548{
549  int i;
550  for (i=0; i < count; i++) {
551    (unsigned char) *loc++ &= ~((unsigned char) *mask++);
552  }
553}
554
555/* Donald Knuth hash */
556hash_t hash(void* str, size_t count)
557{
558   hash_t hash = (hash_t) count;
559   unsigned int i = 0;
560
561   for(i = 0; i < len; str++, i++)
562   {
563      hash = ((hash << 5) ^ (hash >> 27)) ^ (*str);
564   }
565   return hash;
566}
567
568/* Example of advantage point (3) - split a thread handle into its id and non-id values */
569pthread_t id = thread, non-id = thread;
570bitclear((void*) &non-id, (void*) pthread_normalize(&id), sizeof(pthread_t));
571
572
573A pthread_t type change proposal to neutralise the effects of padding
574
575Even if pthread_nornalize() is available, padding is still a problem because
576the standard only garrantees element-by-element equivalence through
577copy operations (assignment and pass-by-value). So padding bit values can
578still change randomly after calls to pthread_normalize().
579
580[I suspect that most compilers take the easy path and always byte-copy anyway,
581partly because it becomes too complex to do (e.g. unions that contain sub-aggregates)
582but also because programmers can easily design their aggregates to minimise and
583often eliminate padding].
584
585How can we eliminate the problem of padding bytes in structs? Could
586defining pthread_t as a union rather than a struct provide a solution?
587
588In fact, the Linux pthread.h defines most of it's pthread_*_t objects (but not
589pthread_t itself) as unions, possibly for this and/or other reasons. We'll
590borrow some element naming from there but the ideas themselves are well known
591- the __align element used to force alignment of the union comes from K&R's
592storage allocator example.
593
594/* Essentially our current pthread_t renamed */
595typedef struct {
596  struct thread_state_t * __p;
597  long __x; /* sequence counter */
598} thread_id_t;
599
600Ensuring that the last element in the above struct is a long ensures that the
601overall struct size is a multiple of sizeof(long), so there should be no trailing
602padding in this struct or the union we define below.
603(Later we'll see that we can handle internal but not trailing padding.)
604
605/* New pthread_t */
606typedef union {
607  char __size[sizeof(thread_id_t)]; /* array as the first element */
608  thread_id_t __tid;
609  long __align;  /* Ensure that the union starts on long boundary */
610} pthread_t;
611
612This guarrantees that, during an assignment or pass-by-value, the compiler copies
613every byte in our thread_id_t because the compiler guarrantees that the __size
614array, which we have ensured is the equal-largest element in the union, retains
615equivalence.
616
617This means that pthread_t values stored, assigned and passed by value will at least
618carry the value of any undefined padding bytes along and therefore ensure that
619those values remain consistent. Our comparisons will return consistent results and
620our hashes of [zero initialised] pthread_t values will also return consistent
621results.
622
623We have also removed the need for a pthread_null() function; we can initialise
624at declaration time or easily create our own const pthread_t to use in assignments
625later:
626
627const pthread_t null_tid = {0}; /* braces are required */
628
629pthread_t t;
630...
631t = null_tid;
632
633
634Note that we don't have to explicitly make use of the __size array at all. It's
635there just to force the compiler behaviour we want.
636
637
638Partial solutions without a pthread_normalize function
639
640
641An application-level pthread_null and pthread_compare proposal
642(and pthread_hash proposal by extention)
643
644In order to deal with the problem of scalar/aggregate pthread_t type disparity in
645portable code I suggest using an old-fashioned union, e.g.:
646
647Contraints:
648- there is no padding, or padding values are preserved through assignment and
649  pass-by-value (see above);
650- there are no extra non-id values in the pthread_t.
651
652
653Example 1: A null initialiser for pthread_t variables...
654
655typedef union {
656    unsigned char b[sizeof(pthread_t)];
657    pthread_t t;
658} init_t;
659
660const init_t initial = {0};
661
662pthread_t tid = initial.t; /* init tid to all zeroes */
663
664
665Example 2: A comparison function for pthread_t values
666
667typedef union {
668   unsigned char b[sizeof(pthread_t)];
669   pthread_t t;
670} pthcmp_t;
671
672int pthcmp(pthread_t left, pthread_t right)
673{
674  /*
675  * Compare two pthread handles in a way that imposes a repeatable but arbitrary
676  * ordering on them.
677  * I.e. given the same set of pthread_t handles the ordering should be the same
678  * each time but the order has no particular meaning other than that. E.g.
679  * the ordering does not imply the thread start sequence, or any other
680  * relationship between threads.
681  *
682  * Return values are:
683  * 1 : left is greater than right
684  * 0 : left is equal to right
685  * -1 : left is less than right
686  */
687  int i;
688  pthcmp_t L, R;
689  L.t = left;
690  R.t = right;
691  for (i = 0; i < sizeof(pthread_t); i++)
692  {
693    if (L.b[i] > R.b[i])
694      return 1;
695    else if (L.b[i] < R.b[i])
696      return -1;
697  }
698  return 0;
699}
700
701It has been pointed out that the C99 standard allows for the possibility that
702integer types also may include padding bits, which could invalidate the above
703method. This addition to C99 was specifically included after it was pointed
704out that there was one, presumably not particularly well known, architecture
705that included a padding bit in it's 32 bit integer type. See section 6.2.6.2
706of both the standard and the rationale, specifically the paragraph starting at
707line 16 on page 43 of the rationale.
708
709
710An aside
711
712Certain compilers, e.g. gcc and one of the IBM compilers, include a feature
713extention: provided the union contains a member of the same type as the
714object then the object may be cast to the union itself.
715
716We could use this feature to speed up the pthrcmp() function from example 2
717above by casting rather than assigning the pthread_t arguments to the union, e.g.:
718
719int pthcmp(pthread_t left, pthread_t right)
720{
721  /*
722  * Compare two pthread handles in a way that imposes a repeatable but arbitrary
723  * ordering on them.
724  * I.e. given the same set of pthread_t handles the ordering should be the same
725  * each time but the order has no particular meaning other than that. E.g.
726  * the ordering does not imply the thread start sequence, or any other
727  * relationship between threads.
728  *
729  * Return values are:
730  * 1 : left is greater than right
731  * 0 : left is equal to right
732  * -1 : left is less than right
733  */
734  int i;
735  for (i = 0; i < sizeof(pthread_t); i++)
736  {
737    if (((pthcmp_t)left).b[i] > ((pthcmp_t)right).b[i])
738      return 1;
739    else if (((pthcmp_t)left).b[i] < ((pthcmp_t)right).b[i])
740      return -1;
741  }
742  return 0;
743}
744
745
746Result thus far
747
748We can't remove undefined bits if they are there in pthread_t already, but we have
749attempted to render them inert for comparison and hashing functions by making them
750consistent through assignment, copy and pass-by-value.
751
752Note: Hashing pthread_t values requires that all pthread_t variables be initialised
753to the same value (usually all zeros) before being assigned a proper thread ID, i.e.
754to ensure that any padding bits are zero, or at least the same value for all
755pthread_t. Since all pthread_t values are generated by the library in the first
756instance this need not be an application-level operation.
757
758
759Conclusion
760
761I've attempted to resolve the multiple issues of type opacity and the possible
762presence of undefined bits and bytes in pthread_t values, which prevent
763applications from comparing or hashing pthread handles.
764
765Two complimentary partial solutions have been proposed, one an application-level
766scheme to handle both scalar and aggregate pthread_t types equally, plus a
767definition of pthread_t itself that neutralises padding bits and bytes by
768coercing semantics out of the compiler to eliminate variations in the values of
769padding bits.
770
771I have not provided any solution to the problem of handling extra values embedded
772in pthread_t, e.g. debugging or trap information that an implementation is entitled
773to include. Therefore none of this replaces the portability and flexibility of API
774functions but what functions are needed? The threads standard is unlikely to
775include that can be implemented by a combination of existing features and more
776generic functions (several references in the threads rationale suggest this.
777Therefore I propose that the following function could replace the several functions
778that have been suggested in conversations:
779
780pthread_t * pthread_normalize(pthread_t * handle);
781
782For most existing pthreads implementations this function, or macro, would reduce to
783a no-op with zero call overhead.
784