Posts:
16
From:
Registered:
7/7/06
|
|
|
|
pthreads and store buffer flushing
Posted:
Mar 20, 2009 7:33 AM
To: Communities » tools » compilers
|
|
When using multiple threads in userspace together with global data structures (bss, data segment or heap) the following is not clear to me: Guess only one thread fills and updates a more complex global data structure (similar to a devices softstate if it was in the kernel) and multiple other threads shall access the data from this structure. Do I really need a mutex, rw-lock or something of that kind around every single access - be it read or write - to any member of this structure? Or would it be enough to declare the respective structure/members volatile. What makes me unsure about correct usage is the statement from the "Multithreaded Programming Guide": "The synchronization primitives use special instructions that flush the store buffers to cache...So, using locks around your shared data ensures memory consistency." On the other hand "Writing Device Drivers" contains an example with a busy flag not protected by a lock and using the volatile keyword instead. Several small code examples in "Multithreaded Programming Guide" access data that are not in the current threads stack without using locks. So, the case described in the beginning does not require a lock for correct data in the structure since only one thread is doing updates. But when is the data visible to other threads? Or which instruction causes the data to be written to memory so that a consumer using "volatile" sees the data? The same question would occurr with a global data buffer being shared between several threads. In practice everything works fine with volatile but the manual says (as far as I understand it) that there might be problems. Any hint would be very useful.
|
|
|
Darryl Gove
Darryl.Gove@Sun.COM
|
|
|
|
Re: [tools-compilers] pthreads and store buffer flushing
Posted:
Mar 20, 2009 10:44 AM
in response to: miebert
|
|
Hi,
I can't comment on exactly your issue, but I can give you an outline of the issues around volatile and mutexes.
Volatile:
When you declare a variable to be volatile it ensures that the compiler always loads that variable from memory and immediately stores it back to memory after any operation on it.
For example:
int flag;
while (flag){}
In the absence of the volatile keyword the compiler will optimise this to:
if (!flag) while (1) {} [If the flag is not zero to start with then the compiler assumes that there's nothing that can make it zero, so there is no exit condition on the loop.]
Not all shared data needs to be declared volatile. Only if you want one thread to see the effect of another thread.
[Example, if one thread populates a buffer, and another thread will later read from that buffer, then you don't need to declare the contents of the buffer as volatile. The important word being later, if you expect the two threads to access the buffer at the same time, then you would probably need the volatile keyword]
Mutexes are there to ensure exclusive access.
You will typically need to use them if you are updating a variable, or if you are performing a complex operation that should appear 'atomic'
For example:
volatile int total;
mutex_lock(); total+=5; mutex_unlock();
You need to do this to avoid a data race, where another thread could also be updating total:
Here's the situation without the mutex:
Thread 1 Thread 2 Read total Read total Add 5 Add 5 Write total Write total
So total would be incremented by 5 rather than 10.
An example of a complex operation would be:
mutex_lock(); My_account = My_account - bill; Their_account = Their_account + bill; mutex_unlock();
You could use two separate mutexes, but then there would be a state where the amount bill would have been removed from My_account, but not yet placed into Their_account (this may or may not be a problem).
===================================
In your case I think you are filling a structure, then passing this structure to another thread. It seems that the two (or more) threads are not simultaneously accessing the structure. So the program order guarantees mutually exclusive access. In your case I do not think that you need either mutexes or the volatile keyword. However, you might need something in the mechanism that enables the sharing of the buffer.
For example, if the buffer is shared as a pointer
buffer * my_buffer;
Thread1 my_buffer = this_buffer;
Thread2 while (my_buffer==0) {} /*Wait for next buffer*/
Then the pointer needs to be volatile.
Alternatively, if you are adding a new element onto a linked list of buffers, then you will probably need a mutex around the linked list to ensure that only one thread is adding or removing times from the list at a time.
BTW, you can use the Thread Analyzer in Sun Studio to detect dataraces in your application. http://developers.sun.com/sunstudio/
HTH,
Regards,
Darryl.
Bert Miemietz wrote: > When using multiple threads in userspace together with global data structures > (bss, data segment or heap) the following is not clear to me: > Guess only one thread fills and updates a more complex global data structure > (similar to a devices softstate if it was in the kernel) and multiple other > threads shall access the data from this structure. > Do I really need a mutex, rw-lock or something of that kind around every single > access - be it read or write - to any member of this structure? > Or would it be enough to declare the respective structure/members volatile. > What makes me unsure about correct usage is the statement from the "Multithreaded > Programming Guide": > "The synchronization primitives use special instructions that flush the store buffers > to cache...So, using locks around your shared data ensures memory consistency." > On the other hand "Writing Device Drivers" contains an example with a busy flag > not protected by a lock and using the volatile keyword instead. > Several small code examples in "Multithreaded Programming Guide" access data > that are not in the current threads stack without using locks. > So, the case described in the beginning does not require a lock for correct data in > the structure since only one thread is doing updates. But when is the data visible > to other threads? Or which instruction causes the data to be written to memory > so that a consumer using "volatile" sees the data? The same question would occurr > with a global data buffer being shared between several threads. > In practice everything works fine with volatile but the manual says (as far as I > understand it) that there might be problems. > Any hint would be very useful.
-- Darryl Gove Compiler Performance Engineering Blog : http://blogs.sun.com/d/ Books: http://www.sun.com/books/catalog/solaris_app_programming.xml http://my.safaribooksonline.com/0595352510 _______________________________________________ tools-compilers mailing list tools-compilers at opensolaris dot org
|
|
|
|
Posts:
16
From:
Registered:
7/7/06
|
|
|
|
Re: [tools-compilers] pthreads and store buffer flushing
Posted:
Mar 20, 2009 11:30 AM
in response to: Darryl Gove
To: Communities » tools » compilers
|
|
Hallo Darryl,
thanks a lot for your explanations. They match exactly what I was used to. But when working through the multithreading manual I got a little bit nervous about the explanations since what is written there about multiprocessors and relaxed memory order would have a serious effect on the code to write. There are two sources contradicting your/our/my concept:
1.) Multithreaded Programming Guide (Sept. 2008, Page 240) "programmers must be careful to use locks around all global or shared data." This is explained by store buffers of one processor that are not necessarily flushed and hence modified data not being visible to other processors (threads).
2.) The Sun MT-FAQ (95114-001) Does the C keyword "volatile" cause the store buffer to flush ? ... The answer: writing to a volatile-qualified type object does not cause a store buffer flush.
This is exact the contrary to your explanation "When you declare a variable to be volatile it ensures that the compiler always loads that variable from memory and immediately stores it back to memory after any operation on it."
Since conforming and portable code is very much an issue for our current project we were a little bit concerned. Perhaps things could be made a little bit clearer in the manuals (provided we are not too dumb to understand them at all).
Thanks for your help!
b.m.
|
|
|
|
Darryl Gove
Darryl.Gove@Sun.COM
|
|
|
|
Re: [tools-compilers] pthreads and store buffer flushing
Posted:
Mar 20, 2009 12:19 PM
in response to: miebert
|
|
Hi,
What you're hitting is the internals of how processors share information.
Bert Miemietz wrote: > Hallo Darryl, > > thanks a lot for your explanations. They match exactly what I was used to. > But when working through the multithreading manual I got a little bit nervous > about the explanations since what is written there about multiprocessors > and relaxed memory order would have a serious effect on the code to write. > There are two sources contradicting your/our/my concept: > > 1.) Multithreaded Programming Guide (Sept. 2008, Page 240) > "programmers must be careful to use locks around all global or shared data." > This is explained by store buffers of one processor that are not necessarily > flushed and hence modified data not being visible to other processors (threads). > > 2.) The Sun MT-FAQ (95114-001) > Does the C keyword "volatile" cause the store buffer to flush ? > ... > The answer: writing to a volatile-qualified type object does not cause a store buffer flush. >
These are correct. When you store data it does not necessarily mean that other processors can see that information.
If processor A writes to variable B, then that store goes through a store queue, then gets written to cache on Processor A, eventually, that cacheline will get flushed to memory.
The volatile keyword ensures that the store occurs, but does not ensure that the data is flushed to memory.
> This is exact the contrary to your explanation > "When you declare a variable to be volatile it ensures that the compiler > always loads that variable from memory and immediately stores it back to > memory after any operation on it."
The volatile keyword ensures that the variable is not held in a register, and is stored back to memory as quickly as possible. However, I'm using "stored to memory" as shorthand for placed into the store queue, then put into cache, then flushed to memory at some point in the future.
> > Since conforming and portable code is very much an issue for our current > project we were a little bit concerned. Perhaps things could be made a > little bit clearer in the manuals (provided we are not too dumb to understand > them at all). >
The topic you're really interested in is memory bars (membars) on SPARC or memory fences (mfence) on x86.
Basically on SPARC, which uses total store order (TSO), you don't need them in all but the most exceptional circumstances. On x86 you might need them because it uses a weaker consistency model.
Take a look at Dave Dice's blog post on this topic: http://blogs.sun.com/dave/entry/java_memory_model_concerns_on
I have some more pointers here: http://blogs.sun.com/d/entry/when_to_use_membars
Regards,
Darryl.
> Thanks for your help! > > b.m.
-- Darryl Gove Compiler Performance Engineering Blog : http://blogs.sun.com/d/ Books: http://www.sun.com/books/catalog/solaris_app_programming.xml http://my.safaribooksonline.com/0595352510 _______________________________________________ tools-compilers mailing list tools-compilers at opensolaris dot org
|
|
|
|
Posts:
16
From:
Registered:
7/7/06
|
|
|
|
Re: [tools-compilers] pthreads and store buffer flushing
Posted:
Mar 21, 2009 12:06 AM
in response to: Darryl Gove
To: Communities » tools » compilers
|
|
Hallo Darryl,
after some more reading (your links were really interesting) I hope you don't mi nd if I come around with one more question. Perhaps the following code fragments can serve as an example:
#define USE_MTX #define BSIZE 8192
volatile int error; volatile char shared_buf[BSIZE] /* buffer for data transport */ int *type_p /* describe type of data in buffer */ type_p = (int *) shared_buf; pthread_mutex_t mtx;
pthread_create(..., transmit_thread, ...); pthread_create(..., receive_thread, ...); pthread_create(..., consumer_thread, ...);
transmit_thread { int error1;
while (1) { #ifdef USE_MTX pthread_mutex_lock(&mtx); error1 = error; pthread_mutex_unlock(&mtx); #else error1 = error; #endif if (error1 != 0) break; if (transmit() != 0) { /* transmit yields any error */ #ifdef USE_MTX pthread_mutex_lock(&mtx); error = 5; pthread_mutex_unlock(&mtx); #else error = 5; #endif } } }
receive_thread { int error1;
while (1) { #ifdef USE_MTX pthread_mutex_lock(&mtx); error1 = error; pthread_mutex_unlock(&mtx); #else error1 = error; #endif if (error1 != 0) break; if (receive_to_buf() != 0) { /* receive yields any error */ #ifdef USE_MTX pthread_mutex_lock(&mtx); error = 5; pthread_mutex_unlock(&mtx); #else error = 5; #endif } else { *type_p = 10; wakeup_consumer_with_cv(); } } }
consumer_thread { int error1; waiting_on_cv(); #ifdef USE_MTX pthread_mutex_lock(&mtx); error1 = error; pthread_mutex_unlock(&mtx); #else error1 = error; #endif if (error1 == 0) { if (*type_p == 10) { process_the_buffer(); } } }
I hope, I didn't do any simple mistakes in writing. The following is desired: - transmit and receive threads share the same communication and shall work in parallel - as soon as any error occurrs this would result in interrupting the threads in their transmit/receive functions The fact that this is due to an error shall be communicated by means of the variable error. A thread interrupted in transmit or receive shall never retry any transmit or receive - if the receive thread has filled the buffer it wakes up the consumer waiting on a cv. The consumer shall then find the correct data in the buffer (as long as error is 0).
It is not worth discussing the purpose of the code. It is only an example to make clear my issue. If I really need a mutex or similar around any access to error (just to ensure flushing of the store buffers after an update) this leads to an ugly code instead of simply writing "while (error == 0)". Error is not calculated, it is simply assigned a value. Therefore I guess one could simply use "error = 5" without a mutex if only this got visible right away to all other threads. On the other hand: If I use the mutex would I still need the volatile keyword to ensure the compiler doesn't optimize away the access? And how can I be/make sure that data written to buffer by receive_thread is completely visible to consumer_thread after the latter is woken up with the cv? Perhaps all these questions can be condensed to the question if I need to define USE_MTX in the example to get a conforming and portable code that runs on any CPU (at least with Solaris).
I hope very much that these issues are also of interest to other programmers. Thank you in advance for your patience and your assistance!
|
|
|
|
Darryl Gove
Darryl.Gove@Sun.COM
|
|
|
|
Re: [tools-compilers] pthreads and store buffer flushing
Posted:
Mar 23, 2009 12:37 PM
in response to: miebert
|
|
Hi,
In the example you give, you are basically protecting accesses to the volatile global variable error with mutexes.
I don't believe that these mutexes are necessary. The accesses to the variable error are already atomic, and the volatile keyword will ensure that the compiler does not hold the variable in a register.
They might help your program work on all platforms by ensuring that the appropriate memory barriers are inserted. However, that would be a side-effect.
The problem I have with the code is that there is no synchronisation on the passing of the barrier between the threads. The synchronisation should be something like:
mutex_lock() write to shared_buf mutex_unlock()
That's the structure that you don't want another thread to read whilst it is partially complete.
If you get that synchronisation correct, I expect that the synchronisation on the error variable would also be attained.
Note that your definition of type_p is incorrect, it needs to be defined as a pointer to a volatile variable. The fact that it is made to point to the first elements in a volatile shared_buf does not transmit that volatility onto the pointer. The code would probably work because in the example you are immediately do a function call after the write - which would ensure that the compiler writes the variable to memory (in case the function reads it).
HTH,
Darryl.
On 03/21/09 12:06 AM, Bert Miemietz wrote: > Hallo Darryl, > > after some more reading (your links were really interesting) I hope you don't mi > nd > if I come around with one more question. Perhaps the following code fragments > can serve as an example: > > #define USE_MTX > #define BSIZE 8192 > > volatile int error; > volatile char shared_buf[BSIZE] /* buffer for data transport */ > int *type_p /* describe type of data in buffer */ > type_p = (int *) shared_buf; > pthread_mutex_t mtx; > > pthread_create(..., transmit_thread, ...); > pthread_create(..., receive_thread, ...); > pthread_create(..., consumer_thread, ...); > > transmit_thread { > int error1; > > while (1) { > #ifdef USE_MTX > pthread_mutex_lock(&mtx); > error1 = error; > pthread_mutex_unlock(&mtx); > #else > error1 = error; > #endif > if (error1 != 0) break; > if (transmit() != 0) { /* transmit yields any error */ > #ifdef USE_MTX > pthread_mutex_lock(&mtx); > error = 5; > pthread_mutex_unlock(&mtx); > #else > error = 5; > #endif > } > } > } > > receive_thread { > int error1; > > while (1) { > #ifdef USE_MTX > pthread_mutex_lock(&mtx); > error1 = error; > pthread_mutex_unlock(&mtx); > #else > error1 = error; > #endif > if (error1 != 0) break; > if (receive_to_buf() != 0) { /* receive yields any error */ > #ifdef USE_MTX > pthread_mutex_lock(&mtx); > error = 5; > pthread_mutex_unlock(&mtx); > #else > error = 5; > #endif > } > else { > *type_p = 10; > wakeup_consumer_with_cv(); > } > } > } > > > consumer_thread { > int error1; > waiting_on_cv(); > #ifdef USE_MTX > pthread_mutex_lock(&mtx); > error1 = error; > pthread_mutex_unlock(&mtx); > #else > error1 = error; > #endif > if (error1 == 0) { > if (*type_p == 10) { > process_the_buffer(); > } > } > } > > > I hope, I didn't do any simple mistakes in writing. > The following is desired: > - transmit and receive threads share the same communication > and shall work in parallel > - as soon as any error occurrs this would result in > interrupting the threads in their transmit/receive functions > The fact that this is due to an error shall be > communicated by means of the variable error. A thread > interrupted in transmit or receive shall never retry any > transmit or receive > - if the receive thread has filled the buffer it wakes up > the consumer waiting on a cv. The consumer shall then > find the correct data in the buffer (as long as error is 0). > > It is not worth discussing the purpose of the code. It is only > an example to make clear my issue. If I really need a mutex > or similar around any access to error (just to ensure flushing > of the store buffers after an update) this leads to an ugly > code instead of simply writing "while (error == 0)". Error is > not calculated, it is simply assigned a value. Therefore I > guess one could simply use "error = 5" without a mutex > if only this got visible right away to all other threads. > On the other hand: If I use the mutex would I still need > the volatile keyword to ensure the compiler doesn't > optimize away the access? > And how can I be/make sure that data written to buffer > by receive_thread is completely visible to consumer_thread > after the latter is woken up with the cv? > Perhaps all these questions can be condensed to the question > if I need to define USE_MTX in the example to get a conforming > and portable code that runs on any CPU (at least with Solaris). > > I hope very much that these issues are also of interest > to other programmers. > Thank you in advance for your patience and your assistance!
-- Darryl Gove Compiler Performance Engineering Blog: http://blogs.sun.com/d/ Book: http://www.sun.com/books/catalog/solaris_app_programming.xml _______________________________________________ tools-compilers mailing list tools-compilers at opensolaris dot org
|
|
|
|
Posts:
16
From:
Registered:
7/7/06
|
|
|
|
Re: [tools-compilers] pthreads and store buffer flushing
Posted:
Mar 24, 2009 1:29 AM
in response to: Darryl Gove
To: Communities » tools » compilers
|
|
Hallo,
the basic question after all is the question of visibility of any change by one thread to other threads. So, what I understood and assume from all written (your answers, links, documention) is from the higl-level C perspective:
1) Any pthread_mutex_unlock (mutex_exit) causes any modified data to be written / flushed to memory. Modified data is then visible to any other thread / cpu. If a later read access is under a mutex (after pthread_mutex_lock / mutex_exit) the data is for sure read from memory, so volatile is not needed here, as it is illustrated in the programming examples for cv_wait. 2) Because of 1) visibility is not an issue for buffers or data structures that are required to be a accessed under a lock for data consistency or program flow control reasons. 3) Updates to data types that can be read / written in a single (atomic) operation and that are not protected by a mutex are not sure to be visible to other threads by simply using the volatile keyword. Instead they become visible after instructions that imply a flush of registers, store buffers, caches to memory. What I read between the lines such instructions / actions are: - mutex_lock / unlock operations - spawning a new thread - terminating a thread - calling a function Visibility to a thread reading this data would require the volatile keyword. ----- Things described under 3) are of course only of limited value, e.g. for a single flag or error value. Any more complex types and operations like "flags |= ERROR_A" might have a more or less unexpected result. When a second thread is doing "flags |= ERROR_B" at the same time the final result might easily be "flags == ERROR_B" omitting the flag ERROR_A that was intended to be set by the first thread.
Do you think 1) - 3) are correct?
|
|
|
|
Darryl Gove
Darryl.Gove@Sun.COM
|
|
|
|
Re: [tools-compilers] pthreads and store buffer flushing
Posted:
Mar 24, 2009 2:08 PM
in response to: miebert
|
|
Hi,
Yes, this is pretty much correct. I have some comments.
On 03/24/09 01:29 AM, Bert Miemietz wrote: > Hallo, > > the basic question after all is the question of visibility of any change by one > thread to other threads. So, what I understood and assume > from all written (your answers, links, documention) is > from the higl-level C perspective: > > 1) Any pthread_mutex_unlock (mutex_exit) causes any modified data to be > written / flushed to memory. Modified data is then visible to any other > thread / cpu. If a later read access is under a mutex (after pthread_mutex_lock / > mutex_exit) the data is for sure read from memory, so volatile is not needed > here, as it is illustrated in the programming examples for cv_wait.
volatile is probably not needed, because the compiler is likely to put the store in because there's a function call (to mutex_unlock) and the compiler doesn't know that the function mutex_unlock won't read the global data.
volatile = tells the compiler to load and store the data from/to memory.
memory barriers = tells the hardware to keep the memory operations in a consistent order.
> 2) Because of 1) visibility is not an issue for buffers or data structures that are > required to be a accessed under a lock for data consistency or program flow > control reasons.
Yes, the mutex_unlock code should include the appropriate memory barrier operations to ensure that the store that unlocks the mutex is seen after the store to data protected by the mutex.
> 3) Updates to data types that can be read / written in a single (atomic) operation > and that are not protected by a mutex are not sure to be visible to other > threads by simply using the volatile keyword. Right - it tells the compiler to do the store, it doesn't tell the hardware to do anything.
> Instead they become visible > after instructions that imply a flush of registers, store buffers, caches to memory. > What I read between the lines such instructions / actions are: > - mutex_lock / unlock operations > - spawning a new thread > - terminating a thread
All those.
> - calling a function
No, a function call does not imply a memory barrier.
I think where you might be being confused is in the following instance:
void func1() { global=1; func2(); }
void func2() { a=global; printf("a=%i",a); }
So the compiler will put the store of global into the code for func1 because it cannot be sure that func2 does not access the variable global.
func2 happens to read global, and since it is executed on the same processor it will always get the correct value (unless the processor runs an incredibly weak memory model).
If another thread were running just func2 while this thread were executing func1, then there is a requirement to pass the value of global to thread 2. On SPARC and x86 the memory model is such that they will see the correct value of global after the store. For some architectures with weaker memory models they might need to place an explicit barrier in to get the data off the chip.
So when do I really need memory barriers.
Here's a bit of pseudo code for setting some data up:
mutex_lock() data1=1; data2=2; data3=3; ... mutex_unlock();
Suppose my thread gets the lock. Writes the data out, and now wants to release the lock.
A release of the lock is a write of zero to the memory location of the lock:
lock->locked=0;
The issue is that I have a stream of stores pending in my store queue etc. I need those to be visible to other processors before I can do my store to the lock variable. Otherwise another thread might see the release of the lock before it sees the new values for the variables.
So my unlock is really:
memory_barrier(); lock->locked=0;
For example: http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libc/sparc/gen/lock.s
> Visibility to a thread reading this data would require the volatile keyword. > ----- > Things described under 3) are of course only of limited value, e.g. for a single flag > or error value. Any more complex types and operations like "flags |= ERROR_A" > might have a more or less unexpected result. When a second thread is doing > "flags |= ERROR_B" at the same time the final result might easily be > "flags == ERROR_B" omitting the flag ERROR_A that was intended to be set > by the first thread.
Exactly - these need to be protected by a mutex (or done atomically - man atomic_ops on S10) otherwise there's a data race.
> > Do you think 1) - 3) are correct?
About.
Regards,
Darryl.
-- Darryl Gove Compiler Performance Engineering Blog: http://blogs.sun.com/d/ Book: http://www.sun.com/books/catalog/solaris_app_programming.xml _______________________________________________ tools-compilers mailing list tools-compilers at opensolaris dot org
|
|
|
|
Posts:
16
From:
Registered:
7/7/06
|
|
|
|
Re: [tools-compilers] pthreads and store buffer flushing
Posted:
Apr 3, 2009 9:42 PM
in response to: Darryl Gove
To: Communities » tools » compilers
|
|
Hallo Darryl,
thank you very much for your helpful explanations and the time you spent on showing more details about what's really going on.
My issue indeed was very much about the visibility of environmental global data to threads, about sharing of data, buffers and (error) flags and about passing parameters and return values. Not every code that appearently works is necessarily correct and portable (e.g. the omitted volatile in my example).
Perhaps it might be a good idea to add some more explanations about inter-thread visibility of stored data to the "Multithreaded Programming Guide".
Wrong assumptions about the visibility of data might cause elusive bugs. On the other hand an unnecessary excessive use of locks/mutexes might end up in ugly code that is difficult to maintain and where there is an increased risk of running into deadlock pitfalls.
Thank your very much for your help!
(Sorry for the late answer - been away from office)
|
|
|
|
|