Home
last modified time | relevance | path

Searched hist:"3 b6a19b2" (Results 1 – 14 of 14) sorted by relevance

/dragonfly/sys/dev/drm/include/linux/
H A Dmutex.h3b6a19b2 Tue Oct 24 01:39:16 GMT 2017 Matthew Dillon <dillon@apollo.backplane.com> kernel - Refactor lockmgr()

* Seriously refactor lockmgr() so we can use atomic_fetchadd_*() for
shared locks and reduce unnecessary atomic ops and atomic op loops.

The main win here is being able to use atomic_fetchadd_*() when
acquiring and releasing shared locks. A simple fstat() loop (which
utilizes a LK_SHARED lockmgr lock on the vnode) improves from 191ns
to around 110ns per loop with 32 concurrent threads (on a 16-core/
32-thread xeon).

* To accomplish this, the 32-bit lk_count field becomes 64-bits. The
shared count is separated into the high 32-bits, allowing it to be
manipulated for both blocking shared requests and the shared lock
count field. The low count bits are used for exclusive locks.
Control bits are adjusted to manage lockmgr features.

LKC_SHARED Indicates shared lock count is active, else excl lock
count. Can predispose the lock when the related count
is 0 (does not have to be cleared, for example).

LKC_UPREQ Queued upgrade request. Automatically granted by
releasing entity (UPREQ -> ~SHARED|1).

LKC_EXREQ Queued exclusive request (only when lock held shared).
Automatically granted by releasing entity
(EXREQ -> ~SHARED|1).

LKC_EXREQ2 Aggregated exclusive request. When EXREQ cannot be
obtained due to the lock being held exclusively or
EXREQ already being queued, EXREQ2 is flagged for
wakeup/retries.

LKC_CANCEL Cancel API support

LKC_SMASK Shared lock count mask (LKC_SCOUNT increments).

LKC_XMASK Exclusive lock count mask (+1 increments)

The 'no lock' condition occurs when LKC_XMASK is 0 and LKC_SMASK is
0, regardless of the state of LKC_SHARED.

* Lockmgr still supports exclusive priority over shared locks. The
semantics have slightly changed. The priority mechanism only applies
to the EXREQ holder. Once an exclusive lock is obtained, any blocking
shared or exclusive locks will have equal priority until the exclusive
lock is released. Once released, shared locks can squeeze in, but
then the next pending exclusive lock will assert its priority over
any new shared locks when it wakes up and loops.

This isn't quite what I wanted, but it seems to work quite well. I
had to make a trade-off in the EXREQ lock-grant mechanism to improve
performance.

* In addition, we use atomic_fcmpset_long() instead of
atomic_cmpset_long() to reduce cache line flip flopping at least
a little.

* Remove lockcount() and lockcountnb(), which tried to count lock refs.
Replace with lockinuse(), which simply tells the caller whether the
lock is referenced or not.

* Expand some of the copyright notices (years and authors) for major
rewrites. Really there are a lot more and I have to pay more attention
to adjustments.
H A Dspinlock.h3b6a19b2 Tue Oct 24 01:39:16 GMT 2017 Matthew Dillon <dillon@apollo.backplane.com> kernel - Refactor lockmgr()

* Seriously refactor lockmgr() so we can use atomic_fetchadd_*() for
shared locks and reduce unnecessary atomic ops and atomic op loops.

The main win here is being able to use atomic_fetchadd_*() when
acquiring and releasing shared locks. A simple fstat() loop (which
utilizes a LK_SHARED lockmgr lock on the vnode) improves from 191ns
to around 110ns per loop with 32 concurrent threads (on a 16-core/
32-thread xeon).

* To accomplish this, the 32-bit lk_count field becomes 64-bits. The
shared count is separated into the high 32-bits, allowing it to be
manipulated for both blocking shared requests and the shared lock
count field. The low count bits are used for exclusive locks.
Control bits are adjusted to manage lockmgr features.

LKC_SHARED Indicates shared lock count is active, else excl lock
count. Can predispose the lock when the related count
is 0 (does not have to be cleared, for example).

LKC_UPREQ Queued upgrade request. Automatically granted by
releasing entity (UPREQ -> ~SHARED|1).

LKC_EXREQ Queued exclusive request (only when lock held shared).
Automatically granted by releasing entity
(EXREQ -> ~SHARED|1).

LKC_EXREQ2 Aggregated exclusive request. When EXREQ cannot be
obtained due to the lock being held exclusively or
EXREQ already being queued, EXREQ2 is flagged for
wakeup/retries.

LKC_CANCEL Cancel API support

LKC_SMASK Shared lock count mask (LKC_SCOUNT increments).

LKC_XMASK Exclusive lock count mask (+1 increments)

The 'no lock' condition occurs when LKC_XMASK is 0 and LKC_SMASK is
0, regardless of the state of LKC_SHARED.

* Lockmgr still supports exclusive priority over shared locks. The
semantics have slightly changed. The priority mechanism only applies
to the EXREQ holder. Once an exclusive lock is obtained, any blocking
shared or exclusive locks will have equal priority until the exclusive
lock is released. Once released, shared locks can squeeze in, but
then the next pending exclusive lock will assert its priority over
any new shared locks when it wakes up and loops.

This isn't quite what I wanted, but it seems to work quite well. I
had to make a trade-off in the EXREQ lock-grant mechanism to improve
performance.

* In addition, we use atomic_fcmpset_long() instead of
atomic_cmpset_long() to reduce cache line flip flopping at least
a little.

* Remove lockcount() and lockcountnb(), which tried to count lock refs.
Replace with lockinuse(), which simply tells the caller whether the
lock is referenced or not.

* Expand some of the copyright notices (years and authors) for major
rewrites. Really there are a lot more and I have to pay more attention
to adjustments.
/dragonfly/sys/dev/disk/dm/
H A Ddm_table.c3b6a19b2 Tue Oct 24 01:39:16 GMT 2017 Matthew Dillon <dillon@apollo.backplane.com> kernel - Refactor lockmgr()

* Seriously refactor lockmgr() so we can use atomic_fetchadd_*() for
shared locks and reduce unnecessary atomic ops and atomic op loops.

The main win here is being able to use atomic_fetchadd_*() when
acquiring and releasing shared locks. A simple fstat() loop (which
utilizes a LK_SHARED lockmgr lock on the vnode) improves from 191ns
to around 110ns per loop with 32 concurrent threads (on a 16-core/
32-thread xeon).

* To accomplish this, the 32-bit lk_count field becomes 64-bits. The
shared count is separated into the high 32-bits, allowing it to be
manipulated for both blocking shared requests and the shared lock
count field. The low count bits are used for exclusive locks.
Control bits are adjusted to manage lockmgr features.

LKC_SHARED Indicates shared lock count is active, else excl lock
count. Can predispose the lock when the related count
is 0 (does not have to be cleared, for example).

LKC_UPREQ Queued upgrade request. Automatically granted by
releasing entity (UPREQ -> ~SHARED|1).

LKC_EXREQ Queued exclusive request (only when lock held shared).
Automatically granted by releasing entity
(EXREQ -> ~SHARED|1).

LKC_EXREQ2 Aggregated exclusive request. When EXREQ cannot be
obtained due to the lock being held exclusively or
EXREQ already being queued, EXREQ2 is flagged for
wakeup/retries.

LKC_CANCEL Cancel API support

LKC_SMASK Shared lock count mask (LKC_SCOUNT increments).

LKC_XMASK Exclusive lock count mask (+1 increments)

The 'no lock' condition occurs when LKC_XMASK is 0 and LKC_SMASK is
0, regardless of the state of LKC_SHARED.

* Lockmgr still supports exclusive priority over shared locks. The
semantics have slightly changed. The priority mechanism only applies
to the EXREQ holder. Once an exclusive lock is obtained, any blocking
shared or exclusive locks will have equal priority until the exclusive
lock is released. Once released, shared locks can squeeze in, but
then the next pending exclusive lock will assert its priority over
any new shared locks when it wakes up and loops.

This isn't quite what I wanted, but it seems to work quite well. I
had to make a trade-off in the EXREQ lock-grant mechanism to improve
performance.

* In addition, we use atomic_fcmpset_long() instead of
atomic_cmpset_long() to reduce cache line flip flopping at least
a little.

* Remove lockcount() and lockcountnb(), which tried to count lock refs.
Replace with lockinuse(), which simply tells the caller whether the
lock is referenced or not.

* Expand some of the copyright notices (years and authors) for major
rewrites. Really there are a lot more and I have to pay more attention
to adjustments.
/dragonfly/sys/sys/
H A Dbuf2.h3b6a19b2 Tue Oct 24 01:39:16 GMT 2017 Matthew Dillon <dillon@apollo.backplane.com> kernel - Refactor lockmgr()

* Seriously refactor lockmgr() so we can use atomic_fetchadd_*() for
shared locks and reduce unnecessary atomic ops and atomic op loops.

The main win here is being able to use atomic_fetchadd_*() when
acquiring and releasing shared locks. A simple fstat() loop (which
utilizes a LK_SHARED lockmgr lock on the vnode) improves from 191ns
to around 110ns per loop with 32 concurrent threads (on a 16-core/
32-thread xeon).

* To accomplish this, the 32-bit lk_count field becomes 64-bits. The
shared count is separated into the high 32-bits, allowing it to be
manipulated for both blocking shared requests and the shared lock
count field. The low count bits are used for exclusive locks.
Control bits are adjusted to manage lockmgr features.

LKC_SHARED Indicates shared lock count is active, else excl lock
count. Can predispose the lock when the related count
is 0 (does not have to be cleared, for example).

LKC_UPREQ Queued upgrade request. Automatically granted by
releasing entity (UPREQ -> ~SHARED|1).

LKC_EXREQ Queued exclusive request (only when lock held shared).
Automatically granted by releasing entity
(EXREQ -> ~SHARED|1).

LKC_EXREQ2 Aggregated exclusive request. When EXREQ cannot be
obtained due to the lock being held exclusively or
EXREQ already being queued, EXREQ2 is flagged for
wakeup/retries.

LKC_CANCEL Cancel API support

LKC_SMASK Shared lock count mask (LKC_SCOUNT increments).

LKC_XMASK Exclusive lock count mask (+1 increments)

The 'no lock' condition occurs when LKC_XMASK is 0 and LKC_SMASK is
0, regardless of the state of LKC_SHARED.

* Lockmgr still supports exclusive priority over shared locks. The
semantics have slightly changed. The priority mechanism only applies
to the EXREQ holder. Once an exclusive lock is obtained, any blocking
shared or exclusive locks will have equal priority until the exclusive
lock is released. Once released, shared locks can squeeze in, but
then the next pending exclusive lock will assert its priority over
any new shared locks when it wakes up and loops.

This isn't quite what I wanted, but it seems to work quite well. I
had to make a trade-off in the EXREQ lock-grant mechanism to improve
performance.

* In addition, we use atomic_fcmpset_long() instead of
atomic_cmpset_long() to reduce cache line flip flopping at least
a little.

* Remove lockcount() and lockcountnb(), which tried to count lock refs.
Replace with lockinuse(), which simply tells the caller whether the
lock is referenced or not.

* Expand some of the copyright notices (years and authors) for major
rewrites. Really there are a lot more and I have to pay more attention
to adjustments.
H A Dlock.h3b6a19b2 Tue Oct 24 01:39:16 GMT 2017 Matthew Dillon <dillon@apollo.backplane.com> kernel - Refactor lockmgr()

* Seriously refactor lockmgr() so we can use atomic_fetchadd_*() for
shared locks and reduce unnecessary atomic ops and atomic op loops.

The main win here is being able to use atomic_fetchadd_*() when
acquiring and releasing shared locks. A simple fstat() loop (which
utilizes a LK_SHARED lockmgr lock on the vnode) improves from 191ns
to around 110ns per loop with 32 concurrent threads (on a 16-core/
32-thread xeon).

* To accomplish this, the 32-bit lk_count field becomes 64-bits. The
shared count is separated into the high 32-bits, allowing it to be
manipulated for both blocking shared requests and the shared lock
count field. The low count bits are used for exclusive locks.
Control bits are adjusted to manage lockmgr features.

LKC_SHARED Indicates shared lock count is active, else excl lock
count. Can predispose the lock when the related count
is 0 (does not have to be cleared, for example).

LKC_UPREQ Queued upgrade request. Automatically granted by
releasing entity (UPREQ -> ~SHARED|1).

LKC_EXREQ Queued exclusive request (only when lock held shared).
Automatically granted by releasing entity
(EXREQ -> ~SHARED|1).

LKC_EXREQ2 Aggregated exclusive request. When EXREQ cannot be
obtained due to the lock being held exclusively or
EXREQ already being queued, EXREQ2 is flagged for
wakeup/retries.

LKC_CANCEL Cancel API support

LKC_SMASK Shared lock count mask (LKC_SCOUNT increments).

LKC_XMASK Exclusive lock count mask (+1 increments)

The 'no lock' condition occurs when LKC_XMASK is 0 and LKC_SMASK is
0, regardless of the state of LKC_SHARED.

* Lockmgr still supports exclusive priority over shared locks. The
semantics have slightly changed. The priority mechanism only applies
to the EXREQ holder. Once an exclusive lock is obtained, any blocking
shared or exclusive locks will have equal priority until the exclusive
lock is released. Once released, shared locks can squeeze in, but
then the next pending exclusive lock will assert its priority over
any new shared locks when it wakes up and loops.

This isn't quite what I wanted, but it seems to work quite well. I
had to make a trade-off in the EXREQ lock-grant mechanism to improve
performance.

* In addition, we use atomic_fcmpset_long() instead of
atomic_cmpset_long() to reduce cache line flip flopping at least
a little.

* Remove lockcount() and lockcountnb(), which tried to count lock refs.
Replace with lockinuse(), which simply tells the caller whether the
lock is referenced or not.

* Expand some of the copyright notices (years and authors) for major
rewrites. Really there are a lot more and I have to pay more attention
to adjustments.
/dragonfly/sys/vfs/tmpfs/
H A Dtmpfs_subr.c3b6a19b2 Tue Oct 24 01:39:16 GMT 2017 Matthew Dillon <dillon@apollo.backplane.com> kernel - Refactor lockmgr()

* Seriously refactor lockmgr() so we can use atomic_fetchadd_*() for
shared locks and reduce unnecessary atomic ops and atomic op loops.

The main win here is being able to use atomic_fetchadd_*() when
acquiring and releasing shared locks. A simple fstat() loop (which
utilizes a LK_SHARED lockmgr lock on the vnode) improves from 191ns
to around 110ns per loop with 32 concurrent threads (on a 16-core/
32-thread xeon).

* To accomplish this, the 32-bit lk_count field becomes 64-bits. The
shared count is separated into the high 32-bits, allowing it to be
manipulated for both blocking shared requests and the shared lock
count field. The low count bits are used for exclusive locks.
Control bits are adjusted to manage lockmgr features.

LKC_SHARED Indicates shared lock count is active, else excl lock
count. Can predispose the lock when the related count
is 0 (does not have to be cleared, for example).

LKC_UPREQ Queued upgrade request. Automatically granted by
releasing entity (UPREQ -> ~SHARED|1).

LKC_EXREQ Queued exclusive request (only when lock held shared).
Automatically granted by releasing entity
(EXREQ -> ~SHARED|1).

LKC_EXREQ2 Aggregated exclusive request. When EXREQ cannot be
obtained due to the lock being held exclusively or
EXREQ already being queued, EXREQ2 is flagged for
wakeup/retries.

LKC_CANCEL Cancel API support

LKC_SMASK Shared lock count mask (LKC_SCOUNT increments).

LKC_XMASK Exclusive lock count mask (+1 increments)

The 'no lock' condition occurs when LKC_XMASK is 0 and LKC_SMASK is
0, regardless of the state of LKC_SHARED.

* Lockmgr still supports exclusive priority over shared locks. The
semantics have slightly changed. The priority mechanism only applies
to the EXREQ holder. Once an exclusive lock is obtained, any blocking
shared or exclusive locks will have equal priority until the exclusive
lock is released. Once released, shared locks can squeeze in, but
then the next pending exclusive lock will assert its priority over
any new shared locks when it wakes up and loops.

This isn't quite what I wanted, but it seems to work quite well. I
had to make a trade-off in the EXREQ lock-grant mechanism to improve
performance.

* In addition, we use atomic_fcmpset_long() instead of
atomic_cmpset_long() to reduce cache line flip flopping at least
a little.

* Remove lockcount() and lockcountnb(), which tried to count lock refs.
Replace with lockinuse(), which simply tells the caller whether the
lock is referenced or not.

* Expand some of the copyright notices (years and authors) for major
rewrites. Really there are a lot more and I have to pay more attention
to adjustments.
/dragonfly/sys/kern/
H A Dvfs_lock.c3b6a19b2 Tue Oct 24 01:39:16 GMT 2017 Matthew Dillon <dillon@apollo.backplane.com> kernel - Refactor lockmgr()

* Seriously refactor lockmgr() so we can use atomic_fetchadd_*() for
shared locks and reduce unnecessary atomic ops and atomic op loops.

The main win here is being able to use atomic_fetchadd_*() when
acquiring and releasing shared locks. A simple fstat() loop (which
utilizes a LK_SHARED lockmgr lock on the vnode) improves from 191ns
to around 110ns per loop with 32 concurrent threads (on a 16-core/
32-thread xeon).

* To accomplish this, the 32-bit lk_count field becomes 64-bits. The
shared count is separated into the high 32-bits, allowing it to be
manipulated for both blocking shared requests and the shared lock
count field. The low count bits are used for exclusive locks.
Control bits are adjusted to manage lockmgr features.

LKC_SHARED Indicates shared lock count is active, else excl lock
count. Can predispose the lock when the related count
is 0 (does not have to be cleared, for example).

LKC_UPREQ Queued upgrade request. Automatically granted by
releasing entity (UPREQ -> ~SHARED|1).

LKC_EXREQ Queued exclusive request (only when lock held shared).
Automatically granted by releasing entity
(EXREQ -> ~SHARED|1).

LKC_EXREQ2 Aggregated exclusive request. When EXREQ cannot be
obtained due to the lock being held exclusively or
EXREQ already being queued, EXREQ2 is flagged for
wakeup/retries.

LKC_CANCEL Cancel API support

LKC_SMASK Shared lock count mask (LKC_SCOUNT increments).

LKC_XMASK Exclusive lock count mask (+1 increments)

The 'no lock' condition occurs when LKC_XMASK is 0 and LKC_SMASK is
0, regardless of the state of LKC_SHARED.

* Lockmgr still supports exclusive priority over shared locks. The
semantics have slightly changed. The priority mechanism only applies
to the EXREQ holder. Once an exclusive lock is obtained, any blocking
shared or exclusive locks will have equal priority until the exclusive
lock is released. Once released, shared locks can squeeze in, but
then the next pending exclusive lock will assert its priority over
any new shared locks when it wakes up and loops.

This isn't quite what I wanted, but it seems to work quite well. I
had to make a trade-off in the EXREQ lock-grant mechanism to improve
performance.

* In addition, we use atomic_fcmpset_long() instead of
atomic_cmpset_long() to reduce cache line flip flopping at least
a little.

* Remove lockcount() and lockcountnb(), which tried to count lock refs.
Replace with lockinuse(), which simply tells the caller whether the
lock is referenced or not.

* Expand some of the copyright notices (years and authors) for major
rewrites. Really there are a lot more and I have to pay more attention
to adjustments.
H A Dkern_lock.c3b6a19b2 Tue Oct 24 01:39:16 GMT 2017 Matthew Dillon <dillon@apollo.backplane.com> kernel - Refactor lockmgr()

* Seriously refactor lockmgr() so we can use atomic_fetchadd_*() for
shared locks and reduce unnecessary atomic ops and atomic op loops.

The main win here is being able to use atomic_fetchadd_*() when
acquiring and releasing shared locks. A simple fstat() loop (which
utilizes a LK_SHARED lockmgr lock on the vnode) improves from 191ns
to around 110ns per loop with 32 concurrent threads (on a 16-core/
32-thread xeon).

* To accomplish this, the 32-bit lk_count field becomes 64-bits. The
shared count is separated into the high 32-bits, allowing it to be
manipulated for both blocking shared requests and the shared lock
count field. The low count bits are used for exclusive locks.
Control bits are adjusted to manage lockmgr features.

LKC_SHARED Indicates shared lock count is active, else excl lock
count. Can predispose the lock when the related count
is 0 (does not have to be cleared, for example).

LKC_UPREQ Queued upgrade request. Automatically granted by
releasing entity (UPREQ -> ~SHARED|1).

LKC_EXREQ Queued exclusive request (only when lock held shared).
Automatically granted by releasing entity
(EXREQ -> ~SHARED|1).

LKC_EXREQ2 Aggregated exclusive request. When EXREQ cannot be
obtained due to the lock being held exclusively or
EXREQ already being queued, EXREQ2 is flagged for
wakeup/retries.

LKC_CANCEL Cancel API support

LKC_SMASK Shared lock count mask (LKC_SCOUNT increments).

LKC_XMASK Exclusive lock count mask (+1 increments)

The 'no lock' condition occurs when LKC_XMASK is 0 and LKC_SMASK is
0, regardless of the state of LKC_SHARED.

* Lockmgr still supports exclusive priority over shared locks. The
semantics have slightly changed. The priority mechanism only applies
to the EXREQ holder. Once an exclusive lock is obtained, any blocking
shared or exclusive locks will have equal priority until the exclusive
lock is released. Once released, shared locks can squeeze in, but
then the next pending exclusive lock will assert its priority over
any new shared locks when it wakes up and loops.

This isn't quite what I wanted, but it seems to work quite well. I
had to make a trade-off in the EXREQ lock-grant mechanism to improve
performance.

* In addition, we use atomic_fcmpset_long() instead of
atomic_cmpset_long() to reduce cache line flip flopping at least
a little.

* Remove lockcount() and lockcountnb(), which tried to count lock refs.
Replace with lockinuse(), which simply tells the caller whether the
lock is referenced or not.

* Expand some of the copyright notices (years and authors) for major
rewrites. Really there are a lot more and I have to pay more attention
to adjustments.
H A Dkern_shutdown.c3b6a19b2 Tue Oct 24 01:39:16 GMT 2017 Matthew Dillon <dillon@apollo.backplane.com> kernel - Refactor lockmgr()

* Seriously refactor lockmgr() so we can use atomic_fetchadd_*() for
shared locks and reduce unnecessary atomic ops and atomic op loops.

The main win here is being able to use atomic_fetchadd_*() when
acquiring and releasing shared locks. A simple fstat() loop (which
utilizes a LK_SHARED lockmgr lock on the vnode) improves from 191ns
to around 110ns per loop with 32 concurrent threads (on a 16-core/
32-thread xeon).

* To accomplish this, the 32-bit lk_count field becomes 64-bits. The
shared count is separated into the high 32-bits, allowing it to be
manipulated for both blocking shared requests and the shared lock
count field. The low count bits are used for exclusive locks.
Control bits are adjusted to manage lockmgr features.

LKC_SHARED Indicates shared lock count is active, else excl lock
count. Can predispose the lock when the related count
is 0 (does not have to be cleared, for example).

LKC_UPREQ Queued upgrade request. Automatically granted by
releasing entity (UPREQ -> ~SHARED|1).

LKC_EXREQ Queued exclusive request (only when lock held shared).
Automatically granted by releasing entity
(EXREQ -> ~SHARED|1).

LKC_EXREQ2 Aggregated exclusive request. When EXREQ cannot be
obtained due to the lock being held exclusively or
EXREQ already being queued, EXREQ2 is flagged for
wakeup/retries.

LKC_CANCEL Cancel API support

LKC_SMASK Shared lock count mask (LKC_SCOUNT increments).

LKC_XMASK Exclusive lock count mask (+1 increments)

The 'no lock' condition occurs when LKC_XMASK is 0 and LKC_SMASK is
0, regardless of the state of LKC_SHARED.

* Lockmgr still supports exclusive priority over shared locks. The
semantics have slightly changed. The priority mechanism only applies
to the EXREQ holder. Once an exclusive lock is obtained, any blocking
shared or exclusive locks will have equal priority until the exclusive
lock is released. Once released, shared locks can squeeze in, but
then the next pending exclusive lock will assert its priority over
any new shared locks when it wakes up and loops.

This isn't quite what I wanted, but it seems to work quite well. I
had to make a trade-off in the EXREQ lock-grant mechanism to improve
performance.

* In addition, we use atomic_fcmpset_long() instead of
atomic_cmpset_long() to reduce cache line flip flopping at least
a little.

* Remove lockcount() and lockcountnb(), which tried to count lock refs.
Replace with lockinuse(), which simply tells the caller whether the
lock is referenced or not.

* Expand some of the copyright notices (years and authors) for major
rewrites. Really there are a lot more and I have to pay more attention
to adjustments.
H A Dvfs_bio.c3b6a19b2 Tue Oct 24 01:39:16 GMT 2017 Matthew Dillon <dillon@apollo.backplane.com> kernel - Refactor lockmgr()

* Seriously refactor lockmgr() so we can use atomic_fetchadd_*() for
shared locks and reduce unnecessary atomic ops and atomic op loops.

The main win here is being able to use atomic_fetchadd_*() when
acquiring and releasing shared locks. A simple fstat() loop (which
utilizes a LK_SHARED lockmgr lock on the vnode) improves from 191ns
to around 110ns per loop with 32 concurrent threads (on a 16-core/
32-thread xeon).

* To accomplish this, the 32-bit lk_count field becomes 64-bits. The
shared count is separated into the high 32-bits, allowing it to be
manipulated for both blocking shared requests and the shared lock
count field. The low count bits are used for exclusive locks.
Control bits are adjusted to manage lockmgr features.

LKC_SHARED Indicates shared lock count is active, else excl lock
count. Can predispose the lock when the related count
is 0 (does not have to be cleared, for example).

LKC_UPREQ Queued upgrade request. Automatically granted by
releasing entity (UPREQ -> ~SHARED|1).

LKC_EXREQ Queued exclusive request (only when lock held shared).
Automatically granted by releasing entity
(EXREQ -> ~SHARED|1).

LKC_EXREQ2 Aggregated exclusive request. When EXREQ cannot be
obtained due to the lock being held exclusively or
EXREQ already being queued, EXREQ2 is flagged for
wakeup/retries.

LKC_CANCEL Cancel API support

LKC_SMASK Shared lock count mask (LKC_SCOUNT increments).

LKC_XMASK Exclusive lock count mask (+1 increments)

The 'no lock' condition occurs when LKC_XMASK is 0 and LKC_SMASK is
0, regardless of the state of LKC_SHARED.

* Lockmgr still supports exclusive priority over shared locks. The
semantics have slightly changed. The priority mechanism only applies
to the EXREQ holder. Once an exclusive lock is obtained, any blocking
shared or exclusive locks will have equal priority until the exclusive
lock is released. Once released, shared locks can squeeze in, but
then the next pending exclusive lock will assert its priority over
any new shared locks when it wakes up and loops.

This isn't quite what I wanted, but it seems to work quite well. I
had to make a trade-off in the EXREQ lock-grant mechanism to improve
performance.

* In addition, we use atomic_fcmpset_long() instead of
atomic_cmpset_long() to reduce cache line flip flopping at least
a little.

* Remove lockcount() and lockcountnb(), which tried to count lock refs.
Replace with lockinuse(), which simply tells the caller whether the
lock is referenced or not.

* Expand some of the copyright notices (years and authors) for major
rewrites. Really there are a lot more and I have to pay more attention
to adjustments.
H A Dvfs_subr.c3b6a19b2 Tue Oct 24 01:39:16 GMT 2017 Matthew Dillon <dillon@apollo.backplane.com> kernel - Refactor lockmgr()

* Seriously refactor lockmgr() so we can use atomic_fetchadd_*() for
shared locks and reduce unnecessary atomic ops and atomic op loops.

The main win here is being able to use atomic_fetchadd_*() when
acquiring and releasing shared locks. A simple fstat() loop (which
utilizes a LK_SHARED lockmgr lock on the vnode) improves from 191ns
to around 110ns per loop with 32 concurrent threads (on a 16-core/
32-thread xeon).

* To accomplish this, the 32-bit lk_count field becomes 64-bits. The
shared count is separated into the high 32-bits, allowing it to be
manipulated for both blocking shared requests and the shared lock
count field. The low count bits are used for exclusive locks.
Control bits are adjusted to manage lockmgr features.

LKC_SHARED Indicates shared lock count is active, else excl lock
count. Can predispose the lock when the related count
is 0 (does not have to be cleared, for example).

LKC_UPREQ Queued upgrade request. Automatically granted by
releasing entity (UPREQ -> ~SHARED|1).

LKC_EXREQ Queued exclusive request (only when lock held shared).
Automatically granted by releasing entity
(EXREQ -> ~SHARED|1).

LKC_EXREQ2 Aggregated exclusive request. When EXREQ cannot be
obtained due to the lock being held exclusively or
EXREQ already being queued, EXREQ2 is flagged for
wakeup/retries.

LKC_CANCEL Cancel API support

LKC_SMASK Shared lock count mask (LKC_SCOUNT increments).

LKC_XMASK Exclusive lock count mask (+1 increments)

The 'no lock' condition occurs when LKC_XMASK is 0 and LKC_SMASK is
0, regardless of the state of LKC_SHARED.

* Lockmgr still supports exclusive priority over shared locks. The
semantics have slightly changed. The priority mechanism only applies
to the EXREQ holder. Once an exclusive lock is obtained, any blocking
shared or exclusive locks will have equal priority until the exclusive
lock is released. Once released, shared locks can squeeze in, but
then the next pending exclusive lock will assert its priority over
any new shared locks when it wakes up and loops.

This isn't quite what I wanted, but it seems to work quite well. I
had to make a trade-off in the EXREQ lock-grant mechanism to improve
performance.

* In addition, we use atomic_fcmpset_long() instead of
atomic_cmpset_long() to reduce cache line flip flopping at least
a little.

* Remove lockcount() and lockcountnb(), which tried to count lock refs.
Replace with lockinuse(), which simply tells the caller whether the
lock is referenced or not.

* Expand some of the copyright notices (years and authors) for major
rewrites. Really there are a lot more and I have to pay more attention
to adjustments.
/dragonfly/sys/vfs/ufs/
H A Dffs_softdep.c3b6a19b2 Tue Oct 24 01:39:16 GMT 2017 Matthew Dillon <dillon@apollo.backplane.com> kernel - Refactor lockmgr()

* Seriously refactor lockmgr() so we can use atomic_fetchadd_*() for
shared locks and reduce unnecessary atomic ops and atomic op loops.

The main win here is being able to use atomic_fetchadd_*() when
acquiring and releasing shared locks. A simple fstat() loop (which
utilizes a LK_SHARED lockmgr lock on the vnode) improves from 191ns
to around 110ns per loop with 32 concurrent threads (on a 16-core/
32-thread xeon).

* To accomplish this, the 32-bit lk_count field becomes 64-bits. The
shared count is separated into the high 32-bits, allowing it to be
manipulated for both blocking shared requests and the shared lock
count field. The low count bits are used for exclusive locks.
Control bits are adjusted to manage lockmgr features.

LKC_SHARED Indicates shared lock count is active, else excl lock
count. Can predispose the lock when the related count
is 0 (does not have to be cleared, for example).

LKC_UPREQ Queued upgrade request. Automatically granted by
releasing entity (UPREQ -> ~SHARED|1).

LKC_EXREQ Queued exclusive request (only when lock held shared).
Automatically granted by releasing entity
(EXREQ -> ~SHARED|1).

LKC_EXREQ2 Aggregated exclusive request. When EXREQ cannot be
obtained due to the lock being held exclusively or
EXREQ already being queued, EXREQ2 is flagged for
wakeup/retries.

LKC_CANCEL Cancel API support

LKC_SMASK Shared lock count mask (LKC_SCOUNT increments).

LKC_XMASK Exclusive lock count mask (+1 increments)

The 'no lock' condition occurs when LKC_XMASK is 0 and LKC_SMASK is
0, regardless of the state of LKC_SHARED.

* Lockmgr still supports exclusive priority over shared locks. The
semantics have slightly changed. The priority mechanism only applies
to the EXREQ holder. Once an exclusive lock is obtained, any blocking
shared or exclusive locks will have equal priority until the exclusive
lock is released. Once released, shared locks can squeeze in, but
then the next pending exclusive lock will assert its priority over
any new shared locks when it wakes up and loops.

This isn't quite what I wanted, but it seems to work quite well. I
had to make a trade-off in the EXREQ lock-grant mechanism to improve
performance.

* In addition, we use atomic_fcmpset_long() instead of
atomic_cmpset_long() to reduce cache line flip flopping at least
a little.

* Remove lockcount() and lockcountnb(), which tried to count lock refs.
Replace with lockinuse(), which simply tells the caller whether the
lock is referenced or not.

* Expand some of the copyright notices (years and authors) for major
rewrites. Really there are a lot more and I have to pay more attention
to adjustments.
/dragonfly/sys/vfs/nfs/
H A Dnfs_subs.c3b6a19b2 Tue Oct 24 01:39:16 GMT 2017 Matthew Dillon <dillon@apollo.backplane.com> kernel - Refactor lockmgr()

* Seriously refactor lockmgr() so we can use atomic_fetchadd_*() for
shared locks and reduce unnecessary atomic ops and atomic op loops.

The main win here is being able to use atomic_fetchadd_*() when
acquiring and releasing shared locks. A simple fstat() loop (which
utilizes a LK_SHARED lockmgr lock on the vnode) improves from 191ns
to around 110ns per loop with 32 concurrent threads (on a 16-core/
32-thread xeon).

* To accomplish this, the 32-bit lk_count field becomes 64-bits. The
shared count is separated into the high 32-bits, allowing it to be
manipulated for both blocking shared requests and the shared lock
count field. The low count bits are used for exclusive locks.
Control bits are adjusted to manage lockmgr features.

LKC_SHARED Indicates shared lock count is active, else excl lock
count. Can predispose the lock when the related count
is 0 (does not have to be cleared, for example).

LKC_UPREQ Queued upgrade request. Automatically granted by
releasing entity (UPREQ -> ~SHARED|1).

LKC_EXREQ Queued exclusive request (only when lock held shared).
Automatically granted by releasing entity
(EXREQ -> ~SHARED|1).

LKC_EXREQ2 Aggregated exclusive request. When EXREQ cannot be
obtained due to the lock being held exclusively or
EXREQ already being queued, EXREQ2 is flagged for
wakeup/retries.

LKC_CANCEL Cancel API support

LKC_SMASK Shared lock count mask (LKC_SCOUNT increments).

LKC_XMASK Exclusive lock count mask (+1 increments)

The 'no lock' condition occurs when LKC_XMASK is 0 and LKC_SMASK is
0, regardless of the state of LKC_SHARED.

* Lockmgr still supports exclusive priority over shared locks. The
semantics have slightly changed. The priority mechanism only applies
to the EXREQ holder. Once an exclusive lock is obtained, any blocking
shared or exclusive locks will have equal priority until the exclusive
lock is released. Once released, shared locks can squeeze in, but
then the next pending exclusive lock will assert its priority over
any new shared locks when it wakes up and loops.

This isn't quite what I wanted, but it seems to work quite well. I
had to make a trade-off in the EXREQ lock-grant mechanism to improve
performance.

* In addition, we use atomic_fcmpset_long() instead of
atomic_cmpset_long() to reduce cache line flip flopping at least
a little.

* Remove lockcount() and lockcountnb(), which tried to count lock refs.
Replace with lockinuse(), which simply tells the caller whether the
lock is referenced or not.

* Expand some of the copyright notices (years and authors) for major
rewrites. Really there are a lot more and I have to pay more attention
to adjustments.
H A Dnfs_vnops.c3b6a19b2 Tue Oct 24 01:39:16 GMT 2017 Matthew Dillon <dillon@apollo.backplane.com> kernel - Refactor lockmgr()

* Seriously refactor lockmgr() so we can use atomic_fetchadd_*() for
shared locks and reduce unnecessary atomic ops and atomic op loops.

The main win here is being able to use atomic_fetchadd_*() when
acquiring and releasing shared locks. A simple fstat() loop (which
utilizes a LK_SHARED lockmgr lock on the vnode) improves from 191ns
to around 110ns per loop with 32 concurrent threads (on a 16-core/
32-thread xeon).

* To accomplish this, the 32-bit lk_count field becomes 64-bits. The
shared count is separated into the high 32-bits, allowing it to be
manipulated for both blocking shared requests and the shared lock
count field. The low count bits are used for exclusive locks.
Control bits are adjusted to manage lockmgr features.

LKC_SHARED Indicates shared lock count is active, else excl lock
count. Can predispose the lock when the related count
is 0 (does not have to be cleared, for example).

LKC_UPREQ Queued upgrade request. Automatically granted by
releasing entity (UPREQ -> ~SHARED|1).

LKC_EXREQ Queued exclusive request (only when lock held shared).
Automatically granted by releasing entity
(EXREQ -> ~SHARED|1).

LKC_EXREQ2 Aggregated exclusive request. When EXREQ cannot be
obtained due to the lock being held exclusively or
EXREQ already being queued, EXREQ2 is flagged for
wakeup/retries.

LKC_CANCEL Cancel API support

LKC_SMASK Shared lock count mask (LKC_SCOUNT increments).

LKC_XMASK Exclusive lock count mask (+1 increments)

The 'no lock' condition occurs when LKC_XMASK is 0 and LKC_SMASK is
0, regardless of the state of LKC_SHARED.

* Lockmgr still supports exclusive priority over shared locks. The
semantics have slightly changed. The priority mechanism only applies
to the EXREQ holder. Once an exclusive lock is obtained, any blocking
shared or exclusive locks will have equal priority until the exclusive
lock is released. Once released, shared locks can squeeze in, but
then the next pending exclusive lock will assert its priority over
any new shared locks when it wakes up and loops.

This isn't quite what I wanted, but it seems to work quite well. I
had to make a trade-off in the EXREQ lock-grant mechanism to improve
performance.

* In addition, we use atomic_fcmpset_long() instead of
atomic_cmpset_long() to reduce cache line flip flopping at least
a little.

* Remove lockcount() and lockcountnb(), which tried to count lock refs.
Replace with lockinuse(), which simply tells the caller whether the
lock is referenced or not.

* Expand some of the copyright notices (years and authors) for major
rewrites. Really there are a lot more and I have to pay more attention
to adjustments.