#
1431996b |
| 12-Oct-2023 |
Hugh Dickins <hughd@google.com> |
percpu_counter: extend _limited_add() to negative amounts
Though tmpfs does not need it, percpu_counter_limited_add() can be twice as useful if it works sensibly with negative amounts (subs) - typic
percpu_counter: extend _limited_add() to negative amounts
Though tmpfs does not need it, percpu_counter_limited_add() can be twice as useful if it works sensibly with negative amounts (subs) - typically decrements towards a limit of 0 or nearby: as suggested by Dave Chinner.
And in the course of that reworking, skip the percpu counter sum if it is already obvious that the limit would be passed: as suggested by Tim Chen.
Extend the comment above __percpu_counter_limited_add(), defining the behaviour with positive and negative amounts, allowing negative limits, but not bothering about overflow beyond S64_MAX.
Link: https://lkml.kernel.org/r/8f86083b-c452-95d4-365b-f16a2e4ebcd4@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Carlos Maiolino <cem@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Dave Chinner <dchinner@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Tim Chen <tim.c.chen@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
beb98686 |
| 30-Sep-2023 |
Hugh Dickins <hughd@google.com> |
shmem,percpu_counter: add _limited_add(fbc, limit, amount)
Percpu counter's compare and add are separate functions: without locking around them (which would defeat their purpose), it has been possib
shmem,percpu_counter: add _limited_add(fbc, limit, amount)
Percpu counter's compare and add are separate functions: without locking around them (which would defeat their purpose), it has been possible to overflow the intended limit. Imagine all the other CPUs fallocating tmpfs huge pages to the limit, in between this CPU's compare and its add.
I have not seen reports of that happening; but tmpfs's recent addition of dquot_alloc_block_nodirty() in between the compare and the add makes it even more likely, and I'd be uncomfortable to leave it unfixed.
Introduce percpu_counter_limited_add(fbc, limit, amount) to prevent it.
I believe this implementation is correct, and slightly more efficient than the combination of compare and add (taking the lock once rather than twice when nearing full - the last 128MiB of a tmpfs volume on a machine with 128 CPUs and 4KiB pages); but it does beg for a better design - when nearing full, there is no new batching, but the costly percpu counter sum across CPUs still has to be done, while locked.
Follow __percpu_counter_sum()'s example, including cpu_dying_mask as well as cpu_online_mask: but shouldn't __percpu_counter_compare() and __percpu_counter_limited_add() then be adding a num_dying_cpus() to num_online_cpus(), when they calculate the maximum which could be held across CPUs? But the times when it matters would be vanishingly rare.
Link: https://lkml.kernel.org/r/bb817848-2d19-bcc8-39ca-ea179af0f0b4@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Tim Chen <tim.c.chen@intel.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Carlos Maiolino <cem@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
c439d5e8 |
| 23-Aug-2023 |
Mateusz Guzik <mjguzik@gmail.com> |
pcpcntr: add group allocation/free
Allocations and frees are globally serialized on the pcpu lock (and the CPU hotplug lock if enabled, which is the case on Debian).
At least one frequent consumer
pcpcntr: add group allocation/free
Allocations and frees are globally serialized on the pcpu lock (and the CPU hotplug lock if enabled, which is the case on Debian).
At least one frequent consumer allocates 4 back-to-back counters (and frees them in the same manner), exacerbating the problem.
While this does not fully remedy scalability issues, it is a step towards that goal and provides immediate relief.
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Reviewed-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: Vegard Nossum <vegard.nossum@oracle.com> Link: https://lore.kernel.org/r/20230823050609.2228718-2-mjguzik@gmail.com [Dennis: reflowed a few lines] Signed-off-by: Dennis Zhou <dennis@kernel.org>
show more ...
|
#
e9b60c7f |
| 16-Mar-2023 |
Dave Chinner <dchinner@redhat.com> |
pcpcntr: remove percpu_counter_sum_all()
percpu_counter_sum_all() is now redundant as the race condition it was invented to handle is now dealt with by percpu_counter_sum() directly and all users of
pcpcntr: remove percpu_counter_sum_all()
percpu_counter_sum_all() is now redundant as the race condition it was invented to handle is now dealt with by percpu_counter_sum() directly and all users of percpu_counter_sum_all() have been removed.
Remove it.
This effectively reverts the changes made in f689054aace2 ("percpu_counter: add percpu_counter_sum_all interface") except for the cpumask iteration that fixes percpu_counter_sum() made earlier in this series.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
show more ...
|
#
8b57b11c |
| 16-Mar-2023 |
Dave Chinner <dchinner@redhat.com> |
pcpcntrs: fix dying cpu summation race
In commit f689054aace2 ("percpu_counter: add percpu_counter_sum_all interface") a race condition between a cpu dying and percpu_counter_sum() iterating online
pcpcntrs: fix dying cpu summation race
In commit f689054aace2 ("percpu_counter: add percpu_counter_sum_all interface") a race condition between a cpu dying and percpu_counter_sum() iterating online CPUs was identified. The solution was to iterate all possible CPUs for summation via percpu_counter_sum_all().
We recently had a percpu_counter_sum() call in XFS trip over this same race condition and it fired a debug assert because the filesystem was unmounting and the counter *should* be zero just before we destroy it. That was reported here:
https://lore.kernel.org/linux-kernel/20230314090649.326642-1-yebin@huaweicloud.com/
likely as a result of running generic/648 which exercises filesystems in the presence of CPU online/offline events.
The solution to use percpu_counter_sum_all() is an awful one. We use percpu counters and percpu_counter_sum() for accurate and reliable threshold detection for space management, so a summation race condition during these operations can result in overcommit of available space and that may result in filesystem shutdowns.
As percpu_counter_sum_all() iterates all possible CPUs rather than just those online or even those present, the mask can include CPUs that aren't even installed in the machine, or in the case of machines that can hot-plug CPU capable nodes, even have physical sockets present in the machine.
Fundamentally, this race condition is caused by the CPU being offlined being removed from the cpu_online_mask before the notifier that cleans up per-cpu state is run. Hence percpu_counter_sum() will not sum the count for a cpu currently being taken offline, regardless of whether the notifier has run or not. This is the root cause of the bug.
The percpu counter notifier iterates all the registered counters, locks the counter and moves the percpu count to the global sum. This is serialised against other operations that move the percpu counter to the global sum as well as percpu_counter_sum() operations that sum the percpu counts while holding the counter lock.
Hence the notifier is safe to run concurrently with sum operations, and the only thing we actually need to care about is that percpu_counter_sum() iterates dying CPUs. That's trivial to do, and when there are no CPUs dying, it has no addition overhead except for a cpumask_or() operation.
This change makes percpu_counter_sum() always do the right thing in the presence of CPU hot unplug events and makes percpu_counter_sum_all() unnecessary. This, in turn, means that filesystems like XFS, ext4, and btrfs don't have to work out when they should use percpu_counter_sum() vs percpu_counter_sum_all() in their space accounting algorithms
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
show more ...
|
#
805afd83 |
| 16-Dec-2022 |
Manfred Spraul <manfred@colorfullife.com> |
lib/percpu_counter: percpu_counter_add_batch() overflow/underflow
Patch series "various irq handling fixes/docu updates".
If an interrupt happens between __this_cpu_read(*fbc->counters) and this_cp
lib/percpu_counter: percpu_counter_add_batch() overflow/underflow
Patch series "various irq handling fixes/docu updates".
If an interrupt happens between __this_cpu_read(*fbc->counters) and this_cpu_add(*fbc->counters, amount), and that interrupt modifies the per_cpu_counter, then the this_cpu_add() after the interrupt returns may under/overflow.
Link: https://lkml.kernel.org/r/20221216150155.200389-1-manfred@colorfullife.com Link: https://lkml.kernel.org/r/20221216150441.200533-1-manfred@colorfullife.com Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: "Sun, Jiebin" <jiebin.sun@intel.com> Cc: <1vier1@web.de> Cc: Alexander Sverdlin <alexander.sverdlin@siemens.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
f689054a |
| 09-Nov-2022 |
Shakeel Butt <shakeelb@google.com> |
percpu_counter: add percpu_counter_sum_all interface
The percpu_counter is used for scenarios where performance is more important than the accuracy. For percpu_counter users, who want more accurate
percpu_counter: add percpu_counter_sum_all interface
The percpu_counter is used for scenarios where performance is more important than the accuracy. For percpu_counter users, who want more accurate information in their slowpath, percpu_counter_sum is provided which traverses all the online CPUs to accumulate the data. The reason it only needs to traverse online CPUs is because percpu_counter does implement CPU offline callback which syncs the local data of the offlined CPU.
However there is a small race window between the online CPUs traversal of percpu_counter_sum and the CPU offline callback. The offline callback has to traverse all the percpu_counters on the system to flush the CPU local data which can be a lot. During that time, the CPU which is going offline has already been published as offline to all the readers. So, as the offline callback is running, percpu_counter_sum can be called for one counter which has some state on the CPU going offline. Since percpu_counter_sum only traverses online CPUs, it will skip that specific CPU and the offline callback might not have flushed the state for that specific percpu_counter on that offlined CPU.
Normally this is not an issue because percpu_counter users can deal with some inaccuracy for small time window. However a new user i.e. mm_struct on the cleanup path wants to check the exact state of the percpu_counter through check_mm(). For such users, this patch introduces percpu_counter_sum_all() which traverses all possible CPUs and it is used in fork.c:check_mm() to avoid the potential race.
This issue is exposed by the later patch "mm: convert mm's rss stats into percpu_counter".
Link: https://lkml.kernel.org/r/20221109012011.881058-1-shakeelb@google.com Signed-off-by: Shakeel Butt <shakeelb@google.com> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
db65a867 |
| 07-May-2021 |
Alex Shi <alexs@kernel.org> |
lib/percpu_counter: tame kernel-doc compile warning
commit 3e8f399da490 ("writeback: rework wb_[dec|inc]_stat family of functions") add some function description of percpu_counter_add_batch. but the
lib/percpu_counter: tame kernel-doc compile warning
commit 3e8f399da490 ("writeback: rework wb_[dec|inc]_stat family of functions") add some function description of percpu_counter_add_batch. but the double '*' in comments means a kernel-doc format comment which isn't right.
Since the whole file of lib/percpu_counter.c has no any other kernel-doc format comments, we'd better to remove this incomplete one to tame the kernel-doc warning:
lib/percpu_counter.c:83: warning: Function parameter or member 'fbc' not described in 'percpu_counter_add_batch' lib/percpu_counter.c:83: warning: Function parameter or member 'amount' not described in 'percpu_counter_add_batch' lib/percpu_counter.c:83: warning: Function parameter or member 'batch' not described in 'percpu_counter_add_batch'
Link: https://lkml.kernel.org/r/20210405135505.132446-1-alexs@kernel.org Signed-off-by: Alex Shi <alexs@kernel.org> Cc: Nikolay Borisov <nborisov@suse.com> Cc: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
1d339638 |
| 16-Oct-2020 |
Miaohe Lin <linmiaohe@huawei.com> |
lib/percpu_counter.c: use helper macro abs()
Use helper macro abs() to simplify the "x >= t || x <= -t" cmp.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linu
lib/percpu_counter.c: use helper macro abs()
Use helper macro abs() to simplify the "x >= t || x <= -t" cmp.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20200927122746.5964-1-linmiaohe@huawei.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
f9e62f31 |
| 15-Aug-2020 |
Stephen Boyd <swboyd@chromium.org> |
treewide: Make all debug_obj_descriptors const
This should make it harder for the kernel to corrupt the debug object descriptor, used to call functions to fixup state and track debug objects, by mov
treewide: Make all debug_obj_descriptors const
This should make it harder for the kernel to corrupt the debug object descriptor, used to call functions to fixup state and track debug objects, by moving the structure to read-only memory.
Signed-off-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20200815004027.2046113-3-swboyd@chromium.org
show more ...
|
#
0a4954a8 |
| 07-Aug-2020 |
Feng Tang <feng.tang@intel.com> |
percpu_counter: add percpu_counter_sync()
percpu_counter's accuracy is related to its batch size. For a percpu_counter with a big batch, its deviation could be big, so when the counter's batch is r
percpu_counter: add percpu_counter_sync()
percpu_counter's accuracy is related to its batch size. For a percpu_counter with a big batch, its deviation could be big, so when the counter's batch is runtime changed to a smaller value for better accuracy, there could also be requirment to reduce the big deviation.
So add a percpu-counter sync function to be run on each CPU.
Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Dennis Zhou <dennis@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Qian Cai <cai@lca.pw> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kees Cook <keescook@chromium.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Tim Chen <tim.c.chen@intel.com> Link: http://lkml.kernel.org/r/1594389708-60781-4-git-send-email-feng.tang@intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
13ba17be |
| 24-Aug-2018 |
Mukesh Ojha <mojha@codeaurora.org> |
notifier: Remove notifier header file wherever not used
The conversion of the hotplug notifiers to a state machine left the notifier.h includes around in some places. Remove them.
Signed-off-by: Mu
notifier: Remove notifier header file wherever not used
The conversion of the hotplug notifiers to a state machine left the notifier.h includes around in some places. Remove them.
Signed-off-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1535114033-4605-1-git-send-email-mojha@codeaurora.org
show more ...
|
#
b2441318 |
| 01-Nov-2017 |
Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine
License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license identifiers to apply.
- when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary:
SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became the concluded license(s).
- when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time.
In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related.
Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
3e8f399d |
| 12-Jul-2017 |
Nikolay Borisov <nborisov@suse.com> |
writeback: rework wb_[dec|inc]_stat family of functions
Currently the writeback statistics code uses a percpu counters to hold various statistics. Furthermore we have 2 families of functions - thos
writeback: rework wb_[dec|inc]_stat family of functions
Currently the writeback statistics code uses a percpu counters to hold various statistics. Furthermore we have 2 families of functions - those which disable local irq and those which doesn't and whose names begin with double underscore. However, they both end up calling __add_wb_stats which in turn calls percpu_counter_add_batch which is already irq-safe.
Exploiting this fact allows to eliminated the __wb_* functions since they don't add any further protection than we already have. Furthermore, refactor the wb_* function to call __add_wb_stat directly without the irq-disabling dance. This will likely result in better runtime of code which deals with modifying the stat counters.
While at it also document why percpu_counter_add_batch is in fact preempt and irq-safe since at least 3 people got confused.
Link: http://lkml.kernel.org/r/1498029937-27293-1-git-send-email-nborisov@suse.com Signed-off-by: Nikolay Borisov <nborisov@suse.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Josef Bacik <jbacik@fb.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Jeff Layton <jlayton@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
104b4e51 |
| 20-Jun-2017 |
Nikolay Borisov <nborisov@suse.com> |
percpu_counter: Rename __percpu_counter_add to percpu_counter_add_batch
Currently, percpu_counter_add is a wrapper around __percpu_counter_add which is preempt safe due to explicit calls to preempt_
percpu_counter: Rename __percpu_counter_add to percpu_counter_add_batch
Currently, percpu_counter_add is a wrapper around __percpu_counter_add which is preempt safe due to explicit calls to preempt_disable. Given how __ prefix is used in percpu related interfaces, the naming unfortunately creates the false sense that __percpu_counter_add is less safe than percpu_counter_add. In terms of context-safety, they're equivalent. The only difference is that the __ version takes a batch parameter.
Make this a bit more explicit by just renaming __percpu_counter_add to percpu_counter_add_batch.
This patch doesn't cause any functional changes.
tj: Minor updates to patch description for clarity. Cosmetic indentation updates.
Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <jbacik@fb.com> Cc: David Sterba <dsterba@suse.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Jan Kara <jack@suse.com> Cc: Jens Axboe <axboe@fb.com> Cc: linux-mm@kvack.org Cc: "David S. Miller" <davem@davemloft.net>
show more ...
|
#
aaf0f2fa |
| 20-Jan-2017 |
Eric Dumazet <edumazet@google.com> |
percpu_counter: percpu_counter_hotcpu_callback() cleanup
In commit ebd8fef304f9 ("percpu_counter: make percpu_counters_lock irq-safe") we disabled irqs in percpu_counter_hotcpu_callback()
We can gr
percpu_counter: percpu_counter_hotcpu_callback() cleanup
In commit ebd8fef304f9 ("percpu_counter: make percpu_counters_lock irq-safe") we disabled irqs in percpu_counter_hotcpu_callback()
We can grab every counter spinlock without having to disable irqs again.
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
#
5588f5af |
| 03-Nov-2016 |
Sebastian Andrzej Siewior <bigeasy@linutronix.de> |
lib/percpu_counter: Convert to hotplug state machine
Install the callbacks via the state machine and let the core invoke the callbacks on the already online CPUs.
Signed-off-by: Sebastian Andrzej S
lib/percpu_counter: Convert to hotplug state machine
Install the callbacks via the state machine and let the core invoke the callbacks on the already online CPUs.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20161103145021.28528-5-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
show more ...
|
#
d99b1d89 |
| 20-May-2016 |
Du, Changbin <changbin.du@intel.com> |
percpu_counter: update debugobjects fixup callbacks return type
Update the return type to use bool instead of int, corresponding to cheange (debugobjects: make fixup functions return bool instead of
percpu_counter: update debugobjects fixup callbacks return type
Update the return type to use bool instead of int, corresponding to cheange (debugobjects: make fixup functions return bool instead of int).
Signed-off-by: Du, Changbin <changbin.du@intel.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Triplett <josh@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tejun Heo <tj@kernel.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
80188b0d |
| 28-May-2015 |
Dave Chinner <dchinner@redhat.com> |
percpu_counter: batch size aware __percpu_counter_compare()
XFS uses non-stanard batch sizes for avoiding frequent global counter updates on it's allocated inode counters, as they increment or decre
percpu_counter: batch size aware __percpu_counter_compare()
XFS uses non-stanard batch sizes for avoiding frequent global counter updates on it's allocated inode counters, as they increment or decrement in batches of 64 inodes. Hence the standard percpu counter batch of 32 means that the counter is effectively a global counter. Currently Xfs uses a batch size of 128 so that it doesn't take the global lock on every single modification.
However, Xfs also needs to compare accurately against zero, which means we need to use percpu_counter_compare(), and that has a hard-coded batch size of 32, and hence will spuriously fail to detect when it is supposed to use precise comparisons and hence the accounting goes wrong.
Add __percpu_counter_compare() to take a custom batch size so we can use it sanely in XFS and factor percpu_counter_compare() to use it.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>
show more ...
|
#
908c7f19 |
| 08-Sep-2014 |
Tejun Heo <tj@kernel.org> |
percpu_counter: add @gfp to percpu_counter_init()
Percpu allocator now supports allocation mask. Add @gfp to percpu_counter_init() so that !GFP_KERNEL allocation masks can be used with percpu_count
percpu_counter: add @gfp to percpu_counter_init()
Percpu allocator now supports allocation mask. Add @gfp to percpu_counter_init() so that !GFP_KERNEL allocation masks can be used with percpu_counters too.
We could have left percpu_counter_init() alone and added percpu_counter_init_gfp(); however, the number of users isn't that high and introducing _gfp variants to all percpu data structures would be quite ugly, so let's just do the conversion. This is the one with the most users. Other percpu data structures are a lot easier to convert.
This patch doesn't make any functional difference.
Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Jan Kara <jack@suse.cz> Acked-by: "David S. Miller" <davem@davemloft.net> Cc: x86@kernel.org Cc: Jens Axboe <axboe@kernel.dk> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
ebd8fef3 |
| 08-Sep-2014 |
Tejun Heo <tj@kernel.org> |
percpu_counter: make percpu_counters_lock irq-safe
percpu_counter is scheduled to grow @gfp support to allow atomic initialization. This patch makes percpu_counters_lock irq-safe so that it can be
percpu_counter: make percpu_counters_lock irq-safe
percpu_counter is scheduled to grow @gfp support to allow atomic initialization. This patch makes percpu_counters_lock irq-safe so that it can be safely used from atomic contexts.
Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
#
e39435ce |
| 08-Apr-2014 |
Jens Axboe <axboe@fb.com> |
lib/percpu_counter.c: fix bad percpu counter state during suspend
I got a bug report yesterday from Laszlo Ersek in which he states that his kvm instance fails to suspend. Laszlo bisected it down t
lib/percpu_counter.c: fix bad percpu counter state during suspend
I got a bug report yesterday from Laszlo Ersek in which he states that his kvm instance fails to suspend. Laszlo bisected it down to this commit 1cf7e9c68fe8 ("virtio_blk: blk-mq support") where virtio-blk is converted to use the blk-mq infrastructure.
After digging a bit, it became clear that the issue was with the queue drain. blk-mq tracks queue usage in a percpu counter, which is incremented on request alloc and decremented when the request is freed. The initial hunt was for an inconsistency in blk-mq, but everything seemed fine. In fact, the counter only returned crazy values when suspend was in progress.
When a CPU is unplugged, the percpu counters merges that CPU state with the general state. blk-mq takes care to register a hotcpu notifier with the appropriate priority, so we know it runs after the percpu counter notifier. However, the percpu counter notifier only merges the state when the CPU is fully gone. This leaves a state transition where the CPU going away is no longer in the online mask, yet it still holds private values. This means that in this state, percpu_counter_sum() returns invalid results, and the suspend then hangs waiting for abs(dead-cpu-value) requests to complete which of course will never happen.
Fix this by clearing the state earlier, so we never have a case where the CPU isn't in online mask but still holds private state. This bug has been there since forever, I guess we don't have a lot of users where percpu counters needs to be reliable during the suspend cycle.
Signed-off-by: Jens Axboe <axboe@fb.com> Reported-by: Laszlo Ersek <lersek@redhat.com> Tested-by: Laszlo Ersek <lersek@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
d1969a84 |
| 16-Jan-2014 |
Hugh Dickins <hughd@google.com> |
percpu_counter: unbreak __percpu_counter_add()
Commit 74e72f894d56 ("lib/percpu_counter.c: fix __percpu_counter_add()") looked very plausible, but its arithmetic was badly wrong: obvious once you se
percpu_counter: unbreak __percpu_counter_add()
Commit 74e72f894d56 ("lib/percpu_counter.c: fix __percpu_counter_add()") looked very plausible, but its arithmetic was badly wrong: obvious once you see the fix, but maddening to get there from the weird tmpfs ENOSPCs
Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Ming Lei <tom.leiming@gmail.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Shaohua Li <shli@fusionio.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Fan Du <fan.du@windriver.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
74e72f89 |
| 15-Jan-2014 |
Ming Lei <tom.leiming@gmail.com> |
lib/percpu_counter.c: fix __percpu_counter_add()
__percpu_counter_add() may be called in softirq/hardirq handler (such as, blk_mq_queue_exit() is typically called in hardirq/softirq handler), so we
lib/percpu_counter.c: fix __percpu_counter_add()
__percpu_counter_add() may be called in softirq/hardirq handler (such as, blk_mq_queue_exit() is typically called in hardirq/softirq handler), so we need to call this_cpu_add()(irq safe helper) to update percpu counter, otherwise counts may be lost.
This fixes the problem that 'rmmod null_blk' hangs in blk_cleanup_queue() because of miscounting of request_queue->mq_usage_counter.
This patch is the v1 of previous one of "lib/percpu_counter.c: disable local irq when updating percpu couter", and takes Andrew's approach which may be more efficient for ARCHs(x86, s390) that have optimized this_cpu_add().
Signed-off-by: Ming Lei <tom.leiming@gmail.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Shaohua Li <shli@fusionio.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Fan Du <fan.du@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
098faf58 |
| 24-Oct-2013 |
Shaohua Li <shli@fusionio.com> |
percpu_counter: make APIs irq safe
In my usage, sometimes the percpu APIs are called with irq locked, sometimes not. lockdep complains there is potential deadlock. Let's always use percpucounter loc
percpu_counter: make APIs irq safe
In my usage, sometimes the percpu APIs are called with irq locked, sometimes not. lockdep complains there is potential deadlock. Let's always use percpucounter lock in irq safe way. There should be no performance penality, as all those are slow code path.
Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Shaohua Li <shli@fusionio.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|