Age | Commit message (Collapse) | Author | Files | Lines |
|
commit 69e445ab8b66a9f30519842ef18be555d3ee9b51 upstream.
If __device_suspend() runs asynchronously (in which case the device
passed to it is in dpm_suspended_list at that point) and it returns
early on an error or pending wakeup, and the power.direct_complete
flag has been set for the device already, the subsequent
device_resume() will be confused by that and it will call
pm_runtime_enable() incorrectly, as runtime PM has not been
disabled for the device by __device_suspend().
To avoid that, clear power.direct_complete if __device_suspend()
is not going to disable runtime PM for the device before returning.
Fixes: aae4518b3124 (PM / sleep: Mechanism to avoid resuming runtime-suspended devices unnecessarily)
Reported-by: Al Cooper <alcooperx@gmail.com>
Tested-by: Al Cooper <alcooperx@gmail.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Cc: 3.16+ <stable@vger.kernel.org> # 3.16+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 3297c8fc65af5d40501ea7cddff1b195cae57e4e ]
There is a race window in device_shutdown(), which may cause
-1. parent device shut down before child or
-2. no shutdown on a new probing device.
For 1st, taking the following scenario:
device_shutdown new plugin device
list_del_init(parent_dev);
spin_unlock(list_lock);
device_add(child)
probe child
shutdown parent_dev
--> now child is on the tail of devices_kset
For 2nd, taking the following scenario:
device_shutdown new plugin device
device_add(dev)
device_lock(dev);
...
device_unlock(dev);
probe dev
--> now, the new occurred dev has no opportunity to shutdown
To fix this race issue, just prevent the new probing request. With this
logic, device_shutdown() is more similar to dpm_prepare().
Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Reviewed-by: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 5e2e2f9f76e157063a656351728703cb02b068f1 upstream.
"count" needs to be signed for the error handling to work. I made "i"
signed as well so they match.
Fixes: 02113ba93ea4 (PM / clk: Add support for obtaining clocks from device-tree)
Cc: 4.6+ <stable@vger.kernel.org> # 4.6+
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 17dbca119312b4e8173d4e25ff64262119fcef38 upstream
L1TF core kernel workarounds are cheap and normally always enabled, However
they still should be reported in sysfs if the system is vulnerable or
mitigated. Add the necessary CPU feature/bug bits.
- Extend the existing checks for Meltdowns to determine if the system is
vulnerable. All CPUs which are not vulnerable to Meltdown are also not
vulnerable to L1TF
- Check for 32bit non PAE and emit a warning as there is no practical way
for mitigation due to the limited physical address bits
- If the system has more than MAX_PA/2 physical memory the invert page
workarounds don't protect the system against the L1TF attack anymore,
because an inverted physical address will also point to valid
memory. Print a warning in this case and report that the system is
vulnerable.
Add a function which returns the PFN limit for the L1TF mitigation, which
will be used in follow up patches for sanity and range checks.
[ tglx: Renamed the CPU feature bit to L1TF_PTEINV ]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 722e5f2b1eec7de61117b7c0a7914761e3da2eda upstream.
Commit 52cdbdd49853 (driver core: correct device's shutdown order)
introduced a regression by breaking device shutdown on some systems.
Namely, the devices_kset_move_last() call in really_probe() added by
that commit is a mistake as it may cause parents to follow children
in the devices_kset list which then causes shutdown to fail. For
example, if a device has children before really_probe() is called
for it (which is not uncommon), that call will cause it to be
reordered after the children in the devices_kset list and the
ordering of that list will not reflect the correct device shutdown
order any more.
Also it causes the devices_kset list to be constantly reordered
until all drivers have been probed which is totally pointless
overhead in the majority of cases and it only covered an issue
with system shutdown, while system-wide suspend/resume potentially
had the same issue on the affected platforms (which was not covered).
Moreover, the shutdown issue originally addressed by the change in
really_probe() made by commit 52cdbdd49853 is not present in 4.18-rc
any more, since dra7 started to use the sdhci-omap driver which
doesn't disable any regulators during shutdown, so the really_probe()
part of commit 52cdbdd49853 can be safely reverted. [The original
issue was related to the omap_hsmmc driver used by dra7 previously.]
For the above reasons, revert the really_probe() modifications made
by commit 52cdbdd49853.
The other code changes made by commit 52cdbdd49853 are useful and
they need not be reverted.
Fixes: 52cdbdd49853 (driver core: correct device's shutdown order)
Link: https://lore.kernel.org/lkml/CAFgQCTt7VfqM=UyCnvNFxrSw8Z6cUtAi3HUwR4_xPAc03SgHjQ@mail.gmail.com/
Reported-by: Pingfan Liu <kernelfans@gmail.com>
Tested-by: Pingfan Liu <kernelfans@gmail.com>
Reviewed-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c5c2a97b3ac7d1ec19e7cff9e38caca6afefc3de upstream.
This commit fixes a rare but possible case when the clk rate is updated
without update of the regulator voltage.
At boot up, CPUfreq checks if the system is running at the right freq. This
is a sanity check in case a bootloader set clk rate that is outside of freq
table present with cpufreq core. In such cases system can be unstable so
better to change it to a freq that is preset in freq-table.
The CPUfreq takes next freq that is >= policy->cur and this is our
target_freq that needs to be set now.
dev_pm_opp_set_rate(dev, target_freq) checks the target_freq and the
old_freq (a current rate). If these are equal it returns early. If not,
it searches for OPP (old_opp) that fits best to old_freq (not listed in
the table) and updates old_freq (!).
Here, we can end up with old_freq = old_opp.rate = target_freq, which
is not handled in _generic_set_opp_regulator(). It's supposed to update
voltage only when freq > old_freq || freq > old_freq.
if (freq > old_freq) {
ret = _set_opp_voltage(dev, reg, new_supply);
[...]
if (freq < old_freq) {
ret = _set_opp_voltage(dev, reg, new_supply);
if (ret)
It results in, no voltage update while clk rate is updated.
Example:
freq-table = {
1000MHz 1.15V
666MHZ 1.10V
333MHz 1.05V
}
boot-up-freq = 800MHz # not listed in freq-table
freq = target_freq = 1GHz
old_freq = 800Mhz
old_opp = _find_freq_ceil(opp_table, &old_freq); #(old_freq is modified!)
old_freq = 1GHz
Fixes: 6a0712f6f199 ("PM / OPP: Add dev_pm_opp_set_rate()")
Cc: 4.6+ <stable@vger.kernel.org> # v4.6+
Signed-off-by: Waldemar Rymarkiewicz <waldemar.rymarkiewicz@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 47e5abfb546a3ace23a77453dc2e9db92704c5ac upstream.
If a device link is added via device_link_add() by the driver of the
link's consumer device, the supplier's runtime PM usage counter is
going to be dropped by the pm_runtime_put_suppliers() call in
driver_probe_device(). However, in that case it is not incremented
unless the supplier driver is already present and the link is not
stateless. That leads to a runtime PM usage counter imbalance for
the supplier device in a few cases.
To prevent that from happening, bump up the supplier runtime
PM usage counter in device_link_add() for all links with the
DL_FLAG_PM_RUNTIME flag set that are added at the consumer probe
time. Use pm_runtime_get_noresume() for that as the callers of
device_link_add() who want the supplier to be resumed by it are
expected to pass DL_FLAG_RPM_ACTIVE in flags to it anyway, but
additionally resume the supplier if the link is added during
consumer driver probe to retain the existing behavior for the
callers depending on it.
Fixes: 21d5c57b3726 (PM / runtime: Use device links)
Reported-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: 4.10+ <stable@vger.kernel.org> # 4.10+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 72038df3c580c4c326b83c86149d7ac34007532a upstream.
In case the PM domain fails to be powered on in genpd_dev_pm_attach(), it
returns -EPROBE_DEFER, but keeping the device attached to its PM domain.
This leads to problems when the next attempt to attach is re-tried. More
precisely, in that situation an -EEXIST error code is returned, because the
device already has its PM domain pointer assigned, from the first attempt.
Now, because of the sloppy error handling by the existing callers of
dev_pm_domain_attach(), probing is allowed to continue when -EEXIST is
returned. However, in such case there are no guarantees that the PM domain
is powered on by genpd, which may lead to hangs when buses/drivers tried to
access their devices.
Let's fix this behaviour, simply by detaching the device when powering on
fails in genpd_dev_pm_attach().
Cc: v4.11+ <stable@vger.kernel.org> # v4.11+
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 84d0c27d6233a9ba0578b20f5a09701eb66cee42 upstream.
syzbot is hitting WARN() at kernfs_add_one() [1].
This is because kernfs_create_link() is confused by previous device_add()
call which continued without setting dev->kobj.parent field when
get_device_parent() failed by memory allocation fault injection.
Fix this by propagating the error from class_dir_create_and_add() to
the calllers of get_device_parent().
[1] https://syzkaller.appspot.com/bug?id=fae0fb607989ea744526d1c082a5b8de6529116f
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: syzbot <syzbot+df47f81c226b31d89fb1@syzkaller.appspotmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 71df179363a5a733a8932e9afb869760d7559383 ]
The cache pointer points to the actual memory used by the cache, as the
comparison here is looking for the type of the cache it should check
against cache_type.
Fixes: 1ea975cf1ef5 ("regmap: Add a function to check if a regmap register is cached")
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c456442cd3a59eeb1d60293c26cbe2ff2c4e42cf upstream
Add the sysfs file for the new vulerability. It does not do much except
show the words 'Vulnerable' for recent x86 cores.
Intel cores prior to family 6 are known not to be vulnerable, and so are
some Atoms and some Xeon Phi.
It assumes that older Cyrix, Centaur, etc. cores are immune.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 69728051f5bf15efaf6edfbcfe1b5a49a2437918 ]
If a device is runtime PM suspended when we enter suspend and has
a dedicated wake IRQ, we can get the following warning:
WARNING: CPU: 0 PID: 108 at kernel/irq/manage.c:526 enable_irq+0x40/0x94
[ 102.087860] Unbalanced enable for IRQ 147
...
(enable_irq) from [<c06117a8>] (dev_pm_arm_wake_irq+0x4c/0x60)
(dev_pm_arm_wake_irq) from [<c0618360>]
(device_wakeup_arm_wake_irqs+0x58/0x9c)
(device_wakeup_arm_wake_irqs) from [<c0615948>]
(dpm_suspend_noirq+0x10/0x48)
(dpm_suspend_noirq) from [<c01ac7ac>]
(suspend_devices_and_enter+0x30c/0xf14)
(suspend_devices_and_enter) from [<c01adf20>]
(enter_state+0xad4/0xbd8)
(enter_state) from [<c01ad3ec>] (pm_suspend+0x38/0x98)
(pm_suspend) from [<c01ab3e8>] (state_store+0x68/0xc8)
This is because the dedicated wake IRQ for the device may have been
already enabled earlier by dev_pm_enable_wake_irq_check(). Fix the
issue by checking for runtime PM suspended status.
This issue can be easily reproduced by setting serial console log level
to zero, letting the serial console idle, and suspend the system from
an ssh terminal. On resume, dmesg will have the warning above.
The reason why I have not run into this issue earlier has been that I
typically run my PM test cases from on a serial console instead over ssh.
Fixes: c84345597558 (PM / wakeirq: Enable dedicated wakeirq for suspend)
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit a3381e3a65cbaf612c8f584906c4dba27e84267c ]
Commit b539cc82d493 (PM / Domains: Ignore domain-idle-states that are
not compatible), made it possible to ignore non-compatible
domain-idle-states OF nodes. However, in case that happens while doing
the OF parsing, the number of elements in the allocated array would
exceed the numbers actually needed, thus wasting memory.
Fix this by pre-iterating the genpd OF node and counting the number of
compatible domain-idle-states nodes, before doing the allocation. While
doing this, it makes sense to rework the code a bit to avoid open coding,
of parts responsible for the OF node iteration.
Let's also take the opportunity to clarify the function header for
of_genpd_parse_idle_states(), about what is being returned in case of
errors.
Fixes: b539cc82d493 (PM / Domains: Ignore domain-idle-states that are not compatible)
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Lina Iyer <ilina@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit f00e71091ab92eba52122332586c6ecaa9cd1a56 upstream.
We're supposed to be checking that "val_len" is not too large but
instead we check if it is smaller than the max.
The only function affected would be regmap_i2c_smbus_i2c_write() in
drivers/base/regmap/regmap-i2c.c. Strangely that function has its own
limit check which returns an error if (count >= I2C_SMBUS_BLOCK_MAX) so
it doesn't look like it has ever been able to do anything except return
an error.
Fixes: c335931ed9d2 ("regmap: Add raw_write/read checks for max_raw_write/read sizes")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9de9a449482677a75f1edd2049268a7efc40fc96 upstream.
This reverts commit 452562abb5b7 ("base: arch_topology: fix section
mismatch build warnings"). It causes the notifier call hangs in some
use-cases.
In some cases with using maxcpus, some of cpus are booted first and
then the remaining cpus are booted. As an example, some users who want
to realize fast boot up often use the following procedure.
1) Define all CPUs on device tree (CA57x4 + CA53x4)
2) Add "maxcpus=4" in bootargs
3) Kernel boot up with CA57x4
4) After kernel boot up, CA53x4 is booted from user
When kernel init was finished, CPUFREQ_POLICY_NOTIFIER was not still
unregisterd. This means that "__init init_cpu_capacity_callback()"
will be called after kernel init sequence. To avoid this problem,
it needs to remove __init{,data} annotations by reverting this commit.
Also, this commit was needed to fix kernel compile issue below.
However, this issue was also fixed by another patch: commit 82d8ba717ccb
("arch_topology: Fix section miss match warning due to
free_raw_capacity()") in v4.15 as well.
Whereas commit 452562abb5b7 added all the missing __init annotations,
commit 82d8ba717ccb removed it from free_raw_capacity().
WARNING: vmlinux.o(.text+0x548f24): Section mismatch in reference
from the function init_cpu_capacity_callback() to the variable
.init.text:$x
The function init_cpu_capacity_callback() references
the variable __init $x.
This is often because init_cpu_capacity_callback lacks a __init
annotation or the annotation of $x is wrong.
Fixes: 82d8ba717ccb ("arch_topology: Fix section miss match warning due to free_raw_capacity()")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Gaku Inami <gaku.inami.xh@renesas.com>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 03e4e0a9e02cf703da331ff6cfd57d0be9bf5692 ]
Ages ago Rob Clark noted,
"Currently with fence-array, we have a potential deadlock situation. If
we fence_add_callback() on an array-fence, the array-fence's lock is
acquired first, and in it's ->enable_signaling() callback, it will install
cbs on it's array-member fences, so the array-member's lock is acquired
second.
But in the signal path, the array-member's lock is acquired first, and
the array-fence's lock acquired second."
Rob proposed either extensive changes to dma-fence to unnest the
fence-array signaling, or to defer the signaling onto a workqueue. This
is a more refined version of the later, that should keep the latency
of the fence signaling to a minimum by using an irq-work, which is
executed asap.
Reported-by: Rob Clark <robdclark@gmail.com>
Suggested-by: Rob Clark <robdclark@gmail.com>
References: 1476635975-21981-1-git-send-email-robdclark@gmail.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20171114162719.30958-1-chris@chris-wilson.co.uk
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 31eb7431805493e10f4731f366cf4d4e3e952035 ]
Prevent rpm_get_suppliers() from returning an error code if runtime
PM is disabled for one or more of the supplier devices it wants to
runtime-resume, so as to make runtime PM work for devices with links
to suppliers that don't use runtime PM (such links may be created
during device enumeration even before it is known whether or not
runtime PM will be enabled for the devices in question, for example).
Fixes: 21d5c57b3726 (PM / runtime: Use device links)
Reported-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Tested-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 433986c2c265d106d6a8e88006e0131fefc92b7b upstream.
Commit baa8809f6097 (PM / runtime: Optimize the use of device links)
added an invocation of pm_runtime_drop_link() to __device_link_del().
However there are two variants of that function, one for CONFIG_SRCU and
another for !CONFIG_SRCU, and the commit only modified the former.
Fixes: baa8809f6097 (PM / runtime: Optimize the use of device links)
Cc: v4.10+ <stable@vger.kernel.org> # v4.10+
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 87590ce6e373d1a5401f6539f0c59ef92dd924a9 upstream.
As the meltdown/spectre problem affects several CPU architectures, it makes
sense to have common way to express whether a system is affected by a
particular vulnerability or not. If affected the way to express the
mitigation should be common as well.
Create /sys/devices/system/cpu/vulnerabilities folder and files for
meltdown, spectre_v1 and spectre_v2.
Allow architectures to override the show function.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linuxfoundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lkml.kernel.org/r/20180107214913.096657732@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit f57ab9a01a36ef3454333251cc57e3a9948b17bf upstream.
Commit dfea747d2aba ("drivers: base: cacheinfo: support DT overrides for
cache properties") doesn't initialise the cache type if it's present
only in DT and the architecture is not aware of it. They are unified
system level cache which are generally transparent.
This patch check if the cache type is set to NOCACHE but the DT node
indicates that it's unified cache and sets the cache type accordingly.
Fixes: dfea747d2aba ("drivers: base: cacheinfo: support DT overrides for cache properties")
Reported-and-tested-by: Tan Xiaojun <tanxiaojun@huawei.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 035ed07208dc501d023873447113f3f178592156 ]
On some i.MX6 platforms which do not have speed grading
check, opp table will not be created in platform code,
so cpufreq driver prints the following error message:
cpu cpu0: dev_pm_opp_get_opp_count: OPP table not found (-19)
However, this is not really an error in this case because the
imx6q-cpufreq driver first calls dev_pm_opp_get_opp_count()
and if it fails, it means that platform code does not provide
OPP and then dev_pm_opp_of_add_table() will be called.
In order to avoid such confusing error message, move it to
debug level.
It is up to the caller of dev_pm_opp_get_opp_count() to check its
return value and decide if it will print an error or not.
Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 5a244727f428a06634f22bb890e78024ab0c89f3 upstream.
The isa_driver structure for an isa_bus device is stored in the device
platform_data member of the respective device structure. This
platform_data member may be reset to NULL if isa_driver match callback
for the device fails, indicating a device unsupported by the ISA driver.
This patch fixes a possible NULL pointer dereference if one of the
isa_driver callbacks to attempted for an unsupported device. This error
should not occur in practice since ISA devices are typically manually
configured and loaded by the users, but we may as well prevent this
error from popping up for the 0day testers.
Fixes: a5117ba7da37 ("[PATCH] Driver model: add ISA bus")
Signed-off-by: William Breathitt Gray <vilhelm.gray@gmail.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 0946b2fb38fdb6585a5ac3ca84ac73924f645952 upstream.
The help for FIRMWARE_IN_KERNEL still references the firmware_install
command that was recently removed by commit 5620a0d1aacd ("firmware:
delete in-kernel firmware").
Clean up the message to direct the user to their distribution's
linux-firmware package, and remove any reference to firmware being
included in the kernel source tree.
Fixes: 5620a0d1aacd ("firmware: delete in-kernel firmware").
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Robin H. Johnson <robbat2@gentoo.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 5241ab40f6e742f8a1631f8826faf6dc6412b3b5 ]
During system-wide PM, genpd relies on its PM callbacks to be invoked for
all its attached devices, as to deal with powering off/on the PM domain. In
other words, genpd is not compatible with the direct_complete path, if
executed by the PM core for any of its attached devices.
However, when genpd's ->prepare() callback invokes pm_generic_prepare(), it
does not take into account that it may return 1. Instead it treats that as
an error internally and expects the PM core to abort the prepare phase and
roll back. This leads to genpd not properly powering on/off the PM domain,
because its internal counters gets wrongly balanced.
To fix the behaviour, allow drivers to return 1 from their ->prepare()
callbacks, but let's return 0 from genpd's ->prepare() callback in such
case, as that prevents the PM core from running the direct_complete path
for the device.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7978db344719dab1e56d05e6fc04aaaddcde0a5e upstream.
The for_each_available_child_of_node() loop in _of_add_opp_table_v2()
doesn't drop the reference to "np" on errors. Fix that.
Fixes: 274659029c9d (PM / OPP: Add support to parse "operating-points-v2" bindings)
Signed-off-by: Tobias Jordan <Tobias.Jordan@elektrobit.com>
[ VK: Improved commit log. ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
When I execute numactl -H (which reads /sys/devices/system/node/nodeX/cpumap
and displays cpumask_of_node for each node), I get different result
on X86 and arm64. For each numa node, the former only displayed online
CPUs, and the latter displayed all possible CPUs. Unfortunately, both
Linux documentation and numactl manual have not described it clear.
I sent a mail to ask for help, and Michal Hocko replied that he
preferred to print online cpus because it doesn't really make much sense
to bind anything on offline nodes.
Will said:
"I suspect the vast majority (if not all) code that reads this file was
developed for x86, so having the same behaviour for arm64 sounds like
something we should do ASAP before people try to special case with
things like #ifdef __aarch64__. I'd rather have this in 4.14 if
possible."
Link: http://lkml.kernel.org/r/1506678805-15392-2-git-send-email-thunder.leizhen@huawei.com
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Tianhong Ding <dingtianhong@huawei.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Libin <huawei.libin@huawei.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
acpi_fwnode_get_reference_args(), the function implementing ACPI
support for fwnode_property_get_reference_args(), returns directly
error codes from __acpi_node_get_property_reference(). The latter
uses different error codes than the OF implementation. In particular,
the OF implementation uses -ENOENT to indicate that the property is
not found, a reference entry is empty and there are no more
references.
Document and align the error codes for property for
fwnode_property_get_reference_args() so that they match with
of_parse_phandle_with_args().
Fixes: 3e3119d3088f (device property: Introduce fwnode_property_get_reference_args)
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Deletion of subdevice will remove device properties associated to parent
when they share the same firmware node after commit 478573c93abd (driver
core: Don't leak secondary fwnode on device removal). This was observed
with a driver adding subdevice that driver wasn't able to read device
properties after rmmod/modprobe cycle.
Consider the lifecycle of it:
parent device registration
ACPI_COMPANION_SET()
device_add_properties()
pset_copy_set()
set_secondary_fwnode(dev, &p->fwnode)
device_add()
parent probe
read device properties
ACPI_COMPANION_SET(subdevice, ACPI_COMPANION(parent))
device_add(subdevice)
parent remove
device_del(subdevice)
device_remove_properties()
set_secondary_fwnode(dev, NULL);
pset_free()
Parent device will have its primary firmware node pointing to an ACPI
node and secondary firmware node point to device properties.
ACPI_COMPANION_SET() call in parent probe will set the subdevice's
firmware node to point to the same 'struct fwnode_handle' and the
associated secondary firmware node, i.e. the device properties as the
parent.
When subdevice is deleted in parent remove that will remove those
device properties and attempt to read device properties in next
parent probe call will fail.
Fix this by tracking the owner device of device properties and delete
them only when owner device is being deleted.
Fixes: 478573c93abd (driver core: Don't leak secondary fwnode on device removal)
Cc: 4.9+ <stable@vger.kernel.org> # 4.9+
Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
"Here are a few small fixes for 4.14-rc4.
The removal of DRIVER_ATTR() was almost completed by 4.14-rc1, but one
straggler made it in through some other tree (odds are, one of
mine...) So there's a simple removal of the last user, and then
finally the macro is removed from the tree.
There's a fix for old crazy udev instances that insist on reloading a
module when it is removed from the kernel due to the new uevents for
bind/unbind. This fixes the reported regression, hopefully some year
in the future we can drop the workaround, once users update to the
latest version, but I'm not holding my breath.
And then there's a build fix for a linker warning, and a buffer
overflow fix to match the PCI fixes you took through the PCI tree in
the same area.
All of these have been in linux-next for a few weeks while I've been
traveling, sorry for the delay"
* tag 'driver-core-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
driver core: remove DRIVER_ATTR
fpga: altera-cvp: remove DRIVER_ATTR() usage
driver core: platform: Don't read past the end of "driver_override" buffer
base: arch_topology: fix section mismatch build warnings
driver core: suppress sending MODALIAS in UNBIND uevents
|
|
The notifier callbacks may want to call some OPP helper routines which
may try to take the same opp_table->lock again and cause a deadlock. One
such usecase was reported by Chanwoo Choi, where calling
dev_pm_opp_disable() leads us to the devfreq's OPP notifier handler,
which further calls dev_pm_opp_find_freq_floor() and it deadlocks.
We don't really need the opp_table->lock to be held across the notifier
call though, all we want to make sure is that the 'opp' doesn't get
freed while being used from within the notifier chain. We can do it with
help of dev_pm_opp_get/put() as well. Let's do it.
Cc: 4.11+ <stable@vger.kernel.org> # 4.11+
Fixes: 5b650b388844 "PM / OPP: Take kref from _find_opp_table()"
Reported-by: Chanwoo Choi <cw00.choi@samsung.com>
Tested-by: Chanwoo Choi <cw00.choi@samsung.com>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix a cpufreq regression introduced by recent changes related to
the generic DT driver, an initialization time memory leak in cpuidle
on ARM, a PM core bug that may cause system suspend/resume to fail on
some systems, a request type validation issue in the PM QoS framework
and two documentation-related issues.
Specifics:
- Fix a regression in cpufreq on systems using DT as the source of
CPU configuration information where two different code paths
attempt to create the cpufreq-dt device object (there can be only
one) and fix up the "compatible" matching for some TI platforms on
top of that (Viresh Kumar, Dave Gerlach).
- Fix an initialization time memory leak in cpuidle on ARM which
occurs if the cpuidle driver initialization fails (Stefan Wahren).
- Fix a PM core function that checks whether or not there are any
system suspend/resume callbacks for a device, but forgets to check
legacy callbacks which then may be skipped incorrectly and the
system may crash and/or the device may become unusable after a
suspend-resume cycle (Rafael Wysocki).
- Fix request type validation for latency tolerance PM QoS requests
which may lead to unexpected behavior (Jan Schönherr).
- Fix a broken link to PM documentation from a header file and a typo
in a PM document (Geert Uytterhoeven, Rafael Wysocki)"
* tag 'pm-4.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: ti-cpufreq: Support additional am43xx platforms
ARM: cpuidle: Avoid memleak if init fail
cpufreq: dt-platdev: Add some missing platforms to the blacklist
PM: core: Fix device_pm_check_callbacks()
PM: docs: Drop an excess character from devices.rst
PM / QoS: Use the correct variable to check the QoS request type
driver core: Fix link to device power management documentation
|
|
* pm-core:
PM: core: Fix device_pm_check_callbacks()
* pm-qos:
PM / QoS: Use the correct variable to check the QoS request type
* pm-docs:
PM: docs: Drop an excess character from devices.rst
driver core: Fix link to device power management documentation
|
|
The device_pm_check_callbacks() function doesn't check legacy
->suspend and ->resume callback pointers under the device's
bus type, class and driver, so in some cases it may set the
no_pm_callbacks flag for the device incorrectly and then the
callbacks may be skipped during system suspend/resume, which
shouldn't happen.
Fixes: aa8e54b55947 (PM / sleep: Go direct_complete if driver has no callbacks)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: 4.5+ <stable@vger.kernel.org> # 4.5+
|
|
When printing the driver_override parameter when it is 4095 and 4094 bytes
long, the printing code would access invalid memory because we need count+1
bytes for printing.
Reject driver_override values of these lengths in driver_override_store().
This is in close analogy to commit 4efe874aace5 ("PCI: Don't read past the
end of sysfs "driver_override" buffer") from Sasha Levin.
Fixes: 3d713e0e382e ("driver core: platform: add device binding path 'driver_override'")
Cc: stable@vger.kernel.org # v3.17+
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Commit 2ef7a2953c81 ("arm, arm64: factorize common cpu capacity default code")
introduced init_cpu_capacity_callback and init_cpu_capacity_notifier
which are referenced from initcall and are missing __init{,data}
annotations resulting the below section mismatch build warnings.
"WARNING: vmlinux.o(.text+0xbab790): Section mismatch in reference from
the function init_cpu_capacity_callback() to the variable .init.text:$x
The function init_cpu_capacity_callback() references the variable
__init $x. This is often because init_cpu_capacity_callback lacks a
__init annotation or the annotation of $x is wrong."
This patch fixes the above build warnings by adding the required annotations.
Fixes: 2ef7a2953c81 ("arm, arm64: factorize common cpu capacity default code")
Cc: Juri Lelli <juri.lelli@arm.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Use the actual function argument for the validation of the request type,
instead of the type field in a fresh (supposedly zero-initialized)
request structure.
Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
My recent bug fix introduced another bug, which caused rmem_dma_device_init
to always fail, as rmem->priv is never set to anything.
This restores the previous behavior, calling dma_init_coherent_memory()
whenever ->priv is NULL.
Fixes: d35b0996fef3 ("dma-coherent: fix dma_declare_coherent_memory() logic error")
Reported-by: Roy Pledge <roy.pledge@nxp.com>
Tested-by: Roy Pledge <roy.pledge@nxp.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Commit 5620a0d1aac ("firmware: delete in-kernel firmware") removed the
entire firmware directory. Unfortunately it thereby also removed the
support for built-in firmware.
This restores the ability to build firmware directly into the kernel by
pruning the original Makefile to the necessary minimum. The default for
EXTRA_FIRMWARE_DIR is now the standard directory /lib/firmware/.
Fixes: 5620a0d1aac ("firmware: delete in-kernel firmware")
Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Acked-by: Greg K-H <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull dma-mapping updates from Christoph Hellwig:
- removal of the old dma_alloc_noncoherent interface
- remove unused flags to dma_declare_coherent_memory
- restrict OF DMA configuration to specific physical busses
- use the iommu mailing list for dma-mapping questions and patches
* tag 'dma-mapping-4.14' of git://git.infradead.org/users/hch/dma-mapping:
dma-coherent: fix dma_declare_coherent_memory() logic error
ARM: imx: mx31moboard: Remove unused 'dma' variable
dma-coherent: remove an unused variable
MAINTAINERS: use the iommu list for the dma-mapping subsystem
dma-coherent: remove the DMA_MEMORY_MAP and DMA_MEMORY_IO flags
dma-coherent: remove the DMA_MEMORY_INCLUDES_CHILDREN flag
of: restrict DMA configuration
dma-mapping: remove dma_alloc_noncoherent and dma_free_noncoherent
i825xx: switch to switch to dma_alloc_attrs
au1000_eth: switch to dma_alloc_attrs
sgiseeq: switch to dma_alloc_attrs
dma-mapping: reduce dma_mapping_error inline bloat
|
|
This reverts commit 81f95076281fdd3bc382e004ba1bce8e82fccbce.
It causes random failures of firmware loading at resume time (well,
random for me, it seems to be more reliable for others) because the
firmware disabling is not actually synchronous with any particular
resume event, and at least the btusb driver that uses a workqueue to
load the firmware at resume seems to occasionally hit the "firmware
loading is disabled" logic because the firmware loader hasn't gotten the
resume event yet.
Some kind of sanity check for not trying to load firmware when it's not
possible might be a good thing, but this commit was not it.
Greg seems to have silently suffered the same issue, and pointed to the
likely culprit, and Gabriel C verified the revert fixed it for him too.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Pointed-at-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tested-by: Gabriel C <nix.or.die@gmail.com>
Cc: Luis R. Rodriguez <mcgrof@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
First, number of CPUs can't be negative number.
Second, different signnnedness leads to suboptimal code in the following
cases:
1)
kmalloc(nr_cpu_ids * sizeof(X));
"int" has to be sign extended to size_t.
2)
while (loff_t *pos < nr_cpu_ids)
MOVSXD is 1 byte longed than the same MOV.
Other cases exist as well. Basically compiler is told that nr_cpu_ids
can't be negative which can't be deduced if it is "int".
Code savings on allyesconfig kernel: -3KB
add/remove: 0/0 grow/shrink: 25/264 up/down: 261/-3631 (-3370)
function old new delta
coretemp_cpu_online 450 512 +62
rcu_init_one 1234 1272 +38
pci_device_probe 374 399 +25
...
pgdat_reclaimable_pages 628 556 -72
select_fallback_rq 446 369 -77
task_numa_find_cpu 1923 1807 -116
Link: http://lkml.kernel.org/r/20170819114959.GA30580@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Patch series "Separate NUMA statistics from zone statistics", v2.
Each page allocation updates a set of per-zone statistics with a call to
zone_statistics(). As discussed in 2017 MM summit, these are a
substantial source of overhead in the page allocator and are very rarely
consumed. This significant overhead in cache bouncing caused by zone
counters (NUMA associated counters) update in parallel in multi-threaded
page allocation (pointed out by Dave Hansen).
A link to the MM summit slides:
http://people.netfilter.org/hawk/presentations/MM-summit2017/MM-summit2017-JesperBrouer.pdf
To mitigate this overhead, this patchset separates NUMA statistics from
zone statistics framework, and update NUMA counter threshold to a fixed
size of MAX_U16 - 2, as a small threshold greatly increases the update
frequency of the global counter from local per cpu counter (suggested by
Ying Huang). The rationality is that these statistics counters don't
need to be read often, unlike other VM counters, so it's not a problem
to use a large threshold and make readers more expensive.
With this patchset, we see 31.3% drop of CPU cycles(537-->369, see
below) for per single page allocation and reclaim on Jesper's
page_bench03 benchmark. Meanwhile, this patchset keeps the same style
of virtual memory statistics with little end-user-visible effects (only
move the numa stats to show behind zone page stats, see the first patch
for details).
I did an experiment of single page allocation and reclaim concurrently
using Jesper's page_bench03 benchmark on a 2-Socket Broadwell-based
server (88 processors with 126G memory) with different size of threshold
of pcp counter.
Benchmark provided by Jesper D Brouer(increase loop times to 10000000):
https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm/bench
Threshold CPU cycles Throughput(88 threads)
32 799 241760478
64 640 301628829
125 537 358906028 <==> system by default
256 468 412397590
512 428 450550704
4096 399 482520943
20000 394 489009617
30000 395 488017817
65533 369(-31.3%) 521661345(+45.3%) <==> with this patchset
N/A 342(-36.3%) 562900157(+56.8%) <==> disable zone_statistics
This patch (of 3):
In this patch, NUMA statistics is separated from zone statistics
framework, all the call sites of NUMA stats are changed to use
numa-stats-specific functions, it does not have any functionality change
except that the number of NUMA stats is shown behind zone page stats
when users *read* the zone info.
E.g. cat /proc/zoneinfo
***Base*** ***With this patch***
nr_free_pages 3976 nr_free_pages 3976
nr_zone_inactive_anon 0 nr_zone_inactive_anon 0
nr_zone_active_anon 0 nr_zone_active_anon 0
nr_zone_inactive_file 0 nr_zone_inactive_file 0
nr_zone_active_file 0 nr_zone_active_file 0
nr_zone_unevictable 0 nr_zone_unevictable 0
nr_zone_write_pending 0 nr_zone_write_pending 0
nr_mlock 0 nr_mlock 0
nr_page_table_pages 0 nr_page_table_pages 0
nr_kernel_stack 0 nr_kernel_stack 0
nr_bounce 0 nr_bounce 0
nr_zspages 0 nr_zspages 0
numa_hit 0 *nr_free_cma 0*
numa_miss 0 numa_hit 0
numa_foreign 0 numa_miss 0
numa_interleave 0 numa_foreign 0
numa_local 0 numa_interleave 0
numa_other 0 numa_local 0
*nr_free_cma 0* numa_other 0
... ...
vm stats threshold: 10 vm stats threshold: 10
... ...
The next patch updates the numa stats counter size and threshold.
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/1503568801-21305-2-git-send-email-kemi.wang@intel.com
Signed-off-by: Kemi Wang <kemi.wang@intel.com>
Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Ying Huang <ying.huang@intel.com>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Historically we have enforced that any kernel zone (e.g ZONE_NORMAL) has
to precede the Movable zone in the physical memory range. The purpose
of the movable zone is, however, not bound to any physical memory
restriction. It merely defines a class of migrateable and reclaimable
memory.
There are users (e.g. CMA) who might want to reserve specific physical
memory ranges for their own purpose. Moreover our pfn walkers have to
be prepared for zones overlapping in the physical range already because
we do support interleaving NUMA nodes and therefore zones can interleave
as well. This means we can allow each memory block to be associated
with a different zone.
Loosen the current onlining semantic and allow explicit onlining type on
any memblock. That means that online_{kernel,movable} will be allowed
regardless of the physical address of the memblock as long as it is
offline of course. This might result in moveble zone overlapping with
other kernel zones. Default onlining then becomes a bit tricky but
still sensible. echo online > memoryXY/state will online the given
block to
1) the default zone if the given range is outside of any zone
2) the enclosing zone if such a zone doesn't interleave with
any other zone
3) the default zone if more zones interleave for this range
where default zone is movable zone only if movable_node is enabled
otherwise it is a kernel zone.
Here is an example of the semantic with (movable_node is not present but
it work in an analogous way). We start with following memblocks, all of
them offline:
memory34/valid_zones:Normal Movable
memory35/valid_zones:Normal Movable
memory36/valid_zones:Normal Movable
memory37/valid_zones:Normal Movable
memory38/valid_zones:Normal Movable
memory39/valid_zones:Normal Movable
memory40/valid_zones:Normal Movable
memory41/valid_zones:Normal Movable
Now, we online block 34 in default mode and block 37 as movable
root@test1:/sys/devices/system/node/node1# echo online > memory34/state
root@test1:/sys/devices/system/node/node1# echo online_movable > memory37/state
memory34/valid_zones:Normal
memory35/valid_zones:Normal Movable
memory36/valid_zones:Normal Movable
memory37/valid_zones:Movable
memory38/valid_zones:Normal Movable
memory39/valid_zones:Normal Movable
memory40/valid_zones:Normal Movable
memory41/valid_zones:Normal Movable
As we can see all other blocks can still be onlined both into Normal and
Movable zones and the Normal is default because the Movable zone spans
only block37 now.
root@test1:/sys/devices/system/node/node1# echo online_movable > memory41/state
memory34/valid_zones:Normal
memory35/valid_zones:Normal Movable
memory36/valid_zones:Normal Movable
memory37/valid_zones:Movable
memory38/valid_zones:Movable Normal
memory39/valid_zones:Movable Normal
memory40/valid_zones:Movable Normal
memory41/valid_zones:Movable
Now the default zone for blocks 37-41 has changed because movable zone
spans that range.
root@test1:/sys/devices/system/node/node1# echo online_kernel > memory39/state
memory34/valid_zones:Normal
memory35/valid_zones:Normal Movable
memory36/valid_zones:Normal Movable
memory37/valid_zones:Movable
memory38/valid_zones:Normal Movable
memory39/valid_zones:Normal
memory40/valid_zones:Movable Normal
memory41/valid_zones:Movable
Note that the block 39 now belongs to the zone Normal and so block38
falls into Normal by default as well.
For completness
root@test1:/sys/devices/system/node/node1# for i in memory[34]?
do
echo online > $i/state 2>/dev/null
done
memory34/valid_zones:Normal
memory35/valid_zones:Normal
memory36/valid_zones:Normal
memory37/valid_zones:Movable
memory38/valid_zones:Normal
memory39/valid_zones:Normal
memory40/valid_zones:Movable
memory41/valid_zones:Movable
Implementation wise the change is quite straightforward. We can get rid
of allow_online_pfn_range altogether. online_pages allows only offline
nodes already. The original default_zone_for_pfn will become
default_kernel_zone_for_pfn. New default_zone_for_pfn implements the
above semantic. zone_for_pfn_range is slightly reorganized to implement
kernel and movable online type explicitly and MMOP_ONLINE_KEEP becomes a
catch all default behavior.
Link: http://lkml.kernel.org/r/20170714121233.16861-3-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Yasuaki Ishimatsu <yasu.isimatu@gmail.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Kani Toshimitsu <toshi.kani@hpe.com>
Cc: <slaoub@gmail.com>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: <linux-api@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Prior to commit f1dd2cd13c4b ("mm, memory_hotplug: do not associate
hotadded memory to zones until online") we used to allow to change the
valid zone types of a memory block if it is adjacent to a different zone
type.
This fact was reflected in memoryNN/valid_zones by the ordering of
printed zones. The first one was default (echo online > memoryNN/state)
and the other one could be onlined explicitly by online_{movable,kernel}.
This behavior was removed by the said patch and as such the ordering was
not all that important. In most cases a kernel zone would be default
anyway. The only exception is movable_node handled by "mm,
memory_hotplug: support movable_node for hotpluggable nodes".
Let's reintroduce this behavior again because later patch will remove
the zone overlap restriction and so user will be allowed to online
kernel resp. movable block regardless of its placement. Original
behavior will then become significant again because it would be
non-trivial for users to see what is the default zone to online into.
Implementation is really simple. Pull out zone selection out of
move_pfn_range into zone_for_pfn_range helper and use it in
show_valid_zones to display the zone for default onlining and then both
kernel and movable if they are allowed. Default online zone is not
duplicated.
Link: http://lkml.kernel.org/r/20170714121233.16861-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Yasuaki Ishimatsu <yasu.isimatu@gmail.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Kani Toshimitsu <toshi.kani@hpe.com>
Cc: <slaoub@gmail.com>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull device properties framework updates from Rafael Wysocki:
"These introduce fwnode operations for all of the separate types of
'firmware nodes' that can be handled by the device properties
framework, make the framework use const fwnode arguments all over, add
a helper for the consolidated handling of node references and switch
over the framework to the new UUID API.
Specifics:
- Introduce fwnode operations for all of the separate types of
'firmware nodes' that can be handled by the device properties
framework and drop the type field from struct fwnode_handle (Sakari
Ailus, Arnd Bergmann).
- Make the device properties framework use const fwnode arguments
where possible (Sakari Ailus).
- Add a helper for the consolidated handling of node references to
the device properties framework (Sakari Ailus).
- Switch over the ACPI part of the device properties framework to the
new UUID API (Andy Shevchenko)"
* tag 'devprop-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: device property: Switch to use new generic UUID API
device property: export irqchip_fwnode_ops
device property: Introduce fwnode_property_get_reference_args
device property: Constify fwnode property API
device property: Constify argument to pset fwnode backend
ACPI: Constify internal fwnode arguments
ACPI: Constify acpi_bus helper functions, switch to macros
ACPI: Prepare for constifying acpi_get_next_subnode() fwnode argument
device property: Get rid of struct fwnode_handle type field
ACPI: Use IS_ERR_OR_NULL() instead of non-NULL check in is_acpi_data_node()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"This time (again) cpufreq gets the majority of changes which mostly
are driver updates (including a major consolidation of intel_pstate),
some schedutil governor modifications and core cleanups.
There also are some changes in the system suspend area, mostly related
to diagnostics and debug messages plus some renames of things related
to suspend-to-idle. One major change here is that suspend-to-idle is
now going to be preferred over S3 on systems where the ACPI tables
indicate to do so and provide requsite support (the Low Power Idle S0
_DSM in particular). The system sleep documentation and the tools
related to it are updated too.
The rest is a few cpuidle changes (nothing major), devfreq updates,
generic power domains (genpd) framework updates and a few assorted
modifications elsewhere.
Specifics:
- Drop the P-state selection algorithm based on a PID controller from
intel_pstate and make it use the same P-state selection method
(based on the CPU load) for all types of systems in the active mode
(Rafael Wysocki, Srinivas Pandruvada).
- Rework the cpufreq core and governors to make it possible to take
cross-CPU utilization updates into account and modify the schedutil
governor to actually do so (Viresh Kumar).
- Clean up the handling of transition latency information in the
cpufreq core and untangle it from the information on which drivers
cannot do dynamic frequency switching (Viresh Kumar).
- Add support for new SoCs (MT2701/MT7623 and MT7622) to the mediatek
cpufreq driver and update its DT bindings (Sean Wang).
- Modify the cpufreq dt-platdev driver to autimatically create
cpufreq devices for the new (v2) Operating Performance Points (OPP)
DT bindings and update its whitelist of supported systems (Viresh
Kumar, Shubhrajyoti Datta, Marc Gonzalez, Khiem Nguyen, Finley
Xiao).
- Add support for Ux500 to the cpufreq-dt driver and drop the
obsolete dbx500 cpufreq driver (Linus Walleij, Arnd Bergmann).
- Add new SoC (R8A7795) support to the cpufreq rcar driver (Khiem
Nguyen).
- Fix and clean up assorted issues in the cpufreq drivers and core
(Arvind Yadav, Christophe Jaillet, Colin Ian King, Gustavo Silva,
Julia Lawall, Leonard Crestez, Rob Herring, Sudeep Holla).
- Update the IO-wait boost handling in the schedutil governor to make
it less aggressive (Joel Fernandes).
- Rework system suspend diagnostics to make it print fewer messages
to the kernel log by default, add a sysfs knob to allow more
suspend-related messages to be printed and add Low Power S0 Idle
constraints checks to the ACPI suspend-to-idle code (Rafael
Wysocki, Srinivas Pandruvada).
- Prefer suspend-to-idle over S3 on ACPI-based systems with the
ACPI_FADT_LOW_POWER_S0 flag set and the Low Power Idle S0 _DSM
interface present in the ACPI tables (Rafael Wysocki).
- Update documentation related to system sleep and rename a number of
items in the code to make it cleare that they are related to
suspend-to-idle (Rafael Wysocki).
- Export a variable allowing device drivers to check the target
system sleep state from the core system suspend code (Florian
Fainelli).
- Clean up the cpuidle subsystem to handle the polling state on x86
in a more straightforward way and to use %pOF instead of full_name
(Rafael Wysocki, Rob Herring).
- Update the devfreq framework to fix and clean up a few minor issues
(Chanwoo Choi, Rob Herring).
- Extend diagnostics in the generic power domains (genpd) framework
and clean it up slightly (Thara Gopinath, Rob Herring).
- Fix and clean up a couple of issues in the operating performance
points (OPP) framework (Viresh Kumar, Waldemar Rymarkiewicz).
- Add support for RV1108 to the rockchip-io Adaptive Voltage Scaling
(AVS) driver (David Wu).
- Fix the usage of notifiers in CPU power management on some
platforms (Alex Shi).
- Update the pm-graph system suspend/hibernation and boot profiling
utility (Todd Brandt).
- Make it possible to run the cpupower utility without CPU0 (Prarit
Bhargava)"
* tag 'pm-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (87 commits)
cpuidle: Make drivers initialize polling state
cpuidle: Move polling state initialization code to separate file
cpuidle: Eliminate the CPUIDLE_DRIVER_STATE_START symbol
cpufreq: imx6q: Fix imx6sx low frequency support
cpufreq: speedstep-lib: make several arrays static, makes code smaller
PM: docs: Delete the obsolete states.txt document
PM: docs: Describe high-level PM strategies and sleep states
PM / devfreq: Fix memory leak when fail to register device
PM / devfreq: Add dependency on PM_OPP
PM / devfreq: Move private devfreq_update_stats() into devfreq
PM / devfreq: Convert to using %pOF instead of full_name
PM / AVS: rockchip-io: add io selectors and supplies for RV1108
cpufreq: ti: Fix 'of_node_put' being called twice in error handling path
cpufreq: dt-platdev: Drop few entries from whitelist
cpufreq: dt-platdev: Automatically create cpufreq device with OPP v2
ARM: ux500: don't select CPUFREQ_DT
cpuidle: Convert to using %pOF instead of full_name
cpufreq: Convert to using %pOF instead of full_name
PM / Domains: Convert to using %pOF instead of full_name
cpufreq: Cap the default transition delay value to 10 ms
...
|
|
A recent change interprets the return code of dma_init_coherent_memory
as an error value, but it is instead a boolean, where 'true' indicates
success. This leads causes the caller to always do the wrong thing,
and also triggers a compile-time warning about it:
drivers/base/dma-coherent.c: In function 'dma_declare_coherent_memory':
drivers/base/dma-coherent.c:99:15: error: 'mem' may be used uninitialized in this function [-Werror=maybe-uninitialized]
I ended up changing the code a little more, to give use the usual
error handling, as this seemed the best way to fix up the warning
and make the code look reasonable at the same time.
Fixes: 2436bdcda53f ("dma-coherent: remove the DMA_MEMORY_MAP and DMA_MEMORY_IO flags")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
|
|
* pm-sleep:
ACPI / PM: Check low power idle constraints for debug only
PM / s2idle: Rename platform operations structure
PM / s2idle: Rename ->enter_freeze to ->enter_s2idle
PM / s2idle: Rename freeze_state enum and related items
PM / s2idle: Rename PM_SUSPEND_FREEZE to PM_SUSPEND_TO_IDLE
ACPI / PM: Prefer suspend-to-idle over S3 on some systems
platform/x86: intel-hid: Wake up Dell Latitude 7275 from suspend-to-idle
PM / suspend: Define pr_fmt() in suspend.c
PM / suspend: Use mem_sleep_labels[] strings in messages
PM / sleep: Put pm_test under CONFIG_PM_SLEEP_DEBUG
PM / sleep: Check pm_wakeup_pending() in __device_suspend_noirq()
PM / core: Add error argument to dpm_show_time()
PM / core: Split dpm_suspend_noirq() and dpm_resume_noirq()
PM / s2idle: Rearrange the main suspend-to-idle loop
PM / timekeeping: Print debug messages when requested
PM / sleep: Mark suspend/hibernation start and finish
PM / sleep: Do not print debug messages by default
PM / suspend: Export pm_suspend_target_state
|