Age | Commit message (Collapse) | Author | Files | Lines |
|
This patch adds a buffer synchronization framework based on DMA BUF[1]
and and based on ww-mutexes[2] for lock mechanism.
The purpose of this framework is to provide not only buffer access control
to CPU and DMA but also easy-to-use interfaces for device drivers and
user application. This framework can be used for all dma devices using
system memory as dma buffer, especially for most ARM based SoCs.
Changelog v5:
- Rmove a dependence on reservation_object: the reservation_object is used
to hook up to ttm and dma-buf for easy sharing of reservations across
devices. However, the dmabuf sync can be used for all dma devices; v4l2
and drm based drivers, so doesn't need the reservation_object anymore.
With regared to this, it adds 'void *sync' to dma_buf structure.
- All patches are rebased on mainline, Linux v3.10.
Changelog v4:
- Add user side interface for buffer synchronization mechanism and update
descriptions related to the user side interface.
Changelog v3:
- remove cache operation relevant codes and update document file.
Changelog v2:
- use atomic_add_unless to avoid potential bug.
- add a macro for checking valid access type.
- code clean.
The mechanism of this framework has the following steps,
1. Register dmabufs to a sync object - A task gets a new sync object and
can add one or more dmabufs that the task wants to access.
This registering should be performed when a device context or an event
context such as a page flip event is created or before CPU accesses a shared
buffer.
dma_buf_sync_get(a sync object, a dmabuf);
2. Lock a sync object - A task tries to lock all dmabufs added in its own
sync object. Basically, the lock mechanism uses ww-mutex[1] to avoid dead
lock issue and for race condition between CPU and CPU, CPU and DMA, and DMA
and DMA. Taking a lock means that others cannot access all locked dmabufs
until the task that locked the corresponding dmabufs, unlocks all the locked
dmabufs.
This locking should be performed before DMA or CPU accesses these dmabufs.
dma_buf_sync_lock(a sync object);
3. Unlock a sync object - The task unlocks all dmabufs added in its own sync
object. The unlock means that the DMA or CPU accesses to the dmabufs have
been completed so that others may access them.
This unlocking should be performed after DMA or CPU has completed accesses
to the dmabufs.
dma_buf_sync_unlock(a sync object);
4. Unregister one or all dmabufs from a sync object - A task unregisters
the given dmabufs from the sync object. This means that the task dosen't
want to lock the dmabufs.
The unregistering should be performed after DMA or CPU has completed
accesses to the dmabufs or when dma_buf_sync_lock() is failed.
dma_buf_sync_put(a sync object, a dmabuf);
dma_buf_sync_put_all(a sync object);
The described steps may be summarized as:
get -> lock -> CPU or DMA access to a buffer/s -> unlock -> put
This framework includes the following two features.
1. read (shared) and write (exclusive) locks - A task is required to declare
the access type when the task tries to register a dmabuf;
READ, WRITE, READ DMA, or WRITE DMA.
The below is example codes,
struct dmabuf_sync *sync;
sync = dmabuf_sync_init(NULL, "test sync");
dmabuf_sync_get(sync, dmabuf, DMA_BUF_ACCESS_R);
...
And the below can be used as access types:
DMA_BUF_ACCESS_R - CPU will access a buffer for read.
DMA_BUF_ACCESS_W - CPU will access a buffer for read or write.
DMA_BUF_ACCESS_DMA_R - DMA will access a buffer for read
DMA_BUF_ACCESS_DMA_W - DMA will access a buffer for read or
write.
2. Mandatory resource releasing - a task cannot hold a lock indefinitely.
A task may never try to unlock a buffer after taking a lock to the buffer.
In this case, a timer handler to the corresponding sync object is called
in five (default) seconds and then the timed-out buffer is unlocked by work
queue handler to avoid lockups and to enforce resources of the buffer.
The below is how to use interfaces for device driver:
1. Allocate and Initialize a sync object:
struct dmabuf_sync *sync;
sync = dmabuf_sync_init(NULL, "test sync");
...
2. Add a dmabuf to the sync object when setting up dma buffer relevant
registers:
dmabuf_sync_get(sync, dmabuf, DMA_BUF_ACCESS_READ);
...
3. Lock all dmabufs of the sync object before DMA or CPU accesses
the dmabufs:
dmabuf_sync_lock(sync);
...
4. Now CPU or DMA can access all dmabufs locked in step 3.
5. Unlock all dmabufs added in a sync object after DMA or CPU access
to these dmabufs is completed:
dmabuf_sync_unlock(sync);
And call the following functions to release all resources,
dmabuf_sync_put_all(sync);
dmabuf_sync_fini(sync);
You can refer to actual example codes:
"drm/exynos: add dmabuf sync support for g2d driver" and
"drm/exynos: add dmabuf sync support for kms framework" from
https://git.kernel.org/cgit/linux/kernel/git/daeinki/
drm-exynos.git/log/?h=dmabuf-sync
And this framework includes fcntl system call[3] as interfaces exported
to user. As you know, user sees a buffer object as a dma-buf file descriptor.
So fcntl() call with the file descriptor means to lock some buffer region being
managed by the dma-buf object.
The below is how to use interfaces for user application:
struct flock filelock;
1. Lock a dma buf:
filelock.l_type = F_WRLCK or F_RDLCK;
/* lock entire region to the dma buf. */
filelock.lwhence = SEEK_CUR;
filelock.l_start = 0;
filelock.l_len = 0;
fcntl(dmabuf fd, F_SETLKW or F_SETLK, &filelock);
...
CPU access to the dma buf
2. Unlock a dma buf:
filelock.l_type = F_UNLCK;
fcntl(dmabuf fd, F_SETLKW or F_SETLK, &filelock);
close(dmabuf fd) call would also unlock the dma buf. And for more
detail, please refer to [3]
References:
[1] http://lwn.net/Articles/470339/
[2] https://patchwork.kernel.org/patch/2625361/
[3] http://linux.die.net/man/2/fcntl
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
|
|
Injects EDEADLK conditions at pseudo-random interval, with
exponential backoff up to UINT_MAX (to ensure that every lock
operation still completes in a reasonable time).
This way we can test the wound slowpath even for ww mutex users
where contention is never expected, and the ww deadlock
avoidance algorithm is only needed for correctness against
malicious userspace. An example would be protecting kernel
modesetting properties, which thanks to single-threaded X isn't
really expected to contend, ever.
I've looked into using the CONFIG_FAULT_INJECTION
infrastructure, but decided against it for two reasons:
- EDEADLK handling is mandatory for ww mutex users and should
never affect the outcome of a syscall. This is in contrast to -ENOMEM
injection. So fine configurability isn't required.
- The fault injection framework only allows to set a simple
probability for failure. Now the probability that a ww mutex acquire
stage with N locks will never complete (due to too many injected
EDEADLK backoffs) is zero. But the expected number of ww_mutex_lock
operations for the completely uncontended case would be O(exp(N)).
The per-acuiqire ctx exponential backoff solution choosen here only
results in O(log N) overhead due to injection and so O(log N * N)
lock operations. This way we can fail with high probability (and so
have good test coverage even for fancy backoff and lock acquisition
paths) without running into patalogical cases.
Note that EDEADLK will only ever be injected when we managed to
acquire the lock. This prevents any behaviour changes for users
which rely on the EALREADY semantics.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
Cc: rostedt@goodmis.org
Cc: daniel@ffwll.ch
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20130620113117.4001.21681.stgit@patser
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Wound/wait mutexes are used when other multiple lock
acquisitions of a similar type can be done in an arbitrary
order. The deadlock handling used here is called wait/wound in
the RDBMS literature: The older tasks waits until it can acquire
the contended lock. The younger tasks needs to back off and drop
all the locks it is currently holding, i.e. the younger task is
wounded.
For full documentation please read Documentation/ww-mutex-design.txt.
References: https://lwn.net/Articles/548909/
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
Cc: rostedt@goodmis.org
Cc: daniel@ffwll.ch
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/51C8038C.9000106@canonical.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
This will allow me to call functions that have multiple
arguments if fastpath fails. This is required to support ticket
mutexes, because they need to be able to pass an extra argument
to the fail function.
Originally I duplicated the functions, by adding
__mutex_fastpath_lock_retval_arg. This ended up being just a
duplication of the existing function, so a way to test if
fastpath was called ended up being better.
This also cleaned up the reservation mutex patch some by being
able to call an atomic_set instead of atomic_xchg, and making it
easier to detect if the wrong unlock function was previously
used.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
Cc: robclark@gmail.com
Cc: rostedt@goodmis.org
Cc: daniel@ffwll.ch
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20130620113105.4001.83929.stgit@patser
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
This file is needed by modem_if driver.
Signed-off-by: Kamil Debski <k.debski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
|
|
Use the generic PHY API instead of the platform callback to control
the MIPI DSIM DPHY.
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
|
|
Use the generic PHY API instead of the platform callback to control
the MIPI CSIS DPHY. The 'phy_label' field is added to the platform
data structure to allow PHY lookup on non-dt platforms
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: Felipe Balbi <balbi@ti.com>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Use the generic PHY API instead of the platform callback to control
the MIPI DSIM DPHY. The 'phy_label' field is added to the platform
data structure to allow PHY lookup on non-dt platforms.
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: Felipe Balbi <balbi@ti.com>
Acked-by: Donghwa Lee <dh09.lee@samsung.com>
|
|
The PHY framework provides a set of APIs for the PHY drivers to
create/destroy a PHY and APIs for the PHY users to obtain a reference to the
PHY with or without using phandle. For dt-boot, the PHY drivers should
also register *PHY provider* with the framework.
PHY drivers should create the PHY by passing id and ops like init, exit,
power_on and power_off. This framework is also pm runtime enabled.
The documentation for the generic PHY framework is added in
Documentation/phy.txt and the documentation for dt binding can be found at
Documentation/devicetree/bindings/phy/phy-bindings.txt
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Tested-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
|
|
Ensure the colorspace is properly adjusted by the driver for YUV
and Bayer image formats. The subdev try_fmt helper is simplified.
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
This patch removes unused gpio irq pin from driver.
Signed-off-by: Lukasz Czerwinski <l.czerwinski@samsung.com>
|
|
This patch adds device tree support for nxp pn544-i2c nfc controller.
Signed-off-by: Lukasz Czerwinski <l.czerwinski@samsung.com>
|
|
This patch adds threshold events support for the ST common
library.
|
|
This patch adds handling of multiple interrupts for st_sensors.
Each mapped interrupt can be declared from DT.
Signed-off-by: Lukasz Czerwinski <l.czerwinski@samsung.com>
|
|
Driver for Samsung S5K5BAF UXGA 1/5" 2M CMOS Image Sensor with embedded
SoC ISP. The driver exposes the sensor as two V4L2 subdevices:
- S5K5BAF-CIS - pure CMOS Image Sensor, fixed 1600x1200 format,
no controls.
- S5K5BAF-ISP - Image Signal Processor, formats up to 1600x1200,
pre/post ISP cropping, downscaling via selection API, controls.
The private V4L2_CID_{RED/GREEN/BLUE}_GAIN controls will be replaced
with V4L2_CID_{RED/BLUE}_BALANCE controls in subsequent patch.
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
v3:
- narrowed state->error usage to i2c and power errors
v2:
- lower-cased driver name,
- removed underscore from regulator names,
- removed platform data code,
- v4l controls grouped in anonymous structs,
- added s5k5baf_clear_error function,
- private controls definitions moved to uapi header file,
- added v4l2-controls.h reservation for private controls,
- corrected subdev registered/unregistered code,
- .log_status sudbev op set to v4l2 helper,
- moved entity link creation to probe routines,
- added cleanup on error to probe function.
|
|
Currently the media device link_notify callback is invoked before the
actual change of state of a link when the link is being enabled, and
after the actual change of state when the link is being disabled.
This doesn't allow a media device driver to perform any operations
on a full graph before a link is disabled, as well as performing
any tasks on a modified graph right after a link's state is changed.
This patch modifies signature of the link_notify callback. This
callback is now called always before and after a link's state change.
To distinguish the notifications a 'notification' argument is added
to the link_notify callback: MEDIA_DEV_NOTIFY_PRE_LINK_CH indicates
notification before link's state change and
MEDIA_DEV_NOTIFY_POST_LINK_CH corresponds to a notification after
link flags change.
[mchehab@redhat.com: whitespace cleanups]
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: Sakari Ailus <sakari.ailus@iki.fi>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
This enumeration is now private to exynos4-is and the exynos5 camera
subsystem driver may have the subdevs handling designed differently.
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
This patch introduces common structure for the video entities
to handle all video nodes and media pipelines associated with
them in more generic way.
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
This function allows to remove all media entity's links to other
entities, leaving no references to a media entity's links array
at its remote entities.
Currently, when a driver of some entity is removed it will free its
media entities links[] array, leaving dangling pointers at other
entities that are part of same media graph. This is troublesome when
drivers of a media device entities are in separate kernel modules,
removing only some modules will leave others in an incorrect state.
This function is intended to be used when an entity is being
unregistered from a media device.
With an assumption that normally the media links should be created
between media entities registered to a media device, with the graph
mutex held.
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Reviewed-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: Sakari Ailus <sakari.ailus@iki.fi>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Function media_entity_remote_source actually returns the remote pad to
the given one, regardless if this is the source or the sink pad.
Name media_entity_remote_pad is more adequate for this function.
Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: Sakari Ailus <sakari.ailus@iki.fi>
Acked-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Jonghwa Lee <jonghwa3.lee@samsung.com>
|
|
This patch adds initial version of Yamaha YMU823 (MC1N2) codec driver
ported for Linux 3.8 and Device Tree.
Signed-off-by: Tomasz Figa <t.figa@samsung.com>
|
|
This patch adds a CDF-compliant panel driver for S6E8AA0 DSI
LCD panel.
Signed-off-by: Tomasz Figa <t.figa@samsung.com>
|
|
This patch adds a CDF-compliant panel driver for S6D6AA1 DSI LCD panel.
Signed-off-by: Tomasz Figa <t.figa@samsung.com>
|
|
This patch adds new driver for DSI master block available in Samsung
Exynos SoCs. The driver is designed and written specifically for Common
Display Framework.
Signed-off-by: Tomasz Figa <t.figa@samsung.com>
|
|
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
|
|
Signed-off-by: Tomasz Figa <t.figa@samsung.com>
|
|
This patch adds device tree support for contiguous and reserved memory
regions defined in device tree.
Large memory blocks can be reliably reserved only during early boot.
This must happen before the whole memory management subsystem is
initialized, because we need to ensure that the given contiguous blocks
are not yet allocated by kernel. Also it must happen before kernel
mappings for the whole low memory are created, to ensure that there will
be no mappings (for reserved blocks) or mapping with special properties
can be created (for CMA blocks). This all happens before device tree
structures are unflattened, so we need to get reserved memory layout
directly from fdt.
Later, those reserved memory regions are assigned to devices on each
device structure initialization.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Tomasz Figa <t.figa@samsung.com>
|
|
Add a function to scan the flattened device-tree starting from the
node given by the path. It is used to extract information (like reserved
memory), which is required on early boot before we can unflatten the tree.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Tomasz Figa <t.figa@samsung.com>
|
|
This patch cleans the initialization of dma contiguous framework. The
all-in-one dma_declare_contiguous() function is now separated into
dma_contiguous_reserve_area() which only steals the the memory from
memblock allocator and dma_contiguous_add_device() function, which
assigns given device to the specified reserved memory area. This improves
the flexibility in defining contiguous memory areas and assigning device
to them, because now it is possible to assign more than one device to
the given contiguous memory area. Such split in initialization procedure
is also required for upcoming device tree support.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Tomasz Figa <t.figa@samsung.com>
|
|
Signed-off-by: Jonghwa Lee <jonghwa3.lee@samsung.com>
|
|
Signed-off-by: Jonghwa Lee <jonghwa3.lee@samsung.com>
|
|
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
|
|
commit a87fa1d81a9fb5e9adca9820e16008c40ad09f33 upstream.
The string property read helpers will run off the end of the buffer if
it is handed a malformed string property. Rework the parsers to make
sure that doesn't happen. At the same time add new test cases to make
sure the functions behave themselves.
The original implementations of of_property_read_string_index() and
of_property_count_strings() both open-coded the same block of parsing
code, each with it's own subtly different bugs. The fix here merges
functions into a single helper and makes the original functions static
inline wrappers around the helper.
One non-bugfix aspect of this patch is the addition of a new wrapper,
of_property_read_string_array(). The new wrapper is needed by the
device_properties feature that Rafael is working on and planning to
merge for v3.19. The implementation is identical both with and without
the new static inline wrapper, so it just got left in to reduce the
churn on the header file.
Signed-off-by: Grant Likely <grant.likely@linaro.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Darren Hart <darren.hart@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8c3e434769b1707fd2d24de5a2eb25fedc634c4a upstream.
0x4c6e is a secondary device id so should not be used
by the driver.
Noticed-by: Mark Kettenis <mark.kettenis@xs4all.nl>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 5695be142e203167e3cb515ef86a88424f3524eb upstream.
PM freezer relies on having all tasks frozen by the time devices are
getting frozen so that no task will touch them while they are getting
frozen. But OOM killer is allowed to kill an already frozen task in
order to handle OOM situtation. In order to protect from late wake ups
OOM killer is disabled after all tasks are frozen. This, however, still
keeps a window open when a killed task didn't manage to die by the time
freeze_processes finishes.
Reduce the race window by checking all tasks after OOM killer has been
disabled. This is still not race free completely unfortunately because
oom_killer_disable cannot stop an already ongoing OOM killer so a task
might still wake up from the fridge and get killed without
freeze_processes noticing. Full synchronization of OOM and freezer is,
however, too heavy weight for this highly unlikely case.
Introduce and check oom_kills counter which gets incremented early when
the allocator enters __alloc_pages_may_oom path and only check all the
tasks if the counter changes during the freezing attempt. The counter
is updated so early to reduce the race window since allocator checked
oom_killer_disabled which is set by PM-freezing code. A false positive
will push the PM-freezer into a slow path but that is not a big deal.
Changes since v1
- push the re-check loop out of freeze_processes into
check_frozen_processes and invert the condition to make the code more
readable as per Rafael
Fixes: f660daac474c6f (oom: thaw threads if oom killed thread is frozen before deferring)
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b8839b8c55f3fdd60dc36abcda7e0266aff7985c upstream.
The math in both blk_stack_limits() and queue_limit_alignment_offset()
assume that a block device's io_min (aka minimum_io_size) is always a
power-of-2. Fix the math such that it works for non-power-of-2 io_min.
This issue (of alignment_offset != 0) became apparent when testing
dm-thinp with a thinp blocksize that matches a RAID6 stripesize of
1280K. Commit fdfb4c8c1 ("dm thin: set minimum_io_size to pool's data
block size") unlocked the potential for alignment_offset != 0 due to
the dm-thin-pool's io_min possibly being a non-power-of-2.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d4c5efdb97773f59a2b711754ca0953f24516739 upstream.
zatimend has reported that in his environment (3.16/gcc4.8.3/corei7)
memset() calls which clear out sensitive data in extract_{buf,entropy,
entropy_user}() in random driver are being optimized away by gcc.
Add a helper memzero_explicit() (similarly as explicit_bzero() variants)
that can be used in such cases where a variable with sensitive data is
being cleared out in the end. Other use cases might also be in crypto
code. [ I have put this into lib/string.c though, as it's always built-in
and doesn't need any dependencies then. ]
Fixes kernel bugzilla: 82041
Reported-by: zatimend@hotmail.co.uk
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit fe8c8a126806fea4465c43d62a1f9d273a572bf5 upstream.
[Only use the compiler.h portion of this patch, to get the
OPTIMIZER_HIDE_VAR() macro, which we need for other -stable patches
- gregkh]
Disabling compiler optimizations can be fragile, since a new
optimization could be added to -O0 or -Os that breaks the assumptions
the code is making.
Instead of disabling compiler optimizations, use a dummy inline assembly
(based on RELOC_HIDE) to block the problematic kinds of optimization,
while still allowing other optimizations to be applied to the code.
The dummy inline assembly is added after every OR, and has the
accumulator variable as its input and output. The compiler is forced to
assume that the dummy inline assembly could both depend on the
accumulator variable and change the accumulator variable, so it is
forced to compute the value correctly before the inline assembly, and
cannot assume anything about its value after the inline assembly.
This change should be enough to make crypto_memneq work correctly (with
data-independent timing) even if it is inlined at its call sites. That
can be done later in a followup patch.
Compile-tested on x86_64.
Signed-off-by: Cesar Eduardo Barros <cesarb@cesarb.eti.br>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 90a8020278c1598fafd071736a0846b38510309c upstream.
->page_mkwrite() is used by filesystems to allocate blocks under a page
which is becoming writeably mmapped in some process' address space. This
allows a filesystem to return a page fault if there is not enough space
available, user exceeds quota or similar problem happens, rather than
silently discarding data later when writepage is called.
However VFS fails to call ->page_mkwrite() in all the cases where
filesystems need it when blocksize < pagesize. For example when
blocksize = 1024, pagesize = 4096 the following is problematic:
ftruncate(fd, 0);
pwrite(fd, buf, 1024, 0);
map = mmap(NULL, 1024, PROT_WRITE, MAP_SHARED, fd, 0);
map[0] = 'a'; ----> page_mkwrite() for index 0 is called
ftruncate(fd, 10000); /* or even pwrite(fd, buf, 1, 10000) */
mremap(map, 1024, 10000, 0);
map[4095] = 'a'; ----> no page_mkwrite() called
At the moment ->page_mkwrite() is called, filesystem can allocate only
one block for the page because i_size == 1024. Otherwise it would create
blocks beyond i_size which is generally undesirable. But later at
->writepage() time, we also need to store data at offset 4095 but we
don't have block allocated for it.
This patch introduces a helper function filesystems can use to have
->page_mkwrite() called at all the necessary moments.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 71458cfc782eafe4b27656e078d379a34e472adf upstream.
We're missing include/linux/compiler-gcc5.h which is required now
because gcc branched off to v5 in trunk.
Just copy the relevant bits out of include/linux/compiler-gcc4.h,
no new code is added as of now.
This fixes a build error when using gcc 5.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 934f3072c17cc8886f4c043b47eeeb1b12f8de33 upstream.
commit 21caf2fc1931 ("mm: teach mm by current context info to not do I/O
during memory allocation") introduces PF_MEMALLOC_NOIO flag to avoid doing
I/O inside memory allocation, __GFP_IO is cleared when this flag is set,
but __GFP_FS implies __GFP_IO, it should also be cleared. Or it may still
run into I/O, like in superblock shrinker. And this will make the kernel
run into the deadlock case described in that commit.
See Dave Chinner's comment about io in superblock shrinker:
Filesystem shrinkers do indeed perform IO from the superblock shrinker and
have for years. Even clean inodes can require IO before they can be freed
- e.g. on an orphan list, need truncation of post-eof blocks, need to
wait for ordered operations to complete before it can be freed, etc.
IOWs, Ext4, btrfs and XFS all can issue and/or block on arbitrary amounts
of IO in the superblock shrinker context. XFS, in particular, has been
doing transactions and IO from the VFS inode cache shrinker since it was
first introduced....
Fix this by clearing __GFP_FS in memalloc_noio_flags(), this function has
masked all the gfp_mask that will be passed into fs for the processes
setting PF_MEMALLOC_NOIO in the direct reclaim path.
v1 thread at: https://lkml.org/lkml/2014/9/3/32
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: joyce.xue <xuejiufei@huawei.com>
Cc: Ming Lei <ming.lei@canonical.com>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ddbe1fca0bcb87ca8c199ea873a456ca8a948567 upstream.
This full-speed USB device generates spurious remote wakeup event
as soon as USB_DEVICE_REMOTE_WAKEUP feature is set. As the result,
Linux can't enter system suspend and S0ix power saving modes once
this keyboard is used.
This patch tries to introduce USB_QUIRK_IGNORE_REMOTE_WAKEUP quirk.
With this quirk set, wakeup capability will be ignored during
device configure.
This patch could be back-ported to kernels as old as 2.6.39.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit bdf6fa52f01b941d4a80372d56de465bdbbd1d23 ]
Currently association restarts do not take into consideration the
state of the socket. When a restart happens, the current assocation
simply transitions into established state. This creates a condition
where a remote system, through a the restart procedure, may create a
local association that is no way reachable by user. The conditions
to trigger this are as follows:
1) Remote does not acknoledge some data causing data to remain
outstanding.
2) Local application calls close() on the socket. Since data
is still outstanding, the association is placed in SHUTDOWN_PENDING
state. However, the socket is closed.
3) The remote tries to create a new association, triggering a restart
on the local system. The association moves from SHUTDOWN_PENDING
to ESTABLISHED. At this point, it is no longer reachable by
any socket on the local system.
This patch addresses the above situation by moving the newly ESTABLISHED
association into SHUTDOWN-SENT state and bundling a SHUTDOWN after
the COOKIE-ACK chunk. This way, the restarted associate immidiately
enters the shutdown procedure and forces the termination of the
unreachable association.
Reported-by: David Laight <David.Laight@aculab.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 4fab9071950c2021d846e18351e0f46a1cffd67b ]
Make sure we use the correct address-family-specific function for
handling MTU reductions from within tcp_release_cb().
Previously AF_INET6 sockets were incorrectly always using the IPv6
code path when sometimes they were handling IPv4 traffic and thus had
an IPv4 dst.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Diagnosed-by: Willem de Bruijn <willemb@google.com>
Fixes: 563d34d057862 ("tcp: dont drop MTU reduction indications")
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d78c9300c51d6ceed9f6d078d4e9366f259de28c upstream.
timeval_to_jiffies tried to round a timeval up to an integral number
of jiffies, but the logic for doing so was incorrect: intervals
corresponding to exactly N jiffies would become N+1. This manifested
itself particularly repeatedly stopping/starting an itimer:
setitimer(ITIMER_PROF, &val, NULL);
setitimer(ITIMER_PROF, NULL, &val);
would add a full tick to val, _even if it was exactly representable in
terms of jiffies_ (say, the result of a previous rounding.) Doing
this repeatedly would cause unbounded growth in val. So fix the math.
Here's what was wrong with the conversion: we essentially computed
(eliding seconds)
jiffies = usec * (NSEC_PER_USEC/TICK_NSEC)
by using scaling arithmetic, which took the best approximation of
NSEC_PER_USEC/TICK_NSEC with denominator of 2^USEC_JIFFIE_SC =
x/(2^USEC_JIFFIE_SC), and computed:
jiffies = (usec * x) >> USEC_JIFFIE_SC
and rounded this calculation up in the intermediate form (since we
can't necessarily exactly represent TICK_NSEC in usec.) But the
scaling arithmetic is a (very slight) *over*approximation of the true
value; that is, instead of dividing by (1 usec/ 1 jiffie), we
effectively divided by (1 usec/1 jiffie)-epsilon (rounding
down). This would normally be fine, but we want to round timeouts up,
and we did so by adding 2^USEC_JIFFIE_SC - 1 before the shift; this
would be fine if our division was exact, but dividing this by the
slightly smaller factor was equivalent to adding just _over_ 1 to the
final result (instead of just _under_ 1, as desired.)
In particular, with HZ=1000, we consistently computed that 10000 usec
was 11 jiffies; the same was true for any exact multiple of
TICK_NSEC.
We could possibly still round in the intermediate form, adding
something less than 2^USEC_JIFFIE_SC - 1, but easier still is to
convert usec->nsec, round in nanoseconds, and then convert using
time*spec*_to_jiffies. This adds one constant multiplication, and is
not observably slower in microbenchmarks on recent x86 hardware.
Tested: the following program:
int main() {
struct itimerval zero = {{0, 0}, {0, 0}};
/* Initially set to 10 ms. */
struct itimerval initial = zero;
initial.it_interval.tv_usec = 10000;
setitimer(ITIMER_PROF, &initial, NULL);
/* Save and restore several times. */
for (size_t i = 0; i < 10; ++i) {
struct itimerval prev;
setitimer(ITIMER_PROF, &zero, &prev);
/* on old kernels, this goes up by TICK_USEC every iteration */
printf("previous value: %ld %ld %ld %ld\n",
prev.it_interval.tv_sec, prev.it_interval.tv_usec,
prev.it_value.tv_sec, prev.it_value.tv_usec);
setitimer(ITIMER_PROF, &prev, NULL);
}
return 0;
}
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Paul Turner <pjt@google.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Reviewed-by: Paul Turner <pjt@google.com>
Reported-by: Aaron Jacobs <jacobsa@google.com>
Signed-off-by: Andrew Hunter <ahh@google.com>
[jstultz: Tweaked to apply to 3.17-rc]
Signed-off-by: John Stultz <john.stultz@linaro.org>
[bwh: Backported to 3.16: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 58d75f4b1ce26324b4d809b18f94819843a98731 upstream.
The recent conversion of saa7134 to vb2 unconvered a poll() bug that
broke the teletext applications alevt and mtt. These applications
expect that calling poll() without having called VIDIOC_STREAMON will
cause poll() to return POLLERR. That did not happen in vb2.
This patch fixes that behavior. It also fixes what should happen when
poll() is called when STREAMON is called but no buffers have been
queued. In that case poll() will also return POLLERR, but only for
capture queues since output queues will always return POLLOUT
anyway in that situation.
This brings the vb2 behavior in line with the old videobuf behavior.
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 0c740d0afc3bff0a097ad03a1c8df92757516f5c upstream.
while_each_thread() and next_thread() should die, almost every lockless
usage is wrong.
1. Unless g == current, the lockless while_each_thread() is not safe.
while_each_thread(g, t) can loop forever if g exits, next_thread()
can't reach the unhashed thread in this case. Note that this can
happen even if g is the group leader, it can exec.
2. Even if while_each_thread() itself was correct, people often use
it wrongly.
It was never safe to just take rcu_read_lock() and loop unless
you verify that pid_alive(g) == T, even the first next_thread()
can point to the already freed/reused memory.
This patch adds signal_struct->thread_head and task->thread_node to
create the normal rcu-safe list with the stable head. The new
for_each_thread(g, t) helper is always safe under rcu_read_lock() as
long as this task_struct can't go away.
Note: of course it is ugly to have both task_struct->thread_node and the
old task_struct->thread_group, we will kill it later, after we change
the users of while_each_thread() to use for_each_thread().
Perhaps we can kill it even before we convert all users, we can
reimplement next_thread(t) using the new thread_head/thread_node. But
we can't do this right now because this will lead to subtle behavioural
changes. For example, do/while_each_thread() always sees at least one
task, while for_each_thread() can do nothing if the whole thread group
has died. Or thread_group_empty(), currently its semantics is not clear
unless thread_group_leader(p) and we need to audit the callers before we
can change it.
So this patch adds the new interface which has to coexist with the old
one for some time, hopefully the next changes will be more or less
straightforward and the old one will go away soon.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Sergey Dyasly <dserrg@gmail.com>
Tested-by: Sergey Dyasly <dserrg@gmail.com>
Reviewed-by: Sameer Nanda <snanda@chromium.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mandeep Singh Baines <msb@chromium.org>
Cc: "Ma, Xindong" <xindong.ma@intel.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: "Tu, Xiaobing" <xiaobing.tu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Li Zefan <lizefan@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e09c2c295468476a239d13324ce9042ec4de05eb upstream.
create_singlethread_workqueue() is a compat interface for single
threaded workqueue which maps to ordered workqueue w/ rescuer in the
current implementation. create_singlethread_workqueue() currently
implemented by invoking alloc_workqueue() w/ appropriate parameters.
8719dceae2f9 ("workqueue: reject adjusting max_active or applying
attrs to ordered workqueues") introduced __WQ_ORDERED to protect
ordered workqueues against dynamic attribute changes which can break
ordering guarantees but forgot to apply it to
create_singlethread_workqueue(). This in itself is okay as nobody
currently uses dynamic attribute change on workqueues created with
create_singlethread_workqueue().
However, 4c16bd327c ("workqueue: implement NUMA affinity for unbound
workqueues") broke singlethreaded guarantee for ordered workqueues
through allocating a separate pool_workqueue on each NUMA node by
default. A later change 8a2b75384444 ("workqueue: fix ordered
workqueues in NUMA setups") fixed it by allocating only one global
pool_workqueue if __WQ_ORDERED is set.
Combined, the __WQ_ORDERED omission in create_singlethread_workqueue()
became critical breaking its single threadedness and ordering
guarantee.
Let's make create_singlethread_workqueue() wrap
alloc_ordered_workqueue() instead so that it inherits __WQ_ORDERED and
can implicitly track future ordered_workqueue changes.
v2: I missed that __WQ_ORDERED now protects against pwq splitting
across NUMA nodes and incorrectly described the patch as a
nice-to-have fix to protect against future dynamic attribute
usages. Oleg pointed out that this is actually a critical
breakage due to 8a2b75384444 ("workqueue: fix ordered workqueues
in NUMA setups").
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Mike Anderson <mike.anderson@us.ibm.com>
Cc: Oleg Nesterov <onestero@redhat.com>
Cc: Gustavo Luiz Duarte <gduarte@redhat.com>
Cc: Tomas Henzl <thenzl@redhat.com>
Fixes: 4c16bd327c ("workqueue: implement NUMA affinity for unbound workqueues")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit f153566570fb9e32c2f59182883f4f66048788fb upstream.
Instead of a void function, return the trigger pointer.
Whilst not in of itself a fix, this makes the following set of
7 fixes cleaner than they would otherwise be.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|