Age | Commit message (Collapse) | Author | Files | Lines |
|
Change-Id: If95b79f560122df83a67cc770be0a4b4522d619f
|
|
Change-Id: Iecd2951a67252f5b411de16e4207d87592079d97
Signed-off-by: Chanho Park <chanho61.park@samsung.com>
|
|
Signed-off-by: Chanho Park <chanho61.park@samsung.com>
|
|
Signed-off-by: Chanho Park <chanho61.park@samsung.com>
|
|
Change-Id: I4389736181c63ae5af670695784cedd21631ba89
Signed-off-by: Chanho Park <chanho61.park@samsung.com>
|
|
Signed-off-by: Chanho Park <chanho61.park@samsung.com>
|
|
Signed-off-by: Chanho Park <chanho61.park@samsung.com>
|
|
PL011 can now use DMA. TX channel is always used, RX usage depends
on the DMA engine's ability to process residual data (absent in
Juno). To cater for RX channel not being activated we als enable
to automatic polling.
Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com>
|
|
Add equivalent attributes to those provided in the platform data
for use when RX DMA is enabled.
Signed-off-by: Andrew Jackson <Andrew.Jackson@arm.com>
|
|
If the DMA engine doesn't support residue processing then the RX DMA
handling won't work terribly well if polling is enabled. So, disable
RX DMA if residue handling isn't available.
Signed-off-by: Andrew Jackson <Andrew.Jackson@arm.com>
|
|
The DMA engines on some systems require that the dma_length is set
when using scatter gather lists.
Signed-off-by: Andrew Jackson <Andrew.Jackson@arm.com>
|
|
The existing code assumed that all the bytes in the
buffer are consumed in a call. This isn't the case
since the count is adjusted.
Signed-off-by: Andrew Jackson <Andrew.Jackson@arm.com>
|
|
Allocating with __GFP_DMA avoids the need for bounce buffers
Signed-off-by: Andrew Jackson <Andrew.Jackson@arm.com>
|
|
The MFIFO is shared by all channels so restrict each memcpy to it's fair
share. This is being over cautious, but without a global view of DMA
channel usage on a system it's not possible to come up with a more
optimum safe limit.
Signed-off-by: Jon Medhurst <tixy@linaro.org>
|
|
The algorithm used for programming the DMA Controller doesn't take into
consideration the requirements of transfers that are not aligned to the
bus width. Work around this by making sure we pick a bust size and
length which ensures no bursts straddle an MFIFO entry.
See "MFIFO Usage Overview" chapter in the the TRM for "CoreLink DMA
Controller DMA-330", Revision r1p1.
Signed-off-by: Jon Medhurst <tixy@linaro.org>
|
|
Neither CMA nor noncoherent allocations support atomic allocations.
Add a dedicated atomic pool to support this.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Riley <davidriley@chromium.org>
Cc: Olof Johansson <olof@lixom.net>
Cc: Ritesh Harjain <ritesh.harjani@gmail.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
ARM currently uses a bitmap for tracking atomic allocations. genalloc
already handles this type of memory pool allocation so switch to using
that instead.
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Riley <davidriley@chromium.org>
Cc: Olof Johansson <olof@lixom.net>
Cc: Ritesh Harjain <ritesh.harjani@gmail.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
For architectures without coherent DMA, memory for DMA may need to be
remapped with coherent attributes. Factor out the the remapping code from
arm and put it in a common location to reduce code duplication.
As part of this, the arm APIs are now migrated away from
ioremap_page_range to the common APIs which use map_vm_area for remapping.
This should be an equivalent change and using map_vm_area is more correct
as ioremap_page_range is intended to bring in io addresses into the cpu
space and not regular kernel managed memory.
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Riley <davidriley@chromium.org>
Cc: Olof Johansson <olof@lixom.net>
Cc: Ritesh Harjain <ritesh.harjani@gmail.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Laura Abbott <lauraa@codeaurora.org>
Cc: Mitchel Humpherys <mitchelh@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
After allocating an address from a particular genpool, there is no good
way to verify if that address actually belongs to a genpool. Introduce
addr_in_gen_pool which will return if an address plus size falls
completely within the genpool range.
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Olof Johansson <olof@lixom.net>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Riley <davidriley@chromium.org>
Cc: Ritesh Harjain <ritesh.harjani@gmail.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
One of the more common algorithms used for allocation is to align the
start address of the allocation to the order of size requested. Add this
as an algorithm option for genalloc.
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Olof Johansson <olof@lixom.net>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Riley <davidriley@chromium.org>
Cc: Ritesh Harjain <ritesh.harjani@gmail.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Currently map_vm_area() takes (struct page *** pages) as third argument,
and after mapping, it moves (*pages) to point to (*pages +
nr_mappped_pages).
It looks like this kind of increment is useless to its caller these
days. The callers don't care about the increments and actually they're
trying to avoid this by passing another copy to map_vm_area().
The caller can always guarantee all the pages can be mapped into vm_area
as specified in first argument and the caller only cares about whether
map_vm_area() fails or not.
This patch cleans up the pointer movement in map_vm_area() and updates
its callers accordingly.
Signed-off-by: WANG Chao <chaowang@redhat.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The arm64 architecture has the ability to exclusively load and store
a pair of registers from an address (ldxp/stxp). Also the SLUB can take
advantage of a cmpxchg_double implementation to avoid taking some
locks.
This patch provides an implementation of cmpxchg_double for 64-bit
pairs, and activates the logic required for the SLUB to use these
functions (HAVE_ALIGNED_STRUCT_PAGE and HAVE_CMPXCHG_DOUBLE).
Also definitions of this_cpu_cmpxchg_8 and this_cpu_cmpxchg_double_8
are wired up to cmpxchg_local and cmpxchg_double_local (rather than the
stock implementations that perform non-atomic operations with
interrupts disabled) as they are used by the SLUB.
On a Juno platform running on only the A57s I get quite a noticeable
performance improvement with 5 runs of hackbench on v3.17:
Baseline | With Patch
-----------------+-----------
Mean 119.2312 | 106.1782
StdDev 0.4919 | 0.4494
(times taken to complete `./hackbench 100 process 1000', in seconds)
Signed-off-by: Steve Capper <steve.capper@linaro.org>
|
|
Steve Capper has posted a v2 of the patch that gives a better path
performance.
This reverts commit 64b57f9219ea2f251c4c3c43c3b36187287d7f9d.
Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com>
|
|
add SLUB debugging."
This reverts commit 86fa18f9ea6d4b168323d30dd5ee681a4aa7383d.
Performance with Android benchmarks drops to 50% with SLUB debugging ON
by default.
Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com>
|
|
debugging.
The reduced debug info trips the debugging tools that rely on certain
information being present in the debug build in order to figure out
the OS support structures. Also, SLUB debugging helps trap a corner
case where a race condition exists in the SLUB allocator when is
updating the sysfs statistics information and it races with a
SLUB deallocator over maintainance of the freelist pointer.
Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com>
|
|
The hdlcd_show_underrun_count() function had a double pointer as second
argument when the correct version should be with a simple pointer. GCC 4.9
throws a warning that was finally noticed and this is the fix.
Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com>
|
|
To support both Clang and GCC, use the global stack register variable vs
a local register variable.
Author: Mark Charlebois <charlebm@gmail.com>
Signed-off-by: Mark Charlebois <charlebm@gmail.com>
Signed-off-by: Behan Webster <behanw@converseincode.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Use the global current_stack_pointer to get the value of the stack pointer.
This change supports being able to compile the kernel with both gcc and clang.
Signed-off-by: Behan Webster <behanw@converseincode.com>
Signed-off-by: Mark Charlebois <charlebm@gmail.com>
Reviewed-by: Jan-Simon Möller <dl9pf@gmx.de>
Reviewed-by: Olof Johansson <olof@lixom.net>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Define a global named register for current_stack_pointer. The use of this new
variable guarantees that both gcc and clang can access this register in C code.
Signed-off-by: Behan Webster <behanw@converseincode.com>
Reviewed-by: Jan-Simon Möller <dl9pf@gmx.de>
Reviewed-by: Mark Charlebois <charlebm@gmail.com>
Reviewed-by: Olof Johansson <olof@lixom.net>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Commit b0c29f79ecea (futexes: Avoid taking the hb->lock if there's
nothing to wake up) changes the futex code to avoid taking a lock when
there are no waiters. This code has been subsequently fixed in commit
11d4616bd07f (futex: revert back to the explicit waiter counting code).
Both the original commit and the fix-up rely on get_futex_key_refs() to
always imply a barrier.
However, for private futexes, none of the cases in the switch statement
of get_futex_key_refs() would be hit and the function completes without
a memory barrier as required before checking the "waiters" in
futex_wake() -> hb_waiters_pending(). The consequence is a race with a
thread waiting on a futex on another CPU, allowing the waker thread to
read "waiters == 0" while the waiter thread to have read "futex_val ==
locked" (in kernel).
Without this fix, the problem (user space deadlocks) can be seen with
Android bionic's mutex implementation on an arm64 multi-cluster system.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Matteo Franchin <Matteo.Franchin@arm.com>
Fixes: b0c29f79ecea (futexes: Avoid taking the hb->lock if there's nothing to wake up)
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Tested-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: <stable@vger.kernel.org>
Cc: Darren Hart <dvhart@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When a kmem_cache is created with ctor, each object in the kmem_cache
will be initialized before ready to use. While in slub implementation,
the first object will be initialized twice.
This patch reduces the duplication of initialization of the first
object.
Fix commit 7656c72b ("SLUB: add macros for scanning objects in a slab").
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
There used to be only one path out of __slab_alloc(), and ALLOC_SLOWPATH
got bumped in that exit path. Now there are two, and a bunch of gotos.
ALLOC_SLOWPATH can now get set more than once during a single call to
__slab_alloc() which is pretty bogus. Here's the sequence:
1. Enter __slab_alloc(), fall through all the way to the
stat(s, ALLOC_SLOWPATH);
2. hit 'if (!freelist)', and bump DEACTIVATE_BYPASS, jump to
new_slab (goto #1)
3. Hit 'if (c->partial)', bump CPU_PARTIAL_ALLOC, goto redo
(goto #2)
4. Fall through in the same path we did before all the way to
stat(s, ALLOC_SLOWPATH)
5. bump ALLOC_REFILL stat, then return
Doing this is obviously bogus. It keeps us from being able to
accurately compare ALLOC_SLOWPATH vs. ALLOC_FASTPATH. It also means
that the total number of allocs always exceeds the total number of
frees.
This patch moves stat(s, ALLOC_SLOWPATH) to be called from the same
place that __slab_alloc() is. This makes it much less likely that
ALLOC_SLOWPATH will get botched again in the spaghetti-code inside
__slab_alloc().
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
a pair of registers from an address (ldxp/stxp). Also the SLUB can take
advantage of a cmpxchg_double implementation to avoid taking some
locks.
This patch provides an implementation of cmpxchg_double for 64-bit
pairs, and activates the logic required for the SLUB to use these
functions (HAVE_ALIGNED_STRUCT_PAGE and HAVE_CMPXCHG_DOUBLE).
On a Juno platform running on only the A57s I get quite a noticeable
performance improvement with hackbench.
Before patch applied:
$ ./hackbench 100 process 1000
Running with 100*40 (== 4000) tasks.
Time: 206.331
After patch applied:
$ ./hackbench 100 process 1000
Running with 100*40 (== 4000) tasks.
Time: 182.396
Signed-off-by: Steve Capper <steve.capper@linaro.org>
|
|
Replace places where __get_cpu_var() is used for an address calculation
with this_cpu_ptr().
Signed-off-by: Christoph Lameter <cl@linux.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
More default options needed by the GTS test suite.
Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com>
|
|
These options are needed by the GTS validation suite in order
to confirm the GTS functionality.
Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com>
|
|
The DT set the upper limit for the pixel clock to 210MHz. While
technically the TDA19988 chip works at that frequency, it is
outside the spec sheet values. Restrict the clock range to the
published values.
Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com>
|
|
Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com>
|
|
This patch adds a basic device tree node for the CCI-400
interconnect, so that the system profiler driver can refer to it.
This node is provisional and may change. For now it serves only to
provide the CCI-400 programmer's view register block base address
and size.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
|
|
The ARM System Profiler driver exposes System Profiler registers via.
debugfs files. The user space can program and later capture metrics from
System Profiler by accessing these files.
This script stands as an example for the System Profiler's usage.
|
|
This patch adds a device node for the system profiler on Juno.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
|
|
This patch implements a basic driver for the ARM System Profiler
The System Profiler registers are exposed symbolically via debugfs,
allowing root do manipulate and experiment with the profiler.
If an interrupt is defined for the profiler in the device tree and
ftrace is enabled, then captured data can be streamed out via trace
events.
The current user interface is minimal, with virtually no abstraction.
Future versions of the driver may substantially change the interface,
so it should be considered experimental for now.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
|
|
The I2C bus driver has problems under heavy load to keep the FIFO
filled. Combined with the auto-STOP setting baked into the hardware,
it spells disaster for conversations with HDMI chip, specially
around the boot time when we want to retrieve the EDID information.
Change the bus speed to 100kHz to increase the chance of servicing
the "FIFO empty" interrupt in time.
Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com>
|
|
The Designware I2C driver does not currently support changing the bus
speed even if the hardware is capable of switching to a slower frequency.
Add support for passing the clock frequency info in the device tree.
Only standard (100kHz) or fast (400kHz) speeds are supported.
Signed-off-by: Andrew Jackson <Andrew.Jackson@arm.com>
Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com>
|
|
If the FIFOs aren't flushed, the left/right channels may be swapped:
this may occur if the FIFOs are not empty when the streams start.
Signed-off-by: Andrew Jackson <Andrew.Jackson@arm.com>
|
|
Enable PL330 DMA engine and Designware I2S support in the defconfig.
However, DMA and PL011 UART doesn't work, so comment out the DMA
channels in PL011 to work around hang in the boot.
Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com>
|
|
Signed-off-by: Andrew Jackson <Andrew.Jackson@arm.com>
|
|
Signed-off-by: Andrew Jackson <Andrew.Jackson@arm.com>
|
|
Convert to driver to use either platform_data or device-tree for configuration
of the device. When using device-tree, the I2S block's configuration is read
from the relevant registers: this reduces the amount of information required in
the device tree.
Signed-off-by: Andrew Jackson <Andrew.Jackson@arm.com>
|
|
On the Designware core, the channels are independent and not combined
in higher registers. So as more channels are added, more registers need
to be updated.
Signed-off-by: Andrew Jackson <Andrew.Jackson@arm.com>
|