summaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/ABI/testing/sysfs-bus-rbd7
-rw-r--r--Documentation/DocBook/debugobjects.tmpl50
-rw-r--r--Documentation/RCU/checklist.txt6
-rw-r--r--Documentation/RCU/rcu.txt10
-rw-r--r--Documentation/RCU/stallwarn.txt16
-rw-r--r--Documentation/RCU/torture.txt13
-rw-r--r--Documentation/RCU/trace.txt4
-rw-r--r--Documentation/RCU/whatisRCU.txt19
-rw-r--r--Documentation/atomic_ops.txt87
-rw-r--r--Documentation/cgroups/memory.txt28
-rw-r--r--Documentation/cgroups/net_prio.txt53
-rw-r--r--Documentation/devicetree/bindings/net/calxeda-xgmac.txt15
-rw-r--r--Documentation/devicetree/bindings/net/can/cc770.txt53
-rw-r--r--Documentation/devicetree/bindings/vendor-prefixes.txt1
-rw-r--r--Documentation/feature-removal-schedule.txt3
-rw-r--r--Documentation/filesystems/btrfs.txt4
-rw-r--r--Documentation/kernel-parameters.txt18
-rw-r--r--Documentation/lockdep-design.txt63
-rw-r--r--Documentation/networking/00-INDEX2
-rw-r--r--Documentation/networking/batman-adv.txt7
-rw-r--r--Documentation/networking/bonding.txt17
-rw-r--r--Documentation/networking/ieee802154.txt27
-rw-r--r--Documentation/networking/ifenslave.c2
-rw-r--r--Documentation/networking/ip-sysctl.txt23
-rw-r--r--Documentation/networking/openvswitch.txt195
-rw-r--r--Documentation/networking/packet_mmap.txt2
-rw-r--r--Documentation/networking/scaling.txt8
-rw-r--r--Documentation/networking/stmmac.txt16
-rw-r--r--Documentation/networking/team.txt2
-rw-r--r--Documentation/power/devices.txt111
-rw-r--r--Documentation/power/runtime_pm.txt40
-rw-r--r--Documentation/sound/alsa/soc/machine.txt6
-rw-r--r--Documentation/trace/events.txt2
-rw-r--r--Documentation/usb/linux-cdc-acm.inf4
-rw-r--r--Documentation/virtual/kvm/api.txt16
35 files changed, 804 insertions, 126 deletions
diff --git a/Documentation/ABI/testing/sysfs-bus-rbd b/Documentation/ABI/testing/sysfs-bus-rbd
index fa72ccb2282..dbedafb095e 100644
--- a/Documentation/ABI/testing/sysfs-bus-rbd
+++ b/Documentation/ABI/testing/sysfs-bus-rbd
@@ -57,13 +57,6 @@ create_snap
$ echo <snap-name> > /sys/bus/rbd/devices/<dev-id>/snap_create
-rollback_snap
-
- Rolls back data to the specified snapshot. This goes over the entire
- list of rados blocks and sends a rollback command to each.
-
- $ echo <snap-name> > /sys/bus/rbd/devices/<dev-id>/snap_rollback
-
snap_*
A directory per each snapshot
diff --git a/Documentation/DocBook/debugobjects.tmpl b/Documentation/DocBook/debugobjects.tmpl
index 08ff908aa7a..24979f691e3 100644
--- a/Documentation/DocBook/debugobjects.tmpl
+++ b/Documentation/DocBook/debugobjects.tmpl
@@ -96,6 +96,7 @@
<listitem><para>debug_object_deactivate</para></listitem>
<listitem><para>debug_object_destroy</para></listitem>
<listitem><para>debug_object_free</para></listitem>
+ <listitem><para>debug_object_assert_init</para></listitem>
</itemizedlist>
Each of these functions takes the address of the real object and
a pointer to the object type specific debug description
@@ -273,6 +274,26 @@
debug checks.
</para>
</sect1>
+
+ <sect1 id="debug_object_assert_init">
+ <title>debug_object_assert_init</title>
+ <para>
+ This function is called to assert that an object has been
+ initialized.
+ </para>
+ <para>
+ When the real object is not tracked by debugobjects, it calls
+ fixup_assert_init of the object type description structure
+ provided by the caller, with the hardcoded object state
+ ODEBUG_NOT_AVAILABLE. The fixup function can correct the problem
+ by calling debug_object_init and other specific initializing
+ functions.
+ </para>
+ <para>
+ When the real object is already tracked by debugobjects it is
+ ignored.
+ </para>
+ </sect1>
</chapter>
<chapter id="fixupfunctions">
<title>Fixup functions</title>
@@ -381,6 +402,35 @@
statistics.
</para>
</sect1>
+ <sect1 id="fixup_assert_init">
+ <title>fixup_assert_init</title>
+ <para>
+ This function is called from the debug code whenever a problem
+ in debug_object_assert_init is detected.
+ </para>
+ <para>
+ Called from debug_object_assert_init() with a hardcoded state
+ ODEBUG_STATE_NOTAVAILABLE when the object is not found in the
+ debug bucket.
+ </para>
+ <para>
+ The function returns 1 when the fixup was successful,
+ otherwise 0. The return value is used to update the
+ statistics.
+ </para>
+ <para>
+ Note, this function should make sure debug_object_init() is
+ called before returning.
+ </para>
+ <para>
+ The handling of statically initialized objects is a special
+ case. The fixup function should check if this is a legitimate
+ case of a statically initialized object or not. In this case only
+ debug_object_init() should be called to make the object known to
+ the tracker. Then the function should return 0 because this is not
+ a real fixup.
+ </para>
+ </sect1>
</chapter>
<chapter id="bugs">
<title>Known Bugs And Assumptions</title>
diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt
index 0c134f8afc6..bff2d8be1e1 100644
--- a/Documentation/RCU/checklist.txt
+++ b/Documentation/RCU/checklist.txt
@@ -328,6 +328,12 @@ over a rather long period of time, but improvements are always welcome!
RCU rather than SRCU, because RCU is almost always faster and
easier to use than is SRCU.
+ If you need to enter your read-side critical section in a
+ hardirq or exception handler, and then exit that same read-side
+ critical section in the task that was interrupted, then you need
+ to srcu_read_lock_raw() and srcu_read_unlock_raw(), which avoid
+ the lockdep checking that would otherwise this practice illegal.
+
Also unlike other forms of RCU, explicit initialization
and cleanup is required via init_srcu_struct() and
cleanup_srcu_struct(). These are passed a "struct srcu_struct"
diff --git a/Documentation/RCU/rcu.txt b/Documentation/RCU/rcu.txt
index 31852705b58..bf778332a28 100644
--- a/Documentation/RCU/rcu.txt
+++ b/Documentation/RCU/rcu.txt
@@ -38,11 +38,11 @@ o How can the updater tell when a grace period has completed
Preemptible variants of RCU (CONFIG_TREE_PREEMPT_RCU) get the
same effect, but require that the readers manipulate CPU-local
- counters. These counters allow limited types of blocking
- within RCU read-side critical sections. SRCU also uses
- CPU-local counters, and permits general blocking within
- RCU read-side critical sections. These two variants of
- RCU detect grace periods by sampling these counters.
+ counters. These counters allow limited types of blocking within
+ RCU read-side critical sections. SRCU also uses CPU-local
+ counters, and permits general blocking within RCU read-side
+ critical sections. These variants of RCU detect grace periods
+ by sampling these counters.
o If I am running on a uniprocessor kernel, which can only do one
thing at a time, why should I wait for a grace period?
diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt
index 4e959208f73..083d88cbc08 100644
--- a/Documentation/RCU/stallwarn.txt
+++ b/Documentation/RCU/stallwarn.txt
@@ -101,6 +101,11 @@ o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that
CONFIG_TREE_PREEMPT_RCU case, you might see stall-warning
messages.
+o A hardware or software issue shuts off the scheduler-clock
+ interrupt on a CPU that is not in dyntick-idle mode. This
+ problem really has happened, and seems to be most likely to
+ result in RCU CPU stall warnings for CONFIG_NO_HZ=n kernels.
+
o A bug in the RCU implementation.
o A hardware failure. This is quite unlikely, but has occurred
@@ -109,12 +114,11 @@ o A hardware failure. This is quite unlikely, but has occurred
This resulted in a series of RCU CPU stall warnings, eventually
leading the realization that the CPU had failed.
-The RCU, RCU-sched, and RCU-bh implementations have CPU stall
-warning. SRCU does not have its own CPU stall warnings, but its
-calls to synchronize_sched() will result in RCU-sched detecting
-RCU-sched-related CPU stalls. Please note that RCU only detects
-CPU stalls when there is a grace period in progress. No grace period,
-no CPU stall warnings.
+The RCU, RCU-sched, and RCU-bh implementations have CPU stall warning.
+SRCU does not have its own CPU stall warnings, but its calls to
+synchronize_sched() will result in RCU-sched detecting RCU-sched-related
+CPU stalls. Please note that RCU only detects CPU stalls when there is
+a grace period in progress. No grace period, no CPU stall warnings.
To diagnose the cause of the stall, inspect the stack traces.
The offending function will usually be near the top of the stack.
diff --git a/Documentation/RCU/torture.txt b/Documentation/RCU/torture.txt
index 783d6c134d3..d67068d0d2b 100644
--- a/Documentation/RCU/torture.txt
+++ b/Documentation/RCU/torture.txt
@@ -61,11 +61,24 @@ nreaders This is the number of RCU reading threads supported.
To properly exercise RCU implementations with preemptible
read-side critical sections.
+onoff_interval
+ The number of seconds between each attempt to execute a
+ randomly selected CPU-hotplug operation. Defaults to
+ zero, which disables CPU hotplugging. In HOTPLUG_CPU=n
+ kernels, rcutorture will silently refuse to do any
+ CPU-hotplug operations regardless of what value is
+ specified for onoff_interval.
+
shuffle_interval
The number of seconds to keep the test threads affinitied
to a particular subset of the CPUs, defaults to 3 seconds.
Used in conjunction with test_no_idle_hz.
+shutdown_secs The number of seconds to run the test before terminating
+ the test and powering off the system. The default is
+ zero, which disables test termination and system shutdown.
+ This capability is useful for automated testing.
+
stat_interval The number of seconds between output of torture
statistics (via printk()). Regardless of the interval,
statistics are printed when the module is unloaded.
diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt
index aaf65f6c6cd..49587abfc2f 100644
--- a/Documentation/RCU/trace.txt
+++ b/Documentation/RCU/trace.txt
@@ -105,14 +105,10 @@ o "dt" is the current value of the dyntick counter that is incremented
or one greater than the interrupt-nesting depth otherwise.
The number after the second "/" is the NMI nesting depth.
- This field is displayed only for CONFIG_NO_HZ kernels.
-
o "df" is the number of times that some other CPU has forced a
quiescent state on behalf of this CPU due to this CPU being in
dynticks-idle state.
- This field is displayed only for CONFIG_NO_HZ kernels.
-
o "of" is the number of times that some other CPU has forced a
quiescent state on behalf of this CPU due to this CPU being
offline. In a perfect world, this might never happen, but it
diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt
index 6ef692667e2..6bbe8dcdc3d 100644
--- a/Documentation/RCU/whatisRCU.txt
+++ b/Documentation/RCU/whatisRCU.txt
@@ -4,6 +4,7 @@ to start learning about RCU:
1. What is RCU, Fundamentally? http://lwn.net/Articles/262464/
2. What is RCU? Part 2: Usage http://lwn.net/Articles/263130/
3. RCU part 3: the RCU API http://lwn.net/Articles/264090/
+4. The RCU API, 2010 Edition http://lwn.net/Articles/418853/
What is RCU?
@@ -834,6 +835,8 @@ SRCU: Critical sections Grace period Barrier
srcu_read_lock synchronize_srcu N/A
srcu_read_unlock synchronize_srcu_expedited
+ srcu_read_lock_raw
+ srcu_read_unlock_raw
srcu_dereference
SRCU: Initialization/cleanup
@@ -855,27 +858,33 @@ list can be helpful:
a. Will readers need to block? If so, you need SRCU.
-b. What about the -rt patchset? If readers would need to block
+b. Is it necessary to start a read-side critical section in a
+ hardirq handler or exception handler, and then to complete
+ this read-side critical section in the task that was
+ interrupted? If so, you need SRCU's srcu_read_lock_raw() and
+ srcu_read_unlock_raw() primitives.
+
+c. What about the -rt patchset? If readers would need to block
in an non-rt kernel, you need SRCU. If readers would block
in a -rt kernel, but not in a non-rt kernel, SRCU is not
necessary.
-c. Do you need to treat NMI handlers, hardirq handlers,
+d. Do you need to treat NMI handlers, hardirq handlers,
and code segments with preemption disabled (whether
via preempt_disable(), local_irq_save(), local_bh_disable(),
or some other mechanism) as if they were explicit RCU readers?
If so, you need RCU-sched.
-d. Do you need RCU grace periods to complete even in the face
+e. Do you need RCU grace periods to complete even in the face
of softirq monopolization of one or more of the CPUs? For
example, is your code subject to network-based denial-of-service
attacks? If so, you need RCU-bh.
-e. Is your workload too update-intensive for normal use of
+f. Is your workload too update-intensive for normal use of
RCU, but inappropriate for other synchronization mechanisms?
If so, consider SLAB_DESTROY_BY_RCU. But please be careful!
-f. Otherwise, use RCU.
+g. Otherwise, use RCU.
Of course, this all assumes that you have determined that RCU is in fact
the right tool for your job.
diff --git a/Documentation/atomic_ops.txt b/Documentation/atomic_ops.txt
index 3bd585b4492..27f2b21a9d5 100644
--- a/Documentation/atomic_ops.txt
+++ b/Documentation/atomic_ops.txt
@@ -84,6 +84,93 @@ compiler optimizes the section accessing atomic_t variables.
*** YOU HAVE BEEN WARNED! ***
+Properly aligned pointers, longs, ints, and chars (and unsigned
+equivalents) may be atomically loaded from and stored to in the same
+sense as described for atomic_read() and atomic_set(). The ACCESS_ONCE()
+macro should be used to prevent the compiler from using optimizations
+that might otherwise optimize accesses out of existence on the one hand,
+or that might create unsolicited accesses on the other.
+
+For example consider the following code:
+
+ while (a > 0)
+ do_something();
+
+If the compiler can prove that do_something() does not store to the
+variable a, then the compiler is within its rights transforming this to
+the following:
+
+ tmp = a;
+ if (a > 0)
+ for (;;)
+ do_something();
+
+If you don't want the compiler to do this (and you probably don't), then
+you should use something like the following:
+
+ while (ACCESS_ONCE(a) < 0)
+ do_something();
+
+Alternatively, you could place a barrier() call in the loop.
+
+For another example, consider the following code:
+
+ tmp_a = a;
+ do_something_with(tmp_a);
+ do_something_else_with(tmp_a);
+
+If the compiler can prove that do_something_with() does not store to the
+variable a, then the compiler is within its rights to manufacture an
+additional load as follows:
+
+ tmp_a = a;
+ do_something_with(tmp_a);
+ tmp_a = a;
+ do_something_else_with(tmp_a);
+
+This could fatally confuse your code if it expected the same value
+to be passed to do_something_with() and do_something_else_with().
+
+The compiler would be likely to manufacture this additional load if
+do_something_with() was an inline function that made very heavy use
+of registers: reloading from variable a could save a flush to the
+stack and later reload. To prevent the compiler from attacking your
+code in this manner, write the following:
+
+ tmp_a = ACCESS_ONCE(a);
+ do_something_with(tmp_a);
+ do_something_else_with(tmp_a);
+
+For a final example, consider the following code, assuming that the
+variable a is set at boot time before the second CPU is brought online
+and never changed later, so that memory barriers are not needed:
+
+ if (a)
+ b = 9;
+ else
+ b = 42;
+
+The compiler is within its rights to manufacture an additional store
+by transforming the above code into the following:
+
+ b = 42;
+ if (a)
+ b = 9;
+
+This could come as a fatal surprise to other code running concurrently
+that expected b to never have the value 42 if a was zero. To prevent
+the compiler from doing this, write something like:
+
+ if (a)
+ ACCESS_ONCE(b) = 9;
+ else
+ ACCESS_ONCE(b) = 42;
+
+Don't even -think- about doing this without proper use of memory barriers,
+locks, or atomic operations if variable a can change at runtime!
+
+*** WARNING: ACCESS_ONCE() DOES NOT IMPLY A BARRIER! ***
+
Now, we move onto the atomic operation interfaces typically implemented with
the help of assembly code.
diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index cc0ebc5241b..4d8774f6f48 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -44,8 +44,8 @@ Features:
- oom-killer disable knob and oom-notifier
- Root cgroup has no limit controls.
- Kernel memory and Hugepages are not under control yet. We just manage
- pages on LRU. To add more controls, we have to take care of performance.
+ Kernel memory support is work in progress, and the current version provides
+ basically functionality. (See Section 2.7)
Brief summary of control files.
@@ -72,6 +72,9 @@ Brief summary of control files.
memory.oom_control # set/show oom controls.
memory.numa_stat # show the number of memory usage per numa node
+ memory.kmem.tcp.limit_in_bytes # set/show hard limit for tcp buf memory
+ memory.kmem.tcp.usage_in_bytes # show current tcp buf memory allocation
+
1. History
The memory controller has a long history. A request for comments for the memory
@@ -255,6 +258,27 @@ When oom event notifier is registered, event will be delivered.
per-zone-per-cgroup LRU (cgroup's private LRU) is just guarded by
zone->lru_lock, it has no lock of its own.
+2.7 Kernel Memory Extension (CONFIG_CGROUP_MEM_RES_CTLR_KMEM)
+
+With the Kernel memory extension, the Memory Controller is able to limit
+the amount of kernel memory used by the system. Kernel memory is fundamentally
+different than user memory, since it can't be swapped out, which makes it
+possible to DoS the system by consuming too much of this precious resource.
+
+Kernel memory limits are not imposed for the root cgroup. Usage for the root
+cgroup may or may not be accounted.
+
+Currently no soft limit is implemented for kernel memory. It is future work
+to trigger slab reclaim when those limits are reached.
+
+2.7.1 Current Kernel Memory resources accounted
+
+* sockets memory pressure: some sockets protocols have memory pressure
+thresholds. The Memory Controller allows them to be controlled individually
+per cgroup, instead of globally.
+
+* tcp memory pressure: sockets memory pressure for the tcp protocol.
+
3. User Interface
0. Configuration
diff --git a/Documentation/cgroups/net_prio.txt b/Documentation/cgroups/net_prio.txt
new file mode 100644
index 00000000000..01b32263559
--- /dev/null
+++ b/Documentation/cgroups/net_prio.txt
@@ -0,0 +1,53 @@
+Network priority cgroup
+-------------------------
+
+The Network priority cgroup provides an interface to allow an administrator to
+dynamically set the priority of network traffic generated by various
+applications
+
+Nominally, an application would set the priority of its traffic via the
+SO_PRIORITY socket option. This however, is not always possible because:
+
+1) The application may not have been coded to set this value
+2) The priority of application traffic is often a site-specific administrative
+ decision rather than an application defined one.
+
+This cgroup allows an administrator to assign a process to a group which defines
+the priority of egress traffic on a given interface. Network priority groups can
+be created by first mounting the cgroup filesystem.
+
+# mount -t cgroup -onet_prio none /sys/fs/cgroup/net_prio
+
+With the above step, the initial group acting as the parent accounting group
+becomes visible at '/sys/fs/cgroup/net_prio'. This group includes all tasks in
+the system. '/sys/fs/cgroup/net_prio/tasks' lists the tasks in this cgroup.
+
+Each net_prio cgroup contains two files that are subsystem specific
+
+net_prio.prioidx
+This file is read-only, and is simply informative. It contains a unique integer
+value that the kernel uses as an internal representation of this cgroup.
+
+net_prio.ifpriomap
+This file contains a map of the priorities assigned to traffic originating from
+processes in this group and egressing the system on various interfaces. It
+contains a list of tuples in the form <ifname priority>. Contents of this file
+can be modified by echoing a string into the file using the same tuple format.
+for example:
+
+echo "eth0 5" > /sys/fs/cgroups/net_prio/iscsi/net_prio.ifpriomap
+
+This command would force any traffic originating from processes belonging to the
+iscsi net_prio cgroup and egressing on interface eth0 to have the priority of
+said traffic set to the value 5. The parent accounting group also has a
+writeable 'net_prio.ifpriomap' file that can be used to set a system default
+priority.
+
+Priorities are set immediately prior to queueing a frame to the device
+queueing discipline (qdisc) so priorities will be assigned prior to the hardware
+queue selection being made.
+
+One usage for the net_prio cgroup is with mqprio qdisc allowing application
+traffic to be steered to hardware/driver based traffic classes. These mappings
+can then be managed by administrators or other networking protocols such as
+DCBX.
diff --git a/Documentation/devicetree/bindings/net/calxeda-xgmac.txt b/Documentation/devicetree/bindings/net/calxeda-xgmac.txt
new file mode 100644
index 00000000000..411727a3f82
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/calxeda-xgmac.txt
@@ -0,0 +1,15 @@
+* Calxeda Highbank 10Gb XGMAC Ethernet
+
+Required properties:
+- compatible : Should be "calxeda,hb-xgmac"
+- reg : Address and length of the register set for the device
+- interrupts : Should contain 3 xgmac interrupts. The 1st is main interrupt.
+ The 2nd is pwr mgt interrupt. The 3rd is low power state interrupt.
+
+Example:
+
+ethernet@fff50000 {
+ compatible = "calxeda,hb-xgmac";
+ reg = <0xfff50000 0x1000>;
+ interrupts = <0 77 4 0 78 4 0 79 4>;
+};
diff --git a/Documentation/devicetree/bindings/net/can/cc770.txt b/Documentation/devicetree/bindings/net/can/cc770.txt
new file mode 100644
index 00000000000..77027bf6460
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/can/cc770.txt
@@ -0,0 +1,53 @@
+Memory mapped Bosch CC770 and Intel AN82527 CAN controller
+
+Note: The CC770 is a CAN controller from Bosch, which is 100%
+compatible with the old AN82527 from Intel, but with "bugs" being fixed.
+
+Required properties:
+
+- compatible : should be "bosch,cc770" for the CC770 and "intc,82527"
+ for the AN82527.
+
+- reg : should specify the chip select, address offset and size required
+ to map the registers of the controller. The size is usually 0x80.
+
+- interrupts : property with a value describing the interrupt source
+ (number and sensitivity) required for the controller.
+
+Optional properties:
+
+- bosch,external-clock-frequency : frequency of the external oscillator
+ clock in Hz. Note that the internal clock frequency used by the
+ controller is half of that value. If not specified, a default
+ value of 16000000 (16 MHz) is used.
+
+- bosch,clock-out-frequency : slock frequency in Hz on the CLKOUT pin.
+ If not specified or if the specified value is 0, the CLKOUT pin
+ will be disabled.
+
+- bosch,slew-rate : slew rate of the CLKOUT signal. If not specified,
+ a resonable value will be calculated.
+
+- bosch,disconnect-rx0-input : see data sheet.
+
+- bosch,disconnect-rx1-input : see data sheet.
+
+- bosch,disconnect-tx1-output : see data sheet.
+
+- bosch,polarity-dominant : see data sheet.
+
+- bosch,divide-memory-clock : see data sheet.
+
+- bosch,iso-low-speed-mux : see data sheet.
+
+For further information, please have a look to the CC770 or AN82527.
+
+Examples:
+
+can@3,100 {
+ compatible = "bosch,cc770";
+ reg = <3 0x100 0x80>;
+ interrupts = <2 0>;
+ interrupt-parent = <&mpic>;
+ bosch,external-clock-frequency = <16000000>;
+};
diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt b/Documentation/devicetree/bindings/vendor-prefixes.txt
index e8552782b44..874921e9780 100644
--- a/Documentation/devicetree/bindings/vendor-prefixes.txt
+++ b/Documentation/devicetree/bindings/vendor-prefixes.txt
@@ -33,6 +33,7 @@ qcom Qualcomm, Inc.
ramtron Ramtron International
samsung Samsung Semiconductor
schindler Schindler
+sil Silicon Image
simtek
sirf SiRF Technology, Inc.
stericsson ST-Ericsson
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index 3d849122b5b..33f7327d045 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -263,8 +263,7 @@ Who: Ravikiran Thirumalai <kiran@scalex86.org>
What: Code that is now under CONFIG_WIRELESS_EXT_SYSFS
(in net/core/net-sysfs.c)
-When: After the only user (hal) has seen a release with the patches
- for enough time, probably some time in 2010.
+When: 3.5
Why: Over 1K .text/.data size reduction, data is available in other
ways (ioctls)
Who: Johannes Berg <johannes@sipsolutions.net>
diff --git a/Documentation/filesystems/btrfs.txt b/Documentation/filesystems/btrfs.txt
index 64087c34327..7671352216f 100644
--- a/Documentation/filesystems/btrfs.txt
+++ b/Documentation/filesystems/btrfs.txt
@@ -63,8 +63,8 @@ IRC network.
Userspace tools for creating and manipulating Btrfs file systems are
available from the git repository at the following location:
- http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-progs-unstable.git
- git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git
+ http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-progs.git
+ git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git
These include the following tools:
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index a0c5c5f4fce..e229769606f 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -315,12 +315,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
CPU-intensive style benchmark, and it can vary highly in
a microbenchmark depending on workload and compiler.
- 1: only for 32-bit processes
- 2: only for 64-bit processes
+ 32: only for 32-bit processes
+ 64: only for 64-bit processes
on: enable for both 32- and 64-bit processes
off: disable for both 32- and 64-bit processes
- amd_iommu= [HW,X86-84]
+ amd_iommu= [HW,X86-64]
Pass parameters to the AMD IOMMU driver in the system.
Possible values are:
fullflush - enable flushing of IO/TLB entries when
@@ -1885,6 +1885,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
arch_perfmon: [X86] Force use of architectural
perfmon on Intel CPUs instead of the
CPU specific event set.
+ timer: [X86] Force use of architectural NMI
+ timer mode (see also oprofile.timer
+ for generic hr timer mode)
+ [s390] Force legacy basic mode sampling
+ (report cpu_type "timer")
oops=panic Always panic on oopses. Default is to just kill the
process, but there is a small probability of
@@ -2750,11 +2755,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
functions are at fixed addresses, they make nice
targets for exploits that can control RIP.
- emulate Vsyscalls turn into traps and are emulated
- reasonably safely.
+ emulate [default] Vsyscalls turn into traps and are
+ emulated reasonably safely.
- native [default] Vsyscalls are native syscall
- instructions.
+ native Vsyscalls are native syscall instructions.
This is a little bit faster than trapping
and makes a few dynamic recompilers work
better than they would in emulation mode.
diff --git a/Documentation/lockdep-design.txt b/Documentation/lockdep-design.txt
index abf768c681e..5dbc99c04f6 100644
--- a/Documentation/lockdep-design.txt
+++ b/Documentation/lockdep-design.txt
@@ -221,3 +221,66 @@ when the chain is validated for the first time, is then put into a hash
table, which hash-table can be checked in a lockfree manner. If the
locking chain occurs again later on, the hash table tells us that we
dont have to validate the chain again.
+
+Troubleshooting:
+----------------
+
+The validator tracks a maximum of MAX_LOCKDEP_KEYS number of lock classes.
+Exceeding this number will trigger the following lockdep warning:
+
+ (DEBUG_LOCKS_WARN_ON(id >= MAX_LOCKDEP_KEYS))
+
+By default, MAX_LOCKDEP_KEYS is currently set to 8191, and typical
+desktop systems have less than 1,000 lock classes, so this warning
+normally results from lock-class leakage or failure to properly
+initialize locks. These two problems are illustrated below:
+
+1. Repeated module loading and unloading while running the validator
+ will result in lock-class leakage. The issue here is that each
+ load of the module will create a new set of lock classes for
+ that module's locks, but module unloading does not remove old
+ classes (see below discussion of reuse of lock classes for why).
+ Therefore, if that module is loaded and unloaded repeatedly,
+ the number of lock classes will eventually reach the maximum.
+
+2. Using structures such as arrays that have large numbers of
+ locks that are not explicitly initialized. For example,
+ a hash table with 8192 buckets where each bucket has its own
+ spinlock_t will consume 8192 lock classes -unless- each spinlock
+ is explicitly initialized at runtime, for example, using the
+ run-time spin_lock_init() as opposed to compile-time initializers
+ such as __SPIN_LOCK_UNLOCKED(). Failure to properly initialize
+ the per-bucket spinlocks would guarantee lock-class overflow.
+ In contrast, a loop that called spin_lock_init() on each lock
+ would place all 8192 locks into a single lock class.
+
+ The moral of this story is that you should always explicitly
+ initialize your locks.
+
+One might argue that the validator should be modified to allow
+lock classes to be reused. However, if you are tempted to make this
+argument, first review the code and think through the changes that would
+be required, keeping in mind that the lock classes to be removed are
+likely to be linked into the lock-dependency graph. This turns out to
+be harder to do than to say.
+
+Of course, if you do run out of lock classes, the next thing to do is
+to find the offending lock classes. First, the following command gives
+you the number of lock classes currently in use along with the maximum:
+
+ grep "lock-classes" /proc/lockdep_stats
+
+This command produces the following output on a modest system:
+
+ lock-classes: 748 [max: 8191]
+
+If the number allocated (748 above) increases continually over time,
+then there is likely a leak. The following command can be used to
+identify the leaking lock classes:
+
+ grep "BD" /proc/lockdep
+
+Run the command and save the output, then compare against the output from
+a later run of this command to identify the leakers. This same output
+can also help you find situations where runtime lock initialization has
+been omitted.
diff --git a/Documentation/networking/00-INDEX b/Documentation/networking/00-INDEX
index bbce1215434..9ad9ddeb384 100644
--- a/Documentation/networking/00-INDEX
+++ b/Documentation/networking/00-INDEX
@@ -144,6 +144,8 @@ nfc.txt
- The Linux Near Field Communication (NFS) subsystem.
olympic.txt
- IBM PCI Pit/Pit-Phy/Olympic Token Ring driver info.
+openvswitch.txt
+ - Open vSwitch developer documentation.
operstates.txt
- Overview of network interface operational states.
packet_mmap.txt
diff --git a/Documentation/networking/batman-adv.txt b/Documentation/networking/batman-adv.txt
index c86d03f18a5..221ad0cdf11 100644
--- a/Documentation/networking/batman-adv.txt
+++ b/Documentation/networking/batman-adv.txt
@@ -200,15 +200,16 @@ abled during run time. Following log_levels are defined:
0 - All debug output disabled
1 - Enable messages related to routing / flooding / broadcasting
-2 - Enable route or tt entry added / changed / deleted
-3 - Enable all messages
+2 - Enable messages related to route added / changed / deleted
+4 - Enable messages related to translation table operations
+7 - Enable all messages
The debug output can be changed at runtime using the file
/sys/class/net/bat0/mesh/log_level. e.g.
# echo 2 > /sys/class/net/bat0/mesh/log_level
-will enable debug messages for when routes or TTs change.
+will enable debug messages for when routes change.
BATCTL
diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index 91df678fb7f..080ad26690a 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -196,6 +196,23 @@ or, for backwards compatibility, the option value. E.g.,
The parameters are as follows:
+active_slave
+
+ Specifies the new active slave for modes that support it
+ (active-backup, balance-alb and balance-tlb). Possible values
+ are the name of any currently enslaved interface, or an empty
+ string. If a name is given, the slave and its link must be up in order
+ to be selected as the new active slave. If an empty string is
+ specified, the current active slave is cleared, and a new active
+ slave is selected automatically.
+
+ Note that this is only available through the sysfs interface. No module
+ parameter by this name exists.
+
+ The normal value of this option is the name of the currently
+ active slave, or the empty string if there is no active slave or
+ the current mode does not use an active slave.
+
ad_select
Specifies the 802.3ad aggregation selection logic to use. The
diff --git a/Documentation/networking/ieee802154.txt b/Documentation/networking/ieee802154.txt
index f41ea240522..1dc1c24a754 100644
--- a/Documentation/networking/ieee802154.txt
+++ b/Documentation/networking/ieee802154.txt
@@ -78,3 +78,30 @@ in software. This is currently WIP.
See header include/net/mac802154.h and several drivers in drivers/ieee802154/.
+6LoWPAN Linux implementation
+============================
+
+The IEEE 802.15.4 standard specifies an MTU of 128 bytes, yielding about 80
+octets of actual MAC payload once security is turned on, on a wireless link
+with a link throughput of 250 kbps or less. The 6LoWPAN adaptation format
+[RFC4944] was specified to carry IPv6 datagrams over such constrained links,
+taking into account limited bandwidth, memory, or energy resources that are
+expected in applications such as wireless Sensor Networks. [RFC4944] defines
+a Mesh Addressing header to support sub-IP forwarding, a Fragmentation header
+to support the IPv6 minimum MTU requirement [RFC2460], and stateless header
+compression for IPv6 datagrams (LOWPAN_HC1 and LOWPAN_HC2) to reduce the
+relatively large IPv6 and UDP headers down to (in the best case) several bytes.
+
+In Semptember 2011 the standard update was published - [RFC6282].
+It deprecates HC1 and HC2 compression and defines IPHC encoding format which is
+used in this Linux implementation.
+
+All the code related to 6lowpan you may find in files: net/ieee802154/6lowpan.*
+
+To setup 6lowpan interface you need (busybox release > 1.17.0):
+1. Add IEEE802.15.4 interface and initialize PANid;
+2. Add 6lowpan interface by command like:
+ # ip link add link wpan0 name lowpan0 type lowpan
+3. Set MAC (if needs):
+ # ip link set lowpan0 address de:ad:be:ef:ca:fe:ba:be
+4. Bring up 'lowpan0' interface
diff --git a/Documentation/networking/ifenslave.c b/Documentation/networking/ifenslave.c
index 65968fbf1e4..ac5debb2f16 100644
--- a/Documentation/networking/ifenslave.c
+++ b/Documentation/networking/ifenslave.c
@@ -539,12 +539,14 @@ static int if_getconfig(char *ifname)
metric = 0;
} else
metric = ifr.ifr_metric;
+ printf("The result of SIOCGIFMETRIC is %d\n", metric);
strcpy(ifr.ifr_name, ifname);
if (ioctl(skfd, SIOCGIFMTU, &ifr) < 0)
mtu = 0;
else
mtu = ifr.ifr_mtu;
+ printf("The result of SIOCGIFMTU is %d\n", mtu);
strcpy(ifr.ifr_name, ifname);
if (ioctl(skfd, SIOCGIFDSTADDR, &ifr) < 0) {
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index f049a1ca186..ad3e80e17b4 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -31,6 +31,16 @@ neigh/default/gc_thresh3 - INTEGER
when using large numbers of interfaces and when communicating
with large numbers of directly-connected peers.
+neigh/default/unres_qlen_bytes - INTEGER
+ The maximum number of bytes which may be used by packets
+ queued for each unresolved address by other network layers.
+ (added in linux 3.3)
+
+neigh/default/unres_qlen - INTEGER
+ The maximum number of packets which may be queued for each
+ unresolved address by other network layers.
+ (deprecated in linux 3.3) : use unres_qlen_bytes instead.
+
mtu_expires - INTEGER
Time, in seconds, that cached PMTU information is kept.
@@ -165,6 +175,9 @@ tcp_congestion_control - STRING
connections. The algorithm "reno" is always available, but
additional choices may be available based on kernel configuration.
Default is set as part of kernel configuration.
+ For passive connections, the listener congestion control choice
+ is inherited.
+ [see setsockopt(listenfd, SOL_TCP, TCP_CONGESTION, "name" ...) ]
tcp_cookie_size - INTEGER
Default size of TCP Cookie Transactions (TCPCT) option, that may be
@@ -282,11 +295,11 @@ tcp_max_ssthresh - INTEGER
Default: 0 (off)
tcp_max_syn_backlog - INTEGER
- Maximal number of remembered connection requests, which are
- still did not receive an acknowledgment from connecting client.
- Default value is 1024 for systems with more than 128Mb of memory,
- and 128 for low memory machines. If server suffers of overload,
- try to increase this number.
+ Maximal number of remembered connection requests, which have not
+ received an acknowledgment from connecting client.
+ The minimal value is 128 for low memory machines, and it will
+ increase in proportion to the memory of machine.
+ If server suffers from overload, try increasing this number.
tcp_max_tw_buckets - INTEGER
Maximal number of timewait sockets held by system simultaneously.
diff --git a/Documentation/networking/openvswitch.txt b/Documentation/networking/openvswitch.txt
new file mode 100644
index 00000000000..b8a048b8df3
--- /dev/null
+++ b/Documentation/networking/openvswitch.txt
@@ -0,0 +1,195 @@
+Open vSwitch datapath developer documentation
+=============================================
+
+The Open vSwitch kernel module allows flexible userspace control over
+flow-level packet processing on selected network devices. It can be
+used to implement a plain Ethernet switch, network device bonding,
+VLAN processing, network access control, flow-based network control,
+and so on.
+
+The kernel module implements multiple "datapaths" (analogous to
+bridges), each of which can have multiple "vports" (analogous to ports
+within a bridge). Each datapath also has associated with it a "flow
+table" that userspace populates with "flows" that map from keys based
+on packet headers and metadata to sets of actions. The most common
+action forwards the packet to another vport; other actions are also
+implemented.
+
+When a packet arrives on a vport, the kernel module processes it by
+extracting its flow key and looking it up in the flow table. If there
+is a matching flow, it executes the associated actions. If there is
+no match, it queues the packet to userspace for processing (as part of
+its processing, userspace will likely set up a flow to handle further
+packets of the same type entirely in-kernel).
+
+
+Flow key compatibility
+----------------------
+
+Network protocols evolve over time. New protocols become important
+and existing protocols lose their prominence. For the Open vSwitch
+kernel module to remain relevant, it must be possible for newer
+versions to parse additional protocols as part of the flow key. It
+might even be desirable, someday, to drop support for parsing
+protocols that have become obsolete. Therefore, the Netlink interface
+to Open vSwitch is designed to allow carefully written userspace
+applications to work with any version of the flow key, past or future.
+
+To support this forward and backward compatibility, whenever the
+kernel module passes a packet to userspace, it also passes along the
+flow key that it parsed from the packet. Userspace then extracts its
+own notion of a flow key from the packet and compares it against the
+kernel-provided version:
+
+ - If userspace's notion of the flow key for the packet matches the
+ kernel's, then nothing special is necessary.
+
+ - If the kernel's flow key includes more fields than the userspace
+ version of the flow key, for example if the kernel decoded IPv6
+ headers but userspace stopped at the Ethernet type (because it
+ does not understand IPv6), then again nothing special is
+ necessary. Userspace can still set up a flow in the usual way,
+ as long as it uses the kernel-provided flow key to do it.
+
+ - If the userspace flow key includes more fields than the
+ kernel's, for example if userspace decoded an IPv6 header but
+ the kernel stopped at the Ethernet type, then userspace can
+ forward the packet manually, without setting up a flow in the
+ kernel. This case is bad for performance because every packet
+ that the kernel considers part of the flow must go to userspace,
+ but the forwarding behavior is correct. (If userspace can
+ determine that the values of the extra fields would not affect
+ forwarding behavior, then it could set up a flow anyway.)
+
+How flow keys evolve over time is important to making this work, so
+the following sections go into detail.
+
+
+Flow key format
+---------------
+
+A flow key is passed over a Netlink socket as a sequence of Netlink
+attributes. Some attributes represent packet metadata, defined as any
+information about a packet that cannot be extracted from the packet
+itself, e.g. the vport on which the packet was received. Most
+attributes, however, are extracted from headers within the packet,
+e.g. source and destination addresses from Ethernet, IP, or TCP
+headers.
+
+The <linux/openvswitch.h> header file defines the exact format of the
+flow key attributes. For informal explanatory purposes here, we write
+them as comma-separated strings, with parentheses indicating arguments
+and nesting. For example, the following could represent a flow key
+corresponding to a TCP packet that arrived on vport 1:
+
+ in_port(1), eth(src=e0:91:f5:21:d0:b2, dst=00:02:e3:0f:80:a4),
+ eth_type(0x0800), ipv4(src=172.16.0.20, dst=172.18.0.52, proto=17, tos=0,
+ frag=no), tcp(src=49163, dst=80)
+
+Often we ellipsize arguments not important to the discussion, e.g.:
+
+ in_port(1), eth(...), eth_type(0x0800), ipv4(...), tcp(...)
+
+
+Basic rule for evolving flow keys
+---------------------------------
+
+Some care is needed to really maintain forward and backward
+compatibility for applications that follow the rules listed under
+"Flow key compatibility" above.
+
+The basic rule is obvious:
+
+ ------------------------------------------------------------------
+ New network protocol support must only supplement existing flow
+ key attributes. It must not change the meaning of already defined
+ flow key attributes.
+ ------------------------------------------------------------------
+
+This rule does have less-obvious consequences so it is worth working
+through a few examples. Suppose, for example, that the kernel module
+did not already implement VLAN parsing. Instead, it just interpreted
+the 802.1Q TPID (0x8100) as the Ethertype then stopped parsing the
+packet. The flow key for any packet with an 802.1Q header would look
+essentially like this, ignoring metadata:
+
+ eth(...), eth_type(0x8100)
+
+Naively, to add VLAN support, it makes sense to add a new "vlan" flow
+key attribute to contain the VLAN tag, then continue to decode the
+encapsulated headers beyond the VLAN tag using the existing field
+definitions. With this change, an TCP packet in VLAN 10 would have a
+flow key much like this:
+
+ eth(...), vlan(vid=10, pcp=0), eth_type(0x0800), ip(proto=6, ...), tcp(...)
+
+But this change would negatively affect a userspace application that
+has not been updated to understand the new "vlan" flow key attribute.
+The application could, following the flow compatibility rules above,
+ignore the "vlan" attribute that it does not understand and therefore
+assume that the flow contained IP packets. This is a bad assumption
+(the flow only contains IP packets if one parses and skips over the
+802.1Q header) and it could cause the application's behavior to change
+across kernel versions even though it follows the compatibility rules.
+
+The solution is to use a set of nested attributes. This is, for
+example, why 802.1Q support uses nested attributes. A TCP packet in
+VLAN 10 is actually expressed as:
+
+ eth(...), eth_type(0x8100), vlan(vid=10, pcp=0), encap(eth_type(0x0800),
+ ip(proto=6, ...), tcp(...)))
+
+Notice how the "eth_type", "ip", and "tcp" flow key attributes are
+nested inside the "encap" attribute. Thus, an application that does
+not understand the "vlan" key will not see either of those attributes
+and therefore will not misinterpret them. (Also, the outer eth_type
+is still 0x8100, not changed to 0x0800.)
+
+Handling malformed packets
+--------------------------
+
+Don't drop packets in the kernel for malformed protocol headers, bad
+checksums, etc. This would prevent userspace from implementing a
+simple Ethernet switch that forwards every packet.
+
+Instead, in such a case, include an attribute with "empty" content.
+It doesn't matter if the empty content could be valid protocol values,
+as long as those values are rarely seen in practice, because userspace
+can always forward all packets with those values to userspace and
+handle them individually.
+
+For example, consider a packet that contains an IP header that
+indicates protocol 6 for TCP, but which is truncated just after the IP
+header, so that the TCP header is missing. The flow key for this
+packet would include a tcp attribute with all-zero src and dst, like
+this:
+
+ eth(...), eth_type(0x0800), ip(proto=6, ...), tcp(src=0, dst=0)
+
+As another example, consider a packet with an Ethernet type of 0x8100,
+indicating that a VLAN TCI should follow, but which is truncated just
+after the Ethernet type. The flow key for this packet would include
+an all-zero-bits vlan and an empty encap attribute, like this:
+
+ eth(...), eth_type(0x8100), vlan(0), encap()
+
+Unlike a TCP packet with source and destination ports 0, an
+all-zero-bits VLAN TCI is not that rare, so the CFI bit (aka
+VLAN_TAG_PRESENT inside the kernel) is ordinarily set in a vlan
+attribute expressly to allow this situation to be distinguished.
+Thus, the flow key in this second example unambiguously indicates a
+missing or malformed VLAN TCI.
+
+Other rules
+-----------
+
+The other rules for flow keys are much less subtle:
+
+ - Duplicate attributes are not allowed at a given nesting level.
+
+ - Ordering of attributes is not significant.
+
+ - When the kernel sends a given flow key to userspace, it always
+ composes it the same way. This allows userspace to hash and
+ compare entire flow keys that it may not be able to fully
+ interpret.
diff --git a/Documentation/networking/packet_mmap.txt b/Documentation/networking/packet_mmap.txt
index 4acea660372..1c08a4b0981 100644
--- a/Documentation/networking/packet_mmap.txt
+++ b/Documentation/networking/packet_mmap.txt
@@ -155,7 +155,7 @@ As capture, each frame contains two parts:
/* fill sockaddr_ll struct to prepare binding */
my_addr.sll_family = AF_PACKET;
- my_addr.sll_protocol = ETH_P_ALL;
+ my_addr.sll_protocol = htons(ETH_P_ALL);
my_addr.sll_ifindex = s_ifr.ifr_ifindex;
/* bind socket to eth0 */
diff --git a/Documentation/networking/scaling.txt b/Documentation/networking/scaling.txt
index a177de21d28..579994afbe0 100644
--- a/Documentation/networking/scaling.txt
+++ b/Documentation/networking/scaling.txt
@@ -208,7 +208,7 @@ The counter in rps_dev_flow_table values records the length of the current
CPU's backlog when a packet in this flow was last enqueued. Each backlog
queue has a head counter that is incremented on dequeue. A tail counter
is computed as head counter + queue length. In other words, the counter
-in rps_dev_flow_table[i] records the last element in flow i that has
+in rps_dev_flow[i] records the last element in flow i that has
been enqueued onto the currently designated CPU for flow i (of course,
entry i is actually selected by hash and multiple flows may hash to the
same entry i).
@@ -224,7 +224,7 @@ following is true:
- The current CPU's queue head counter >= the recorded tail counter
value in rps_dev_flow[i]
-- The current CPU is unset (equal to NR_CPUS)
+- The current CPU is unset (equal to RPS_NO_CPU)
- The current CPU is offline
After this check, the packet is sent to the (possibly updated) current
@@ -235,7 +235,7 @@ CPU.
==== RFS Configuration
-RFS is only available if the kconfig symbol CONFIG_RFS is enabled (on
+RFS is only available if the kconfig symbol CONFIG_RPS is enabled (on
by default for SMP). The functionality remains disabled until explicitly
configured. The number of entries in the global flow table is set through:
@@ -258,7 +258,7 @@ For a single queue device, the rps_flow_cnt value for the single queue
would normally be configured to the same value as rps_sock_flow_entries.
For a multi-queue device, the rps_flow_cnt for each queue might be
configured as rps_sock_flow_entries / N, where N is the number of
-queues. So for instance, if rps_flow_entries is set to 32768 and there
+queues. So for instance, if rps_sock_flow_entries is set to 32768 and there
are 16 configured receive queues, rps_flow_cnt for each queue might be
configured as 2048.
diff --git a/Documentation/networking/stmmac.txt b/Documentation/networking/stmmac.txt
index 8d67980fabe..d0aeeadd264 100644
--- a/Documentation/networking/stmmac.txt
+++ b/Documentation/networking/stmmac.txt
@@ -4,14 +4,16 @@ Copyright (C) 2007-2010 STMicroelectronics Ltd
Author: Giuseppe Cavallaro <peppe.cavallaro@st.com>
This is the driver for the MAC 10/100/1000 on-chip Ethernet controllers
-(Synopsys IP blocks); it has been fully tested on STLinux platforms.
+(Synopsys IP blocks).
Currently this network device driver is for all STM embedded MAC/GMAC
-(i.e. 7xxx/5xxx SoCs) and it's known working on other platforms i.e. ARM SPEAr.
+(i.e. 7xxx/5xxx SoCs), SPEAr (arm), Loongson1B (mips) and XLINX XC2V3000
+FF1152AMT0221 D1215994A VIRTEX FPGA board.
-DWC Ether MAC 10/100/1000 Universal version 3.41a and DWC Ether MAC 10/100
-Universal version 4.0 have been used for developing the first code
-implementation.
+DWC Ether MAC 10/100/1000 Universal version 3.60a (and older) and DWC Ether MAC 10/100
+Universal version 4.0 have been used for developing this driver.
+
+This driver supports both the platform bus and PCI.
Please, for more information also visit: www.stlinux.com
@@ -277,5 +279,5 @@ In fact, these can generate an huge amount of debug messages.
6) TODO:
o XGMAC is not supported.
- o Review the timer optimisation code to use an embedded device that will be
- available in new chip generations.
+ o Add the EEE - Energy Efficient Ethernet
+ o Add the PTP - precision time protocol
diff --git a/Documentation/networking/team.txt b/Documentation/networking/team.txt
new file mode 100644
index 00000000000..5a013686b9e
--- /dev/null
+++ b/Documentation/networking/team.txt
@@ -0,0 +1,2 @@
+Team devices are driven from userspace via libteam library which is here:
+ https://github.com/jpirko/libteam
diff --git a/Documentation/power/devices.txt b/Documentation/power/devices.txt
index 646a89e0c07..3139fb505dc 100644
--- a/Documentation/power/devices.txt
+++ b/Documentation/power/devices.txt
@@ -123,9 +123,10 @@ please refer directly to the source code for more information about it.
Subsystem-Level Methods
-----------------------
The core methods to suspend and resume devices reside in struct dev_pm_ops
-pointed to by the pm member of struct bus_type, struct device_type and
-struct class. They are mostly of interest to the people writing infrastructure
-for buses, like PCI or USB, or device type and device class drivers.
+pointed to by the ops member of struct dev_pm_domain, or by the pm member of
+struct bus_type, struct device_type and struct class. They are mostly of
+interest to the people writing infrastructure for platforms and buses, like PCI
+or USB, or device type and device class drivers.
Bus drivers implement these methods as appropriate for the hardware and the
drivers using it; PCI works differently from USB, and so on. Not many people
@@ -139,41 +140,57 @@ sequencing in the driver model tree.
/sys/devices/.../power/wakeup files
-----------------------------------
-All devices in the driver model have two flags to control handling of wakeup
-events (hardware signals that can force the device and/or system out of a low
-power state). These flags are initialized by bus or device driver code using
+All device objects in the driver model contain fields that control the handling
+of system wakeup events (hardware signals that can force the system out of a
+sleep state). These fields are initialized by bus or device driver code using
device_set_wakeup_capable() and device_set_wakeup_enable(), defined in
include/linux/pm_wakeup.h.
-The "can_wakeup" flag just records whether the device (and its driver) can
+The "power.can_wakeup" flag just records whether the device (and its driver) can
physically support wakeup events. The device_set_wakeup_capable() routine
-affects this flag. The "should_wakeup" flag controls whether the device should
-try to use its wakeup mechanism. device_set_wakeup_enable() affects this flag;
-for the most part drivers should not change its value. The initial value of
-should_wakeup is supposed to be false for the majority of devices; the major
-exceptions are power buttons, keyboards, and Ethernet adapters whose WoL
-(wake-on-LAN) feature has been set up with ethtool. It should also default
-to true for devices that don't generate wakeup requests on their own but merely
-forward wakeup requests from one bus to another (like PCI bridges).
+affects this flag. The "power.wakeup" field is a pointer to an object of type
+struct wakeup_source used for controlling whether or not the device should use
+its system wakeup mechanism and for notifying the PM core of system wakeup
+events signaled by the device. This object is only present for wakeup-capable
+devices (i.e. devices whose "can_wakeup" flags are set) and is created (or
+removed) by device_set_wakeup_capable().
Whether or not a device is capable of issuing wakeup events is a hardware
matter, and the kernel is responsible for keeping track of it. By contrast,
whether or not a wakeup-capable device should issue wakeup events is a policy
decision, and it is managed by user space through a sysfs attribute: the
-power/wakeup file. User space can write the strings "enabled" or "disabled" to
-set or clear the "should_wakeup" flag, respectively. This file is only present
-for wakeup-capable devices (i.e. devices whose "can_wakeup" flags are set)
-and is created (or removed) by device_set_wakeup_capable(). Reads from the
-file will return the corresponding string.
-
-The device_may_wakeup() routine returns true only if both flags are set.
+"power/wakeup" file. User space can write the strings "enabled" or "disabled"
+to it to indicate whether or not, respectively, the device is supposed to signal
+system wakeup. This file is only present if the "power.wakeup" object exists
+for the given device and is created (or removed) along with that object, by
+device_set_wakeup_capable(). Reads from the file will return the corresponding
+string.
+
+The "power/wakeup" file is supposed to contain the "disabled" string initially
+for the majority of devices; the major exceptions are power buttons, keyboards,
+and Ethernet adapters whose WoL (wake-on-LAN) feature has been set up with
+ethtool. It should also default to "enabled" for devices that don't generate
+wakeup requests on their own but merely forward wakeup requests from one bus to
+another (like PCI Express ports).
+
+The device_may_wakeup() routine returns true only if the "power.wakeup" object
+exists and the corresponding "power/wakeup" file contains the string "enabled".
This information is used by subsystems, like the PCI bus type code, to see
whether or not to enable the devices' wakeup mechanisms. If device wakeup
mechanisms are enabled or disabled directly by drivers, they also should use
device_may_wakeup() to decide what to do during a system sleep transition.
-However for runtime power management, wakeup events should be enabled whenever
-the device and driver both support them, regardless of the should_wakeup flag.
-
+Device drivers, however, are not supposed to call device_set_wakeup_enable()
+directly in any case.
+
+It ought to be noted that system wakeup is conceptually different from "remote
+wakeup" used by runtime power management, although it may be supported by the
+same physical mechanism. Remote wakeup is a feature allowing devices in
+low-power states to trigger specific interrupts to signal conditions in which
+they should be put into the full-power state. Those interrupts may or may not
+be used to signal system wakeup events, depending on the hardware design. On
+some systems it is impossible to trigger them from system sleep states. In any
+case, remote wakeup should always be enabled for runtime power management for
+all devices and drivers that support it.
/sys/devices/.../power/control files
------------------------------------
@@ -249,20 +266,31 @@ for every device before the next phase begins. Not all busses or classes
support all these callbacks and not all drivers use all the callbacks. The
various phases always run after tasks have been frozen and before they are
unfrozen. Furthermore, the *_noirq phases run at a time when IRQ handlers have
-been disabled (except for those marked with the IRQ_WAKEUP flag).
-
-All phases use bus, type, or class callbacks (that is, methods defined in
-dev->bus->pm, dev->type->pm, or dev->class->pm). These callbacks are mutually
-exclusive, so if the device type provides a struct dev_pm_ops object pointed to
-by its pm field (i.e. both dev->type and dev->type->pm are defined), the
-callbacks included in that object (i.e. dev->type->pm) will be used. Otherwise,
-if the class provides a struct dev_pm_ops object pointed to by its pm field
-(i.e. both dev->class and dev->class->pm are defined), the PM core will use the
-callbacks from that object (i.e. dev->class->pm). Finally, if the pm fields of
-both the device type and class objects are NULL (or those objects do not exist),
-the callbacks provided by the bus (that is, the callbacks from dev->bus->pm)
-will be used (this allows device types to override callbacks provided by bus
-types or classes if necessary).
+been disabled (except for those marked with the IRQF_NO_SUSPEND flag).
+
+All phases use PM domain, bus, type, or class callbacks (that is, methods
+defined in dev->pm_domain->ops, dev->bus->pm, dev->type->pm, or dev->class->pm).
+These callbacks are regarded by the PM core as mutually exclusive. Moreover,
+PM domain callbacks always take precedence over bus, type and class callbacks,
+while type callbacks take precedence over bus and class callbacks, and class
+callbacks take precedence over bus callbacks. To be precise, the following
+rules are used to determine which callback to execute in the given phase:
+
+ 1. If dev->pm_domain is present, the PM core will attempt to execute the
+ callback included in dev->pm_domain->ops. If that callback is not
+ present, no action will be carried out for the given device.
+
+ 2. Otherwise, if both dev->type and dev->type->pm are present, the callback
+ included in dev->type->pm will be executed.
+
+ 3. Otherwise, if both dev->class and dev->class->pm are present, the
+ callback included in dev->class->pm will be executed.
+
+ 4. Otherwise, if both dev->bus and dev->bus->pm are present, the callback
+ included in dev->bus->pm will be executed.
+
+This allows PM domains and device types to override callbacks provided by bus
+types or device classes if necessary.
These callbacks may in turn invoke device- or driver-specific methods stored in
dev->driver->pm, but they don't have to.
@@ -283,9 +311,8 @@ When the system goes into the standby or memory sleep state, the phases are:
After the prepare callback method returns, no new children may be
registered below the device. The method may also prepare the device or
- driver in some way for the upcoming system power transition (for
- example, by allocating additional memory required for this purpose), but
- it should not put the device into a low-power state.
+ driver in some way for the upcoming system power transition, but it
+ should not put the device into a low-power state.
2. The suspend methods should quiesce the device to stop it from performing
I/O. They also may save the device registers and put it into the
diff --git a/Documentation/power/runtime_pm.txt b/Documentation/power/runtime_pm.txt
index 5336149f831..c2ae8bf77d4 100644
--- a/Documentation/power/runtime_pm.txt
+++ b/Documentation/power/runtime_pm.txt
@@ -44,25 +44,33 @@ struct dev_pm_ops {
};
The ->runtime_suspend(), ->runtime_resume() and ->runtime_idle() callbacks
-are executed by the PM core for either the power domain, or the device type
-(if the device power domain's struct dev_pm_ops does not exist), or the class
-(if the device power domain's and type's struct dev_pm_ops object does not
-exist), or the bus type (if the device power domain's, type's and class'
-struct dev_pm_ops objects do not exist) of the given device, so the priority
-order of callbacks from high to low is that power domain callbacks, device
-type callbacks, class callbacks and bus type callbacks, and the high priority
-one will take precedence over low priority one. The bus type, device type and
-class callbacks are referred to as subsystem-level callbacks in what follows,
-and generally speaking, the power domain callbacks are used for representing
-power domains within a SoC.
+are executed by the PM core for the device's subsystem that may be either of
+the following:
+
+ 1. PM domain of the device, if the device's PM domain object, dev->pm_domain,
+ is present.
+
+ 2. Device type of the device, if both dev->type and dev->type->pm are present.
+
+ 3. Device class of the device, if both dev->class and dev->class->pm are
+ present.
+
+ 4. Bus type of the device, if both dev->bus and dev->bus->pm are present.
+
+The PM core always checks which callback to use in the order given above, so the
+priority order of callbacks from high to low is: PM domain, device type, class
+and bus type. Moreover, the high-priority one will always take precedence over
+a low-priority one. The PM domain, bus type, device type and class callbacks
+are referred to as subsystem-level callbacks in what follows.
By default, the callbacks are always invoked in process context with interrupts
enabled. However, subsystems can use the pm_runtime_irq_safe() helper function
-to tell the PM core that a device's ->runtime_suspend() and ->runtime_resume()
-callbacks should be invoked in atomic context with interrupts disabled.
-This implies that these callback routines must not block or sleep, but it also
-means that the synchronous helper functions listed at the end of Section 4 can
-be used within an interrupt handler or in an atomic context.
+to tell the PM core that their ->runtime_suspend(), ->runtime_resume() and
+->runtime_idle() callbacks may be invoked in atomic context with interrupts
+disabled for a given device. This implies that the callback routines in
+question must not block or sleep, but it also means that the synchronous helper
+functions listed at the end of Section 4 may be used for that device within an
+interrupt handler or generally in an atomic context.
The subsystem-level suspend callback is _entirely_ _responsible_ for handling
the suspend of the device as appropriate, which may, but need not include
diff --git a/Documentation/sound/alsa/soc/machine.txt b/Documentation/sound/alsa/soc/machine.txt
index 3e2ec9cbf39..d50c14df341 100644
--- a/Documentation/sound/alsa/soc/machine.txt
+++ b/Documentation/sound/alsa/soc/machine.txt
@@ -50,8 +50,7 @@ Machine DAI Configuration
The machine DAI configuration glues all the codec and CPU DAIs together. It can
also be used to set up the DAI system clock and for any machine related DAI
initialisation e.g. the machine audio map can be connected to the codec audio
-map, unconnected codec pins can be set as such. Please see corgi.c, spitz.c
-for examples.
+map, unconnected codec pins can be set as such.
struct snd_soc_dai_link is used to set up each DAI in your machine. e.g.
@@ -83,8 +82,7 @@ Machine Power Map
The machine driver can optionally extend the codec power map and to become an
audio power map of the audio subsystem. This allows for automatic power up/down
of speaker/HP amplifiers, etc. Codec pins can be connected to the machines jack
-sockets in the machine init function. See soc/pxa/spitz.c and dapm.txt for
-details.
+sockets in the machine init function.
Machine Controls
diff --git a/Documentation/trace/events.txt b/Documentation/trace/events.txt
index b510564aac7..bb24c2a0e87 100644
--- a/Documentation/trace/events.txt
+++ b/Documentation/trace/events.txt
@@ -191,8 +191,6 @@ And for string fields they are:
Currently, only exact string matches are supported.
-Currently, the maximum number of predicates in a filter is 16.
-
5.2 Setting filters
-------------------
diff --git a/Documentation/usb/linux-cdc-acm.inf b/Documentation/usb/linux-cdc-acm.inf
index 37a02ce5484..f0ffc27d4c0 100644
--- a/Documentation/usb/linux-cdc-acm.inf
+++ b/Documentation/usb/linux-cdc-acm.inf
@@ -90,10 +90,10 @@ ServiceBinary=%12%\USBSER.sys
[SourceDisksFiles]
[SourceDisksNames]
[DeviceList]
-%DESCRIPTION%=DriverInstall, USB\VID_0525&PID_A4A7, USB\VID_1D6B&PID_0104&MI_02
+%DESCRIPTION%=DriverInstall, USB\VID_0525&PID_A4A7, USB\VID_1D6B&PID_0104&MI_02, USB\VID_1D6B&PID_0106&MI_00
[DeviceList.NTamd64]
-%DESCRIPTION%=DriverInstall, USB\VID_0525&PID_A4A7, USB\VID_1D6B&PID_0104&MI_02
+%DESCRIPTION%=DriverInstall, USB\VID_0525&PID_A4A7, USB\VID_1D6B&PID_0104&MI_02, USB\VID_1D6B&PID_0106&MI_00
;------------------------------------------------------------------------------
diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 7945b0bd35e..e2a4b528736 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1100,6 +1100,15 @@ emulate them efficiently. The fields in each entry are defined as follows:
eax, ebx, ecx, edx: the values returned by the cpuid instruction for
this function/index combination
+The TSC deadline timer feature (CPUID leaf 1, ecx[24]) is always returned
+as false, since the feature depends on KVM_CREATE_IRQCHIP for local APIC
+support. Instead it is reported via
+
+ ioctl(KVM_CHECK_EXTENSION, KVM_CAP_TSC_DEADLINE_TIMER)
+
+if that returns true and you use KVM_CREATE_IRQCHIP, or if you emulate the
+feature in userspace, then you can enable the feature for KVM_SET_CPUID2.
+
4.47 KVM_PPC_GET_PVINFO
Capability: KVM_CAP_PPC_GET_PVINFO
@@ -1151,6 +1160,13 @@ following flags are specified:
/* Depends on KVM_CAP_IOMMU */
#define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
+The KVM_DEV_ASSIGN_ENABLE_IOMMU flag is a mandatory option to ensure
+isolation of the device. Usages not specifying this flag are deprecated.
+
+Only PCI header type 0 devices with PCI BAR resources are supported by
+device assignment. The user requesting this ioctl must have read/write
+access to the PCI sysfs resource files associated with the device.
+
4.49 KVM_DEASSIGN_PCI_DEVICE
Capability: KVM_CAP_DEVICE_DEASSIGNMENT