diff options
Diffstat (limited to 'Documentation/admin-guide')
-rw-r--r-- | Documentation/admin-guide/cgroup-v1/memory.rst | 2 | ||||
-rw-r--r-- | Documentation/admin-guide/kdump/vmcoreinfo.rst | 14 | ||||
-rw-r--r-- | Documentation/admin-guide/mm/damon/usage.rst | 76 | ||||
-rw-r--r-- | Documentation/admin-guide/mm/ksm.rst | 27 | ||||
-rw-r--r-- | Documentation/admin-guide/mm/memory-hotplug.rst | 14 | ||||
-rw-r--r-- | Documentation/admin-guide/mm/userfaultfd.rst | 15 | ||||
-rw-r--r-- | Documentation/admin-guide/mm/zswap.rst | 14 |
7 files changed, 109 insertions, 53 deletions
diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst index fabaad3fd9c2..8d3afeede10e 100644 --- a/Documentation/admin-guide/cgroup-v1/memory.rst +++ b/Documentation/admin-guide/cgroup-v1/memory.rst @@ -92,8 +92,6 @@ Brief summary of control files. memory.oom_control set/show oom controls. memory.numa_stat show the number of memory usage per numa node - memory.kmem.limit_in_bytes This knob is deprecated and writing to - it will return -ENOTSUPP. memory.kmem.usage_in_bytes show current kernel memory allocation memory.kmem.failcnt show the number of kernel memory usage hits limits diff --git a/Documentation/admin-guide/kdump/vmcoreinfo.rst b/Documentation/admin-guide/kdump/vmcoreinfo.rst index f8ebb63b6c5d..599e8d3bcbc3 100644 --- a/Documentation/admin-guide/kdump/vmcoreinfo.rst +++ b/Documentation/admin-guide/kdump/vmcoreinfo.rst @@ -141,8 +141,8 @@ nodemask_t The size of a nodemask_t type. Used to compute the number of online nodes. -(page, flags|_refcount|mapping|lru|_mapcount|private|compound_dtor|compound_order|compound_head) -------------------------------------------------------------------------------------------------- +(page, flags|_refcount|mapping|lru|_mapcount|private|compound_order|compound_head) +---------------------------------------------------------------------------------- User-space tools compute their values based on the offset of these variables. The variables are used when excluding unnecessary pages. @@ -325,8 +325,8 @@ NR_FREE_PAGES On linux-2.6.21 or later, the number of free pages is in vm_stat[NR_FREE_PAGES]. Used to get the number of free pages. -PG_lru|PG_private|PG_swapcache|PG_swapbacked|PG_slab|PG_hwpoision|PG_head_mask ------------------------------------------------------------------------------- +PG_lru|PG_private|PG_swapcache|PG_swapbacked|PG_slab|PG_hwpoision|PG_head_mask|PG_hugetlb +----------------------------------------------------------------------------------------- Page attributes. These flags are used to filter various unnecessary for dumping pages. @@ -338,12 +338,6 @@ More page attributes. These flags are used to filter various unnecessary for dumping pages. -HUGETLB_PAGE_DTOR ------------------ - -The HUGETLB_PAGE_DTOR flag denotes hugetlbfs pages. Makedumpfile -excludes these pages. - x86_64 ====== diff --git a/Documentation/admin-guide/mm/damon/usage.rst b/Documentation/admin-guide/mm/damon/usage.rst index 2d495fa85a0e..084f0a32b421 100644 --- a/Documentation/admin-guide/mm/damon/usage.rst +++ b/Documentation/admin-guide/mm/damon/usage.rst @@ -87,7 +87,7 @@ comma (","). :: │ │ │ │ │ │ │ filters/nr_filters │ │ │ │ │ │ │ │ 0/type,matching,memcg_id │ │ │ │ │ │ │ stats/nr_tried,sz_tried,nr_applied,sz_applied,qt_exceeds - │ │ │ │ │ │ │ tried_regions/ + │ │ │ │ │ │ │ tried_regions/total_bytes │ │ │ │ │ │ │ │ 0/start,end,nr_accesses,age │ │ │ │ │ │ │ │ ... │ │ │ │ │ │ ... @@ -127,14 +127,18 @@ in the state. Writing ``commit`` to the ``state`` file makes kdamond reads the user inputs in the sysfs files except ``state`` file again. Writing ``update_schemes_stats`` to ``state`` file updates the contents of stats files for each DAMON-based operation scheme of the kdamond. For details of the -stats, please refer to :ref:`stats section <sysfs_schemes_stats>`. Writing -``update_schemes_tried_regions`` to ``state`` file updates the DAMON-based -operation scheme action tried regions directory for each DAMON-based operation -scheme of the kdamond. Writing ``clear_schemes_tried_regions`` to ``state`` -file clears the DAMON-based operating scheme action tried regions directory for -each DAMON-based operation scheme of the kdamond. For details of the -DAMON-based operation scheme action tried regions directory, please refer to -:ref:`tried_regions section <sysfs_schemes_tried_regions>`. +stats, please refer to :ref:`stats section <sysfs_schemes_stats>`. + +Writing ``update_schemes_tried_regions`` to ``state`` file updates the +DAMON-based operation scheme action tried regions directory for each +DAMON-based operation scheme of the kdamond. Writing +``update_schemes_tried_bytes`` to ``state`` file updates only +``.../tried_regions/total_bytes`` files. Writing +``clear_schemes_tried_regions`` to ``state`` file clears the DAMON-based +operating scheme action tried regions directory for each DAMON-based operation +scheme of the kdamond. For details of the DAMON-based operation scheme action +tried regions directory, please refer to :ref:`tried_regions section +<sysfs_schemes_tried_regions>`. If the state is ``on``, reading ``pid`` shows the pid of the kdamond thread. @@ -359,15 +363,21 @@ number (``N``) to the file creates the number of child directories named ``0`` to ``N-1``. Each directory represents each filter. The filters are evaluated in the numeric order. -Each filter directory contains three files, namely ``type``, ``matcing``, and -``memcg_path``. You can write one of two special keywords, ``anon`` for -anonymous pages, or ``memcg`` for specific memory cgroup filtering. In case of -the memory cgroup filtering, you can specify the memory cgroup of the interest -by writing the path of the memory cgroup from the cgroups mount point to -``memcg_path`` file. You can write ``Y`` or ``N`` to ``matching`` file to -filter out pages that does or does not match to the type, respectively. Then, -the scheme's action will not be applied to the pages that specified to be -filtered out. +Each filter directory contains six files, namely ``type``, ``matcing``, +``memcg_path``, ``addr_start``, ``addr_end``, and ``target_idx``. To ``type`` +file, you can write one of four special keywords: ``anon`` for anonymous pages, +``memcg`` for specific memory cgroup, ``addr`` for specific address range (an +open-ended interval), or ``target`` for specific DAMON monitoring target +filtering. In case of the memory cgroup filtering, you can specify the memory +cgroup of the interest by writing the path of the memory cgroup from the +cgroups mount point to ``memcg_path`` file. In case of the address range +filtering, you can specify the start and end address of the range to +``addr_start`` and ``addr_end`` files, respectively. For the DAMON monitoring +target filtering, you can specify the index of the target between the list of +the DAMON context's monitoring targets list to ``target_idx`` file. You can +write ``Y`` or ``N`` to ``matching`` file to filter out pages that does or does +not match to the type, respectively. Then, the scheme's action will not be +applied to the pages that specified to be filtered out. For example, below restricts a DAMOS action to be applied to only non-anonymous pages of all memory cgroups except ``/having_care_already``.:: @@ -381,8 +391,14 @@ pages of all memory cgroups except ``/having_care_already``.:: echo /having_care_already > 1/memcg_path echo N > 1/matching -Note that filters are currently supported only when ``paddr`` -`implementation <sysfs_contexts>` is being used. +Note that ``anon`` and ``memcg`` filters are currently supported only when +``paddr`` `implementation <sysfs_contexts>` is being used. + +Also, memory regions that are filtered out by ``addr`` or ``target`` filters +are not counted as the scheme has tried to those, while regions that filtered +out by other type filters are counted as the scheme has tried to. The +difference is applied to :ref:`stats <damos_stats>` and +:ref:`tried regions <sysfs_schemes_tried_regions>`. .. _sysfs_schemes_stats: @@ -406,13 +422,21 @@ stats by writing a special keyword, ``update_schemes_stats`` to the relevant schemes/<N>/tried_regions/ -------------------------- +This directory initially has one file, ``total_bytes``. + When a special keyword, ``update_schemes_tried_regions``, is written to the -relevant ``kdamonds/<N>/state`` file, DAMON creates directories named integer -starting from ``0`` under this directory. Each directory contains files -exposing detailed information about each of the memory region that the -corresponding scheme's ``action`` has tried to be applied under this directory, -during next :ref:`aggregation interval <sysfs_monitoring_attrs>`. The -information includes address range, ``nr_accesses``, and ``age`` of the region. +relevant ``kdamonds/<N>/state`` file, DAMON updates the ``total_bytes`` file so +that reading it returns the total size of the scheme tried regions, and creates +directories named integer starting from ``0`` under this directory. Each +directory contains files exposing detailed information about each of the memory +region that the corresponding scheme's ``action`` has tried to be applied under +this directory, during next :ref:`aggregation interval +<sysfs_monitoring_attrs>`. The information includes address range, +``nr_accesses``, and ``age`` of the region. + +Writing ``update_schemes_tried_bytes`` to the relevant ``kdamonds/<N>/state`` +file will only update the ``total_bytes`` file, and will not create the +subdirectories. The directories will be removed when another special keyword, ``clear_schemes_tried_regions``, is written to the relevant diff --git a/Documentation/admin-guide/mm/ksm.rst b/Documentation/admin-guide/mm/ksm.rst index 7626392fe82c..776f244bdae4 100644 --- a/Documentation/admin-guide/mm/ksm.rst +++ b/Documentation/admin-guide/mm/ksm.rst @@ -159,6 +159,8 @@ The effectiveness of KSM and MADV_MERGEABLE is shown in ``/sys/kernel/mm/ksm/``: general_profit how effective is KSM. The calculation is explained below. +pages_scanned + how many pages are being scanned for ksm pages_shared how many shared pages are being used pages_sharing @@ -173,6 +175,13 @@ stable_node_chains the number of KSM pages that hit the ``max_page_sharing`` limit stable_node_dups number of duplicated KSM pages +ksm_zero_pages + how many zero pages that are still mapped into processes were mapped by + KSM when deduplicating. + +When ``use_zero_pages`` is/was enabled, the sum of ``pages_sharing`` + +``ksm_zero_pages`` represents the actual number of pages saved by KSM. +if ``use_zero_pages`` has never been enabled, ``ksm_zero_pages`` is 0. A high ratio of ``pages_sharing`` to ``pages_shared`` indicates good sharing, but a high ratio of ``pages_unshared`` to ``pages_sharing`` @@ -196,21 +205,25 @@ several times, which are unprofitable memory consumed. 1) How to determine whether KSM save memory or consume memory in system-wide range? Here is a simple approximate calculation for reference:: - general_profit =~ pages_sharing * sizeof(page) - (all_rmap_items) * + general_profit =~ ksm_saved_pages * sizeof(page) - (all_rmap_items) * sizeof(rmap_item); - where all_rmap_items can be easily obtained by summing ``pages_sharing``, - ``pages_shared``, ``pages_unshared`` and ``pages_volatile``. + where ksm_saved_pages equals to the sum of ``pages_sharing`` + + ``ksm_zero_pages`` of the system, and all_rmap_items can be easily + obtained by summing ``pages_sharing``, ``pages_shared``, ``pages_unshared`` + and ``pages_volatile``. 2) The KSM profit inner a single process can be similarly obtained by the following approximate calculation:: - process_profit =~ ksm_merging_pages * sizeof(page) - + process_profit =~ ksm_saved_pages * sizeof(page) - ksm_rmap_items * sizeof(rmap_item). - where ksm_merging_pages is shown under the directory ``/proc/<pid>/``, - and ksm_rmap_items is shown in ``/proc/<pid>/ksm_stat``. The process profit - is also shown in ``/proc/<pid>/ksm_stat`` as ksm_process_profit. + where ksm_saved_pages equals to the sum of ``ksm_merging_pages`` and + ``ksm_zero_pages``, both of which are shown under the directory + ``/proc/<pid>/ksm_stat``, and ksm_rmap_items is also shown in + ``/proc/<pid>/ksm_stat``. The process profit is also shown in + ``/proc/<pid>/ksm_stat`` as ksm_process_profit. From the perspective of application, a high ratio of ``ksm_rmap_items`` to ``ksm_merging_pages`` means a bad madvise-applied policy, so developers or diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst b/Documentation/admin-guide/mm/memory-hotplug.rst index 1b02fe5807cc..2994958c7ce8 100644 --- a/Documentation/admin-guide/mm/memory-hotplug.rst +++ b/Documentation/admin-guide/mm/memory-hotplug.rst @@ -433,6 +433,18 @@ The following module parameters are currently defined: memory in a way that huge pages in bigger granularity cannot be formed on hotplugged memory. + + With value "force" it could result in memory + wastage due to memmap size limitations. For + example, if the memmap for a memory block + requires 1 MiB, but the pageblock size is 2 + MiB, 1 MiB of hotplugged memory will be wasted. + Note that there are still cases where the + feature cannot be enforced: for example, if the + memmap is smaller than a single page, or if the + architecture does not support the forced mode + in all configurations. + ``online_policy`` read-write: Set the basic policy used for automatic zone selection when onlining memory blocks without specifying a target zone. @@ -669,7 +681,7 @@ when still encountering permanently unmovable pages within ZONE_MOVABLE (-> BUG), memory offlining will keep retrying until it eventually succeeds. When offlining is triggered from user space, the offlining context can be -terminated by sending a fatal signal. A timeout based offlining can easily be +terminated by sending a signal. A timeout based offlining can easily be implemented via:: % timeout $TIMEOUT offline_block | failure_handling diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/admin-guide/mm/userfaultfd.rst index 7c304e432205..4349a8c2b978 100644 --- a/Documentation/admin-guide/mm/userfaultfd.rst +++ b/Documentation/admin-guide/mm/userfaultfd.rst @@ -244,6 +244,21 @@ write-protected (so future writes will also result in a WP fault). These ioctls support a mode flag (``UFFDIO_COPY_MODE_WP`` or ``UFFDIO_CONTINUE_MODE_WP`` respectively) to configure the mapping this way. +Memory Poisioning Emulation +--------------------------- + +In response to a fault (either missing or minor), an action userspace can +take to "resolve" it is to issue a ``UFFDIO_POISON``. This will cause any +future faulters to either get a SIGBUS, or in KVM's case the guest will +receive an MCE as if there were hardware memory poisoning. + +This is used to emulate hardware memory poisoning. Imagine a VM running on a +machine which experiences a real hardware memory error. Later, we live migrate +the VM to another physical machine. Since we want the migration to be +transparent to the guest, we want that same address range to act as if it was +still poisoned, even though it's on a new physical host which ostensibly +doesn't have a memory error in the exact same spot. + QEMU/KVM ======== diff --git a/Documentation/admin-guide/mm/zswap.rst b/Documentation/admin-guide/mm/zswap.rst index c5c2c7dbb155..45b98390e938 100644 --- a/Documentation/admin-guide/mm/zswap.rst +++ b/Documentation/admin-guide/mm/zswap.rst @@ -49,7 +49,7 @@ compressed pool. Design ====== -Zswap receives pages for compression through the Frontswap API and is able to +Zswap receives pages for compression from the swap subsystem and is able to evict pages from its own compressed pool on an LRU basis and write them back to the backing swap device in the case that the compressed pool is full. @@ -70,19 +70,19 @@ means the compression ratio will always be 2:1 or worse (because of half-full zbud pages). The zsmalloc type zpool has a more complex compressed page storage method, and it can achieve greater storage densities. -When a swap page is passed from frontswap to zswap, zswap maintains a mapping +When a swap page is passed from swapout to zswap, zswap maintains a mapping of the swap entry, a combination of the swap type and swap offset, to the zpool handle that references that compressed swap page. This mapping is achieved with a red-black tree per swap type. The swap offset is the search key for the tree nodes. -During a page fault on a PTE that is a swap entry, frontswap calls the zswap -load function to decompress the page into the page allocated by the page fault -handler. +During a page fault on a PTE that is a swap entry, the swapin code calls the +zswap load function to decompress the page into the page allocated by the page +fault handler. Once there are no PTEs referencing a swap page stored in zswap (i.e. the count -in the swap_map goes to 0) the swap code calls the zswap invalidate function, -via frontswap, to free the compressed entry. +in the swap_map goes to 0) the swap code calls the zswap invalidate function +to free the compressed entry. Zswap seeks to be simple in its policies. Sysfs attributes allow for one user controlled policy: |