aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/sysctl
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/sysctl')
-rw-r--r--Documentation/sysctl/fs.txt60
-rw-r--r--Documentation/sysctl/kernel.txt34
-rw-r--r--Documentation/sysctl/vm.txt44
3 files changed, 109 insertions, 29 deletions
diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt
index 13d6166d7a2..88152f214f4 100644
--- a/Documentation/sysctl/fs.txt
+++ b/Documentation/sysctl/fs.txt
@@ -32,6 +32,8 @@ Currently, these files are in /proc/sys/fs:
- nr_open
- overflowuid
- overflowgid
+- protected_hardlinks
+- protected_symlinks
- suid_dumpable
- super-max
- super-nr
@@ -157,22 +159,68 @@ The default is 65534.
==============================================================
+protected_hardlinks:
+
+A long-standing class of security issues is the hardlink-based
+time-of-check-time-of-use race, most commonly seen in world-writable
+directories like /tmp. The common method of exploitation of this flaw
+is to cross privilege boundaries when following a given hardlink (i.e. a
+root process follows a hardlink created by another user). Additionally,
+on systems without separated partitions, this stops unauthorized users
+from "pinning" vulnerable setuid/setgid files against being upgraded by
+the administrator, or linking to special files.
+
+When set to "0", hardlink creation behavior is unrestricted.
+
+When set to "1" hardlinks cannot be created by users if they do not
+already own the source file, or do not have read/write access to it.
+
+This protection is based on the restrictions in Openwall and grsecurity.
+
+==============================================================
+
+protected_symlinks:
+
+A long-standing class of security issues is the symlink-based
+time-of-check-time-of-use race, most commonly seen in world-writable
+directories like /tmp. The common method of exploitation of this flaw
+is to cross privilege boundaries when following a given symlink (i.e. a
+root process follows a symlink belonging to another user). For a likely
+incomplete list of hundreds of examples across the years, please see:
+http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=/tmp
+
+When set to "0", symlink following behavior is unrestricted.
+
+When set to "1" symlinks are permitted to be followed only when outside
+a sticky world-writable directory, or when the uid of the symlink and
+follower match, or when the directory owner matches the symlink's owner.
+
+This protection is based on the restrictions in Openwall and grsecurity.
+
+==============================================================
+
suid_dumpable:
This value can be used to query and set the core dump mode for setuid
or otherwise protected/tainted binaries. The modes are
0 - (default) - traditional behaviour. Any process which has changed
- privilege levels or is execute only will not be dumped
+ privilege levels or is execute only will not be dumped.
1 - (debug) - all processes dump core when possible. The core dump is
owned by the current user and no security is applied. This is
intended for system debugging situations only. Ptrace is unchecked.
+ This is insecure as it allows regular users to examine the memory
+ contents of privileged processes.
2 - (suidsafe) - any binary which normally would not be dumped is dumped
- readable by root only. This allows the end user to remove
- such a dump but not access it directly. For security reasons
- core dumps in this mode will not overwrite one another or
- other files. This mode is appropriate when administrators are
- attempting to debug problems in a normal environment.
+ anyway, but only if the "core_pattern" kernel sysctl is set to
+ either a pipe handler or a fully qualified path. (For more details
+ on this limitation, see CVE-2006-2451.) This mode is appropriate
+ when administrators are attempting to debug problems in a normal
+ environment, and either have a core dump pipe handler that knows
+ to treat privileged core dumps with care, or specific directory
+ defined for catching core dumps. If a core dump happens without
+ a pipe handler or fully qualifid path, a message will be emitted
+ to syslog warning about the lack of a correct setting.
==============================================================
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index 6d78841fd41..ccd42589e12 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -38,6 +38,7 @@ show up in /proc/sys/kernel:
- l2cr [ PPC only ]
- modprobe ==> Documentation/debugging-modules.txt
- modules_disabled
+- msg_next_id [ sysv ipc ]
- msgmax
- msgmnb
- msgmni
@@ -62,7 +63,9 @@ show up in /proc/sys/kernel:
- rtsig-max
- rtsig-nr
- sem
+- sem_next_id [ sysv ipc ]
- sg-big-buff [ generic SCSI device (sg) ]
+- shm_next_id [ sysv ipc ]
- shm_rmid_forced
- shmall
- shmmax [ sysv ipc ]
@@ -181,6 +184,8 @@ core_pattern is used to specify a core dumpfile pattern name.
%p pid
%u uid
%g gid
+ %d dump mode, matches PR_SET_DUMPABLE and
+ /proc/sys/fs/suid_dumpable
%s signal number
%t UNIX time of dump
%h hostname
@@ -318,6 +323,22 @@ to false.
==============================================================
+msg_next_id, sem_next_id, and shm_next_id:
+
+These three toggles allows to specify desired id for next allocated IPC
+object: message, semaphore or shared memory respectively.
+
+By default they are equal to -1, which means generic allocation logic.
+Possible values to set are in range {0..INT_MAX}.
+
+Notes:
+1) kernel doesn't guarantee, that new object will have desired id. So,
+it's up to userspace, how to handle an object with "wrong" id.
+2) Toggle with non-default value will be set back to -1 by kernel after
+successful IPC object allocation.
+
+==============================================================
+
nmi_watchdog:
Enables/Disables the NMI watchdog on x86 systems. When the value is
@@ -540,6 +561,19 @@ are doing anyway :)
==============================================================
+shmall:
+
+This parameter sets the total amount of shared memory pages that
+can be used system wide. Hence, SHMALL should always be at least
+ceil(shmmax/PAGE_SIZE).
+
+If you are not sure what the default PAGE_SIZE is on your Linux
+system, you can run the following command:
+
+# getconf PAGE_SIZE
+
+==============================================================
+
shmmax:
This value can be used to query and set the run time limit
diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 96f0ee825be..078701fdbd4 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -42,7 +42,6 @@ Currently, these files are in /proc/sys/vm:
- mmap_min_addr
- nr_hugepages
- nr_overcommit_hugepages
-- nr_pdflush_threads
- nr_trim_pages (only if CONFIG_MMU=n)
- numa_zonelist_order
- oom_dump_tasks
@@ -77,8 +76,8 @@ huge pages although processes will also directly compact memory as required.
dirty_background_bytes
-Contains the amount of dirty memory at which the pdflush background writeback
-daemon will start writeback.
+Contains the amount of dirty memory at which the background kernel
+flusher threads will start writeback.
Note: dirty_background_bytes is the counterpart of dirty_background_ratio. Only
one of them may be specified at a time. When one sysctl is written it is
@@ -90,7 +89,7 @@ other appears as 0 when read.
dirty_background_ratio
Contains, as a percentage of total system memory, the number of pages at which
-the pdflush background writeback daemon will start writing out dirty data.
+the background kernel flusher threads will start writing out dirty data.
==============================================================
@@ -113,9 +112,9 @@ retained.
dirty_expire_centisecs
This tunable is used to define when dirty data is old enough to be eligible
-for writeout by the pdflush daemons. It is expressed in 100'ths of a second.
-Data which has been dirty in-memory for longer than this interval will be
-written out next time a pdflush daemon wakes up.
+for writeout by the kernel flusher threads. It is expressed in 100'ths
+of a second. Data which has been dirty in-memory for longer than this
+interval will be written out next time a flusher thread wakes up.
==============================================================
@@ -129,7 +128,7 @@ data.
dirty_writeback_centisecs
-The pdflush writeback daemons will periodically wake up and write `old' data
+The kernel flusher threads will periodically wake up and write `old' data
out to disk. This tunable expresses the interval between those wakeups, in
100'ths of a second.
@@ -426,16 +425,6 @@ See Documentation/vm/hugetlbpage.txt
==============================================================
-nr_pdflush_threads
-
-The current number of pdflush threads. This value is read-only.
-The value changes according to the number of dirty pages in the system.
-
-When necessary, additional pdflush threads are created, one per second, up to
-nr_pdflush_threads_max.
-
-==============================================================
-
nr_trim_pages
This is available only on NOMMU kernels.
@@ -502,9 +491,10 @@ oom_dump_tasks
Enables a system-wide task dump (excluding kernel threads) to be
produced when the kernel performs an OOM-killing and includes such
-information as pid, uid, tgid, vm size, rss, cpu, oom_adj score, and
-name. This is helpful to determine why the OOM killer was invoked
-and to identify the rogue task that caused it.
+information as pid, uid, tgid, vm size, rss, nr_ptes, swapents,
+oom_score_adj score, and name. This is helpful to determine why the
+OOM killer was invoked, to identify the rogue task that caused it,
+and to determine why the OOM killer chose the task it did to kill.
If this is set to zero, this information is suppressed. On very
large systems with thousands of tasks it may not be feasible to dump
@@ -574,16 +564,24 @@ of physical RAM. See above.
page-cluster
-page-cluster controls the number of pages which are written to swap in
-a single attempt. The swap I/O size.
+page-cluster controls the number of pages up to which consecutive pages
+are read in from swap in a single attempt. This is the swap counterpart
+to page cache readahead.
+The mentioned consecutivity is not in terms of virtual/physical addresses,
+but consecutive on swap space - that means they were swapped out together.
It is a logarithmic value - setting it to zero means "1 page", setting
it to 1 means "2 pages", setting it to 2 means "4 pages", etc.
+Zero disables swap readahead completely.
The default value is three (eight pages at a time). There may be some
small benefits in tuning this to a different value if your workload is
swap-intensive.
+Lower values mean lower latencies for initial faults, but at the same time
+extra faults and I/O delays for following faults if they would have been part of
+that consecutive pages readahead would have brought in.
+
=============================================================
panic_on_oom