Linux Kernel Settings with Examples from Hadoop

Tech Articles
Linux Kernel Article p. 2
This article does not assume you will be using Hadoop
Linux Kernel Settings
with Hadoop as an Example

Introduction
Linux kernel settings may be adjusted when tuning a Linux server. Note that these changes should be tested and not just put into production. This article:
  • shows  how to determine current settings
  • shows  how to update the settings
  • gives some examples
    • these examples are settings commonly changed when tuning Hadoop

sysctl command
Linux Kernel settings are usually set with the /sbin/sysctl command.

The current values for the settings that can be changed by sysctl are found in /proc/sys directory. However, these settings are usually displayed by:
sysctl –a

The names for these settings follow the path under /proc/sys with . (period) being substituted for the /

For example:
vm.swappiness = 0

Would be found under proc by looking at this file:
/proc/sys/vm/swappiness
That file would only contain the value
0

Of course, this value can be changed by overwriting the file with a new value:
echo 1 > /proc/sys/vm/swappiness

The preferable method is to update sysctl’s configuration file:
/etc/sysctl.conf

Once you have updated the setting, you can reboot. More likely, you will want to run this command:
sysctl -p

This will update the server’s settings based on the values in /etc/sysctl.conf and then display the values.


Hadoop Kernel settings: VM
The following virtual memory kernel settings come up frequently in articles about tuning Hadoop (including O’Reilly’ Hadoop Operations).

vm.swappiness
This will tell the kernel to not swap application data (whenever possible).

Set this to 0 for Hadoop.

Explanation: higher values for this kernel setting indicates that the kernel should be more aggressive in swapping application data to disk. Lower values defer swapping of application data to disk at the expense of forcing file system buffers to be discarded. The valid range is 0 to 100, with most linux kernels set near the middle of that range.

If Hadoop data is swapped to disk while other I/O processes are ongoing, Hadoop operations may time out. This in turn may cause Hadoop to appear to have failed. These false alarms will impact fault tolerant configurations.

vm.overcommit_memory
Applications often allocate more memory than needed. This setting defines the conditions that determine whether a large memory request is accepted or denied. There are three possible values for this parameter:
  • 0 — The default setting. This setting can sometimes allow available memory on the system to be overloaded.
  • 1 — The kernel performs no memory overcommit handling. Under this setting, the potential for memory overload is increased, but so is performance for memory-intensive tasks.
  •  2 — The kernel denies requests for memory equal to or larger than the sum of total available swap and the percentage of physical RAM specified in overcommit_ratio. This setting is best if you want a lesser risk of memory overcommitment.

Hadoop uses Java and Java is typically configured with MAXHEAPSIZE per service. This Java setting, rather than swapping, should be used to manage Hadoop’s memory usage.

When this switch is set, the Linux OS knows that enough memory is always available to backup the virtual pages. To configure this at runtime:
sysctl -w vm.overcommit_memory = 1
sysctl -w vm.overcommit_ratio = 50   


Click this for page 2: further Hadoop-related network-specific kernel settings.

Suggestions for Future Learning

Red Hat:

Documentation about the proc file system is installed on the system by default in /usr/src/linux-/
for example:
/usr/src/linux-2.4/

Some of the most authoritative information on the /proc/ directory can be found by reading the kernel source code. Make sure the kernel-source RPM is installed on the system and look in the
/usr/src/linux-/
directory for the source code.

Some specific documents:
 /usr/src/linux-/Documentation/filesystems/proc.txt — Contains assorted, but limited, information about all aspects of the /proc/directory

 /usr/src/linux-/Documentation/sysrq.txt — An overview of System Request Key options.

 /usr/src/linux-/Documentation/sysctl/ — A directory containing a variety of sysctl tips, including modifying values that concern the kernel (kernel.txt), accessing file systems (fs.txt), and virtual memory use (vm.txt).

/usr/src/linux-/Documentation/networking/ip-sysctl.txt — A detailed overview of IP networking options.


Tech Articles
Linux Kernel Article p. 2