|
|
(10 intermediate revisions by the same user not shown) |
Line 1: |
Line 1: |
| == High System Load ==
| | '''For performance problems related load, see [[High_System_Load_(Ubuntu)|High System Load]]''' |
| The system load is normally represented by the load average over the last 1, 5 and 15 minutes.
| |
| | |
| For example, the <code>uptime</code> command gives a single line summary of system uptime and recent load
| |
| | |
| <pre>
| |
| user@server:~$ uptime
| |
| 14:28:49 up 9 days, 22:41, 1 user, load average: 0.34, 0.36, 0.32
| |
| </pre>
| |
| | |
| So in the above, as of 14:28:49 hrs the server has been up for 9 days 22 hours odd, has 1 user logged in, and the system load averages for the past 1, 5, and 15 minutes are shown.
| |
| | |
| The load average for a given period indicates how many processes were running or in a uninterruptable (waiting for IO) state. What's bad depends on your system, for a single CPU system a load average greater than 1 could be considered bad as there are more processes running than CPU's to service them.
| |
| | |
| === <code>top</code> ===
| |
| The <code>top</code> command allows some basic insight into the system's performance, and is akin to the Task Manager in Windows.
| |
| | |
| <pre>
| |
| user@server:~$ top
| |
| top - 14:32:09 up 9 days, 22:44, 1 user, load average: 0.70, 0.44, 0.34
| |
| Tasks: 137 total, 1 running, 136 sleeping, 0 stopped, 0 zombie
| |
| Cpu(s): 93.8%us, 6.2%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
| |
| Mem: 1023360k total, 950520k used, 72840k free, 10836k buffers
| |
| Swap: 1757176k total, 1110228k used, 646948k free, 135524k cached
| |
| | |
| PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
| |
| 6608 zimbra 20 0 556m 69m 12m S 69.1 6.9 0:03.26 java
| |
| 17284 zimbra 20 0 649m 101m 3604 S 4.6 10.1 31:34.74 java
| |
| 2610 zimbra 20 0 976m 181m 3700 S 0.7 18.1 133:06.68 java
| |
| 1 root 20 0 23580 1088 732 S 0.0 0.1 0:04.70 init
| |
| 2 root 20 0 0 0 0 S 0.0 0.0 0:00.01 kthreadd
| |
| 3 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0
| |
| ....
| |
| </pre>
| |
| | |
| Note that CPU metrics are with respect to 1 CPU, so on a multiple CPU system, seeing values > 100% is valid.
| |
| | |
| {|class="vwikitable"
| |
| |+ Overview of CPU Metrics, % over time
| |
| ! Code !! Name !! Description
| |
| |-
| |
| | <code>us</code> || User CPU || % of CPU time spent servicing user processes (excluding nice)
| |
| |-
| |
| | <code>sy</code> || System CPU || % of CPU time spent servicing kernel processes
| |
| |-
| |
| | <code>ni</code> || Nice CPU || % of CPU time spent servicing user nice processes (nice reduces the priority of process)
| |
| |-
| |
| | <code>id</code> || Idle CPU || % of CPU time spent idling (doing nothing)
| |
| |-
| |
| | <code>wa</code> || IO Wait || % of CPU time spent waiting for IO (high indicates disk/network bottleneck)
| |
| |-
| |
| | <code>ha</code> || Hardware Interrupts || % of CPU time spent servicing hardware interrupts
| |
| |-
| |
| | <code>si</code> || Software Interrupts || % of CPU time spent servicing hardware interrupts
| |
| |-
| |
| | <code>st</code> || Steal || % of CPU time stolen to service virtual machines
| |
| |}
| |
| | |
| {|class="vwikitable"
| |
| |+ Task column heading descriptions (to change what columns are shown press <code>f</code>)
| |
| ! Key !! Display !! Name !! Description
| |
| |-
| |
| | <code>a</code> || <code>PID</code> || Process ID || Task/process identifier
| |
| |-
| |
| | <code>b</code> || <code>PPID</code> || Parent PID || Task/process identifier of processes parent (ie the process that launched this process)
| |
| |- | |
| | <code>c</code> || <code>RUSER</code> || Real User Name || Real username of task's owner
| |
| |-
| |
| | <code>d</code> || <code>UID</code> || User ID || User ID of task's owner
| |
| |-
| |
| | <code>e</code> || <code>USER</code> || User Name || Username ID of task's owner
| |
| |-
| |
| | <code>f</code> || <code>GROUP</code> || Group Name || Group name of task's owner
| |
| |-
| |
| | <code>g</code> || <code>TTY</code> || Controlling TTY || Device that started the process
| |
| |-
| |
| | <code>h</code> || <code>PR</code> || Priority || The task's priority
| |
| |-
| |
| | <code>i</code> || <code>NI</code> || Nice value || Adjusted task priority. From -20 meaning high priorty, through 0 meaning unadjusted, to 19 meaning low priority
| |
| |-
| |
| | <code>j</code> || <code>P</code> || Last Used CPU || ID of the CPU last used by the task
| |
| |-
| |
| | <code>k</code> || <code>%CPU</code> || CPU Usage || Task's usage of CPU
| |
| |-
| |
| | <code>l</code> || <code>TIME</code> || CPU Time || Total CPU time used by the task
| |
| |-
| |
| | <code>m</code> || <code>TIME+</code> || CPU Time, hundredths || Total CPU time used by the task in sub-second accuracy
| |
| |-
| |
| | <code>n</code> || <code>%MEM</code> || Memory usage (RES) || Task's usage of available physical memory
| |
| |-
| |
| | <code>o</code> || <code>VIRT</code> || Virtual Image (kb) || Task's allocation of virtual memory
| |
| |-
| |
| | <code>p</code> || <code>SWAP</code> || Swapped size (kb) || Task's swapped memory (resident in swap-file)
| |
| |-
| |
| | <code>q</code> || <code>RES</code> || Resident size (kb) || Task's unswapped memory (resident in physical memory)
| |
| |-
| |
| | <code>r</code> || <code>CODE</code> || Code size (kb) || Task's virtual memory used for executable code
| |
| |-
| |
| | <code>s</code> || <code>DATA</code> || Data+Stack size (kb) || Task's virtual memory not used for executable code
| |
| |-
| |
| | <code>t</code> || <code>SHR</code> || Shared Mem size (kb) || Task's shared memory
| |
| |-
| |
| | <code>u</code> || <code>nFLT</code> || Page Fault count || Major/Hard page faults that have occured for the task
| |
| |-
| |
| | <code>v</code> || <code>nDRT</code> || Dirty Pages count || Tasks memory pages that have been modified since last write to disk, and so can be readily freed from physical memory
| |
| |-
| |
| | <code>w</code> || <code>S</code> || Process Status ||
| |
| * D - Uninterruptible sleep
| |
| * R - Running
| |
| * S - Sleeping
| |
| * T - Traced or Stopped
| |
| * Z - Zombie
| |
| |-
| |
| | <code>x</code> || <code>Command</code> || Command Line || Command used to start task
| |
| |-
| |
| | <code>y</code> || <code>WCHAN</code> || Sleeping in Function || Name (or address) of function that the task is sleeping in
| |
| |-
| |
| | <code>z</code> || <code>Flags</code> || Taks Flags || Task's scheduling flags
| |
| |}
| |
| | |
| | |
| === Identify Process Causing High System Load ===
| |
| If the high load is constant, just fire up <code>top</code> and see if there is a specific process to blame, or if your stuck waiting for disk or network IO.
| |
| | |
| If the high load is transient but repetitive, then you'll need to capture the output of <code>top</code> at the right time, the following script will create a log of <code>top</code> output during periods of high load
| |
| | |
| <source lang="bash">#!/bin/bash
| |
| #
| |
| # During high load, write output form top to file.
| |
| #
| |
| # Simon Strutt - July 2012
| |
| | |
| LOGFILE="/home/user/load_log.txt" # Update to a valid folder path
| |
| MAXLOAD=100 # Multiple by 100 as 'if' comparison can only handle integers
| |
| | |
| LOAD=`cut -d ' ' -f 1 /proc/loadavg`
| |
| LOAD=`echo $LOAD '*100' | bc -l | awk -F '.' '{ print $1; exit; }'` # Convert load to x100 integer
| |
| | |
| if [ $LOAD -gt $MAXLOAD ]; then
| |
| echo `date '+%Y-%m-%d %H:%M:%S'`>> ${LOGFILE}
| |
| top -b -n 1 >> ${LOGFILE}
| |
| fi</source>
| |
| | |
| Schedule with something like (update with correct path to <code>load_log</script>...
| |
| <pre>crontab -e
| |
| 1 * * * * /bin/bash /home/user/load_log</pre>
| |
|
| |
|
| == Network == | | == Network == |
Line 153: |
Line 8: |
| # Use <code> dmesg | grep -i eth </code> to ascertain what's been detected at boot time | | # Use <code> dmesg | grep -i eth </code> to ascertain what's been detected at boot time |
| # Assuming it states that say <code>eth0</code> has been changed to <code>eth1</code> then just update the <code>/etc/network/interfaces</code> file | | # Assuming it states that say <code>eth0</code> has been changed to <code>eth1</code> then just update the <code>/etc/network/interfaces</code> file |
| | # Alternatively, force the ''new'' NIC to be <code>eth0</code> by editing the <code>/etc/udev/rules.d/70-persistent-net.rules</code> file |
| | #* You'll need to reboot the server for changes to take effect |
|
| |
|
| == File System == | | == File System == |
Line 188: |
Line 45: |
| # The arrays should now be being sync'ed, check progress by monitoring <code>/proc/mdstat</code> | | # The arrays should now be being sync'ed, check progress by monitoring <code>/proc/mdstat</code> |
| #* <code> more /proc/mdstat </code> | | #* <code> more /proc/mdstat </code> |
| | |
| | === Recover Deleted Files === |
| | Ideally you should recover files to a seperate disk partition to the one you are attempting to recover from. This procedure should help to recover lost or corrupted files from a filesystem using [http://manpages.ubuntu.com/manpages/lucid/man1/scalpel.1.html Scalpel], a data recovery utility built on the foundation of [http://foremost.sourceforge.net/ Foremost] |
| | |
| | # Install Scalpel |
| | #* <code> apt-get install scalpel </code> |
| | # Update the config file to search for the lost files (uncomment/add as neccessary) |
| | #* <code> /etc/scalpel/scalpel.conf </code> |
| | #* For PHP files (not embedded in HTML) use <code> php n 50000 <?php ?> </code> |
| | # Create a folder for the recovered files to go to |
| | #* <code> mkdir /tmp/recov </code> |
| | # Launch Scalpel to trawl the disk image (will takes ages, and source disk will be under high load) |
| | #* <code> scalpel /dev/mapper/svr-root -o /tmp/recov/ </code> |
| | # Search through recovered files to find the data of interest |
| | #* <code> grep -R "string you want to find" /tmp/recov/* </code> |
|
| |
|
| == SSH == | | == SSH == |
Line 200: |
Line 72: |
| * '''The following packages have been kept back''' | | * '''The following packages have been kept back''' |
| ** Package manager can hold back updates because they will cause conflicts, or sometimes because they're major kernel updates. Running <code>aptitude safe-upgrade</code> normally seems to force kernel updates through. | | ** Package manager can hold back updates because they will cause conflicts, or sometimes because they're major kernel updates. Running <code>aptitude safe-upgrade</code> normally seems to force kernel updates through. |
| | |
| | === Add EOL Repository === |
| | Once a version of Ubuntu has gone End Of Line (EOL), you can't install software packages using the normal repository. On trying you'll get an error similar to |
| | * <code>Failed to fetch http://gb.archive.ubuntu.com/ubuntu/pool/main/s/<package> 404 Not Found</code> |
| | |
| | The repository is still available, but via a different URL - http://old-releases.ubuntu.com |
| | |
| | Edit <code>/etc/apt/sources.list</code> and add the following (replace hardy with your flavour of Ubuntu). Remove the existing ubuntu repositories (they'll just cause errors as they're inaccessible) |
| | |
| | <pre> |
| | # Hardy EOL |
| | # Required |
| | deb http://old-releases.ubuntu.com/ubuntu/ hardy main restricted universe multiverse |
| | deb http://old-releases.ubuntu.com/ubuntu/ hardy-updates main restricted universe multiverse |
| | deb http://old-releases.ubuntu.com/ubuntu/ hardy-security main restricted universe multiverse |
| | |
| | # Optional |
| | #deb http://old-releases.ubuntu.com/ubuntu/ hardy-backports main restricted universe multiverse |
| | </pre> |
|
| |
|
| == Reboot Required? == | | == Reboot Required? == |
Line 207: |
Line 98: |
| To see which packages caused this to be set, inspect the contents of... | | To see which packages caused this to be set, inspect the contents of... |
| /var/run/reboot-required.pkgs | | /var/run/reboot-required.pkgs |
| | |
| | == Firewall == |
| | === ERROR: problem running ufw-init === |
| | If on starting or reloading <code>ufw</code> you receive this error, its likely that you have a configuration problem. This is especially likely if you've needed to edit <code>ufw</code>'s config files directly. |
| | |
| | # Ensure that <code>ufw</code> is running |
| | #* <code> ufw enable </code> |
| | # Force the config to be reloaded |
| | #* <code> /lib/ufw/ufw-init force-reload </code> |
| | # Or if <code>ufw</code> failed to start use |
| | #* <code> /lib/ufw/ufw-init start </code> |
| | |
| | Doing the above should trigger the error, and present a better description of what the problem is |
| | |
| | See http://ubuntuforums.org/showthread.php?t=1660916 for further info |
| | |
|
| |
|
| [[Category:Ubuntu]] | | [[Category:Ubuntu]] |
| [[Category:Troubleshooting]] | | [[Category:Troubleshooting]] |
| [[Category:Bash]] | | [[Category:Bash]] |