Difference between revisions of "Troubleshooting (Ubuntu)"
Jump to navigation
Jump to search
(Added "Unable to Mount CD-ROM") |
(Added "High System Load") |
||
Line 1: | Line 1: | ||
== High System Load == | |||
The system load is normally represented by the load average over the last 1, 5 and 15 minutes. | |||
For example, the <code>uptime</code> command gives a single line summary of system uptime and recent load | |||
<pre> | |||
user@server:~$ uptime | |||
14:28:49 up 9 days, 22:41, 1 user, load average: 0.34, 0.36, 0.32 | |||
</pre> | |||
So in the above, as of 14:28:49 hrs the server has been up for 9 days 22 hours odd, has 1 user logged in, and the system load averages for the past 1, 5, and 15 minutes are shown. | |||
The load average for a given period indicates how many processes were running or in a uninterruptable (waiting for IO) state. What's bad depends on your system, for a single CPU system a load average greater than 1 could be considered bad as there are more processes running than CPU's to service them. | |||
=== <code>top</code> === | |||
The <code>top</code> command allows some basic insight into the system's performance, and is akin to the Task Manager in Windows. | |||
<pre> | |||
user@server:~$ top | |||
top - 14:32:09 up 9 days, 22:44, 1 user, load average: 0.70, 0.44, 0.34 | |||
Tasks: 137 total, 1 running, 136 sleeping, 0 stopped, 0 zombie | |||
Cpu(s): 93.8%us, 6.2%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st | |||
Mem: 1023360k total, 950520k used, 72840k free, 10836k buffers | |||
Swap: 1757176k total, 1110228k used, 646948k free, 135524k cached | |||
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND | |||
6608 zimbra 20 0 556m 69m 12m S 69.1 6.9 0:03.26 java | |||
17284 zimbra 20 0 649m 101m 3604 S 4.6 10.1 31:34.74 java | |||
2610 zimbra 20 0 976m 181m 3700 S 0.7 18.1 133:06.68 java | |||
1 root 20 0 23580 1088 732 S 0.0 0.1 0:04.70 init | |||
2 root 20 0 0 0 0 S 0.0 0.0 0:00.01 kthreadd | |||
3 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0 | |||
.... | |||
</pre> | |||
Note that CPU metrics are with respect to 1 CPU, so on a multiple CPU system, seeing values > 100% is valid. | |||
{|class="vwikitable" | |||
|+ Overview of CPU Metrics, % over time | |||
! Code !! Name !! Description | |||
|- | |||
| <code>us</code> || User CPU || % of CPU time spent servicing user processes (excluding nice) | |||
|- | |||
| <code>sy</code> || System CPU || % of CPU time spent servicing kernel processes | |||
|- | |||
| <code>ni</code> || Nice CPU || % of CPU time spent servicing user nice processes (nice reduces the priority of process) | |||
|- | |||
| <code>id</code> || Idle CPU || % of CPU time spent idling (doing nothing) | |||
|- | |||
| <code>wa</code> || IO Wait || % of CPU time spent waiting for IO (high indicates disk/network bottleneck) | |||
|- | |||
| <code>ha</code> || Hardware Interrupts || % of CPU time spent servicing hardware interrupts | |||
|- | |||
| <code>si</code> || Software Interrupts || % of CPU time spent servicing hardware interrupts | |||
|- | |||
| <code>st</code> || Steal || % of CPU time stolen to service virtual machines | |||
|} | |||
{|class="vwikitable" | |||
|+ Task column heading descriptions (to change what columns are shown press <code>f</code>) | |||
! Key !! Display !! Name !! Description | |||
|- | |||
| <code>a</code> || <code>PID</code> || Process ID || Task/process identifier | |||
|- | |||
| <code>b</code> || <code>PPID</code> || Parent PID || Task/process identifier of processes parent (ie the process that launched this process) | |||
|- | |||
| <code>c</code> || <code>RUSER</code> || Real User Name || Real username of task's owner | |||
|- | |||
| <code>d</code> || <code>UID</code> || User ID || User ID of task's owner | |||
|- | |||
| <code>e</code> || <code>USER</code> || User Name || Username ID of task's owner | |||
|- | |||
| <code>f</code> || <code>GROUP</code> || Group Name || Group name of task's owner | |||
|- | |||
| <code>g</code> || <code>TTY</code> || Controlling TTY || Device that started the process | |||
|- | |||
| <code>h</code> || <code>PR</code> || Priority || The task's priority | |||
|- | |||
| <code>i</code> || <code>NI</code> || Nice value || Adjusted task priority. From -20 meaning high priorty, through 0 meaning unadjusted, to 19 meaning low priority | |||
|- | |||
| <code>j</code> || <code>P</code> || Last Used CPU || ID of the CPU last used by the task | |||
|- | |||
| <code>k</code> || <code>%CPU</code> || CPU Usage || Task's usage of CPU | |||
|- | |||
| <code>l</code> || <code>TIME</code> || CPU Time || Total CPU time used by the task | |||
|- | |||
| <code>m</code> || <code>TIME+</code> || CPU Time, hundredths || Total CPU time used by the task in sub-second accuracy | |||
|- | |||
| <code>n</code> || <code>%MEM</code> || Memory usage (RES) || Task's usage of available physical memory | |||
|- | |||
| <code>o</code> || <code>VIRT</code> || Virtual Image (kb) || Task's allocation of virtual memory | |||
|- | |||
| <code>p</code> || <code>SWAP</code> || Swapped size (kb) || Task's swapped memory (resident in swap-file) | |||
|- | |||
| <code>q</code> || <code>RES</code> || Resident size (kb) || Task's unswapped memory (resident in physical memory) | |||
|- | |||
| <code>r</code> || <code>CODE</code> || Code size (kb) || Task's virtual memory used for executable code | |||
|- | |||
| <code>s</code> || <code>DATA</code> || Data+Stack size (kb) || Task's virtual memory not used for executable code | |||
|- | |||
| <code>t</code> || <code>SHR</code> || Shared Mem size (kb) || Task's shared memory | |||
|- | |||
| <code>u</code> || <code>nFLT</code> || Page Fault count || Major/Hard page faults that have occured for the task | |||
|- | |||
| <code>v</code> || <code>nDRT</code> || Dirty Pages count || Tasks memory pages that have been modified since last write to disk, and so can be readily freed from physical memory | |||
|- | |||
| <code>w</code> || <code>S</code> || Process Status || | |||
* D - Uninterruptible sleep | |||
* R - Running | |||
* S - Sleeping | |||
* T - Traced or Stopped | |||
* Z - Zombie | |||
|- | |||
| <code>x</code> || <code>Command</code> || Command Line || Command used to start task | |||
|- | |||
| <code>y</code> || <code>WCHAN</code> || Sleeping in Function || Name (or address) of function that the task is sleeping in | |||
|- | |||
| <code>z</code> || <code>Flags</code> || Taks Flags || Task's scheduling flags | |||
|} | |||
=== Identify Process Causing High System Load === | |||
If the high load is constant, just fire up <code>top</code> and see if there is a specific process to blame, or if your stuck waiting for disk or network IO. | |||
If the high load is transient but repetitive, then you'll need to capture the output of <code>top</code> at the right time, the following script will create a log of <code>top</code> output during periods of high load | |||
<source lang="bash">#!/bin/bash | |||
# | |||
# During high load, write output form top to file. | |||
# | |||
# Simon Strutt - July 2012 | |||
LOGFILE="load_log.txt" | |||
MAXLOAD=100 # Multiple by 100 as if comparison can only handle integers | |||
LOAD=`cut -d ' ' -f 1 /proc/loadavg` | |||
LOAD=`echo $LOAD '*100' | bc -l | awk -F '.' '{ print $1; exit; }'` # Convert load to x100 integer | |||
if [ $LOAD -gt $MAXLOAD ]; then | |||
echo `date '+%Y-%m-%d %H:%M:%S'`>> ${LOGFILE} | |||
top -b -n 1 >> ${LOGFILE} | |||
fi</source> | |||
Schedule with something like... | |||
<pre>crontab -e | |||
1 * * * * /bin/bash /home/simons/load_log</pre> | |||
== Network == | == Network == | ||
=== No NIC === | === No NIC === | ||
Line 63: | Line 210: | ||
[[Category:Ubuntu]] | [[Category:Ubuntu]] | ||
[[Category:Troubleshooting]] | [[Category:Troubleshooting]] | ||
[[Category:Bash]] |