Difference between revisions of "Troubleshooting (Ubuntu)"

Jump to navigation Jump to search
Added "High System Load"
(Added "Unable to Mount CD-ROM")
(Added "High System Load")
Line 1: Line 1:
== High System Load ==
The system load is normally represented by the load average over the last 1, 5 and 15 minutes.
For example, the <code>uptime</code> command gives a single line summary of system uptime and recent load
<pre>
user@server:~$ uptime
14:28:49 up 9 days, 22:41,  1 user,  load average: 0.34, 0.36, 0.32
</pre>
So in the above, as of 14:28:49 hrs the server has been up for 9 days 22 hours odd, has 1 user logged in, and the system load averages for the past 1, 5, and 15 minutes are shown.
The load average for a given period indicates how many processes were running or in a uninterruptable (waiting for IO) state.  What's bad depends on your system, for a single CPU system a load average greater than 1 could be considered bad as there are more processes running than CPU's to service them.
=== <code>top</code> ===
The <code>top</code> command allows some basic insight into the system's performance, and is akin to the Task Manager in Windows.
<pre>
user@server:~$ top
top - 14:32:09 up 9 days, 22:44,  1 user,  load average: 0.70, 0.44, 0.34
Tasks: 137 total,  1 running, 136 sleeping,  0 stopped,  0 zombie
Cpu(s): 93.8%us,  6.2%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  1023360k total,  950520k used,    72840k free,    10836k buffers
Swap:  1757176k total,  1110228k used,  646948k free,  135524k cached
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
6608 zimbra    20  0  556m  69m  12m S 69.1  6.9  0:03.26 java
17284 zimbra    20  0  649m 101m 3604 S  4.6 10.1  31:34.74 java
2610 zimbra    20  0  976m 181m 3700 S  0.7 18.1 133:06.68 java
    1 root      20  0 23580 1088  732 S  0.0  0.1  0:04.70 init
    2 root      20  0    0    0    0 S  0.0  0.0  0:00.01 kthreadd
    3 root      RT  0    0    0    0 S  0.0  0.0  0:00.00 migration/0
....
</pre>
Note that CPU metrics are with respect to 1 CPU, so on a multiple CPU system, seeing values > 100% is valid.
{|class="vwikitable"
|+ Overview of CPU Metrics, % over time
! Code  !! Name !! Description
|-
| <code>us</code> || User CPU || % of CPU time spent servicing user processes (excluding nice)
|-
| <code>sy</code> || System CPU || % of CPU time spent servicing kernel processes
|-
| <code>ni</code> || Nice CPU || % of CPU time spent servicing user nice processes (nice reduces the priority of process)
|-
| <code>id</code> || Idle CPU || % of CPU time spent idling (doing nothing)
|-
| <code>wa</code> || IO Wait || % of CPU time spent waiting for IO (high indicates disk/network bottleneck)
|-
| <code>ha</code> || Hardware Interrupts || % of CPU time spent servicing hardware interrupts
|-
| <code>si</code> || Software Interrupts || % of CPU time spent servicing hardware interrupts
|-
| <code>st</code> || Steal || % of CPU time stolen to service virtual machines
|}
{|class="vwikitable"
|+ Task column heading descriptions (to change what columns are shown press <code>f</code>)
! Key !! Display  !! Name !! Description
|-
| <code>a</code> || <code>PID</code> || Process ID || Task/process identifier
|-
| <code>b</code> || <code>PPID</code> || Parent PID || Task/process identifier of processes parent (ie the process that launched this process)
|-
| <code>c</code> || <code>RUSER</code> || Real User Name || Real username of task's owner
|-
| <code>d</code> || <code>UID</code> || User ID || User ID of task's owner
|-
| <code>e</code> || <code>USER</code> || User Name || Username ID of task's owner
|-
| <code>f</code> || <code>GROUP</code> || Group Name || Group name of task's owner
|-
| <code>g</code> || <code>TTY</code> || Controlling TTY || Device that started the process
|-
| <code>h</code> || <code>PR</code> || Priority || The task's priority
|-
| <code>i</code> || <code>NI</code> || Nice value || Adjusted task priority. From -20 meaning high priorty, through 0 meaning unadjusted, to 19 meaning low priority
|-
| <code>j</code> || <code>P</code> || Last Used CPU || ID of the CPU last used by the task
|-
| <code>k</code> || <code>%CPU</code> || CPU Usage || Task's usage of CPU
|-
| <code>l</code> || <code>TIME</code> || CPU Time || Total CPU time used by the task
|-
| <code>m</code> || <code>TIME+</code> || CPU Time, hundredths || Total CPU time used by the task in sub-second accuracy
|-
| <code>n</code> || <code>%MEM</code> || Memory usage (RES) || Task's usage of available physical memory
|-
| <code>o</code> || <code>VIRT</code> || Virtual Image (kb) || Task's allocation of virtual memory
|-
| <code>p</code> || <code>SWAP</code> || Swapped size (kb) || Task's swapped memory (resident in swap-file)
|-
| <code>q</code> || <code>RES</code> || Resident size (kb) || Task's unswapped memory (resident in physical memory)
|-
| <code>r</code> || <code>CODE</code> || Code size (kb) || Task's virtual memory used for executable code
|-
| <code>s</code> || <code>DATA</code> || Data+Stack size (kb) || Task's virtual memory not used for executable code
|-
| <code>t</code> || <code>SHR</code> || Shared Mem size (kb) || Task's shared memory
|-
| <code>u</code> || <code>nFLT</code> || Page Fault count || Major/Hard page faults that have occured for the task 
|-
| <code>v</code> || <code>nDRT</code> || Dirty Pages count || Tasks memory pages that have been modified since last write to disk, and so can be readily freed from physical memory
|-
| <code>w</code> || <code>S</code> || Process Status ||
* D - Uninterruptible sleep
* R - Running
* S - Sleeping
* T - Traced or Stopped
* Z - Zombie
|-
| <code>x</code> || <code>Command</code> || Command Line || Command used to start task
|-
| <code>y</code> || <code>WCHAN</code> || Sleeping in Function || Name (or address) of function that the task is sleeping in
|-
| <code>z</code> || <code>Flags</code> || Taks Flags || Task's scheduling flags
|}
=== Identify Process Causing High System Load ===
If the high load is constant, just fire up <code>top</code> and see if there is a specific process to blame, or if your stuck waiting for disk or network IO.
If the high load is transient but repetitive, then you'll need to capture the output of <code>top</code> at the right time, the following script will create a log of <code>top</code> output during periods of high load
<source lang="bash">#!/bin/bash
#
# During high load, write output form top to file.
#
# Simon Strutt - July 2012
LOGFILE="load_log.txt"
MAXLOAD=100                    # Multiple by 100 as if comparison can only handle integers
LOAD=`cut -d ' ' -f 1 /proc/loadavg`
LOAD=`echo $LOAD '*100' | bc -l | awk -F '.' '{ print $1; exit; }'`    # Convert load to x100 integer
if [ $LOAD -gt $MAXLOAD ]; then
        echo `date '+%Y-%m-%d %H:%M:%S'`>> ${LOGFILE}
        top -b -n 1 >> ${LOGFILE}
fi</source>
Schedule with something like...
<pre>crontab -e
1 * * * * /bin/bash  /home/simons/load_log</pre>
== Network ==
== Network ==
=== No NIC ===
=== No NIC ===
Line 63: Line 210:
[[Category:Ubuntu]]
[[Category:Ubuntu]]
[[Category:Troubleshooting]]
[[Category:Troubleshooting]]
[[Category:Bash]]

Navigation menu