平台:sunos 5.8 generic_108528-23 sun4u sparc sunw,ultra-enterprise
数据库:8.1.5.0.0
症状:响应缓慢,应用请求已经无法返回
登陆数据库,发现redo日志组除current外都处于active状态
oracle:/oracle/oracle8>sqlplus "/ as sysdba"
sql*plus: release 8.1.5.0.0 - production on thu jun 23 18:56:06 2005
(c) copyright 1999 oracle corporation. all rights reserved.
connected to:
oracle8i enterprise edition release 8.1.5.0.0 - production
with the partitioning and java options
pl/sql release 8.1.5.0.0 - production
sql> select * from v$log;
group# thread# sequence# bytes members arc status first_change# first_tim
---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- ---------
1 1 520403 31457280 1 no active 1.3861e+10 23-jun-05
2 1 520404 31457280 1 no active 1.3861e+10 23-jun-05
3 1 520405 31457280 1 no active 1.3861e+10 23-jun-05
4 1 520406 31457280 1 no current 1.3861e+10 23-jun-05
5 1 520398 31457280 1 no active 1.3860e+10 23-jun-05
6 1 520399 31457280 1 no active 1.3860e+10 23-jun-05
7 1 520400 104857600 1 no active 1.3860e+10 23-jun-05
8 1 520401 104857600 1 no active 1.3860e+10 23-jun-05
9 1 520402 104857600 1 no active 1.3861e+10 23-jun-05
9 rows selected.
sql> /
group# thread# sequence# bytes members arc status first_change# first_tim
---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- ---------
1 1 520403 31457280 1 no active 1.3861e+10 23-jun-05
2 1 520404 31457280 1 no active 1.3861e+10 23-jun-05
3 1 520405 31457280 1 no active 1.3861e+10 23-jun-05
4 1 520406 31457280 1 no current 1.3861e+10 23-jun-05
5 1 520398 31457280 1 no active 1.3860e+10 23-jun-05
6 1 520399 31457280 1 no active 1.3860e+10 23-jun-05
7 1 520400 104857600 1 no active 1.3860e+10 23-jun-05
8 1 520401 104857600 1 no active 1.3860e+10 23-jun-05
9 1 520402 104857600 1 no active 1.3861e+10 23-jun-05
9 rows selected.
如果日志都处于active状态,那么显然dbwr的写已经无法跟上log switch触发的检查点。
接下来让我们检查一下dbwr的繁忙程度:
sql> !
oracle:/oracle/oracle8>ps -efgrep ora_
oracle 2273 1 0 mar 31 ? 57:40 ora_smon_hysms02
oracle 2266 1 0 mar 31 ? 811:42 ora_dbw0_hysms02
oracle 2264 1 16 mar 31 ? 16999:57 ora_pmon_hysms02
oracle 2268 1 0 mar 31 ? 1649:07 ora_lgwr_hysms02
oracle 2279 1 0 mar 31 ? 8:09 ora_snp1_hysms02
oracle 2281 1 0 mar 31 ? 4:22 ora_snp2_hysms02
oracle 2285 1 0 mar 31 ? 9:40 ora_snp4_hysms02
oracle 2271 1 0 mar 31 ? 15:57 ora_ckpt_hysms02
oracle 2283 1 0 mar 31 ? 5:37 ora_snp3_hysms02
oracle 2277 1 0 mar 31 ? 5:58 ora_snp0_hysms02
oracle 2289 1 0 mar 31 ? 0:00 ora_d000_hysms02
oracle 2287 1 0 mar 31 ? 0:00 ora_s000_hysms02
oracle 2275 1 0 mar 31 ? 0:04 ora_reco_hysms02
oracle 21023 21012 0 18:52:59 pts/65 0:00 grep ora_
dbwr的进程号是2266。
使用top命令观察一下:
oracle:/oracle/oracle8>top
last pid: 21145; load averages: 3.38, 3.45, 3.67 18:53:38
725 processes: 711 sleeping, 1 running, 10 zombie, 3 on cpu
cpu states: 35.2% idle, 40.1% user, 9.4% kernel, 15.4% iowait, 0.0% swap
memory: 3072m real, 286m free, 3120m swap in use, 1146m swap free
pid username thr pri nice size res state time cpu command
11855 smspf 1 59 0 1355m 1321m cpu/0 19:32 16.52% oracle
2264 oracle 1 0 0 1358m 1316m run 283.3h 16.36% oracle
11280 oracle 1 13 0 1356m 1321m sleep 79.8h 0.77% oracle
6957 smspf 15 29 10 63m 14m sleep 107.7h 0.76% java
17393 smspf 1 30 0 1356m 1322m cpu/1 833:05 0.58% oracle
29299 smspf 5 58 0 8688k 5088k sleep 18.5h 0.38% fee_ftp_get
21043 oracle 1 43 0 3264k 2056k cpu/9 0:01 0.31% top
20919 smspf 17 29 10 63m 17m sleep 247:02 0.29% java
25124 smspf 1 58 0 16m 4688k sleep 0:35 0.25% smif_status_rec
8086 smspf 5 23 0 21m 13m sleep 41.1h 0.24% fee_file_in
16009 root 1 35 0 4920k 3160k sleep 0:03 0.21% sshd2
25126 smspf 1 58 0 1355m 1321m sleep 0:26 0.20% oracle
2266 oracle 1 60 0 1357m 1317m sleep 811:42 0.18% oracle
11628 smspf 7 59 0 3440k 2088k sleep 0:39 0.16% sgip_client_ltz
26257 smspf 82 59 0 447m 178m sleep 533:04 0.15% java
我们注意到,2266号进程消耗的cpu不过0.18%,显然并不繁忙,那么瓶颈就很可能在io上。
使用iostat工具检查io状况。
gqgai:/home/gqgai>iostat -xn 3
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
......
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t6d0
1.8 38.4 32.4 281.0 0.0 0.7 0.0 16.4 0 29 c0t10d0
1.8 38.4 32.4 281.0 0.0 0.5 0.0 13.5 0 27 c0t11d0
24.8 61.3 1432.4 880.1 0.0 0.5 0.0 5.4 0 26 c1t1d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 9.1 0 0 hurraysms02:vold(pid238)
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
........
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t6d0
0.3 8.3 0.3 47.0 0.0 0.1 0.0 9.2 0 8 c0t10d0
0.0 8.3 0.0 47.0 0.0 0.1 0.0 8.0 0 7 c0t11d0
11.7 65.3 197.2 522.2 0.0 1.6 0.0 20.5 0 100 c1t1d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 hurraysms02:vold(pid238)
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
........
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t6d0
0.3 13.7 2.7 68.2 0.0 0.2 0.0 10.9 0 12 c0t10d0
0.0 13.7 0.0 68.2 0.0 0.1 0.0 9.6 0 11 c0t11d0
11.3 65.3 90.7 522.7 0.0 1.5 0.0 19.5 0 99 c1t1d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 hurraysms02:vold(pid238)
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
........
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t6d0
0.0 8.0 0.0 42.7 0.0 0.1 0.0 9.3 0 7 c0t10d0
0.0 8.0 0.0 42.7 0.0 0.1 0.0 9.1 0 7 c0t11d0
11.0 65.7 978.7 525.3 0.0 1.4 0.0 17.7 0 99 c1t1d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 hurraysms02:vold(pid238)
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
........
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t6d0
0.3 87.7 2.7 433.7 0.0 2.2 0.0 24.9 0 90 c0t10d0
0.0 88.3 0.0 436.5 0.0 1.8 0.0 19.9 0 81 c0t11d0
89.0 54.0 725.4 432.0 0.0 2.1 0.0 14.8 0 100 c1t1d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 hurraysms02:vold(pid238)
我们注意到,存放数据库的主要卷c1t1d0的繁忙程度始终处于99~100,而写速度却只有500k/s左右,这个速度是极为缓慢的。
(%b percent of time the disk is busy (transactions in progress)
kw/s kilobytes written per second)
根据我们的常识t3盘阵通常按char写速度可以达到10m/s左右,以前测试过一些tpcc指标,可以参考:use bonnie to test system io speed。
而正常情况下的数据库随机写通常都在1~2m左右,显然此时的磁盘已经处于不正常状态,经过确认的确是硬盘发生了损坏,raid5的group中损坏了一块硬盘。
经过更换以后系统逐渐恢复正常。
posted by eygle at june 26, 2005
Java Asp PHP .Net XML C/C++ CGI VB Jsp J2ee J2se J2me EJB Servlet Tomcat Resin Struts Weblogic Eclipse ANT GUI JMS Web servise IDEA Webphere Hibernate Spring Jboss Applet Swing Socket Javamail Perl Ajax P2P 安全 模式 框架 测试 开源 游戏
Windows XP Windows 2000 Windows 2003 Windows Me Windows 9.x Linux UNIX 注册表 操作系统 服务器 应用服务器