OS환경 : Oracle Linux 6.8 (64bit)
DB 환경 : Oracle Database 11.2.0.4
방법 : 오라클 11g R2 10GB update 중 세션 킬시 발생하는 문제
11g R2 db 운영 중 10gb 의 데이터를 update 하던 중 과도하게 소요되는 시간으로 인해
강제로 kill 하는 시나리오
사전 작업
현재 메모리 값 확인
1 2 3 4 5 | SQL> show parameter memory_target NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ memory_target big integer 2G |
작업 전 undo tablespace 용량 11gb로 증설
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | SQL> set lines 200 pages 1000 col tablespace_name for a15 col file_name for a50 SQL> select tablespace_name, file_id, file_name, bytes/1024/1024 MB from dba_data_files order by 2; TABLESPACE_NAME FILE_ID FILE_NAME MB --------------- ---------- -------------------------------------------------- ---------- SYSTEM 1 /app/oracle/oradata/orcl/system01.dbf 740 SYSAUX 2 /app/oracle/oradata/orcl/sysaux01.dbf 530 UNDOTBS1 3 /app/oracle/oradata/orcl/undotbs01.dbf 90 USERS 4 /app/oracle/oradata/orcl/users01.dbf 5 SQL> alter database datafile 3 resize 11g; Database altered. |
redo log file size 증가
기존 50MB에서 1GB로 증설
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | SQL> set lines 200 set pages 1000 col member for a60 select l.group#, member, archived, l.status, (bytes/1024/1024) MB from v$log l, v$logfile f where f.group# = l.group# order by 1; GROUP# MEMBER ARC STATUS MB ---------- ------------------------------------------------------------ --- ---------------- ---------- 1 /app/oracle/oradata/orcl/redo01.log YES UNUSED 1024 2 /app/oracle/oradata/orcl/redo02.log NO INACTIVE 1024 3 /app/oracle/oradata/orcl/redo03.log NO CURRENT 1024 |
https://positivemh.tistory.com/284 참조
테스트용 테이블 스페이스 생성
1 2 3 | SQL> create tablespace killtest1 datafile '/app/oracle/oradata/orcl/killtest01.dbf' size 10g; Tablespace created. |
테스트용 유저 생성
1 2 3 4 5 6 7 | SQL> create user killuser identified by killuser account unlock default tablespace killtest1 quota unlimited on killtest1; User created. SQL> grant resource, connect to killuser; Grant succeeded. |
테스트용 테이블 생성 및 데이터 삽입
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | SQL> conn killuser/killuser Connected. SQL> create table dummy (cola number, colb number, colc number, cold number, cole number) nologging; Table created. SQL> declare type tbl_ins is table of dummy%rowtype index by binary_integer; w_ins tbl_ins; begin for i in 1..10000000 loop w_ins(i).cola :=i; w_ins(i).colb :=10; w_ins(i).colc :=i+1; w_ins(i).cold :=i+3; w_ins(i).cole :=99; end loop; forall i in 1..10000000 insert into dummy values w_ins(i); commit; end; / PL/SQL procedure successfully completed. |
새로운 세션(2번 세션)으로 용량확인
1 2 3 4 5 6 7 8 9 10 11 12 | SQL> col owner for a10 col segment_name for a10 col segment_type for a10 select owner,segment_name,segment_type,sum(bytes)/1024/1024 as MB from dba_segments where owner='KILLUSER' GROUP BY owner,segment_name,segment_type; OWNER SEGMENT_NA SEGMENT_TY MB ---------- ---------- ---------- ---------- KILLUSER DUMMY TABLE 312 |
1번 실행시 312MB가 쌓임
추가 데이터 삽입(insert 5번 실행 후 commit)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | SQL> insert into dummy select * from dummy; 10000000 rows created. SQL> / 20000000 rows created. SQL> / 40000000 rows created. SQL> / 80000000 rows created. SQL> / 160000000 rows created. SQL> commit; Commit complete. |
2번 세션으로 용량확인
1 2 3 4 5 6 7 8 9 10 11 12 | SQL> col owner for a10 col segment_name for a10 col segment_type for a10 select owner,segment_name,segment_type,sum(bytes)/1024/1024 as MB from dba_segments where owner='KILLUSER' GROUP BY owner,segment_name,segment_type; OWNER SEGMENT_NA SEGMENT_TY MB ---------- ---------- ---------- ---------- KILLUSER DUMMY TABLE 9920 |
약 10GB가 쌓임
데이터 건수 확인
1 2 3 4 5 | SQL> select count(*) from dummy; COUNT(*) ---------- 320000000 |
3억2천개 rows가 생성됨
테이블 nologging 해제
1 2 3 | SQL> alter table dummy logging; Table altered. |
10gb update 작업 진행
현재 시간 확인
1 2 3 4 5 6 7 8 9 | SQL> alter session set nls_date_format = 'YYYY/MM/DD hh24:mi:ss'; Session altered. SQL> select sysdate from dual; SYSDATE ------------------- 2020/12/22 17:43:58 |
dummy 테이블의 데이터 update 진행
set timing on 후 속도 테스트
1 2 3 4 | SQL> update dummy set cola=999, colb=20, colc=1000, cold=0, cole=10; . . (진행중) |
약 00분 소요됨
2번 세션으로 현재 시간 확인
1 2 3 4 5 6 7 8 9 | SQL> alter session set nls_date_format = 'YYYY/MM/DD hh24:mi:ss'; Session altered. SQL> select sysdate from dual; SYSDATE ------------------- 2020/12/22 18:12:58 |
약 30분 정도 지남
(update 계속 진행중)
세션 2로 세션 1 sid, serial# 확인
1 2 3 4 5 | SQL> select username, sid, serial# from v$session where username = 'KILLUSER'; USERNAME SID SERIAL# ------------ ---------- ---------- KILLUSER 21 43 |
세션 2에서 세션 1을 kill 시도
1 2 3 4 5 | SQL> alter system kill session '21, 43'; alter system kill session '21, 43' * ERROR at line 1: ORA-00031: session marked for kill |
ORA-00031 메세지가 발생함
kill immediate 명령 시도
1 2 3 4 5 | SQL> alter system kill session '21, 43' immediate; alter system kill session '21, 43' immediate * ERROR at line 1: ORA-00031: session marked for kill |
ORA-00031 메세지가 발생함
세션 상태 확인
1 2 3 4 5 | SQL> select sid, serial#, username, status from v$session where sid=21; SID SERIAL# USERNAME STATUS ---------- ---------- ------------------------------ -------- 21 43 KILLUSER KILLED |
status가 killed 로 표시됨
세션 3으로 undo 사용량 확인
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | SQL> set lines 200 pages 1000 col SID_SERIAL for a10 col ORAUSER for a10 col PROGRAM for a30 col UNDOSEG for a30 col undo for a10 SELECT TO_CHAR(s.sid)||',' ||TO_CHAR(s.serial#) sid_serial, NVL(s.username, 'None') orauser, s.program, r.name undoseg, t.used_ublk * TO_NUMBER(x.value)/1024||'K' "Undo" FROM sys.v_$rollname r, sys.v_$session s, sys.v_$transaction t, sys.v_$parameter x WHERE s.taddr = t.addr AND r.usn = t.xidusn(+) AND x.name = 'db_block_size' / SID_SERIAL ORAUSER PROGRAM UNDOSEG Undo ---------- ---------- ------------------------------ ------------------------------ ---------- 21,43 KILLUSER sqlplus@oracle50 (TNS V1-V3) _SYSSMU4_1254879796$ 13499480K |
killuser 세션이 undo 사용중
롤백 진행률 확인
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | SQL> col OPNAME for a10 col TARGET for a20 col SOFAR for 999999999 col UNITS for a10 select OPNAME,TARGET,SOFAR,TOTALWORK,UNITS,START_TIME,TIME_REMAINING, ELAPSED_SECONDS from v$session_longops where username='KILLUSER' / OPNAME TARGET SOFAR TOTALWORK UNITS START_TIME TIME_REMAINING ELAPSED_SECONDS ---------- -------------------- ---------- ---------- ---------- ------------------- -------------- --------------- Table Scan KILLUSER.DUMMY 79457 79457 Blocks 2020/12/22 17:29:39 0 8 Table Scan KILLUSER.DUMMY 163196 163196 Blocks 2020/12/22 17:29:49 0 15 Table Scan KILLUSER.DUMMY 318692 318692 Blocks 2020/12/22 17:30:09 0 33 Table Scan KILLUSER.DUMMY 637867 637867 Blocks 2020/12/22 17:30:53 0 73 Table Scan KILLUSER.DUMMY 1268034 1268034 Blocks 2020/12/22 17:33:37 0 18 Table Scan KILLUSER.DUMMY 1268034 1268034 Blocks 2020/12/22 17:38:02 0 77 Table Scan KILLUSER.DUMMY 1268034 1268034 Blocks 2020/12/22 17:43:41 0 895 7 rows selected. |
컬럼 설명에 따르면 모두 완료가 되었고 TIME_REMAINING 이 모두 0임
각 컬럼 설명
TIME_REMAINING : 예상된 남겨진 시간을 보여준다. (초단위)
ELAPSED_SECONDS : 이제까지 수행한 시간을 보여준다. (초단위)
TOTALWORK : 수행하는 작업의 전체 양을 보여준다.
SOFAR : 이제까지 수행한 작업의 양을 보여준다.
UNITS : 작업한 양의 단위를 보여준다.
LAST_UPDATE_TIME : view의 내용이 현재것인지를 판단하게 한다.
하지만 alert log에선 계속 log switch 를 하였다는 로그가 발생함
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | $ cd /app/oracle/diag/rdbms/orcl/orcl/trace $ tail -f alert_orcl.log Tue Dec 22 18:18:11 2020 Thread 1 advanced to log sequence 129 (LGWR switch) Current log# 3 seq# 129 mem# 0: /app/oracle/oradata/orcl/redo03.log Tue Dec 22 18:18:44 2020 Thread 1 advanced to log sequence 130 (LGWR switch) Current log# 1 seq# 130 mem# 0: /app/oracle/oradata/orcl/redo01.log Tue Dec 22 18:19:17 2020 Thread 1 advanced to log sequence 131 (LGWR switch) Current log# 2 seq# 131 mem# 0: /app/oracle/oradata/orcl/redo02.log Tue Dec 22 18:19:50 2020 Thread 1 advanced to log sequence 132 (LGWR switch) Current log# 3 seq# 132 mem# 0: /app/oracle/oradata/orcl/redo03.log . . |
OS Kill 을 위해 os pid 확인
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | SQL> set lines 200 pages 1000 SELECT X.SID, X.USERNAME, X.OSUSER, X.PROCESS AS FG_PID, Y.SPID BG_PID FROM V$SESSION X, V$PROCESS Y WHERE X.PADDR = Y.ADDR AND X.USERNAME IS NOT NULL / SID USERNAME OSUSER FG_PID BG_PID ---------- ------------------------------ ------------------------------ ------------------------ ------------------------ 36 SYS oracle 10088 10089 21 KILLUSER oracle 9857 9937 32 SYS oracle 10412 10413 |
os kill 명령 실행(kill -9 BG_PID)
1 | $ kill -9 9937 |
세션 1 확인
1 2 3 4 5 6 7 | SQL> update dummy set cola=999, colb=20, colc=1000, cold=0, cole=10; update dummy set cola=999, colb=20, colc=1000, cold=0, cole=10 * ERROR at line 1: ORA-03113: end-of-file on communication channel Process ID: 9937 Session ID: 21 Serial number: 43 |
kill 됨
update는 끝났지만 alert log 확인 시 log switch가 계속 진행됨을 확인할 수 있음(세션2)
1 2 3 4 5 6 7 8 9 10 11 12 13 | $ cd /app/oracle/diag/rdbms/orcl/orcl/trace $ tail -f alert_orcl.log Tue Dec 22 18:26:06 2020 Thread 1 advanced to log sequence 143 (LGWR switch) Current log# 2 seq# 143 mem# 0: /app/oracle/oradata/orcl/redo02.log Tue Dec 22 18:26:39 2020 Thread 1 advanced to log sequence 144 (LGWR switch) Current log# 3 seq# 144 mem# 0: /app/oracle/oradata/orcl/redo03.log Tue Dec 22 18:27:12 2020 Thread 1 advanced to log sequence 145 (LGWR switch) Current log# 1 seq# 145 mem# 0: /app/oracle/oradata/orcl/redo01.log . . |
이 때 shutdown immediate 시도(세션4)
1 2 3 4 | SQL> shutdown immediate . . (반응없음) |
이 때 alert log 확인
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | $ cd /app/oracle/diag/rdbms/orcl/orcl/trace $ tail -f alert_orcl.log Tue Dec 22 18:27:57 2020 Shutting down instance (immediate) Stopping background process SMCO Shutting down instance: further logons disabled Stopping background process QMNC Tue Dec 22 18:28:02 2020 Stopping background process CJQ0 Stopping background process MMNL Stopping background process MMON License high water mark = 6 All dispatchers and shared servers shutdown Tue Dec 22 18:28:07 2020 ALTER DATABASE CLOSE NORMAL Tue Dec 22 18:28:27 2020 Thread 1 advanced to log sequence 147 (LGWR switch) Current log# 3 seq# 147 mem# 0: /app/oracle/oradata/orcl/redo03.log Tue Dec 22 18:29:06 2020 Thread 1 advanced to log sequence 148 (LGWR switch) Current log# 1 seq# 148 mem# 0: /app/oracle/oradata/orcl/redo01.log |
immediate 로 종료하였지만 내려가는듯 하더니 16번째 줄에서 log switch를 다시 시작함
shutdown immediate 입력한 세션 종료를 위해 Ctrl+C, Ctrl+Z(종료) 를 입력하였지만 꺼지지 않음
1 2 3 4 | SQL> shutdown immediate ^C ^Z ^C |
세션 3으로 shutdown abort 실행
1 2 | SQL> shutdown abort ORACLE instance shut down. |
다시 db 기동
1 2 3 4 5 6 7 8 9 10 | SQL> startup ORACLE instance started. Total System Global Area 2137886720 bytes Fixed Size 2254952 bytes Variable Size 1342179224 bytes Database Buffers 788529152 bytes Redo Buffers 4923392 bytes Database mounted. Database opened. |
정상적으로 open 됨
alert log 확인
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | $ cd /app/oracle/diag/rdbms/orcl/orcl/trace $ tail -f alert_orcl.log Tue Dec 22 18:32:17 2020 ALTER DATABASE OPEN Beginning crash recovery of 1 threads Started redo scan Completed redo scan read 132698 KB redo, 9848 data blocks need recovery Started redo application at Thread 1: logseq 150, block 8194 Recovery of Online Redo Log: Thread 1 Group 3 Seq 150 Reading mem 0 Mem# 0: /app/oracle/oradata/orcl/redo03.log Completed redo application of 101.72MB Completed crash recovery at Thread 1: logseq 150, block 273591, scn 2649617 9848 data blocks read, 7885 data blocks written, 132698 redo k-bytes read Thread 1 advanced to log sequence 151 (thread open) Thread 1 opened at log sequence 151 Current log# 1 seq# 151 mem# 0: /app/oracle/oradata/orcl/redo01.log Successful open of redo thread 1 MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set SMON: enabling cache recovery [10561] Successfully onlined Undo Tablespace 2. Undo initialization finished serial:0 start:5943974 end:5943994 diff:20 (0 seconds) Verifying file header compatibility for 11g tablespace encryption.. Verifying 11g file header compatibility for tablespace encryption completed SMON: enabling tx recovery |
기동 중 crash recovery 를 하고 redo apply를 101mb 정도 함
기동이 다 되고 난뒤 다시 log switch 를 진행함
1 2 3 4 5 6 7 8 9 10 11 | $ cd /app/oracle/diag/rdbms/orcl/orcl/trace $ tail -f alert_orcl.log Tue Dec 22 18:32:52 2020 Thread 1 advanced to log sequence 152 (LGWR switch) Current log# 2 seq# 152 mem# 0: /app/oracle/oradata/orcl/redo02.log Tue Dec 22 18:33:25 2020 Thread 1 advanced to log sequence 153 (LGWR switch) Current log# 3 seq# 153 mem# 0: /app/oracle/oradata/orcl/redo03.log Tue Dec 22 18:34:01 2020 Thread 1 advanced to log sequence 154 (LGWR switch) Current log# 1 seq# 154 mem# 0: /app/oracle/oradata/orcl/redo01.log |
오라클 프로세스 확인
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | SQL> set lines 200 pages 1000 select program, sid, serial#, username, status from v$session; PROGRAM SID SERIAL# USERNAME STATUS ------------------------------------------------ ---------- ---------- ------------------------------ -------- sqlplus@oracle50 (TNS V1-V3) 1 5 SYS ACTIVE oracle@oracle50 (PMON) 2 1 ACTIVE oracle@oracle50 (PSP0) 3 1 ACTIVE oracle@oracle50 (VKTM) 4 1 ACTIVE oracle@oracle50 (GEN0) 5 1 ACTIVE oracle@oracle50 (DIAG) 6 1 ACTIVE oracle@oracle50 (DBRM) 7 1 ACTIVE oracle@oracle50 (DIA0) 8 1 ACTIVE oracle@oracle50 (MMAN) 9 1 ACTIVE oracle@oracle50 (DBW0) 10 1 ACTIVE oracle@oracle50 (LGWR) 11 1 ACTIVE oracle@oracle50 (CKPT) 12 1 ACTIVE oracle@oracle50 (SMON) 13 1 ACTIVE oracle@oracle50 (RECO) 14 1 ACTIVE oracle@oracle50 (MMON) 15 1 ACTIVE oracle@oracle50 (MMNL) 16 1 ACTIVE oracle@oracle50 (CJQ0) 17 11 ACTIVE oracle@oracle50 (SMCO) 18 9 ACTIVE oracle@oracle50 (W000) 20 19 ACTIVE oracle@oracle50 (QMNC) 21 3 ACTIVE oracle@oracle50 (J000) 25 33 ACTIVE oracle@oracle50 (Q000) 28 3 ACTIVE oracle@oracle50 (Q001) 31 1 ACTIVE oracle@oracle50 (J001) 32 9 ACTIVE 24 rows selected. |
KILLUSER 세션은 없고 시스템 프로세스만 돌아가고 있음
몇분 뒤 alert log에 아래와 같은 메세지가 발생하고 더이상 log switch가 발생하지 않음
1 2 3 4 5 6 7 | $ cd /app/oracle/diag/rdbms/orcl/orcl/trace $ tail -f alert_orcl.log Tue Dec 22 18:36:16 2020 SMON: Parallel transaction recovery tried Starting background process SMCO Tue Dec 22 18:36:16 2020 SMCO started with pid=22, OS id=10605 |
SMON: Parallel transaction recovery tried 이 발생했다면 확인해봐야 할 뷰
v$fast_start_servers 뷰 또는 v$fast_start_transactions 뷰 확인
1 2 3 4 5 6 7 8 9 10 11 | SQL> select * from v$fast_start_transactions; USN SLT SEQ STATE UNDOBLOCKSDONE UNDOBLOCKSTOTAL PID CPUTIME PARENTUSN PARENTSLT PARENTSEQ XID PXI RCVSERVERS ---------- ---------- ---------- ---------------- -------------- --------------- ---------- ---------- ---------- ---------- ---------- ---------------- ---------------- ---------- 4 15 744 RECOVERED 353217 353217 238 04000F00E8020000 1 SQL> select RCVSERVERS from v$fast_start_transactions; RCVSERVERS ----------- 1 |
RCVSERVERS 컬럼의 값이 1로만 나타나면 트랜잭션이
가장 비효율적인 방식인 SMON 프로세스를 통해 직렬로 롤백되고 있는 것임
이 경우 초기화 매개변수 RECOVERY_PARALLELISM을 0과 1을 제외한 값으로 수정한 다음
병렬 롤백 인스턴스를 다시 시작할 수 있음
그런 다음, ALTER SYSTEM SET FAST_START_PARALLEL_ROLLBACK = HIGH를 실행하면
CPU 수의 네 배나 되는 병렬 서버를 생성할 수 있음
결론 : 오랜 시간동안 update 중인 세션을 kill 했을 경우 ORA-00031 이 발생하고
가장 좋은 방법은 자동으로 kill 이 끝날 때 까지 기다리는 것이고
불가피한 경우 OS kill 명령을 통해 kill 이 가능함
kill을 하더라도 log switch는 한동안 계속됨 이 때 db를 종료할 경우(immediate 또는 abort)
db 재기동 이후에도 내부적으로 계속 rollback이 진행되고 이로 인해 log switch가 계속 발생함
*alter system session kill 은 세션을 직접적으로 kill 하는 명령이 아닌
해당 세션이 자살하게끔 신호(세션 상태 object에 비트를 설정)를 보내는것이기 때문에 ORA-00031 같은 메세지가 발생할 수 있는것임
참조 : https://positivemh.tistory.com/183
https://positivemh.tistory.com/202
https://positivemh.tistory.com/182'ORACLE > Admin' 카테고리의 다른 글
오라클 11g R2 Temp Tablespace 재생성 (2) | 2020.11.03 |
---|---|
오라클 설치 이후 uid, gid 변경 가이드 (재설치) (0) | 2020.10.07 |
오라클 11g R2 /etc/hosts 권한 부족시 발생하는 문제 (0) | 2020.09.18 |
오라클 19c /etc/hosts 권한 부족시 발생하는 문제 (0) | 2020.09.18 |
오라클 19c 리스너 상태 READY와 UNKNOWN 차이 (0) | 2020.09.15 |