Entries tagged as Audit Vault
Related tags
OCT 13: Adding a new Oracle host to Audit Vault
[AV Server] is your Audit Vault Server host - one of these.
[Agent] is the server where your the databases to be audited are hosted - probably quite a few of these.
[Source Database] is one of the many database instances running on the Agent server(s) that you want to collect audit information from.
Where you need to supply your own values, these are shown in italics. I've assumed here that I want to audit a PROD1 database on the prod host.
1. [AV Server] Add the new agent.
2. [Agent] Install the Agent software. You'll need the agent details that you added to the AV Server in step 1.
3. [Source Database] Create an account in the database that you want to collect audit data from and then assign the correct privileges to it. (Note that you only need to run the last command if you'll be using the Redo Collector)
4. [AV Server] Verify that source database configuration is ok.
5. [AV Server] Add Source
6. [AV Server] Add Collectors. There are three collector types you could add. Here I'm just adding the DB_AUD and OS_AUD collectors.
7. [Agent] Complete Source configuration. This sets up tnsnames.ora and Wallet entries on the Agent machine.
8. [AV Server] Start Agent
Thanks again to Tammy for her help.
[Agent] is the server where your the databases to be audited are hosted - probably quite a few of these.
[Source Database] is one of the many database instances running on the Agent server(s) that you want to collect audit information from.
Where you need to supply your own values, these are shown in italics. I've assumed here that I want to audit a PROD1 database on the prod host.
1. [AV Server] Add the new agent.
avca add_agent -agentname agent_prod -agenthost prod AVCA started Adding agent... Enter agent user name: agent_prod Enter agent user password: GHt5_5dhY Re-enter agent user password: GHt5_5dhY Agent added successfully.
2. [Agent] Install the Agent software. You'll need the agent details that you added to the AV Server in step 1.
3. [Source Database] Create an account in the database that you want to collect audit data from and then assign the correct privileges to it. (Note that you only need to run the last command if you'll be using the Redo Collector)
create user avsrc_prod identified by avpwd1; @/oem/oracle/product/av/scripts/streams/source/zarsspriv.sql avsrc_prod setup; @/oem/oracle/product/av/scripts/streams/source/zarsspriv.sql avsrc_prod redo_coll;
4. [AV Server] Verify that source database configuration is ok.
avorcldb verify -src prod:1521:PROD1.WORLD -colltype ALL
5. [AV Server] Add Source
avorcldb add_source -src prod:1521:PROD1.WORLD -desc PROD1 -agentname agent_prod
6. [AV Server] Add Collectors. There are three collector types you could add. Here I'm just adding the DB_AUD and OS_AUD collectors.
avorcldb add_collector -srcname PROD1.WORLD -agentname agent_prod -colltype OSAUD -orclhome /oem/oracle/product/10.2.0.4 -collname PROD1_OSAUD avorcldb add_collector -srcname PROD1.WORLD -agentname agent_prod -colltype DBAUD -collname PROD1_DBAUD
7. [Agent] Complete Source configuration. This sets up tnsnames.ora and Wallet entries on the Agent machine.
avorcldb setup -srcname PROD1.WORLD -srcusr -wpwd GHt5_5dhY
8. [AV Server] Start Agent
avctl start_agent -agentname agent_prod
9. [AV Server] Start Collectors
avctl start_collector -collname PROD1_OSAUD -srcname PROD1.WORLD avctl start_collector -collname PROD1_DBAUD -srcname PROD1.WORLD
Thanks again to Tammy for her help.
OCT 13: Hanging Audit Vault Warehouse Refresh Job
It was just a Pre-Production system that we could play around with but I noticed that there were no recent audit trail entries when I tried some tests. On closer inspection, the most recent audit record was a couple of days old, so it looked like the job which inserts new audit records into the Audit Vault Data Warehouse wasn't running properly. The first step was to connect to that database instance and check the status of the job.
OK, so the Warehouse Refresh job has been running for a few days (2 days, 21 hours and 6 minutes). When I looked at the session for that job, it was waiting on "enq: PS - contention" which is (trumpet salute) a parallel execution wait event.
It's a bit of a red herring because it's really waiting on the PX slave, so the real event should be in that session 169.
So, looking at that made me Google around and come up with this. Which implies that I should look at bug number 5908030 which is stated as a duplicate of 5476091. Both RDBMS bugs and both will be fixed in (gulp) 11.2.
The next job was to kill the currently running refresh job and restart it, particularly as this was just Pre-Prod and the lack of audit records in the Warehouse was holding up testing. I tried various approaches to stopping the job, running from the highest level more elegant approaches, down to brute force.
I waited and tried a few more times, but that slave really didn't want to die ...
So I tried try killing the process
Now it's gone, see if parent session for refresh job is still there
No, so the job should be gone too
Well, it had restarted and been running for a minute or so. The audit trail entries started to appear in the warehouse, anyway. As I won't be working with this again for a while, I'm counting on colleagues to keep me posted about whether this keeps happening!
SQL> select owner, job_name, elapsed_time from all_scheduler_running_jobs; OWNER JOB_NAME ------------------------------ ------------------------------ ELAPSED_TIME --------------------------------------------------------------------------- AVSYS REFRESH_WAREHOUSE_DATA +002 21:06:37.19
OK, so the Warehouse Refresh job has been running for a few days (2 days, 21 hours and 6 minutes). When I looked at the session for that job, it was waiting on "enq: PS - contention" which is (trumpet salute) a parallel execution wait event.
1 select sid, username, last_call_et, event, wait_time, seconds_in_wait, state, blocking_session, blocking_session_status 2 from v$session 3* where sid=201 SQL> / SID USERNAME LAST_CALL_ET ---------- ------------------------------ ------------ EVENT WAIT_TIME ---------------------------------------------------------------- ---------- SECONDS_IN_WAIT STATE BLOCKING_SESSION BLOCKING_SE --------------- ------------------- ---------------- ----------- 201 AVSYS 249575 enq: PS - contention 0 48361 WAITING 169 VALID
It's a bit of a red herring because it's really waiting on the PX slave, so the real event should be in that session 169.
SQL> c/201/169 3* where sid=169 SQL> / SID USERNAME LAST_CALL_ET ---------- ------------------------------ ------------ EVENT WAIT_TIME ---------------------------------------------------------------- ---------- SECONDS_IN_WAIT STATE BLOCKING_SESSION BLOCKING_SE --------------- ------------------- ---------------- ----------- 169 AVSYS 249250 cursor: pin X 0 0 WAITING UNKNOWN
So, looking at that made me Google around and come up with this. Which implies that I should look at bug number 5908030 which is stated as a duplicate of 5476091. Both RDBMS bugs and both will be fixed in (gulp) 11.2.
The next job was to kill the currently running refresh job and restart it, particularly as this was just Pre-Prod and the lack of audit records in the Warehouse was holding up testing. I tried various approaches to stopping the job, running from the highest level more elegant approaches, down to brute force.
SQL> exec dbms_scheduler.stop_job('AVSYS.REFRESH_WAREHOUSE_DATA'); BEGIN dbms_scheduler.stop_job('AVSYS.REFRESH_WAREHOUSE_DATA'); END; * ERROR at line 1: ORA-27365: job has been notified to stop, but failed to do so immediately ORA-06512: at "SYS.DBMS_ISCHED", line 164 ORA-06512: at "SYS.DBMS_SCHEDULER", line 483 ORA-06512: at line 1
Tried that a few times and decided it was probably well and truly stuck, particularly when the bugs were referring to systemstate dumps, so I tried killing the stuck slave, first
1 select sid, username, last_call_et, event, wait_time, seconds_in_wait, state, blocking_session, blocking_session_status 2 from v$session 3* where sid=201 SQL> / SID USERNAME LAST_CALL_ET ---------- ------------------------------ ------------ EVENT WAIT_TIME ---------------------------------------------------------------- ---------- SECONDS_IN_WAIT STATE BLOCKING_SESSION BLOCKING_SE --------------- ------------------- ---------------- ----------- 201 AVSYS 251777 enq: PS - contention 0 50563 WAITING 169 VALID SQL> 3 3* where sid=201 SQL> c/201/169 3* where sid=169 SQL> / SID USERNAME LAST_CALL_ET ---------- ------------------------------ ------------ EVENT WAIT_TIME ---------------------------------------------------------------- ---------- SECONDS_IN_WAIT STATE BLOCKING_SESSION BLOCKING_SE --------------- ------------------- ---------------- ----------- 169 AVSYS 251446 cursor: pin X 0 0 WAITING UNKNOWN SQL> select sid, serial# from v$session where sid=169; SID SERIAL# ---------- ---------- 169 1585 SQL> alter system kill session '169,1585'; alter system kill session '169,1585' * ERROR at line 1: ORA-00031: session marked for kill
I waited and tried a few more times, but that slave really didn't want to die ...
SQL> l 1 select sid, username, last_call_et, event, wait_time, seconds_in_wait, state, blocking_session, blocking_session_status 2 from v$session 3* where sid=169 SQL> / SID USERNAME LAST_CALL_ET ---------- ------------------------------ ------------ EVENT WAIT_TIME ---------------------------------------------------------------- ---------- SECONDS_IN_WAIT STATE BLOCKING_SESSION BLOCKING_SE --------------- ------------------- ---------------- ----------- 169 AVSYS 251725 cursor: pin X 0 0 WAITING UNKNOWN
So I tried try killing the process
SQL> select sid, serial#, paddr, osuser, process, status 2 from v$session where sid=169; SID SERIAL# PADDR OSUSER ---------- ---------- ---------------- ------------------------------ PROCESS STATUS ------------ -------- 169 1585 070000005A383348 oracle 962750 KILLED SQL> !ps -ef|grep 962750 oracle 962750 1 1 Jul 13 - 11:49 ora_p006_AV01TEST oracle 1519634 1757348 2 12:11:27 pts/1 0:00 grep 962750 SQL> !kill -9 962750 SQL> !ps -ef|grep 962750 oracle 1519638 1757348 1 12:11:38 pts/1 0:00 grep 962750
Now it's gone, see if parent session for refresh job is still there
1 select sid, username, last_call_et, event, wait_time, seconds_in_wait, state, blocking_session, blocking_session_status 2 from v$session 3* where sid=201 SQL> / no rows selected
No, so the job should be gone too
SQL> select * from all_scheduler_running_jobs; OWNER JOB_NAME ------------------------------ ------------------------------ JOB_SUBNAME SESSION_ID SLAVE_PROCESS_ID SLAVE_OS_PRO ------------------------------ ---------- ---------------- ------------ RUNNING_INSTANCE RESOURCE_CONSUMER_GROUP ---------------- -------------------------------- ELAPSED_TIME --------------------------------------------------------------------------- CPU_USED --------------------------------------------------------------------------- AVSYS REFRESH_WAREHOUSE_DATA 122 19 692392 1 +000 00:01:25.48 +000 00:00:18.38
Well, it had restarted and been running for a minute or so. The audit trail entries started to appear in the warehouse, anyway. As I won't be working with this again for a while, I'm counting on colleagues to keep me posted about whether this keeps happening!
No comments:
Post a Comment