Oracle Database Administration: March 2015

Monday, March 23, 2015

Service Placement in RAC - Script

services.sh script :

# Check services for databases
#---------------------------------------------------------------------
#!/bin/bash
#checks if services are running on preferred nodes
for i in `crsctl stat res -t | grep svc | grep -v world`; do
lvService=`echo $i | awk 'BEGIN { FS = ".svc" } ; { print $1 }' | cut -f3,4 -d'.'`
lvDatabase=`echo $i | cut -f2 -d'.'`
lvServiceEnabled=`srvctl config service -s $lvService -d $lvDatabase | grep "is enabled"`
lvServicePreferred=`srvctl config service -s $lvService -d $lvDatabase | grep "Preferred instances:" | cut -f3 -d' ' | sed 's/,/\n/g'|sort`
lvServiceAvailable=`srvctl config service -s $lvService -d $lvDatabase | grep "Available instances:" | cut -f3 -d' ' | sed 's/,/\n/g'|sort`
lvInstanceRunning=`srvctl status service -s $lvService -d $lvDatabase | grep "is running" | cut -f7 -d' ' | sed 's/,/\n/g'|sort`
if [ "$lvServiceEnabled" = "" ]; then
echo "INFORMATION.Service ${lvService} is disabled and not running in prefered ${lvServicePreferred}"
elif [ "$lvInstanceRunning" = "" ]; then
echo "================================================================================================================="
echo "FAIL.Service ${lvService} is enabled but not running on prefered ${lvServicePreferred}"
echo "-----------------------------------------------------------------------------------------------------------------"
echo "Command to run"
echo "srvctl start service -d ${lvDatabase} -s ${lvService}"
echo "-----------------------------------------------------------------------------------------------------------------"
elif [ "$lvServicePreferred" = "$lvInstanceRunning" ]; then
echo "SUCCESS.Service ${lvService} is running in preferred location(s) ${lvServicePreferred}"
else
echo "==============================================================================================================="
echo "WARNING.Service ${lvService} is running in location(s) ${lvInstanceRunning}. Should be in ${lvServicePreferred}"
echo "Test of services failed for database ${lvDatabase}"
echo "---------------------------------------------------------------------------------------------------------------"
echo "Command to run"
echo "srvctl relocate service -d ${lvDatabase} -s ${lvService} -i ${lvInstanceRunning} -t ${lvServicePreferred}"
echo "---------------------------------------------------------------------------------------------------------------"
fi
done

Wednesday, March 18, 2015

Check Active sessions in Oracle Database

script to check Active / Running Sessions in Oracle Database.

col sid for 999999
col serial# for 999999
col spid for a8
col username for a20
col osuser for a20
set lines 300 pagesize 999
col event for a30
col TERMINAL for a15
col SQL_ID for a15
col STATUS for a10

alter session set nls_date_format = 'DD-MON-YYYY HH24:MI:SS';
select s.inst_id,s.sid,s.serial#,s.sql_id,s.event,s.p1,s.p2,s.p3,p.spid,p.pid,s.username,s.osuser,s.status,s.terminal,s.BLOCKING_SESSION,s.LOGON_TIME from gv$session s,gv$process p where s.paddr=p.addr and
s.status='ACTIVE' and s.username IS not null order by 2;

Tuesday, March 17, 2015

Setting up new SCAN IP for SCAN LISTENER

first check -- srvctl config scan and it will give SCAN configuration IPs
and after execute this - Copy and paste IP addresses of SCAN and enter ,

alter system set remote_listener='(DESCRIPTION=(ADDRESS_LIST=
(ADDRESS=(PROTOCOL=TCPS)(HOST=&scan_ip1)(PORT=1523))
(ADDRESS=(PROTOCOL=TCPS)(HOST=&scan_ip2)(PORT=1523))
(ADDRESS=(PROTOCOL=TCPS)(HOST=&scan_ip3)(PORT=1523))))' scope=both sid='*';

after alter system register;

check scan_listener and listener -

and you are done.

Monday, March 16, 2015

ORA-12528 error is new in 10g.

It is happen when you connect with "TNS_ENTRY" with username/password and try to startup database followed by shutdown command. Solution:

1.change listener registration from STATIC to DYNAMIC.

2. Or connect without TNS_ENTRY.

Tuesday, March 10, 2015

Find Blocking Sessions

Problem - find blocking sessions

Blocking sessions occur when one sessions holds an exclusive lock on an object and doesn't release it before another sessions wants to update the same data. This will block the second until the first one has done its work.

From the view of the user it will look like the application completely hangs while waiting for the first session to release its lock. You'll often have to identify these sessions in order to improve your application to avoid as many blocking sessions as possible.

Recipie #1 - find blocking sessions with v$session

SELECT
s.blocking_session,
s.sid,
s.serial#,
s.seconds_in_wait
FROM
v$session s
WHERE
blocking_session IS NOT NULL;

Recipie #2 - find blocking sessions using v$lock

SELECT
l1.sid || ' is blocking ' || l2.sid blocking_sessions
FROM
v$lock l1, v$lock l2
WHERE
l1.block = 1 AND
l2.request > 0 AND
l1.id1 = l2.id1 AND
l1.id2 = l2.id2;

Recipie #3 - blocking sessions with all available information

The next query prints a few more information, it let's you quickly see who's blocking who. Run this query and you can immediately call the colleague who's locking your table:

SELECT s1.username || '@' || s1.machine
|| ' ( SID=' || s1.sid || ' ) is blocking '
|| s2.username || '@' || s2.machine || ' ( SID=' || s2.sid || ' ) ' AS blocking_status
FROM v$lock l1, v$session s1, v$lock l2, v$session s2
WHERE s1.sid=l1.sid AND s2.sid=l2.sid
AND l1.BLOCK=1 AND l2.request > 0
AND l1.id1 = l2.id1
AND l1.id2 = l2.id2;

Recipie #4 - identifying blocked objects

The view v$lock we've already used in the queries above exposes even more information. There are differnet kind of locks - check this site for a complete list: http://download.oracle.com/docs/cd/B13789_01/server.101/b10755/dynviews_1123.htm#sthref3198

If you encounter a TM lock is means that two sessions are trying to modify some data but blocking each other. Unless one sessions finished (commit or rollback), you'll never have to wait forever.

The following queries shows you all the TM locks:

SELECT sid, id1 FROM v$lock WHERE TYPE='TM';

SID ID1
92 20127
51 20127

The ID you get from this query refers to the actual database object which can help you to identify the problem, look at the next query:

SELECT object_name FROM dba_objects WHERE object_id=20127;

There queries should help you to identify the cause of your blocking sessions!

Sunday, March 8, 2015

Do You Know If Your Database Is Slow?

Going to the basics of how database performance can be described, we can simply say that database performance is either the response time of the operations the end-user do and/or the amount of work the database instance does in a certain time period – throughput.

We can easily find these metrics in from the v$sysmetric dynamic view:

SQL> select to_char(begin_time,'hh24:mi') time, round( value * 10, 2) "Response Time (ms)"

     from v$sysmetric

     where metric_name='SQL Service Response Time'

So this is the last-minute response time for user calls (here in ms). We can check the throughput by checking the amount of logical blocks (it includes the physical blocks) being read, plus we can add direct reads (last minute and last several seconds output here for a database with 8 KB block):

SQL> select a.begin_time, a.end_time, round(((a.value + b.value)/131072),2) "GB per sec"

from v$sysmetric a, v$sysmetric b

where a.metric_name = 'Logical Reads Per Sec'

and b.metric_name = 'Physical Reads Direct Per Sec'

and a.begin_time = b.begin_time

/

We can check more historical values through v$sysmetric_summary, v$sysmetric_history and dba_hist_ssysmetric_summary.

So did these queries answer the basic question “Do we have bad performance?”? 100 MB/sec throughput and 0.32 ms for a user call? We have seen better performance, but is it bad enough that we should alert the on-call DBA to investigate in more detail and look for the reason why we are seeing this kind of values? We cannot say. We need something to compare these values to so that we can determine if they are too low or too high. It is somewhat like being in a train that passes next to another moving train, going in same direction but at a different speed. We don’t know the speed of our train, and we don’t know the speed of the other train, so we cannot answer the question “Are we going very fast?”. If we turn to the other side and see a tree passing on the other side of the train, we will be able to estimate the speed of the train (also taking into account our experience of what is very fast for a train…). So we need something that has an absolute value. In the case of the tree, we know that the tree has speed of 0 (Ok, it is not completely absolute, but we had to simplify now :) ).

So we understand that we need an absolute value or base-line, which we know represents having  “bad”, “normal”, or “good” performance. How do we find these values?

Bad, normal and good
One way to establish these absolutes is to just experiment, establish when the database instance provides acceptable performance by going to the applications that uses the database and checking its response time, or run the queries that the application runs directly and determine if they complete in acceptable time (defined by the business requirements) – when you reach these results, check the database instance response time and current throughput, and carve them in stone as absolutes that can be used to compare future measurements.

The approach above may sometimes work, but when you start measuring response time, you will notice that it might go up and down wildly. You will need to define some bounds around the value you think is a “normal”  response time. So a response time above this bound can be called “bad”, and we can alert that we have performance degradation.

To define this point more accurately, I would suggest using another strategy. We can make an “educated” guess on these values by analyzing them historically from the DBA_HIST_SYSMETRIC_SUMMARY view. We just need to have enough history in there.

SQL> with epsilon

as

(select avg(average - STANDARD_DEVIATION ) m1,

        avg(average +  STANDARD_DEVIATION ) m2

from dba_hist_sysmetric_summary

where metric_name='User Calls Per Sec')

select avg(a.average -  a.STANDARD_DEVIATION) "A - Good",

       avg(a.average) "Average",

       avg(a.average + a.STANDARD_DEVIATION)  "B - Bad"

from dba_hist_sysmetric_summary a,

dba_hist_sysmetric_summary b,

epsilon e

where a.metric_name='SQL Service Response Time'

and b.metric_name='User Calls Per Sec'

and a.snap_id = b.snap_id

and b.average between e.m1 and e.m2

/

Please note the subquery called epsilon. I have used it here to limit the history from which we are learning to a subset of AWR snapshots where there was more meaningful work done on the database. It does not take into account times of very low activity and times of very high (abnormally) high activity, which don’t necessarily show a representative load from which we can extract our “normal” response time behavior.

So now when we check the current response time:

SQL> select to_char(begin_time,'hh24:mi') time,  value "Response Time"

from v$sysmetric

where metric_name='SQL Service Response Time'

/

Specific Response Time
sRT = Response Time (in ms) / Throughput (in GB/sec)

SQL> with epsilon

as

(select avg(average - STANDARD_DEVIATION ) m1,

avg(average +  STANDARD_DEVIATION ) m2

from dba_hist_sysmetric_summary

 where metric_name='User Calls Per Sec')

select avg( ((a.average-a.standard_deviation)*10)

/

(((c.average-c.standard_deviation)  + (d.average-d.standard_deviation))/131072)) A

,

avg( (a.average*10)

/

((c.average + d.average)/131072)) "Average"

,

avg( ((a.average+a.standard_deviation)*10)

/

(((c.average+c.standard_deviation)  + (d.average+d.standard_deviation))/131072)) B

from dba_hist_sysmetric_summary a,

 dba_hist_sysmetric_summary b,

 dba_hist_sysmetric_summary c,

 dba_hist_sysmetric_summary d,

 epsilon e

where a.metric_name='SQL Service Response Time'

and b.metric_name='User Calls Per Sec'

and c.metric_name='Logical Reads Per Sec'

and d.metric_name='Physical Reads Direct Per Sec'

and a.snap_id = b.snap_id

and a.snap_id = c.snap_id

and a.snap_id = d.snap_id

and b.average between e.m1 and e.m2

order by 1

/

Again, we can establish some values (for example, from the standard ranking for efficiency as shown in the image above, and go by these values). Or, we can create moving baselines as previously from the history of the particular DB instance usage by using the query (though with not that much ranks):

with epsilon

as

(select avg(average - STANDARD_DEVIATION ) m1,

avg(average +  STANDARD_DEVIATION ) m2

from dba_hist_sysmetric_summary

where metric_name='User Calls Per Sec')

select avg(round(a.average + a.STANDARD_DEVIATION)) + stddev(round(a.average + a.STANDARD_DEVIATION)) A,

avg(round(a.average + (a.STANDARD_DEVIATION/2))) + stddev(round(a.average + (a.STANDARD_DEVIATION/2))) B,

avg(round(a.average)) C,

avg(round(a.average - (a.STANDARD_DEVIATION/2))) - stddev(round(a.average - (a.STANDARD_DEVIATION/2))) D,

avg(round(a.average - a.STANDARD_DEVIATION)) - stddev(round(a.average - a.STANDARD_DEVIATION)) E

from dba_hist_sysmetric_summary a,

dba_hist_sysmetric_summary b,

epsilon e

where a.metric_name='Database CPU Time Ratio'

and b.metric_name='User Calls Per Sec'

and a.snap_id = b.snap_id

and b.average between e.m1 and e.m2

/