Here is a bit of interesting investigation that I would like to share.
Application received timeouts while trying to connect to the database.
Connection via SQL*PLUS showed same results. Time taken to login was high for either bequeath or non-local connections.
SQL*PLUS connection was traced using strace. Relevant output is shown below.
Please note that elapsed time per call is reported on the far right of every system call.
open("/etc/hostid", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000020>
open("/etc/hostid", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000018>
uname({sys="Linux", node="xxxxx", ...}) = 0 <0.000017>
stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 <0.000018>
open("/etc/resolv.conf", O_RDONLY) = 9 <0.000019>
fstat(9, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 <0.000016>
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b880d500000 <0.000019>
read(9, "", 4096) = 0 <0.000017>
close(9) = 0 <0.000017>
munmap(0x2b880d500000, 4096) = 0 <0.000019>
uname({sys="Linux", node="xxxxx", ...}) = 0 <0.000015>
write(10, "\4N\0\0\6\0\0\0\0\0\3s\3\0\0\0\0\0\0\0\0\0\0\0\0!\0\0\0"..., 1102) = 1102 <0.000018>
read(11, "\4j\0\0\6\0\0\0\0\0\10\30\0\23\0\0\0\23AUTH_VERSION_S"..., 8208) = 1130 <1.257143>
open("/u01/app/oracle/product/11.2.0/dbhome_1/rdbms/mesg/oraus.msb", O_RDONLY) = 9 <0.000049>
While it is evident that one of the read call took more time than expected (1.25 seconds), it was unclear as to what that read call was waiting on. Following the forked process using strace revealed the following:
[pid 5350] sendto(6, "\2\0\0\0\f\0\0\0\6\0\0\0group\0\0\0", 20, MSG_NOSIGNAL, NULL, 0) = 20
[pid 5350] poll([{fd=6, events=POLLIN|POLLERR|POLLHUP, revents=POLLIN|POLLERR|POLLHUP}], 1, 5000) = 1
[pid 5350] recvmsg(6, {msg_name(0)=NULL, msg_iov(1)=[{"group\0", 6}], msg_controllen=24, {cmsg_len=20, cmsg_level=SOL_SOCKET, cmsg_type=SCM_RIGHTS, {7}}, msg_flags=0}, 0) = 6
[pid 5350] fstat(7, {st_mode=S_IFREG|0600, st_size=217016, ...}) = 0
[pid 5350] pread(7, "\1\0\0\0h\0\0\0D\254\0\0\1\0\0\0\250\305XP\0\0\0\0\323"..., 104, 0) = 104
[pid 5350] mmap(NULL, 217016, PROT_READ, MAP_SHARED, 7, 0) = 0x2b646ca23000[pid 5350] close(7) = 0
[pid 5350] close(6) = 0
[pid 5350] getgid() = 501
[pid 5350] open("/u01/app/oracle/admin/bigb/adump/xxxxx_ora_5350_1.aud", O_RDWR|O_CREAT|O_EXCL, 0660) = -1 EEXIST (File exists)
[pid 5350] open("/u01/app/oracle/admin/bigb/adump/xxxxx_ora_5350_2.aud", O_RDWR|O_CREAT|O_EXCL, 0660) = -1 EEXIST (File exists)
[pid 5350] open("/u01/app/oracle/admin/bigb/adump/xxxxx_ora_5350_3.aud", O_RDWR|O_CREAT|O_EXCL, 0660) = -1 EEXIST (File exists)
[pid 5350] open("/u01/app/oracle/admin/bigb/adump/xxxxx_ora_5350_4.aud", O_RDWR|O_CREAT|O_EXCL, 0660) = -1 EEXIST (File exists)
[pid 5350] open("/u01/app/oracle/admin/bigb/adump/xxxxx_ora_5350_5.aud", O_RDWR|O_CREAT|O_EXCL, 0660) = -1 EEXIST (File exists)
[pid 5350] open("/u01/app/oracle/admin/bigb/adump/xxxxx_ora_5350_6.aud", O_RDWR|O_CREAT|O_EXCL, 0660) = -1 EEXIST (File exists)
[pid 5350] open("/u01/app/oracle/admin/bigb/adump/xxxxx_ora_5350_7.aud", O_RDWR|O_CREAT|O_EXCL, 0660) = -1 EEXIST (File exists)
...
...
... and many more
[pid 5350] open("/u01/app/oracle/admin/bigb/adump/xxxxx_ora_5350_186.aud", O_RDWR|O_CREAT|O_EXCL, 0660) = 6
[pid 5350] write(6, "Audit file ", 11) = 11
[pid 5350] write(6, "/u01/app/oracle/admin/bigb/adump"..., 55) = 55
[pid 5350] write(6, "\n", 1) = 1
...
...
...
[pid 5350] write(6, "Thu Mar 12 11:37:08 2015 +01:00\n", 32) = 32
[pid 5350] write(6, "LENGTH : \'160\'", 14) = 14
[pid 5350] write(6, "\n", 1) = 1
[pid 5350] write(6, "ACTION :[7] \'CONNECT\'\nDATABASE U"..., 159) = 159
[pid 5350] write(6, "\n", 1) = 1
[pid 5350] write(12, "\0\225\0\0\6\0\0\0\0\0\10\0\0\4\1\0\0\0\0\0\1\0\0\0\0\0"..., 149 <unfinished ...>
[pid 5344] <... read resumed> "\0\225\0\0\6\0\0\0\0\0\10\0\0\4\1\0\0\0\0\0\1\0\0\0\0\0"..., 8208) = 149
As shown in above strace output, PID of sqlplus (user) process is 5344. It starts with read system call and calls forked (shadow) process (pid 5350) . The shadow process, before logging into the database has to make entry in the audit logs. For accomplishing this, it starts scanning the audit_trail directory to find the latest file (based on the acquired PID 5350) where the audit entry can be made. This scan is performed in sequential order. See the snippet below extracted from strace output.
xxxxx_ora_5350_1.aud',
xxxxx_ora_5350_2.aud',
xxxxx_ora_5350_3.aud',
xxxxx_ora_5350_4.aud',
xxxxx_ora_5350_5.aud',
xxxxx_ora_5350_6.aud',
xxxxx_ora_5350_7.aud',
xxxxx_ora_5350_8.aud',
xxxxx_ora_5350_9.aud',
xxxxx_ora_5350_a.aud',
xxxxx_ora_5350_b.aud',
xxxxx_ora_5350_c.aud',
xxxxx_ora_5350_d.aud',
xxxxx_ora_5350_e.aud',
xxxxx_ora_5350_f.aud',
xxxxx_ora_5350_10.aud',
xxxxx_ora_5350_11.aud',
xxxxx_ora_5350_12.aud',
xxxxx_ora_5350_13.aud',
xxxxx_ora_5350_14.aud',
xxxxx_ora_5350_15.aud',
xxxxx_ora_5350_16.aud',
xxxxx_ora_5350_17.aud',
xxxxx_ora_5350_18.aud',
xxxxx_ora_5350_19.aud',
xxxxx_ora_5350_1a.aud',
xxxxx_ora_5350_1b.aud',
xxxxx_ora_5350_1c.aud',
xxxxx_ora_5350_1d.aud',
xxxxx_ora_5350_1e.aud',
xxxxx_ora_5350_1f.aud',
...
...
xxxxx_ora_5350_182.aud';
xxxxx_ora_5350_183.aud';
xxxxx_ora_5350_184.aud';
xxxxx_ora_5350_185.aud';
xxxxx_ora_5350_186.aud';
So, the longer this list, more time will it take for session to login.
After sequential scanning of all the audit logs, the shadow process finally opens the latest file and writes an Audit Entry:
[pid 5350] open('/u01/app/oracle/admin/bigb/adump/xxxxx_ora_5350_186.aud', O_RDWR|O_CREAT|O_EXCL, 0660) = 6
[pid 5350] write(6, 'Audit file', 11) = 11
After a successful entry is made, control is given back to user process (pid 5344) which continues with the read operation.
[pid 5344] <... read resumed> "\0\225\0\0\6\0\0\0\0\0\10\0\0\4\1\0\0\0\0\0\1\0\0\0\0\0"..., 8208) = 149
Finally, the control is given back to user console (note the file descripter below):
[pid 5344] read(0, <unfinished ...>
Process 5350 detached
Solution
Audit trail directory reported around 1.2 million audit files.
oracle@xxxxx:/u01/app/oracle/admin/bigb/adump> ls -ltr | wc -l
11840730
To fix this, recent audit logs were copied to another location and the curent location was cleaned up.
oracle@xxxxx:/u01/app/oracle/admin/bigb/adump> ls -ltr | awk '{print $NF}' | xargs rm