Hello Oracle gurus!
I have a question regarding SAP reconnect behavior in conjunction with RAC installation.
We have created a demo system for testing purposes, on top of the RAC 12.1 (SAP NW 7.4 with latest kernel 7.42 and DBSL pathches).
TAF has been configured and if main RAC node (to which SAP is connected) shutdowns correctly (ACPI shutdown) SAP WPs doing reconnect very well (some seconds after VIP move) and it's expected behavior.
But , if main node dies without any notification , like a "power off" happens , SAP WP's can run endless (I think 7200sec by def) doing some internal jobs until max_wp_runtime passes and WP restarted completely.
We have found some recommendations to update kernel parameters on OS level like this :
To improve fail over performance in a RAC cluster, consider changing the following IP kernel parameters as well:
net.ipv4.tcp_keepalive_time net.ipv4.tcp_keepalive_intvl net.ipv4.tcp_retries2 net.ipv4.tcp_syn_retries
We have changed for testing on server where SAP is installed these parameters :
net.ipv4.tcp_keepalive_time = 30
net.ipv4.tcp_keepalive_intvl = 10
net.ipv4.tcp_keepalive_probes = 2
and in addition in tnsnames has been added "enable=broken" definition.
So the question is , this is how SAP with Oracle should be configured (in terms of fail-over) or we have missed something ?
Also , these parameters are really low , so if you have some best practice values , it's always welcome.