Monday, December 28, 2009

Why should we avoid distributed transactions ?

A distributed transaction is a transaction that spans multiple resources. A distributed transactions are also at times referred to as XA transactions. XA transaction is X/Open group specification on distributed transactions. XA transaction is required to guarantee the ACID (Atomicity,Consistency,Isolation,Durability) properties of a transaction when the transaction spans multiple resources - be it multiple databases,database and JMS connection, database and file system or any resource. XA transactions require transaction manager to co-ordinate the multiple resource managers involved in the transaction. Transaction Managers co-ordinate the resource managers involved using 2-Phase Commit protocol (2-PC).
Usage of distributed/XA transactions in turn come with its own issues and complexities.Some of the problems/complexities with distributed transactions include
  1. 2-PC protocol is very chatty protocol and does a lot of logging to be able to recover from any failure scenario.
  2. Too much overhead to 99.9% of the cases to handle less than 0.1% of the failure/exception cases
  3. Increases the amount of time the locks are held on the databases. This increases the chances for deadlocks on the database. This also lowers the overall performance of the system.
  4. Distributed transactions are sort of the bane of scalability. It sort of grinds the entire system to halt by adding overhead to each and every transaction.
  5. Availability of the Systems goes down. When using distributed transactions,the completion of a distributed transaction is now a product of the availability of two different systems and there by the the total availability of the system goes down.
  6. XA/Distributed transaction configuration is complicated and is difficult to test to make sure the configuration is configured correctly. (Many Java Developers tend to believe that using JTA implementation of transaction manager will take care of a XA transactions and tend to forget configuring the resources as XA resources. Many a time people end up using JTA even while dealing with single resource. ). Many people get XA enlisted resources when they don't have to and many a time the applications would perform better if they weren't unnecessarily using XA.

    Based on the various issues mentioned above it is very much recommended that the use of XA transactions be avoided as much as we can. So how do we maintain the ACID of a transaction that goes against multiple resources !!!? In most cases intelligent recovery business scenarios supported by the system's management interface will eliminate a need for XA. The possible ways to circumvent the need for distributed transactions could be a seperate post in itself...stay tuned..:).

Thursday, September 3, 2009

How to generate Events ?

Event Driven Architecture or EDA is the handling of events as they occur and the transfer of events among the systems in a loosely coupled way. The usage of events and EDA can replace the need for integrating the systems at the database level. With the usage of EDA's the systems are loosely coupled and information can be passed between the systems in real time, rather than relying on ETL processes or data transfers between the systems. The hardest part of the implementing an EDA is capturing the events
Events can be generated by tapping into web sites, files, databases, message oriented middleware, legacy applications, networks and so on.Events are generally triggered from one of the business transactions. This requires the need for event generation and the business transaction to happen in the same transaction and moreover the event generation should not impair the performance of the business transaction. This is going to be a challenge in itself.

Most of the time the events are generated in one of the 2 ways
  • Application Generated events
  • Database Generated events
Database generated events are fine grained and App generated events are more complex or coarse grained.Various challenges with EDA include
  • Need to have the business operation and the event generation happen with in the same transaction
  • Reliable event delivery
  • Events might need to be delivered in order
  • Events might need to be delivered exactly once (at least once, at most once)
  • Highly Scalable
  • Granularity of the events
  • Support for multiple platforms/languages
  • Catalog for events
  • How to fix the bad events that might have got created because of bad code, bad data.
To make sure that we do not lose any events, event generation and the corresponding business transaction need to be part of the same transaction. By having a queue that is backed by the same DB on which the business transaction occurs we make sure that the events are generated with the same transaction as the business activity. If our database is oracle then using OracleAQ might be a good option. Moreover by using Oracle AQ we can have the applications generate the events with in the same business transaction with out having to deal with the using of distributed transactions.
Options for capturing events at the Database end include
  • DBMS data event adapters – Golden Gate
  • Trigger generate events
For the data that might be modified by multiple applications, firing events from database triggers might be more reliable than firing events from applications. On the other end, database triggers may be too fine grained to capture some business events. Applications should fire those events. Also, for the data with clearly defined application ownership, firing events from application might be a good option
CAP theorem: Cap theorem is applicable to the current scenario of having to make the event generation and business transaction into a single transaction. Goals for a shared data system include.
Strong Consistency: all clients see the same view, even in presence of updates
High Availability: all clients can find some replica of the data, even in the presence of failures
Partition-tolerance: the system properties hold even when the system is partitioned
the theorem states that you can always have only two of the three CAP properties at the same time. The first property, Consistency, has to do with ACID systems, usually implemented through the two-phase commit protocol (XA transactions).
Systems that handle an incredibly huge number of transactions and data, always need some kind of system partitioning and must provide high availability. For example consider the case of a organizations like Amazon, Ebay.. the third and second CAP properties (Availability and Partitioning) are fixed, they need to sacrifice Consistency. It means they prefer to compensate or reconcile inconsistencies instead of sacrificing high availability, because their primary need is to scale well to allow for a smooth user experience/event generation.
The consistency of a two phase commit with out any performance costs can be obtained with out using distributed transactions by designing the event consumption operations to be idempotent and having the business logic in place to make sure order of events is not important. By using BASE transactions over ACID highly reliable and scalable systems can be designed.

BASE – BASICALLY AVAILABLE -SOFT STATE – EVENTUALLY CONSISTENCY
References
http://wiki/download/attachments/22454961/IEEE_Software_Design_2PC.pdf
http://camelcase.blogspot.com/2007/08/cap-theorem.html
http://activemq.apache.org/should-i-use-xa.html

Friday, March 20, 2009

Taking Memory and Thread dumps in JAVA

At times we will have to debug issues like Memory Leaks and deadlocks in java apps. Most of the issues related to memory leaks and thread locking can be debugged by taking the memory and thread dumps. I would like to document some of tools and tips that have been quite useful for me and for my fellow programmers who might need these.
Memory Dumps:
Most of the memory leaks result in the appserver or the java process crashing with an out of memory error. So just to be proactive and be able to get memory dump when that happens, it is a often a good idea to pass the following arguments to the JVM

 -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/path to file for heapdump/

As it might be obvious when these arguments are used when ever an java.lang.OutOfMemoryError occurs a heap dump is printed to the file mentioned in HeapDumpPath argument. If some userdefined process or command needs to be kicked of when java.lang.OutOfMemoryError occurs, then the following command is useful
-XX:OnOutOfMemoryError="<cmd args>

-XX:-PrintClassHistogram
Prints a histogram of class instances on Ctrl-Break
Memory dumps can also be taken on live java processes using jmap command. There are subtle variations in the way it can be used on 1.5 and 1.6
  1. Java 1.5: jmap -heap:format=b
    Works intermittently and definitely doesn't work more than once on the same process due to the problems with jmap in java 1.5
  2. Java 1.6: jmap -dump:format=b,file=heap.bin or jmap -dump:live,format=b,file=heap.bin
    Live option only gives the heap remained after running garbage collection. Jmap will create a file called heap.bin in the directoy jmap is run from
    Memory dumps can be taken from JConsole as well from 1.6.

Under certain circumstances, we may not be able to dump the heap to disk. we can instead, to get a histogram of the heap using the following command.Note that memory dump using jmap is only possible if it is executed by the same user that started the java process

Memory Dump Analysis:
Once we have the memory dumps, the next step is to analyze the dumps. One tool that was really useful for this purpose is
MAT: Eclipse based tool for analyzing the memory dumps.

http://www.eclipse.org/mat/

Thread Dump:
When applications result in any deadlocks , analyzing the threads will help us in getting to the root cause of the problem. Thread dumps can be take on linux using

1. kill -3 (must be executed by the same user that started the java process)
2. For Jboss process, we can take the thread dump from jmx-console as well (jboss.system->ServerInfo->listThreadDump operation)
3. Jstack

Analyzing Thread Dumps:
Once taking thread dumps , they can be analyzed and checked for deadlocks using Lockness
Lockness is a plugin for eclipse to analyze the thread dumps.

http://lockness.plugin.free.fr/home.php
-XX:-PrintConcurrentLocks
Print java.util.concurrent locks in Ctrl-Break thread dump. The jstack -l command provides equivalent functionality.
-XX:-TraceClassLoading
Traces the loading of classes.
-XX:-TraceClassLoadingPreorder
Trace all classes loaded in order referenced