Thursday, August 30, 2007

ID Consistency Across Data Stores

Just thought I would share an amusing story about ID consistency. More accurately, I suppose, the lack of ID consistency. Without getting too specific, a friend was telling me a story about a process flow he ran into recently. Basically, the system had two places where data was stored. In the first, new items were added with several pieces of data including unique internal and external IDs. Each item would then be copied to a second data store via an automated process. Once in the new system, the item became available to end users so that they might add an additional set of information. Another automated process would see changes the users made in this second system and copy certain pieces of the data back to the corresponding item in the first data store. This was the most important part of the process because the primary goal was that the final data be correct in the first data store. Although a little complicated, this doesn't seem like a bad solution at first glance.

The problem for them started because the first data store used the internal ID as the unique identifier for the items while the second used the external ID. Then, they allowed the users to change the external ID for any item in the first data store. Hopefully you're starting to see where I'm going with this. Imagine an item in the first data store that has been around for a while. It's got all the data that was initially added plus the important data that was copied from the second data store. One day, a user comes and changes the external ID of the item in the first data store. The automated process picks up the change and attempts to copy it to the second data store. However, because the external ID is now different, the item is "new" as far as the second data store is concerned. So, a new row with just the minimal set of data coming from the first data store is created in the second. The automated process that copies changes from the second store to the first runs and finds the "new" item which is lacking any of the important information that users typically add in the second data store (previously added to the item with the original external ID). So, when the data is copied back to the first data store, all the information already there is simply deleted. A little while later, all the users are asking what happened to their records in the first data store.

I hope my story made sense. I found it amusing and thought I would share. Obviously, there are several things one can do to remedy the situation. However, it could have all been easily avoided by simply ensuring that both data stores used the same ID for each item.