How to maintain referential integrity in an rdbms when deleting a node?

Philip Greenspun's Homepage : Philip Greenspun's Homepage Discussion Forums : Ask Philip : One Thread
Notify me of new responses
I have a recent (2010) B.S. in Computer Science, but what we never covered entirely,
and what I am having some issues finding in my books or online, is how to maintain
the referential integrity of an object graph when a node is deleted. I have read that
"soft deleting" is popular but this seems not to be best practice in that it retains
deprecated data in the production database, requires extra logic to skip "deleted" or
"inactive" records, can cause database query time to become slow over time, and can
allow for deprecated data to show up in reports.

What do you suggest to handle the deletion of a record when there are other records
which have a dependency on that record?

-- Travis Johansen, April 2, 2012

Answers

Great question. Thanks for asking. With a full-featured RDBMS you can partition the table on the "deleted_or_not" column and that will result in all of the live production rows to be stored compactly. If you don't want deprecated data to show up in reports, simply give the full table an obscure name, such as customers_including_deleted_rows and create a view "customers" (containing only the live rows) from which most of the application code queries. This assumes, of course, that there is some value to having the old data around.

-- Philip Greenspun, April 5, 2012

Thank you for the response Dr. Greenspun, I appreciate it. What I had envisioned doing to solve this was using a second database as an archive with a junction table between the live vs. archive db for tracking purposes. Partitioning and using views seems like a very viable solution and I will definitely be integrating your suggestion into my solution.

-- Travis Johansen, April 5, 2012