The question:
I’m in the process of installing applications and databases.
In order to check whether a file is correct, I can simply take the checksum.
Is there a way to check if a table is correct? (I realise I can drop the entire table into a file and take the checksum of that file, but maybe there’s a simpler way.)
The Solutions:
Below are the methods you can try. The first solution is probably the best. Try others if the first one doesn’t work. Senior developers aren’t just copying/pasting – they read the methods carefully & apply them wisely to each case.
Method 1
There are multiple ways and tools out there to do a database comparison. My preference is the tool SQL Examiner Suite. It is simple, powerful, relatively cheap, and can you tell you both schema differences and data differences. It’s flexible in the advanced options by allowing you to ignore different things during comparison such as collation differences, defaults, check constraints, etc. And it can compare databases on the same server or across different servers.
Additionally, here’s a comprehensive list of tools for database comparison, from MSSQLTips.
If you mostly only care about data differences, don’t have a lot of tables to compare, and don’t want to spend the money on a tool, you can probably roll your own solution using something like the HASHBYTES()
or CHECKSUM()
functions. I prefer HASHBYTES()
for its higher precision.
Leveraging one of the aforementioned functions, the idea would be to create a single hash per row (of the concatenation of every column in the table), and do a FULL JOIN
on the hashes between the two instances of the table, filtering where either side’s key is null. This would leave you with just the rows that are different.
How you want to implement the hashing function is your call. If it’s a one time comparison, you can do it on the fly (though the join can be slow). If it’s a recurring thing, you can add it as a column to the table itself or create a view on top of the table with it as a column, if you’re able to alter the schema. This would allow you to create an index on the result of the function (HASHBYTES()
is deterministic, I’m not so sure about CHECKSUM()
), for better performance during comparison.
The limitation to this method is that both tables need to be accessible to be joined on, either by having both databases on the same server or by leveraging something like a Linked Server to access the other database’s server, which potentially can be slow for a lot of data.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0