The question:
I’m a recently recruited Data Engineer, I have made a bunch of ETL pipelines planned to run daily, the destination tables are truncated and charged again. The DBMS is on-prem SQL Server.
When I arrived I found other ETLs that also run in a daily manner, but when I checked the fragmentation percentage on the indexes the rates were too high, so I’m thinking of creating an optimization task for all the indices.
Most of the clustered indices (primary keys) are doing just fine, but the non-clustered indices are suffering from high fragmentation, How and when should I rebuild/reorganize indices? After or before loading the new data?
The Solutions:
Below are the methods you can try. The first solution is probably the best. Try others if the first one doesn’t work. Senior developers aren’t just copying/pasting – they read the methods carefully & apply them wisely to each case.
Method 1
I would do this,
- Truncate table before re-load.
- Script out existing index definitions and drop the existing.
- Run the re-load which should be done much faster than before as there were no pre-existing index’s on the table.
- Then, re-create the indexes using definitions extracted from step (2).
Pros:
- Fast load times
- Controls the fragmentation that you are concerned about.
Cons:
- Every time you have to re-create your indexes.
Note: If the time of completion is a factor then you probably would need to try both approaches(Re-create vs Re-build) and see what works best for you.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0