Stupid Lessons #1: Deleting SAS datasets

From time to time we all make mistakes. Sometimes bigger, sometimes smaller. It is being said that doing a mistake once is an accident, doing it twice is being an idiot. With post series "Tough lessons" I hope to prevent myself from being an idiot by exposing these lessons publicly, discussing what I learnt and trying to understand how they can be prevented from happening again.

The most recent big mistake of mine is confusing the difference between permanent and temporary SAS datasets which resulted in me deleting half of client's data. Thankfully, there are backups so the data was restored fairly easily. Still, it was a nerve-wrecking experience caused by false assumptions.

Alright, let's dissect the issue by examining the difference between temporary and permanent datasets.

Temporary vs. permanent datasets

Temporary SAS datasets only exist during the current SAS session and are saved in temporary library "work".

Permanent SAS datasets are saved to a physical location in memory. Thus, they exist even after you close SAS program and be opened later for reuse.

The key difference to note is where these datasets are stored. Temporary datasets at "work" by default and permanent datasets at all other libraries with name other than "work".

Temporary vs. permanent datasets

Now, here is where the lesson comes. When you delete datasets from permanent library, they get deleted from the actual physical location of where they are stored as well! I was under the impression that SAS keeps a copy of a dataset when it opens it via LIBNAME statement but apparently that's not the case. Thus, be very careful deleting files from permanent libraries because once you delete them, it is hard, if not impossible, to recover them back without backups.

On the other hand, datasets from temporary library such as "work" can be deleted safely since datasets there are only generated on the fly when the program runs. Thus, the program can be rerun and the deleted datasets will reappear.

Preventing accidental deletion

libname raw "/path/to/file" access=readonly; "access=readonly"  

Above option may be used. It prevents from modifying or deleting the dataset in the SAS session.

sudo chown -R root:root "directory name" && sudo chmod -R 700 "directory name"  

If you use Linux and have access rights to "root", you may be able to make files undeletable without explicit permission. This is helpful in case you let others work with sensitive datasets that you want to make sure will not get deleted.

data test(alter="pass");  
   set sashelp.class;
run;  

To prevent accidental deletion or modification of datasets by code from within the SAS program you may also use 'alter' option as shown above. It prevents modification or deletion of datasets without supplying the password in a popup window.

Conclusions

First, always make sure that the dataset you are deleting is temporary! If it's a permanent dataset, then make sure you really do not need it anymore.

Second, there are many options to prevent accidental deletion or modifications of datasets. Use them wherever appropriate.

Finally, even with these precautions in place I still believe having backups is mandatory if you or your company relies on data heavily. Anything might happen and the only robust way to recover from data loss is via backups.