ISSUES WITH RETAINING THE SAME FILE NAME DURING DISCOVERY

Posted on December 17, 2010

0


INTRODUCTION

Typically when data is discovered, the documents are renamed with their appropriate control numbers. On rare occasions clients require us to retain the original document name and this can have pitfalls. One issue that we encountered was something that could not be easily resolved during our quality control process and I wanted to share my experiences with you, such that you are able to easily determine in case you experience something similar in the future.

CLIENT REQUEST

We were given several thousand office documents in folders and sub folders and the request was to print those documents to ADOBE PDF format and provide back to the client retaining the output folder structure exactly same as the original input folder structure. This was a seemingly mundane and easy request to cater to, however, we were asked to retain the original document names. Padding those document names with some sort of control number was not an option that we were allowed to use.

 WHAT TRANSPIRED?

After the process of printing all the documents in the input set to PDF was complete we found a few documents missing. The total number of documents in the input folder was more than total number of documents in the output folder. There was no error in processing and no log which indicated this reduction in the number of documents.

HOW WE WENT ABOUT TROUBLE SHOOTING THE ISSUE

We wrote a script which would return as output, path information of each folder along with the corresponding number of documents in them. We ran the same script on the Input folders as well. The comparison between the document quantities in input folders versus output folders allowed us to isolate all the sub folders where there was a difference in document count. On deeper analysis of all the documents in the troublesome sub folders got us the answer to what had occurred.

EXPLANATION OF THE ISSUE

We found documents with the same name which had different file extensions. For example: ABC.doc and ABC.xls when converted to PDF format retaining the same folder structure would both become ABC.pdf and overwrite the previous document there by reducing the document count by one. This had occurred several times in many sub folders where the documents had the same name but different extensions.

PRECAUTION & CONCLUSION

Retaining the same document name during discovery can have issues as explained in this article. In order to avoid such situations you can run an inventory on all folders and isolate the file extension from file name to evaluate whether there exists documents with the same name but different extensions. If they do exist then renaming such documents with their extension as suffix may be a solution that you may want to apply. For Example: ABC.doc and ABC.xls when found in the same folder could be renamed as ABC_doc.doc and ABC_xls.xls to avoid overwriting documents during the image conversion process. Padding document names with some sort of control number while retaining the document names in the image conversion process may be an alternative option.

Advertisement