Originally posted on Cicayda.com:
In the first article, CSI eDiscovery: The Fundamentals of Computer Forensics, we reviewed the growing world of electronic data (ESI), why a forensic collection is helpful in litigation, and when you should begin to consider collection.
Now let’s explore the various methods of forensic collection when dealing with physical computers. While the method of collection may differ from matter to matter the forensic purpose remains the same: to preserve electronic evidence.
Personal Computers/Servers
Personal computers have been a staple in the workplace since the 1980s allowing employees to produce & send correspondence, calculate figures, and perform research all in a more efficient manner. Due to heavy use of personal computers and internal storage devices such as email and data servers, many of the initial data collections begin with these devices.
However, one of the biggest questions that arises once a decision has been made to collect data is: “Do we copy or image?”
This question is typically answered based on the type of matter and if there is a fear of evidence tampering whether it be accidental or intentional.
Many cases involve simple inquires such as what time an email was sent or ensuring all documents were properly collected based on agreed upon set of search terms. These types of matters do not typically require the additional expense and time needed to forensically image everyone’s personal computer and business servers. A properly structured and targeted forensic copy along with a forensic report will suffice and greatly reduce the cost of gathering the material.
However, there are matters where there may be suspicious activity such as copying intellectual property, criminal activity, or matters where there are regulatory bodies requesting a broad-scope collection requiring devices to be forensically imaged.
For example, there may be suspicion of data being deleted from a laptop. An exact image of that hard drive may be created in order to perform technical analysis on empty sectors.
This process requires onsite experts, hardware, and time but offers greater exploration into the entire contents of the collected media that a forensic copy will not.
Regardless of imaging or copying, the forensically captured and preserved information becomes the authoritative source for all searches, further analysis, and productions during legal discovery eliminating the need to disturb the originating source.
Forensic Copying
The operating systems’ basic copy methods offered to the ordinary user have built in checks such as Cyclic Redundancy Check (CRC) to verify that files have been copied when a user executes the copy/paste or transfers a file over a network. However, they do not maintain the integrity of the hidden data.
Various tools are available to the computer forensic expert used to authenticate and report that a file collection was copied with no alterations. These tools utilize industry accepted algorithms during the collection process to validate data during the collection process. Two of the algorithms for ensuring data has been replicated exactly is the Message Digest Algorithm often referred to as the MD5 and the Secure Hash Algorithm commonly known as SHA.
Richard P. Salgado states “[h]ash algorithms are used to confirm that when a copy of data is made, the original is unaltered and the copy is identical, bit-for-bit… The hash algorithm has afforded digital media forensic analysis a highly reliable and efficient means to ensure that the integrity of the digital evidence collected remains uncompromised.”
Additionally, forensic collection tools aim to keep record of the file attributes or metadata since the algorithms typically only create a hash on the binary content. This data includes items such as filename, storage path, and file system date-time stamps such as creation date & last modified date. Further information on metadata can be found in the previous article Metadata Explained.
“When a hash algorithm is used, it computes a string of numbers for a digital file. Any change to the data will result in a change to the hash value. Both MD5 and SHA-1 algorithms are commonly used on forensic image files. The hash process is normally used during acquisition of the evidence, during verification of the forensic image (duplicate of the evidence), and again at the end of the examination to ensure the integrity of the data and forensic processing. MD5 and SHA-1 hash values are also currently used to validate the integrity of downloaded files in information technology applications” (Forensic Mag, 2008).
The Validation Algorithms
MD5
Per the creator of the MD5, R.L. Rivest, “the MD5 algorithm takes as input a message of arbitrary length and produces as output a 128-bit ‘fingerprint’ or ‘message digest’ of the input message.” In other words, the algorithm creates a fingerprint, known in the industry as a checksum or hash value, that can be compared to other fingerprints. So the MD5 “fingerprint” can be compared between copies of the same file to ensure that nothing has been altered within one of the files. Even the smallest change to a file can alter the entire fingerprint causing the values to be different.
SHA
The US government also created a hashing algorithm called the Secure Hash Algorithm. It has been revised a few times moving from SHA-0 to SHA-1 and so on for security concerns. However, SHA-1 is currently an acceptable hashing algorithm for the purposes of forensic validation in the eDiscovery world.
Authentication Not Encryption
The MD5 was shown in 2004 not to be collision resistant. It was then deemed “cryptographically broken” and not digitally secure. SHA1 sums were also found to be vulnerable in 2005. Nonetheless, these algorithms are still very powerful in forensic preservation and collection. Don L. Lewis succinctly states, “we use hash algorithms for known file identification and evidence authentication, which differs from its use in encryption” (Forensic Mag, 2008).
The National Digital Stewardship Alliance (NDSA) is a US based organization aimed at “long term preservation of digital information.” They have concluded that both cryptographic hash functions, MD5 & SH1, are valid methods for establishing fixity or “the property of a digital file or object being fixed or unchanged” due to the “high level of detail” (loc.gov, 2014).
The hash value and other file level metadata such as creation date, file size, and folder path are typically stored in a log file when data is forensically copied. This log creates a defensible record of all the data collected during the copy.
Forensic Imaging
Forensic imaging is a much more comprehensive than the forensic copy. Imaging creates an exact replica of the hard drive including all of the open blocks of a hard drive which may contain deleted data.
It is the most accurate and historical snapshot of the physical data source but may be an excessive method of collection.
Think of a hard drive as a buffet.
There is a wide assortment of food choices partitioned into the salad station, a selection of meats, vegetables, fruits, and, my favorite area, desserts. There is a finite amount of spaces for the food and sometimes there is an empty space or two waiting to be replaced/refilled. This is representative of the physical space, occupied or not, available on the hard drive.
So how do you track what areas have food and what areas had food but need something new? The chefs.
The chefs dictate how to setup the buffet, they are aware of what areas of the buffet have an abundance of food and what areas are empty. They are also aware of what used to be in the empty containers. There may even been some remaining crumbs on the buffet indicating what was once available. The chefs are representative of the hard drive’s master file table (MFT) sometimes referred to as the File Allocation Table (FAT).
Like the chefs managing the buffet, the MFT tracks what is on the drive or has been removed. The crumbs lingering on the buffet are the remaining bits of the deleted file waiting to be wiped clean and replaced with another batch of food.
The forensic image will capture an exact replica or mirror image of all individual bit cookies, browser history, registry entries, system files, temporary files for items such as memory resident documents that were never “saved”, instant messing data, hidden files, files marked as deleted, and even the empty space.
Once collected, the forensic image may be searched for deleted evidence, extract reports on internet history, or export a set of files for further review.
An Excessive Method of Collection?
A forensic image of a hard drive may be an excessive method of collection both from a time and cost perspective. However, it may be the most important factor when dealing with cases involving criminal activity or suspicion of tampering with evidence.
For example, in the US v Diaz case a truck driver was convicted of possession with intent to distribute 1,000 kilograms or more of marijuana. Jesus Manuel Diaz made a routine stop in his tractor-trailer at a New Mexico weigh station. His vehicle was later searched due to suspicious activity by Mr. Diaz as well as inconsistent bill of lading & weight scale tickets. Further investigation led to a laptop and printer being discovered within Diaz’s cab.
The computer was forensically imaged then searched for any suspicious activity. “[A] computer forensics expert testified that a program used to create bills of lading had been deleted from the computer the day before Diaz was arrested” (US v Diaz, 2009).
Types of Digital Evidence
Yuri Gubanov provides quite a comprehensive list of ESI that may be found on PCs and Servers in his article Retrieving Digital Evidence: Methods, Techniques and Issues published inForensicFocus in July 2012.
Types of digital evidence include all of the following, and more:
- Address books and contact lists
- Audio files and voice recordings
- Backups to various programs, including backups to mobile devices
- Bookmarks and favorites
- Browser history
- Calendars
- Compressed archives (ZIP, RAR, etc.) including encrypted archives
- Configuration and .ini files (may contain account information, last access dates etc.)
- Cookies
- Databases
- Documents
- Email messages, attachments and email databases
- Events
- Hidden and system files
- Log files
- Organizer items
- Page files, hibernation files and printer spooler files
- Pictures, images, digital photos
- Videos
- Virtual machines
- System files
- Temporary files
Conclusion
Always discuss the upcoming collection need with a forensic consultant first. They will help determine the correct method of collection based on the nature of the matter. Additionally, the forensic consultant will ensure that you are approaching the collection in the most economical yet defensible method possible.