As a system administrator, part of your responsibility is to help users manage their data. One of the vital aspects of doing that is to ensure your organization has a good backup plan, and that your users either make their backups regularly, or else don’t have to because you’ve automated the process.
However, sometimes the worst happens. A file gets deleted by mistake, a filesystem becomes corrupt, or a partition gets lost, and for whatever reason, the backups don’t contain what you need.
As we discussed in How to prevent and recover from accidental file deletion in Linux, before trying to recover lost data, you must find out why the data is missing in the first place. It’s possible that a user has simply misplaced the file, or that there is a backup that the user isn’t aware of. But if a user has indeed removed a file with no backups, then you know you need to recover a deleted file. If a partition table has become scrambled, though, then the files aren’t really lost at all, and you might want to consider using TestDisk to recover the partition table, or the partition itself.
What happens if your file or partition recovery isn’t successful, or is only in part? Then it’s time for Scalpel. Scalpel performs file carving operations based on patterns describing unique file types. It looks for these patterns based on binary strings and regular expressions, and then extracts the file accordingly.
This tool isn’t currently being maintained, but it’s ever-reliable, compiling and running exactly as expected. If you’re running Red Hat Enterprise Linux (RHEL) 7, RHEL 8, or Fedora, you can download Scalpel’s RPM installers, along with its dependency, libtre
, from klaatu.fedorapeople.org.
Starting with Scalpel
Scalpel comes bundled with a comprehensive list of file types and their most unique identifying features. Sometimes, a file can be identified by predictable text at its head and tail:
htm n 50000 <html </html>
While at other times, cryptic-looking hex codes are necessary:
jpg y 200000000 \xff\xd8\xff\xe0\x00\x10 \xff\xd9
Scalpel expects you to duplicate /etc/scalpel.conf
edit your copy to include the file types you hope to recover, and to exclude the file types you know you don’t need. For instance, if you know you don’t have or care about .fws
files, then comment that line out of the file. Doing this can speed up the recovery process and reduce false positives.
In the configuration file, the format of a file definition is, from left to right:
- The file’s extension.
- Whether the header and footer are case sensitive (
y
orn
). - The minimum and maximum file size you want Scalpel to find.
- A standard header that identifies the beginning of the file.
- A standard footer that identifies the end of the file.
The footer
field is optional. If no footer is provided, then Scalpel extracts the number of bytes you set as the file type’s maximum value.
You might find that a recovery effort only rescues part of a file, such as this mostly-recovered JPG:
This result means that you probably need to increase the file’s bounds maximum value, and then re-scan, so that the end of the file can be recovered, too:
Defining new file types
First, make a copy of the Scalpel configuration file. If all your users generate similar data, then you may only need one config file for your entire organization. Or, you might find it better to have one config file per department.
To add your own file types to a Scalpel config, start with some investigative forensics.
For text files, you ideally have some predictable structure you can anticipate. For instance, an XML file probably starts with <xml
and ends with </xml
. Binary files are similarly predictable. Using the hexdump
command, you can view a typical header from the file type you want to define. Here’s the results for an XCF, the default layered graphic file from GIMP:
$ head --bytes 8 example.xcf | hexdump --canonical
00000000 67 69 6d 70 20 78 63 66 |gimp xcf|
00000008
This output is from a Red Hat Enterprise Linux 8 system. On older systems, an older syntax may be necessary:
$ head --bytes 8 example.xcf | hexdump -C
00000000 67 69 6d 70 20 78 63 66 |gimp xcf|
00000008
The canonical output of hexdump
displays the address in the far left column, and the decoded values on the far right. In the center column are the hexadecimal bytes of the first 8 bytes of the XCF file’s first line.
Most binary files in /etc/scalpel.conf
look pretty similar to that output, except that these values are prefaced with the \x
escape sequence to denote that the numbers are actually hexadecimal digits. For instance, a JPG file looks like this in the configuration file:
jpg y 200000000 \xff\xd8\xff\xe0\x00\x10 \xff\xd9
Compare that value with a test hexdump of the first 6 bytes (because that’s how many bytes scalpel.conf
contains in its JPG definition) of any JPG file on your system:
$ head --bytes 6 example.jpg | | hexdump --canonical
00000000 ff d8 ff e0 00 10 |......|
00000006
Compare the footer with the last 2 bytes to match what the config file shows:
$ tail --bytes -2 example.jpg | hexdump --canonical
00000000 ff d9 |..|
00000002
These values match up, so you can be confident that valid JPG files probably all start and end in a predictable sequence.
Note: The Ogg entry in the scalpel.conf
file is misleading, as it lacks the \x
escape sequence. If you need to recover an Ogg file, fix this, or replace its definition.
Getting to work
Now, to obtain the same level of confidence for all files you need to recover (such as XCF, in the previous example). To reiterate, this is your workflow for defining the binary file types common to the victim drive:
- Get the hexadecimal values of the first few bytes of a file type using the
head --bytes n
command. - Get the last few bytes using the
tail --bytes -n
command. - Repeat this process on several different files of the same type to confirm consistency of this pattern, adjusting the length of your header and footer patterns as required.
- Enter the header and footer values into your custom Scalpel config, using the
\x
notation to identify each byte as a hexadecimal character.
Follow this sequence for each important binary file type you need to recover.
If a file is plaintext, provide a common header and footer, such as #!/bin/sh
for shell scripts, #
(the space after the #
is important) for markdown files with an h1 level title, <xml
for XML files, and so on.
When you’re ready to run Scalpel, create a directory where it can place your rescued files:
$ mkdir /run/media/seth/rescuer/scalped
Note: Do not create this directory on the same volume that contains the lost data.
If the victim drive is not yet mounted, mount it, and then run Scalpel:
$ scalpel -c my-scalpel.conf \
-o /run/media/seth/rescuer/scalped \
/run/media/seth/victim
You can also run Scalpel on a disk image:
$ scalpel -c my-scalpel.conf \
-o ~/scalped ~/victim.img
When Scalpel is done, review the files in your designated rescue directory.
All in all, it’s best to make backups so you can avoid doing file recovery at all. But, should the worst happen, try Scalpel and carve carefully.
저자 소개
Seth Kenlon is a Linux geek, open source enthusiast, free culture advocate, and tabletop gamer. Between gigs in the film industry and the tech industry (not necessarily exclusive of one another), he likes to design games and hack on code (also not necessarily exclusive of one another).
채널별 검색
오토메이션
기술, 팀, 인프라를 위한 IT 자동화 최신 동향
인공지능
고객이 어디서나 AI 워크로드를 실행할 수 있도록 지원하는 플랫폼 업데이트
오픈 하이브리드 클라우드
하이브리드 클라우드로 더욱 유연한 미래를 구축하는 방법을 알아보세요
보안
환경과 기술 전반에 걸쳐 리스크를 감소하는 방법에 대한 최신 정보
엣지 컴퓨팅
엣지에서의 운영을 단순화하는 플랫폼 업데이트
인프라
세계적으로 인정받은 기업용 Linux 플랫폼에 대한 최신 정보
애플리케이션
복잡한 애플리케이션에 대한 솔루션 더 보기
오리지널 쇼
엔터프라이즈 기술 분야의 제작자와 리더가 전하는 흥미로운 스토리
제품
- Red Hat Enterprise Linux
- Red Hat OpenShift Enterprise
- Red Hat Ansible Automation Platform
- 클라우드 서비스
- 모든 제품 보기
툴
체험, 구매 & 영업
커뮤니케이션
Red Hat 소개
Red Hat은 Linux, 클라우드, 컨테이너, 쿠버네티스 등을 포함한 글로벌 엔터프라이즈 오픈소스 솔루션 공급업체입니다. Red Hat은 코어 데이터센터에서 네트워크 엣지에 이르기까지 다양한 플랫폼과 환경에서 기업의 업무 편의성을 높여 주는 강화된 기능의 솔루션을 제공합니다.