diskdump.py

Python  /  Sysadmin Tools  /  Disk Imaging

diskdump.py

Language: Python 3  ·  Dependencies: none (stdlib + lsblk + dd)
Privilege: sudo  ·  Output: .img / .iso  ·  ~110 lines

A minimal, zero-dependency disk imaging script that wraps

plaintext
dd
and
plaintext
lsblk
in a clean interactive interface. It lists your drives, names the output file after the disk model and timestamp, and gets out of the way.

What it does

At its core,

plaintext
diskdump.py
is a thin Python shell around two Unix primitives:
plaintext
lsblk
for drive discovery and
plaintext
dd
for the actual byte-for-byte copy. Both have existed on Linux for decades. The script's value is not in reimplementing either — it is in removing the friction of using them together safely. Typing a raw
plaintext
dd
command by hand, under pressure, against a production disk, is exactly the kind of situation where a transposed letter becomes a catastrophe.

The script asks you three questions — which disk, where to save it, and what format — then shows you exactly what it is about to do and waits for explicit confirmation before a single byte is read. The filename is generated automatically from the disk's model name and the current timestamp, so the output is always traceable and never accidentally overwrites a previous image.

The flow, step by step
1
Drive discovery via lsblk
Calls
plaintext
lsblk -J -o NAME,SIZE,TYPE,MODEL
, parses the JSON output, and filters for
plaintext
type == "disk"
— excluding partitions, loop devices, and other block device types. Returns a clean list of physical drives.
2
Interactive disk selection
Prints each drive as a numbered menu entry showing device path, size, and model name. Loops until valid integer input is received — invalid input is silently retried rather than crashing.
3
Name sanitisation
The disk model string (e.g.
plaintext
Samsung SSD 870 EVO 1TB
) is lowercased, whitespace is collapsed to underscores, and any character outside
plaintext
[a-z0-9._-]
is stripped. Safe to use as a filename on any Linux filesystem.
4
Destination and format selection
Prompts for an output directory (validated with
plaintext
os.path.isdir
) and a format extension — either
plaintext
.img
or
plaintext
.iso
. Rejects anything else and exits cleanly. The final filename is assembled as
plaintext
{model}_{DD-MM-YYYY_HHhMM}.{ext}
.
5
Confirmation gate
Prints the full source path and destination path, then requires explicit
plaintext
yes
or
plaintext
y
input. Anything else cancels. No countdown, no implicit confirmation after a timeout.
6
dd execution
Runs
plaintext
sudo dd if=/dev/sdX of=/path/to/output bs=4M status=progress conv=fsync
. Block size is 4 MB for efficient throughput.
plaintext
status=progress
prints live transfer rate and byte count.
plaintext
conv=fsync
flushes kernel write buffers before exit, ensuring the image is fully written to disk and not lingering in cache.
The code
diskdump.py
Python

#!/usr/bin/env python3

import subprocess
import json
import os
from datetime import datetime
import sys
import re


def sanitize_name(name):
    """Convert disk model to a safe filename."""
    if not name:
        return "diskimage"
    name = name.strip().lower()
    name = re.sub(r'\s+', '_', name)
    name = re.sub(r'[^a-z0-9._-]', '', name)
    return name


def get_disks():
    result = subprocess.run(
        ["lsblk", "-J", "-o", "NAME,SIZE,TYPE,MODEL"],
        capture_output=True,
        text=True,
        check=True
    )

    data = json.loads(result.stdout)

    disks = []
    for device in data["blockdevices"]:
        if device["type"] == "disk":
            disks.append(device)

    return disks


def choose_disk(disks):
    print("\nDetected Drives:\n")

    for i, d in enumerate(disks):
        model = (d.get("model") or "Unknown").strip()
        print(f"{i+1}) /dev/{d['name']} | {d['size']} | {model}")

    while True:
        try:
            choice = int(input("\nChoose a disk number: "))
            if 1 <= choice <= len(disks):
                return disks[choice-1]
        except ValueError:
            pass

        print("Invalid selection.")


def run_dd(source, output_file):
    print("\nStarting backup...\n")

    cmd = [
        "sudo",
        "dd",
        f"if={source}",
        f"of={output_file}",
        "bs=4M",
        "status=progress",
        "conv=fsync"
    ]

    process = subprocess.run(cmd)

    if process.returncode != 0:
        print("\nBackup failed.")
        sys.exit(1)


def main():

    disks = get_disks()
    disk = choose_disk(disks)

    model = sanitize_name(disk.get("model"))
    timestamp = datetime.now().strftime("%d-%m-%Y_%Hh%M")

    directory = input("\nEnter destination directory: ").strip()
    if not os.path.isdir(directory):
        print("Directory does not exist.")
        return

    fmt = input("\nChoose image format (.img or .iso): ").strip().lower()
    if fmt not in [".img", ".iso"]:
        print("Invalid format.")
        return

    filename = f"{model}_{timestamp}{fmt}"
    output_file = os.path.join(directory, filename)

    source = f"/dev/{disk['name']}"

    print(f"\nSource: {source}")
    print(f"Destination: {output_file}")

    confirm = input("\nProceed with backup? (yes/no): ").lower()
    if confirm not in ["yes", "y"]:
        print("Cancelled.")
        return

    run_dd(source, output_file)

    print("\nBackup completed successfully!")
    print(f"Image saved to: {output_file}")


if __name__ == "__main__":
    main()    
Why dd? Why not something else?

plaintext
dd
has been the standard Unix disk copying tool since the 1970s. It operates at the block level — it does not care about filesystems, partition tables, or file structure. It reads raw bytes from a source and writes them identically to a destination. The resulting image is a forensically exact clone: every partition, every sector, every byte of slack space. Mount it as a loop device and it behaves identically to the original drive.

Alternatives exist.

plaintext
ddrescue
adds error recovery and is preferable for failing drives.
plaintext
clonezilla
offers partition-aware imaging with compression.
plaintext
rsync
handles live filesystem copies without the output being a mountable image. Each has its place. But for a straightforward, offline, full-disk image of a healthy drive —
plaintext
dd
is still the most direct tool, and this script is essentially a polite wrapper around it.

↳ bs=4M and conv=fsync

The default

plaintext
dd
block size of 512 bytes is historical. At 512 bytes per read/write cycle, imaging a modern multi-terabyte drive would take an impractical amount of time. Setting
plaintext
bs=4M
reads and writes in 4-megabyte chunks — far more efficient on modern hardware.
plaintext
conv=fsync
instructs
plaintext
dd
to call
plaintext
fsync()
before exiting, flushing all pending writes from the kernel page cache to physical storage. Without it, the script can report success while gigabytes of data are still buffered in RAM.

The filename design

The output filename —

plaintext
samsung_ssd_870_evo_1tb_23-04-2025_14h30.img
— is deliberately verbose. The model name tells you which physical device was imaged. The timestamp tells you when. Together they make a collection of disk images self-documenting without requiring a separate manifest or log file.

The

plaintext
sanitize_name
function is a small but necessary piece of defensive programming. Disk model strings returned by
plaintext
lsblk
can contain spaces, slashes, parentheses, and other characters that interact badly with shell scripts, file managers, and backup tools. The function strips everything outside the safe set
plaintext
[a-z0-9._-]
after lowercasing and collapsing whitespace. The fallback to
plaintext
"diskimage"
handles drives that report no model name at all — USB adapters and some virtual block devices commonly return
plaintext
null
.

The goal is a filename you can read six months later and immediately know what it contains, with no supporting documentation required.

Caveats and what it doesn't do

The script is intentionally minimal. There are several things it does not attempt, and being clear about them matters before using it in anger:

  • No compression The output is a raw byte-for-byte copy, the same size as the source drive. A 1 TB disk produces a 1 TB image file regardless of how much data is actually on it. Pipe through
    plaintext
    gzip
    or
    plaintext
    zstd
    manually if storage space is a concern.
  • No error recovery If
    plaintext
    dd
    hits a bad sector, it exits with a non-zero return code and the script halts. For imaging damaged or failing drives, use
    plaintext
    ddrescue
    , which can retry bad blocks, skip them, and resume interrupted sessions.
  • No live drives Imaging a mounted, running filesystem with
    plaintext
    dd
    produces an inconsistent snapshot — writes happening during the copy may be partially captured. The script makes no attempt to detect or warn about this. Unmount the source before imaging, or use LVM snapshots for live backups.
  • No verification The script does not hash the source and destination after imaging to confirm they match. For forensic or archival purposes, run
    plaintext
    sha256sum
    against both
    plaintext
    /dev/sdX
    and the output file manually after completion.
  • sudo required Reading raw block devices requires root. The script calls
    plaintext
    sudo dd
    directly, which will prompt for a password if the session doesn't have an active
    plaintext
    sudo
    token. It does not check in advance whether the user has the necessary privileges.

✓ What it is good for

Creating a bootable image of a working system before a major upgrade. Archiving a decommissioned machine's full disk state. Cloning a configured install to identical hardware. Producing a forensic snapshot of a drive that needs to be handed to someone else. Any situation where you want an exact, offline, sector-level copy of a healthy block device and a filename you'll still understand next year.

Running it
terminal
# make executable
chmod +x diskdump.py

# run it
./diskdump.py

# example session
Detected Drives:

1) /dev/sda | 931.5G | Samsung SSD 870 EVO 1TB
2) /dev/sdb |   7.5G | SanDisk Ultra

Choose a disk number: 1

Enter destination directory: /mnt/backup

Choose image format (.img or .iso): .img

Source:      /dev/sda
Destination: /mnt/backup/samsung_ssd_870_evo_1tb_23-04-2025_14h30.img

Proceed with backup? (yes/no): yes

Starting backup...

1000215216128 bytes (1.0 TB) copied, 1847 s, 541 MB/s

No installation required. No pip packages. The only dependencies are

plaintext
lsblk
and
plaintext
dd
, which ship with every mainstream Linux distribution. Drop it in
plaintext
/usr/local/bin
, mark it executable, and it works anywhere.