diskdump.py

Python / Sysadmin Tools / Disk Imaging

diskdump.py

Language: Python 3 · Dependencies: none (stdlib + lsblk + dd)
Privilege: sudo · Output: .img / .iso · ~110 lines

A minimal, zero-dependency disk imaging script that wraps

plaintext

dd

and

plaintext

lsblk

in a clean interactive interface. It lists your drives, names the output file after the disk model and timestamp, and gets out of the way.

What it does

At its core,

plaintext

diskdump.py

is a thin Python shell around two Unix primitives:

plaintext

lsblk

for drive discovery and

plaintext

dd

for the actual byte-for-byte copy. Both have existed on Linux for decades. The script's value is not in reimplementing either — it is in removing the friction of using them together safely. Typing a raw

plaintext

dd

command by hand, under pressure, against a production disk, is exactly the kind of situation where a transposed letter becomes a catastrophe.

The script asks you three questions — which disk, where to save it, and what format — then shows you exactly what it is about to do and waits for explicit confirmation before a single byte is read. The filename is generated automatically from the disk's model name and the current timestamp, so the output is always traceable and never accidentally overwrites a previous image.

The flow, step by step

Drive discovery via lsblk

Calls

plaintext

lsblk -J -o NAME,SIZE,TYPE,MODEL

, parses the JSON output, and filters for

plaintext

type == "disk"

— excluding partitions, loop devices, and other block device types. Returns a clean list of physical drives.

Interactive disk selection

Prints each drive as a numbered menu entry showing device path, size, and model name. Loops until valid integer input is received — invalid input is silently retried rather than crashing.

Name sanitisation

The disk model string (e.g.

plaintext

Samsung SSD 870 EVO 1TB

) is lowercased, whitespace is collapsed to underscores, and any character outside

plaintext

[a-z0-9._-]

is stripped. Safe to use as a filename on any Linux filesystem.

Destination and format selection

Prompts for an output directory (validated with

plaintext

os.path.isdir

) and a format extension — either

plaintext

.img

plaintext

.iso

. Rejects anything else and exits cleanly. The final filename is assembled as

plaintext

{model}_{DD-MM-YYYY_HHhMM}.{ext}

Confirmation gate

Prints the full source path and destination path, then requires explicit

plaintext

yes

plaintext

input. Anything else cancels. No countdown, no implicit confirmation after a timeout.

dd execution

Runs

plaintext

sudo dd if=/dev/sdX of=/path/to/output bs=4M status=progress conv=fsync

. Block size is 4 MB for efficient throughput.

plaintext

status=progress

prints live transfer rate and byte count.

plaintext

conv=fsync

flushes kernel write buffers before exit, ensuring the image is fully written to disk and not lingering in cache.

The code

diskdump.py
Python

#!/usr/bin/env python3

import subprocess
import json
import os
from datetime import datetime
import sys
import re


def sanitize_name(name):
    """Convert disk model to a safe filename."""
    if not name:
        return "diskimage"
    name = name.strip().lower()
    name = re.sub(r'\s+', '_', name)
    name = re.sub(r'[^a-z0-9._-]', '', name)
    return name


def get_disks():
    result = subprocess.run(
        ["lsblk", "-J", "-o", "NAME,SIZE,TYPE,MODEL"],
        capture_output=True,
        text=True,
        check=True
    )

    data = json.loads(result.stdout)

    disks = []
    for device in data["blockdevices"]:
        if device["type"] == "disk":
            disks.append(device)

    return disks


def choose_disk(disks):
    print("\nDetected Drives:\n")

    for i, d in enumerate(disks):
        model = (d.get("model") or "Unknown").strip()
        print(f"{i+1}) /dev/{d['name']} | {d['size']} | {model}")

    while True:
        try:
            choice = int(input("\nChoose a disk number: "))
            if 1 <= choice <= len(disks):
                return disks[choice-1]
        except ValueError:
            pass

        print("Invalid selection.")


def run_dd(source, output_file):
    print("\nStarting backup...\n")

    cmd = [
        "sudo",
        "dd",
        f"if={source}",
        f"of={output_file}",
        "bs=4M",
        "status=progress",
        "conv=fsync"
    ]

    process = subprocess.run(cmd)

    if process.returncode != 0:
        print("\nBackup failed.")
        sys.exit(1)


def main():

    disks = get_disks()
    disk = choose_disk(disks)

    model = sanitize_name(disk.get("model"))
    timestamp = datetime.now().strftime("%d-%m-%Y_%Hh%M")

    directory = input("\nEnter destination directory: ").strip()
    if not os.path.isdir(directory):
        print("Directory does not exist.")
        return

    fmt = input("\nChoose image format (.img or .iso): ").strip().lower()
    if fmt not in [".img", ".iso"]:
        print("Invalid format.")
        return

    filename = f"{model}_{timestamp}{fmt}"
    output_file = os.path.join(directory, filename)

    source = f"/dev/{disk['name']}"

    print(f"\nSource: {source}")
    print(f"Destination: {output_file}")

    confirm = input("\nProceed with backup? (yes/no): ").lower()
    if confirm not in ["yes", "y"]:
        print("Cancelled.")
        return

    run_dd(source, output_file)

    print("\nBackup completed successfully!")
    print(f"Image saved to: {output_file}")


if __name__ == "__main__":
    main()    

Why dd? Why not something else?

plaintext

dd

has been the standard Unix disk copying tool since the 1970s. It operates at the block level — it does not care about filesystems, partition tables, or file structure. It reads raw bytes from a source and writes them identically to a destination. The resulting image is a forensically exact clone: every partition, every sector, every byte of slack space. Mount it as a loop device and it behaves identically to the original drive.

Alternatives exist.

plaintext

ddrescue

adds error recovery and is preferable for failing drives.

plaintext

clonezilla

offers partition-aware imaging with compression.

plaintext

rsync

handles live filesystem copies without the output being a mountable image. Each has its place. But for a straightforward, offline, full-disk image of a healthy drive —

plaintext

dd

is still the most direct tool, and this script is essentially a polite wrapper around it.

↳ bs=4M and conv=fsync

The default

plaintext

dd

block size of 512 bytes is historical. At 512 bytes per read/write cycle, imaging a modern multi-terabyte drive would take an impractical amount of time. Setting

plaintext

bs=4M

reads and writes in 4-megabyte chunks — far more efficient on modern hardware.

plaintext

conv=fsync

instructs

plaintext

dd

to call

plaintext

fsync()

before exiting, flushing all pending writes from the kernel page cache to physical storage. Without it, the script can report success while gigabytes of data are still buffered in RAM.

The filename design

The output filename —

plaintext

samsung_ssd_870_evo_1tb_23-04-2025_14h30.img

— is deliberately verbose. The model name tells you which physical device was imaged. The timestamp tells you when. Together they make a collection of disk images self-documenting without requiring a separate manifest or log file.

The

plaintext

sanitize_name

function is a small but necessary piece of defensive programming. Disk model strings returned by

plaintext

lsblk

can contain spaces, slashes, parentheses, and other characters that interact badly with shell scripts, file managers, and backup tools. The function strips everything outside the safe set

plaintext

[a-z0-9._-]

after lowercasing and collapsing whitespace. The fallback to

plaintext

"diskimage"

handles drives that report no model name at all — USB adapters and some virtual block devices commonly return

plaintext

null

The goal is a filename you can read six months later and immediately know what it contains, with no supporting documentation required.

Caveats and what it doesn't do

The script is intentionally minimal. There are several things it does not attempt, and being clear about them matters before using it in anger:

No compression
The output is a raw byte-for-byte copy, the same size as the source drive. A 1 TB disk produces a 1 TB image file regardless of how much data is actually on it. Pipe through
plaintext
```
gzip
```
or
plaintext
```
zstd
```
manually if storage space is a concern.
No error recovery
If
plaintext
```
dd
```
hits a bad sector, it exits with a non-zero return code and the script halts. For imaging damaged or failing drives, use
plaintext
```
ddrescue
```
, which can retry bad blocks, skip them, and resume interrupted sessions.
No live drives
Imaging a mounted, running filesystem with
plaintext
```
dd
```
produces an inconsistent snapshot — writes happening during the copy may be partially captured. The script makes no attempt to detect or warn about this. Unmount the source before imaging, or use LVM snapshots for live backups.
No verification
The script does not hash the source and destination after imaging to confirm they match. For forensic or archival purposes, run
plaintext
```
sha256sum
```
against both
plaintext
```
/dev/sdX
```
and the output file manually after completion.
sudo required
Reading raw block devices requires root. The script calls
plaintext
```
sudo dd
```
directly, which will prompt for a password if the session doesn't have an active
plaintext
```
sudo
```
token. It does not check in advance whether the user has the necessary privileges.

✓ What it is good for

Creating a bootable image of a working system before a major upgrade. Archiving a decommissioned machine's full disk state. Cloning a configured install to identical hardware. Producing a forensic snapshot of a drive that needs to be handed to someone else. Any situation where you want an exact, offline, sector-level copy of a healthy block device and a filename you'll still understand next year.

Running it

terminal

# make executable
chmod +x diskdump.py

# run it
./diskdump.py

# example session
Detected Drives:

1) /dev/sda | 931.5G | Samsung SSD 870 EVO 1TB
2) /dev/sdb |   7.5G | SanDisk Ultra

Choose a disk number: 1

Enter destination directory: /mnt/backup

Choose image format (.img or .iso): .img

Source:      /dev/sda
Destination: /mnt/backup/samsung_ssd_870_evo_1tb_23-04-2025_14h30.img

Proceed with backup? (yes/no): yes

Starting backup...

1000215216128 bytes (1.0 TB) copied, 1847 s, 541 MB/s

No installation required. No pip packages. The only dependencies are

plaintext

lsblk

and

plaintext

dd

, which ship with every mainstream Linux distribution. Drop it in

plaintext

/usr/local/bin

, mark it executable, and it works anywhere.