Skip to main content

Overview

Advanced filtering uses Python callbacks to give you complete control over the filtering process. This enables complex operations that can’t be achieved with simple command-line options.

Understanding Callbacks

Callbacks are Python functions that filter-repo calls for each git object. You provide the function body as a string.

Basic Callback Structure

For a callback like --name-callback, filter-repo creates:
def name_callback(name):
  YOUR_CODE_HERE
  return modified_name
You only provide the YOUR_CODE_HERE part.
Bytestrings Requiredgit-filter-repo uses bytestrings (bytes), not strings:
  • Use b"text" instead of "text"
  • Compare with b"value" not "value"
  • Use .replace(b"old", b"new")

Simple Callbacks

Name Callback

Modify author, committer, and tagger names:
git filter-repo --name-callback '
  return name.replace(b"Wiliam", b"William")
'

Email Callback

Fix email addresses:
git filter-repo --email-callback '
  # Fix common typos
  email = email.replace(b".cm", b".com")
  email = email.replace(b"gmial.com", b"gmail.com")
  return email
'

Refname Callback

Modify branch and tag names:
git filter-repo --refname-callback '
  # Add prefix to all branches (refs/heads/main -> refs/heads/v2-main)
  if refname.startswith(b"refs/heads/"):
    branch = refname[11:]  # Remove "refs/heads/"
    return b"refs/heads/v2-" + branch
  return refname
'
Refnames must be fully qualified:
  • Use b"refs/heads/main" not b"main"
  • Use b"refs/tags/v1.0" not b"v1.0"

Filename Callback

Rename or remove files:
git filter-repo --filename-callback '
  # Remove all files in src/ subdirectories (except toplevel src/)
  if b"/src/" in filename:
    return None  # Delete file
  
  # Rename tools/ -> scripts/misc/
  if filename.startswith(b"tools/"):
    return b"scripts/misc/" + filename[6:]
  
  # Keep all other files unchanged
  return filename
'
Return values:
  • filename - Keep file unchanged
  • Modified filename - Rename file
  • None - Remove file from history

Message Callback

Modify commit and tag messages:
git filter-repo --message-callback '
  # Add Signed-off-by if missing
  if b"Signed-off-by:" not in message:
    message += b"\nSigned-off-by: Me Myself <me@example.com>"
  
  # Fix typos
  message = re.sub(b"[Ee]-?[Mm][Aa][Ii][Ll]", b"email", message)
  
  return message
'

Object Callbacks

More powerful callbacks that operate on complete git objects.

Blob Callback

Modify file contents:
git filter-repo --blob-callback '
  # Skip blobs over 25 bytes
  if len(blob.data) > 25:
    blob.skip()
  else:
    blob.data = blob.data.replace(b"Hello", b"Goodbye")
'
Blob properties:
  • blob.data - File contents (bytes)
  • blob.original_id - Original git hash
  • blob.id - New git object ID
  • blob.skip() - Remove this blob

Commit Callback

Modify commits:
git filter-repo --commit-callback '
  # Remove executable files with "666" in their name
  commit.file_changes = [
    change for change in commit.file_changes
    if not (change.mode == b"100755" and b"666" in change.filename)
  ]
  
  # Prevent deletion of specific file
  commit.file_changes = [
    change for change in commit.file_changes
    if not (change.type == b"D" and change.filename == b"important.txt")
  ]
  
  # Make all .sh files executable
  for change in commit.file_changes:
    if change.filename.endswith(b".sh"):
      change.mode = b"100755"
'
Commit properties:
  • commit.branch - Branch name (bytes)
  • commit.original_id - Original commit hash
  • commit.author_name, commit.author_email, commit.author_date
  • commit.committer_name, commit.committer_email, commit.committer_date
  • commit.message - Commit message (bytes)
  • commit.parents - List of parent commit IDs
  • commit.file_changes - List of FileChange objects
  • commit.skip(new_id) - Skip this commit
FileChange properties:
  • change.type - b"M" (modify), b"D" (delete), b"DELETEALL"
  • change.filename - Path (bytes)
  • change.mode - File mode: b"100644", b"100755", b"120000", b"160000"
  • change.blob_id - Git blob ID

Tag Callback

Modify annotated tags:
git filter-repo --tag-callback '
  # Skip tags by specific author
  if tag.tagger_name == b"Jim Williams":
    tag.skip()
  else:
    # Add extra info to tag message
    tag.message += b"\n\nTag of %s by %s on %s" % (
      tag.ref, tag.tagger_email, tag.tagger_date
    )
'
Tag properties:
  • tag.ref - Tag name (without refs/tags/ prefix)
  • tag.from_ref - Commit being tagged
  • tag.original_id - Original tag hash
  • tag.tagger_name, tag.tagger_email, tag.tagger_date
  • tag.message - Tag message
  • tag.skip() - Remove this tag

Reset Callback

Modify reset (branch creation) events:
git filter-repo --reset-callback '
  # Rename master branch to main
  reset.ref = reset.ref.replace(b"master", b"main")
'
Reset properties:
  • reset.ref - Reference name
  • reset.from_ref - Commit hash or mark

Advanced Use Cases

Multi-Line Callbacks

Use multi-line Python code:
git filter-repo --filename-callback '
  # Define a mapping
  renames = {
    b"README": b"README.md",
    b"COPYING": b"LICENSE",
    b"AUTHORS": b"CONTRIBUTORS.md",
  }
  
  # Apply renames
  if filename in renames:
    return renames[filename]
  
  # Remove backup files
  if filename.endswith(b".bak") or filename.endswith(b"~"):
    return None
  
  return filename
'

Using Regular Expressions

The re module is available:
git filter-repo --message-callback '
  # Convert issue references: #123 -> JIRA-123
  message = re.sub(b"#(\\d+)", b"JIRA-\\1", message)
  
  # Remove trailing whitespace from each line
  lines = message.split(b"\\n")
  lines = [re.sub(b"\\s+$", b"", line) for line in lines]
  message = b"\\n".join(lines)
  
  return message
'

Accessing Metadata

Commit callback receives additional metadata:
git filter-repo --commit-callback '
  # aux_info contains:
  # - orig_parents: original parent commit IDs
  # - had_file_changes: whether commit had file changes
  
  # Example: Mark commits that lost all files
  if not commit.file_changes and aux_info["had_file_changes"]:
    commit.message += b"\n\n[Note: All file changes filtered out]"
'

Conditional Processing

git filter-repo --blob-callback '
  # Only process small text files
  if len(blob.data) > 1024 * 1024:  # > 1MB
    return
  
  if b"\\0" in blob.data[0:8192]:  # Binary file
    return
  
  # Safe to process as text
  blob.data = blob.data.upper()
'

Combining Callbacks

Use multiple callbacks together:
git filter-repo \
  --name-callback 'return name.title()' \
  --email-callback 'return email.lower()' \
  --filename-callback '
    if filename.endswith(b".tmp"):
      return None
    return filename
  ' \
  --message-callback '
    return message.replace(b"TODO", b"DONE")
  '

Complex Examples

Enforce File Naming Convention

git filter-repo --filename-callback '
  # Convert to lowercase
  parts = filename.split(b"/")
  parts[-1] = parts[-1].lower()
  filename = b"/".join(parts)
  
  # Replace spaces with hyphens
  filename = filename.replace(b" ", b"-")
  
  # Remove special characters
  filename = re.sub(b"[^a-z0-9/_.-]", b"", filename)
  
  return filename
'

Add File Headers

git filter-repo --blob-callback '
  # Skip binary files
  if b"\\0" in blob.data[0:8192]:
    return
  
  # Add copyright header to source files
  header = b"""# Copyright (C) 2024 Example Corp
# Licensed under MIT License

"""
  
  if not blob.data.startswith(b"# Copyright"):
    blob.data = header + blob.data
'

Squash Small Commits

This requires more complex logic:
git filter-repo --commit-callback '
  # Skip commits with tiny messages
  if len(commit.message) < 10:
    commit.skip(commit.first_parent())
'
commit.skip(new_id) marks the commit as skipped and maps its ID to new_id. Children of this commit will use new_id as their parent.

Rewrite Dates

git filter-repo --commit-callback '
  # Make all commits appear to be from 2024
  import time
  from datetime import datetime
  
  # Parse existing date
  timestamp, timezone = commit.author_date.split()
  dt = datetime.fromtimestamp(int(timestamp))
  
  # Update year
  new_dt = dt.replace(year=2024)
  new_timestamp = int(new_dt.timestamp())
  
  # Update both author and committer dates
  commit.author_date = b"%d %s" % (new_timestamp, timezone)
  commit.committer_date = commit.author_date
'

Remove Merge Commits

git filter-repo --commit-callback '
  # Skip merge commits (commits with multiple parents)
  if len(commit.parents) > 1:
    commit.skip(commit.first_parent())
'

Using External Scripts

For very complex logic, use external Python scripts:
git filter-repo --commit-callback "$(cat my_callback.py)"
my_callback.py:
import json

# Load configuration
with open('filter-config.json', 'rb') as f:
  config = json.load(f)

# Complex filtering logic
if commit.branch in config['protected_branches']:
  return

# ... more logic ...

Performance Tips

Optimize Callbacks
  1. Avoid expensive operations in hot paths
  2. Cache results when possible
  3. Short-circuit early if possible
  4. Use bytestring operations (faster than string)
# Good: Short-circuit early
if not filename.endswith(b".py"):
  return filename
# ... expensive processing ...

# Bad: Always processes
# ... expensive processing ...
if filename.endswith(b".py"):
  return modified_filename
return filename
Callback ErrorsIf a callback raises an exception, filter-repo will abort. Test thoroughly:
# Test on a small branch first
git filter-repo --refs test-branch --callback '...'

Available Modules

These Python modules are available in callbacks:
  • argparse - Argument parsing
  • collections - Container datatypes
  • fnmatch - Filename pattern matching
  • io - I/O operations
  • os - Operating system interface
  • platform - Platform identification
  • re - Regular expressions
  • shutil - High-level file operations
  • subprocess - Subprocess management
  • sys - System-specific parameters
  • time - Time access
  • textwrap - Text wrapping
  • datetime - Date/time handling
Plus all filter-repo classes:
  • Blob, Commit, Tag, Reset, FileChange
  • FilteringOptions, RepoFilter

API Compatibility Warning

API May ChangeThe callback API is NOT guaranteed to be stable. If you write scripts that use callbacks:
  1. Pin to a specific git-filter-repo version
  2. Test after any upgrades
  3. Contribute test cases for APIs you rely on
See Library Usage for more stable APIs.

Next Steps