FACE SWAP PROBLEMSJAN 12, 2026

Why Some Faces Can't Be Swapped by AI

Why does AI ignore some faces? It's not a bug—it's a boundary. Discover how side profiles, glasses, and shadows blind the detector, and how to pick footage that works every time.

Why Some Faces Can't Be Swapped by AI

slug: why-some-faces-cant-be-swapped-by-ai

Key findings

Some faces are effectively "unswappable" for current one-shot face swap systems (like FaceFusion). This is not a random bug, but a predictable limitation of the underlying computer vision pipeline. The process relies on three strictly sequential steps: Face Detection, Landmark Alignment, and Identity Embedding. If the input face falls outside the training distribution of these specific models—such as extreme side profiles, heavy occlusion, non-photorealistic styles (cartoons/statues), or severe motion blur—the pipeline breaks at Step 1 or Step 2, preventing any swap from occurring.

Applicable Scope This explanation applies primarily to one-shot face swappers based on the inswapper_128 model and common detectors like retinaface or yoloface. It may not fully apply to training-based methods like LoRA or Dreambooth, which can learn new face distributions given enough data.


What the phenomenon looks like

  • "Why does the face swap work on one person but ignore the other?"
  • "I tried to swap a cartoon/anime character, but nothing happened."
  • "The face is clearly visible, but the software says 'No Face Detected'."
  • "It works in the preview but fails in the final video for certain angles."
  • "Can I swap a face that is looking away from the camera?"

When this problem appears most often

The failure to swap usually happens under these specific visual conditions:

  1. Extreme Angles (Profile Views): When a face turns beyond ~45-60 degrees (a sharp side profile), most landmark detectors (like 2dfan4) lose track of the necessary 5 key points (eyes, nose, mouth corners).
  2. Non-Photorealistic Inputs: Cartoons, oil paintings, statues, or highly stylized 3D avatars often lack the specific texture gradients that retinaface or yoloface were trained to recognize as "human."
  3. Occlusion: Sunglasses, face masks, hands covering the mouth, or heavy bangs covering eyes can break the contiguous facial structure required for a valid embedding.
  4. Severe Motion Blur: In video, fast movement smears facial features. If the detector cannot find sharp edges, it skips the frame to avoid swapping onto "ghosting" artifacts.
  5. Closed Eyes: While sometimes swappable, closed eyes remove critical landmarks, often causing the embedding model to produce a low-confidence score or a "zombie-like" swap.

Why this happens

To understand why a swap fails, you have to look at the "Gatekeeper Pipeline." A face swap isn't a single magic operation; it's a relay race. If the baton drops at any stage, the race ends instantly.

1. The Detector (RetinaFace / YOLOFace)

The first gatekeeper is the Face Detector. Its only job is to draw a box around what it thinks is a human face.

  • The Limit: It is trained on datasets of real human faces. It does not "understand" a face; it statistically recognizes patterns of light and shadow.
  • The Failure: A cartoon line drawing or a blurry smear doesn't match the mathematical pattern of a face, so the detector returns 0 faces.
  • User Action: Users often try to lower the face_detector_score (threshold). This tells the detector to be "less picky," which might catch a blurry face but introduces the risk of swapping onto background objects (like a pattern on a shirt).

2. The Aligner (Landmark Extraction)

Once a box is found, the system must locate specific "anchors" (landmarks) to align the standard 128x128 mask.

  • The Limit: Standard aligners need 5 visible points: both eyes, the nose tip, and mouth corners.
  • The Failure: In a side profile, one eye and one mouth corner might be hidden. The aligner cannot guess the geometry accurately, so the pipeline aborts to prevent a twisted, horrific result.

3. The Embedder (ArcFace / InsightFace)

Finally, the system extracts the "Identity" vector.

  • The Limit: The model (often inswapper_128) expects a normalized, front-facing human face.
  • The Failure: Even if detection succeeds (e.g., on a statue), the embedder might extract a "weak" identity vector because the texture is stone, not skin. The swap might technically happen, but it won't look like the target person because the "source identity" signal is garbage.

Trade-offs implied

  • Sensitivity vs. Stability: Lowering the detection threshold (face_detector_score) catches more difficult faces (e.g., blurry ones), but drastically increases false positives (swapping faces onto knees, wall outlets, or shirt patterns).
  • Profile Support vs. Distortion: Using models trained for extreme angles can enable side-profile swaps, but these often look stretched or uncanny because the generative model (inswapper) is optimized for frontal views.

Frequently asked questions

Q: Is my GPU too weak to detect the face? A: No. Detection failure is almost always a software/model limitation, not a hardware power issue. A stronger GPU just fails faster.

Q: Why can't I swap anime characters? A: Most generic models (like inswapper_128) are trained exclusively on photorealistic human data. They effectively "blind" themselves to non-human art styles to avoid errors.

Q: Can I train the model to recognize this specific face? A: Not with standard one-shot swappers like FaceFusion's default mode. That requires training a LoRA or a specific checkpoint (Dreambooth), which is a completely different workflow than one-shot swapping.

Q: Why does it work in the preview but not the output? A: Previews often run at lower resolutions or skip certain filtering steps for speed. The final render applies the full pipeline (including strictly enforced detection thresholds), which might reject a borderline face that the preview let slide.


Final perspective

"Unswappable" faces are rarely a user error; they are the boundaries of the model's training data. Recognizing these limits saves hours of futile parameter tweaking. If the face is blocked, blurry, or non-human, the most efficient fix is usually to change the source footage, not the settings.