JAN 13, 2026

What Are FaceFusion Execution Providers and Why Do They Fail?

How FaceFusion execution providers work and why CUDA, TensorRT, OpenVINO, or CoreML may fail to load.

FaceFusion uses ONNX Runtime to run its machine learning models, and execution providers determine where that actually happens—CPU, NVIDIA GPU, Intel GPU, AMD GPU, or Apple Silicon. The most common ones are CUDA, TensorRT, OpenVINO, CoreML, and CPU.

When an execution provider fails to load, FaceFusion either falls back to CPU (which is dramatically slower) or throws an error. Most of these failures come down to version mismatches—your CUDA toolkit doesn't match the onnxruntime-gpu package, or you're missing some platform-specific dependencies.


What you might be experiencing

  • "CUDA is not showing as an option in FaceFusion."
  • "Processing is extremely slow even though I have a GPU."
  • "FaceFusion says it's using CPU but I installed CUDA."
  • "TensorRT optimization takes forever and never finishes."
  • "OpenVINO not detected on my Intel laptop."
  • "CoreML execution provider not found on Mac."
  • "Why does CodeFormer keep falling back to CPU?"

If any of these sound familiar, keep reading.


When this happens most often

Execution provider failures usually show up in these scenarios:

  • CUDA version mismatch: You've got CUDA 12.x installed, but the onnxruntime-gpu package was built for CUDA 11.8. The provider fails silently and FaceFusion falls back to CPU without telling you.

  • Missing onnxruntime-gpu: You installed the regular onnxruntime package instead of onnxruntime-gpu, which doesn't include GPU execution providers at all.

  • TensorRT first-run optimization: TensorRT compiles optimized graphs on first use. This can take 10-30 minutes or longer, and people often think it's frozen or crashed.

  • Unsupported ONNX operations: Some models use operations that the GPU provider doesn't support. ONNX Runtime offloads these to CPU, causing constant CPU-GPU data transfers and major slowdowns.

  • Apple Silicon configuration: macOS users on M1/M2/M3 chips need CoreML as their execution provider, but it won't load if Xcode command-line tools are missing or if you installed the wrong onnxruntime package.

  • Intel GPU without OpenVINO: If you have Intel integrated or Arc GPU, you need OpenVINO as your execution provider. Without it, FaceFusion just defaults to CPU.


Why this happens

1. Execution providers are not interchangeable

Each execution provider is a separate backend compiled for specific hardware:

FaceFusion can't use CUDA if the CUDA provider isn't compiled into your installed ONNX Runtime package. It's that simple.

2. Silent fallback to CPU

Here's the sneaky part: when a GPU execution provider fails to initialize, ONNX Runtime doesn't crash. It just quietly falls back to the CPU provider. You only notice when processing takes 10-50x longer than expected. FaceFusion's logs might show the fallback, but it's easy to miss if you're not looking.

3. Partial GPU execution

Some models have operations that the GPU provider doesn't implement. When this happens, ONNX Runtime runs those operations on CPU and copies data back and forth. This can actually be slower than running the whole model on CPU because of the data transfer overhead. CodeFormer is a common example—it uses operations that CUDA doesn't fully accelerate, causing CPU fallback for parts of the model.

4. TensorRT compilation overhead

TensorRT optimizes models for your specific GPU architecture at runtime. The first time a model runs with TensorRT, it builds an optimized execution plan. This can take a long time, especially for complex models. Once it's compiled, everything is fast—but people often mistake that initial delay for a crash or freeze.

5. Version coupling is strict

ONNX Runtime GPU packages are tightly coupled to specific CUDA versions:

  • onnxruntime-gpu 1.16.x typically requires CUDA 11.8
  • onnxruntime-gpu 1.17.x may support CUDA 12.x

Install the wrong combination and the provider just fails to load without giving you a clear error message.


Trade-offs you'll face

  • CUDA vs TensorRT: CUDA is simpler to set up and works right away. TensorRT offers faster inference but requires a long first-run optimization and additional SDK installation.

  • GPU speed vs compatibility: GPU execution is faster but requires precise version matching. CPU execution is universally compatible but 10-50x slower.

  • Automatic selection vs manual override: FaceFusion can auto-select an execution provider, but it might not choose optimally. Manual selection gives you control but requires understanding the hardware stack.

  • Model compatibility vs performance: Some face enhancement models like CodeFormer don't fully accelerate on CUDA due to unsupported operations. You'll have to choose between using these models with slower performance or switching to fully GPU-accelerated alternatives.


Frequently asked questions

Q: Why is CUDA not showing as an option?

A: The onnxruntime-gpu package might not be installed, or it's compiled for a different CUDA version than what's on your system. FaceFusion only shows providers that ONNX Runtime successfully loads.

Q: Why is FaceFusion so slow even with a GPU?

A: The GPU execution provider might have failed to load, causing a silent fallback to CPU. Check FaceFusion's logs or terminal output to confirm which provider is actually active.

Q: How long should TensorRT optimization take?

A: Initial TensorRT optimization can take 10-30 minutes or longer depending on the model and GPU. This happens once per model and gets cached for future runs.

Q: Why does CodeFormer fall back to CPU?

A: CodeFormer uses some ONNX operations that the CUDA provider doesn't fully support. ONNX Runtime offloads those to CPU. Sometimes forcing CPU execution for the entire model is actually faster than partial GPU execution.

Q: Which execution provider should I use?

A: For NVIDIA GPUs, CUDA is the most reliable choice. TensorRT offers better performance after initial optimization. For Intel hardware, use OpenVINO. For Apple Silicon, use CoreML. CPU is the fallback for all platforms.



Final thoughts

Execution providers are the bridge between FaceFusion's models and your hardware. When that bridge is misconfigured—wrong CUDA version, missing packages, unsupported operations—processing either fails or falls back to CPU. Understanding that FaceFusion depends on ONNX Runtime's provider system helps you diagnose why GPU acceleration might not be working. The key is matching the onnxruntime package to your installed CUDA toolkit and actually verifying that the expected provider is active, not just assuming it is.