[Abstract] Contact-rich manipulation demands human-like integration of perception and force feedback: vision should guide task progress, while high-frequency interaction control must stabilize contact under uncertainty. Existing learning-based policies often entangle these roles in a monolithic network, trading off global generalization against stable local refinement, while control-centric approaches typically assume a known task structure or learn only controller parameters rather than the structure itself. In this paper, we formalize a physically grounded interaction frame, an instantaneous local basis that decouples force regulation from motion execution, and propose a method to recover it from demonstrations. Based on this, we address both issues by proposing Force Policy, a global-local vision-force policy in which a global policy guides free-space actions using vision, and upon contact, a high-frequency local policy with force feedback estimates the interaction frame and executes hybrid force-position control for stable interaction. Real-world experiments across diverse contact-rich tasks show consistent gains over strong baselines, with more robust contact establishment, more accurate force regulation, and reliable generalization to novel objects with varied geometries and physical properties, ultimately improving both contact stability and execution quality.

Interaction Frame: Explicit Contact Structure

We introduce the Interaction Frame (IF) to represent contact structure. IF is derived purely from physical response, using stiffness spectra to infer local geometry. For locally conservative, topologically invariant contacts, stiffness is geometry-induced, so its principal axes align with the surface geometry, allowing us to recover the geometric frame directly from force observations.

We present two approximation strategies for dissipative and structural residual dominance. Given the initial visual context and task description, we prompt Gemini 3 Pro to identify the dominant residual and apply the corresponding reconstruction.

Force Policy: Global-Local Vision-Force Policy

Contact-rich manipulation alternates between vision-guided free-space motion and force-governed contact interaction. We factor the policy accordingly: a vision policy handles global geometry and long-horizon planning, while a high-frequency force policy regulates local contact dynamics using proprioception and force/torque feedback. This decomposition assigns each modality to where it is most informative—vision drives task progress, and force ensures stable, responsive interaction.

We also introduce a dual-policy asynchronous scheduler to coordinate the global vision and high-frequency force policies without latency degrading control. It enables smooth switching and latency-aware trajectory alignment, ensuring stable execution and improved force-sensing fidelity.

Experiments

We design three tasks spanning two categories (polishing and insertion) to evaluate different policies for contact-rich manipulation. The descriptions on the right highlight the key challenges of each task compared to similar tasks in prior literature. All tasks require highly accurate force regulation to be successfully completed.

We compare Force Policy with vision-only baselines (RISE-2 and π_0.5) and force-aware baselines (RDP, FoAR, ForceVLA and TA-VLA). Force Policy exhibits significant effectiveness in handling a diverse range of contact-rich tasks, outperforming previous vision-only or force-aware policies.

Push and Flip

❮ ❯

Plug in EV Charger

❮ ❯

Scrape Off Sticker (Easy)

❮ ❯

Scrape Off Sticker (Hard)

❮ ❯

Force Control Evaluation

Force Policy achieves significantly superior force control compared to existing force-aware baselines. It closely tracks the force profile of human demonstrations, maintaining the necessary effective force magnitude throughout the critical phases.

Push and Flip

Plug in EV Charger

Scrape off Sticker (Hard)

Generalization and Robustness

Force Policy demonstrates decent generalization ability compared to baselines. We evaluated the policies on unseen objects with varying colors, geometries, and stiffnesses. Force Policy consistently achieves high success rates, whereas baselines frequently suffer catastrophic failure.

Force Policy is also robust to unmodeled disturbances due to its superior force control. A key characteristic of the policy is the explicit regulation of the interaction wrench. When the human operator physically interferes with the robot, the policy adapts the end-effector pose to maintain the target wrench.

BibTeX

@article{fang2026force,
    title   = {Force Policy: Learning Hybrid Force-Position Control Policy under Interaction Frame for Contact-Rich Manipulation},
    author  = {Fang, Hongjie and Tang, Shirun and Mei, Mingyu and Qin, Haoxiang and He, Zihao and Chen, Jingjing and Feng, Ying and Wang, Chenxi and Liu, Wanxi and He, Zaixing and Lu, Cewu and Wang, Shiquan},
    journal = {arXiv preprint arXiv},
    year    = {2026}
}