[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
-
Updated
Jan 20, 2026 - Python
[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
This is the official website for TuriX Computer-use-Agent
An open-sourced end-to-end VLM-based GUI Agent
RLAnything & DemyAgent: General and scalable agentic RL algorithms across terminal, GUI, SWE, and tool-call settings
Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents
Official implementation of "SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience"
[AAAI-2026] Code for "UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement Learning"
[AAAI 2026 Oral] Official repository for InfiGUI-G1. We introduce Adaptive Exploration Policy Optimization (AEPO) to overcome semantic alignment bottlenecks in GUI agents through efficient, guided exploration.
Enable AI to control your PC. This repo includes the WorldGUI Benchmark and GUI-Thinker Agent Framework.
Official repo of "MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents". It can be used to evaluate a GUI agent with a hierarchical manner across multiple platforms, including Windows, Linux, macOS, iOS, Android and Web.
DART-GUI: Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data Curation
The official code for "GUI-ReWalk: Massive Data Generation for GUI Agent via Stochastic Exploration and Intent-Aware Reasoning"
🕵 Code for our EMNLP 2025 Main paper: "FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in Diverse Adventure Games"
Source code of the paper "V-Droid: Advancing Mobile GUI Agent Through Generative Verifiers"
A Practical Zoom-in GUI Grounding and Behavior-Based Evaluation method.
Create your self-hosted, open-source Operator model.
Compress2Focus: Efficient Coordinate Compression for Policy Optimization in Multi-Turn GUI Agents
🛒 An intelligent shopping agent powered by LLMs. Cross-platform search (JD, Taobao, Vipshop), AI-driven product analysis, and smart scoring reports. 全网比价,AI 决策购物助手。
A think-with-image GUI visual grounding model.
一个基于 MaaFramework 与多模态大模型,通过视觉理解屏幕内容,利用 Planner-Executor-Verifier 三模式架构自动规划并执行任务的 GUI 智能体系统。
Add a description, image, and links to the gui-agent topic page so that developers can more easily learn about it.
To associate your repository with the gui-agent topic, visit your repo's landing page and select "manage topics."