LLM Benchmarks
2025
OSWorld-Human: Benchmarking the Efficiency of Computer-Use Agents
July 2, 2025