Terminal-bench: a benchmark for AI agents in terminal environments tbench.ai 3 points by cpard 3 hours ago