


That said, I've found it surprisingly capable for a number of tasks, and I think it's in a stable enough place we can share.

Right now there are a lot of limitations, and this is more a "research preview" than a finished product. Once GPT-4 has decided the task is done or it can't make any more progress, it responds with a special action indicating it's done. It also sends the list of actions already taken as part of the current task so GPT-4 can detect if it's getting stuck in a loop and abort. It then goes back to step (2) and asks GPT-4 for the next action to take with the updated page DOM. Taxy parses GPT-4's response and performs the action requested on the page. We use the ReAct paradigm ( ) so it explains what it's trying to do before taking an action, which both makes it more accurate and helps with debugging.Ĥ. In our prompt we give it the option to either click an element or set an input's value. GPT-4 tries to figure out what action to take. Taxy pulls the DOM of the current page, puts it through a pipeline to remove all non-semantic information, hidden elements, etc and sends it to GPT-4 along with your text instructions.ģ. "schedule a meeting with David tomorrow at 2").Ģ. You open the extension and write the task you'd like done (eg. You can see a few demos in the Github README, but basically it works like this:ġ. Hey HN! My brother Arctic_fly and I spent the last two weeks since the GPT-4 launch building Taxy, an open source Chrome extension that lets you automate arbitrary tasks in your browser using GPT-4.
