Action space#
The action space is a set of primitives that the agent can use to interact with the environment. The primitives are divided into categories based on the type of interaction they perform. Below a list of the primitives supported by BrowserGym:
Category |
Primitive |
Description |
|---|---|---|
bid |
fill(bid, text) |
Fill an input field with text. |
click(bid, button) |
Click an element. |
|
dblclick(bid, button) |
Double-click an element. |
|
hover(bid) |
Hover the mouse over an element. |
|
press(bid, key comb) |
Focus an element and press a combination of keys. |
|
focus(bid) |
Focus an element. |
|
clear(bid) |
Clear an input field. |
|
select_option(bid, options) |
Select one or multiple options in a drop-down element. |
|
drag_and_drop(from bid, to bid) |
Drag and drop one element to another. |
|
coord |
mouse_move(x, y) |
Move the mouse to a location. then press and hold a mouse button. |
mouse_up(x, y, button) |
Move the mouse to a location then release a mouse button. |
|
mouse_click(x, y, button) |
Move the mouse to a location and click a mouse button. |
|
mouse_dblclick(x, y, button) |
Move the mouse to a location and double-click a mouse button. |
|
mouse_drag_and_drop(from x, from y, to x, to y) |
Drag and drop from a location to a location. |
|
keyboard |
keyboard_down(key) |
Press and holds a keyboard key. |
keyboard_up(key) |
Release a keyboard key. |
|
keyboard_press(key comb) |
Press a combination of keys. |
|
keyboard_type(text) |
Types a string of text through the keyboard. |
|
keyboard_inset_text(text) |
Insert a string of text in the currently focused element. |
|
tab |
new_tab() |
Open a new tab. |
tab_close() |
Close the current tab. |
|
tab_focus(index) |
Bring a tab to front (activate tab). |
|
nav |
go_back() |
Navigate to the previous page in history. |
go_forward() |
Navigate to the next page in history. |
|
goto(url) |
Navigate to a url. |
|
misc |
scroll(dx, dy) |
|
send_msg_to_user(text) |
||
noop() |
Do nothing. |
|
python |
Any python code (UNSAFE!) |
Executes code with playwright, the active page and the send msg to user primitive available. |
Example#
import gymnasium as gym
import browsergym.workarena
import time
env = gym.make(
"browsergym/workarena.servicenow.filter-asset-list",
headless=False,
)
try:
obs, info = env.reset(seed=10)
# Perform the following sequence of actions
actions = ["click('a46')", "click('a2157')", "fill('a2164', 'Asset tag')"]
for action in actions:
obs, reward, terminated, truncated, info = env.step(action)
# Sleep for 3 seconds to see the effect of the action
time.sleep(3)
finally:
env.close()
For more details please refer to the WorkArena paper.