Action space#

The action space is a set of primitives that the agent can use to interact with the environment. The primitives are divided into categories based on the type of interaction they perform. Below a list of the primitives supported by BrowserGym:

Category

Primitive

Description

bid

fill(bid, text)

Fill an input field with text.

click(bid, button)

Click an element.

dblclick(bid, button)

Double-click an element.

hover(bid)

Hover the mouse over an element.

press(bid, key comb)

Focus an element and press a combination of keys.

focus(bid)

Focus an element.

clear(bid)

Clear an input field.

select_option(bid, options)

Select one or multiple options in a drop-down element.

drag_and_drop(from bid, to bid)

Drag and drop one element to another.

coord

mouse_move(x, y)

Move the mouse to a location. then press and hold a mouse button.

mouse_up(x, y, button)

Move the mouse to a location then release a mouse button.

mouse_click(x, y, button)

Move the mouse to a location and click a mouse button.

mouse_dblclick(x, y, button)

Move the mouse to a location and double-click a mouse button.

mouse_drag_and_drop(from x, from y, to x, to y)

Drag and drop from a location to a location.

keyboard

keyboard_down(key)

Press and holds a keyboard key.

keyboard_up(key)

Release a keyboard key.

keyboard_press(key comb)

Press a combination of keys.

keyboard_type(text)

Types a string of text through the keyboard.

keyboard_inset_text(text)

Insert a string of text in the currently focused element.

tab

new_tab()

Open a new tab.

tab_close()

Close the current tab.

tab_focus(index)

Bring a tab to front (activate tab).

nav

go_back()

Navigate to the previous page in history.

go_forward()

Navigate to the next page in history.

goto(url)

Navigate to a url.

misc

scroll(dx, dy)

send_msg_to_user(text)

noop()

Do nothing.

python

Any python code (UNSAFE!)

Executes code with playwright, the active page and the send msg to user primitive available.

Example#

import gymnasium as gym
import browsergym.workarena
import time

env = gym.make(
    "browsergym/workarena.servicenow.filter-asset-list",
    headless=False,
)

try:
    obs, info = env.reset(seed=10)

    # Perform the following sequence of actions
    actions = ["click('a46')", "click('a2157')", "fill('a2164', 'Asset tag')"]

    for action in actions:
        obs, reward, terminated, truncated, info = env.step(action)
        # Sleep for 3 seconds to see the effect of the action
        time.sleep(3)

finally:
    env.close()

For more details please refer to the WorkArena paper.