New Features (2026-04)
This page documents the April 2026 additions to AutoControl. Every new feature ships with a headless Python API and a GUI affordance, and is wired into the executor so it works from JSON scripts, the socket server, the REST API, and the CLI without any Python glue.
Clipboard
Headless:
import je_auto_control as ac
ac.set_clipboard("hello")
text = ac.get_clipboard()
Action-JSON commands:
[["AC_clipboard_set", {"text": "hello"}]]
[["AC_clipboard_get", {}]]
Backends: Windows (Win32 via ctypes), macOS (pbcopy/pbpaste),
Linux (xclip or xsel). A RuntimeError is raised if no backend
is available.
Dry-run / step-debug executor
Run an action list through the executor without invoking any side effects — useful for validating JSON scripts:
from je_auto_control.utils.executor.action_executor import executor
record = executor.execute_action(actions, dry_run=True)
step_callback lets you observe each action before it runs:
executor.execute_action(actions, step_callback=lambda a: print(a))
From the CLI:
python -m je_auto_control.cli run script.json --dry-run
Global hotkey daemon (Windows)
Bind OS-level hotkeys to action-JSON scripts:
from je_auto_control import default_hotkey_daemon
default_hotkey_daemon.bind("ctrl+alt+1", "scripts/greet.json")
default_hotkey_daemon.start()
Supported modifiers: ctrl, alt, shift, win / super /
meta. Keys: letters, digits, f1 … f12, arrows, space,
enter, tab, escape, home, end, insert, delete,
pageup, pagedown.
macOS and Linux currently raise NotImplementedError on
start() — the Strategy-pattern interface is in place so backends can
be added later.
GUI: Hotkeys tab (bind/unbind, start/stop daemon, live fired count).
Event triggers
Poll-based triggers fire an action script when a screen/state change is detected:
from je_auto_control import default_trigger_engine, ImageAppearsTrigger
default_trigger_engine.add(ImageAppearsTrigger(
trigger_id="", script_path="scripts/click_ok.json",
image_path="templates/ok_button.png", threshold=0.85,
repeat=True,
))
default_trigger_engine.start()
Available trigger types:
ImageAppearsTrigger— template match on the current screenWindowAppearsTrigger— title substring matchPixelColorTrigger— pixel color within toleranceFilePathTrigger— mtime change on a path
GUI: Triggers tab (add/remove/start/stop, live fired count).
Cron scheduling
Five-field cron (minute hour day-of-month month day-of-week) with
*, comma-lists, */step, and start-stop ranges:
from je_auto_control import default_scheduler
job = default_scheduler.add_cron_job(
script_path="scripts/daily.json",
cron_expression="0 9 * * 1-5", # 09:00 on weekdays
)
default_scheduler.start()
Interval and cron jobs coexist in the same scheduler; job.is_cron
tells them apart. GUI: Scheduler tab has cron/interval radio.
Plugin loader
A plugin file is any .py defining top-level callables whose names
start with AC_. Each one becomes a new executor command:
# my_plugins/greeting.py
def AC_greet(args=None):
return f"hello, {args['name']}"
from je_auto_control import (
load_plugin_directory, register_plugin_commands,
)
commands = load_plugin_directory("my_plugins/")
register_plugin_commands(commands)
# Now usable from JSON:
# [["AC_greet", {"name": "world"}]]
GUI: Plugins tab (browse directory, one-click register).
Warning
Plugin files execute arbitrary Python. Only load from directories under your own control.
REST API server
A stdlib-only HTTP server that exposes the executor and scheduler:
from je_auto_control import start_rest_api_server
server = start_rest_api_server(host="127.0.0.1", port=9939)
Endpoints:
GET /health— liveness probeGET /jobs— scheduler job listPOST /executewith body{"actions": [...]}— run actions
GUI: Socket Server tab now has a separate REST section with its own
host/port and a 0.0.0.0 opt-in.
Note
Defaults to 127.0.0.1 per CLAUDE.md policy. Bind to 0.0.0.0
only when you have authenticated the network boundary.
CLI runner
A thin subcommand-based CLI over the headless APIs:
python -m je_auto_control.cli run script.json
python -m je_auto_control.cli run script.json --var name=alice --dry-run
python -m je_auto_control.cli list-jobs
python -m je_auto_control.cli start-server --port 9938
python -m je_auto_control.cli start-rest --port 9939
--var name=value is parsed as JSON when possible (so count=10
becomes an int), otherwise treated as a string.
Multi-language GUI (i18n)
Live language switching via the Language menu. Built-in packs:
English
Traditional Chinese (繁體中文)
Simplified Chinese (简体中文)
Japanese (日本語)
Register additional languages at runtime:
from je_auto_control.gui.language_wrapper.multi_language_wrapper import (
language_wrapper,
)
language_wrapper.register_language("French", {"menu_file": "Fichier", ...})
Missing keys fall through to the English default, so a feature ships with usable labels even before its translations land.
OCR (text on screen)
Tesseract-backed text locator. Useful when a button or label has no stable accessibility name and no template image:
import je_auto_control as ac
matches = ac.find_text_matches("Submit")
cx, cy = ac.locate_text_center("Submit")
ac.click_text("Submit")
ac.wait_for_text("Loading complete", timeout=15.0)
If Tesseract isn’t on PATH:
ac.set_tesseract_cmd(r"C:\Program Files\Tesseract-OCR\tesseract.exe")
Action-JSON commands: AC_locate_text, AC_click_text,
AC_wait_text.
Accessibility element finder
Query the OS accessibility tree (Windows UIA via uiautomation,
macOS AX) by name / role / app name:
import je_auto_control as ac
elements = ac.list_accessibility_elements(app_name="Calculator")
ok = ac.find_accessibility_element(name="OK", role="Button")
ac.click_accessibility_element(name="OK", app_name="Calculator")
Raises AccessibilityNotAvailableError on platforms where no backend
is installed. Action-JSON commands: AC_a11y_list, AC_a11y_find,
AC_a11y_click. GUI: Accessibility tab.
VLM (AI) element locator
When neither template matching nor accessibility can find the element, describe it in plain language and let a vision-language model return pixel coordinates:
import je_auto_control as ac
x, y = ac.locate_by_description("the green Submit button")
ac.click_by_description(
"the cookie-banner 'Accept all' button",
screen_region=[0, 800, 1920, 1080], # optional crop
)
Backends (loaded lazily, zero imports at package import time):
Anthropic (
anthropicSDK,ANTHROPIC_API_KEY)OpenAI (
openaiSDK,OPENAI_API_KEY)
Environment variables (keys are never logged or persisted):
ANTHROPIC_API_KEY/OPENAI_API_KEYAUTOCONTROL_VLM_BACKEND=anthropic|openaiAUTOCONTROL_VLM_MODEL=<model-id>
Action-JSON commands: AC_vlm_locate, AC_vlm_click. GUI:
AI Locator tab.
Run history + error-snapshot artifacts
Every run from the scheduler, trigger engine, hotkey daemon, REST API,
and manual GUI replay is recorded to ~/.je_auto_control/history.db
(SQLite). When a run finishes with an error, a screenshot is captured
automatically and attached to the row:
from je_auto_control import default_history_store
for run in default_history_store.list_runs(limit=20):
print(run.id, run.source, run.status, run.artifact_path)
Artifacts are stored under ~/.je_auto_control/artifacts/ and are
removed when the matching run is pruned or the history is cleared. GUI:
Run History tab — double-click the artifact column to open the
screenshot in the OS image viewer.
OCR — region dump and regex search
The OCR module already exposed substring / exact-match helpers. Two new APIs cover scenarios the existing ones could not:
import je_auto_control as ac
# Dump every recognised text record in a region (or full screen)
for match in ac.read_text_in_region(region=[0, 0, 800, 600]):
print(match.text, match.center, match.confidence)
# Regex search — useful when text varies (order numbers, error codes)
for match in ac.find_text_regex(r"Order#\d+"):
print(match.text, match.center)
# Compiled patterns and flags work too
import re
ac.find_text_regex(re.compile(r"foo", re.IGNORECASE))
Action-JSON commands:
[["AC_read_text_in_region", {"region": [0, 0, 800, 600]}]]
[["AC_find_text_regex", {"pattern": "Order#\\d+"}]]
GUI: OCR Reader tab. Pick a region with the existing overlay (or leave blank for full screen), set language / minimum confidence, then hit Dump region text or Find by regex. Results are returned as a JSON list with text, bounding box, and confidence per hit.
Runtime variables and data-driven control flow
Pre-execution interpolation in script_vars.interpolate only
substituted ${var} placeholders once against a static mapping;
scripts had no way to mutate state during execution. VariableScope
is a runtime mapping the executor exposes to flow-control commands so
they can read and write the same bag the runtime interpolator consults.
The executor now resolves ${var} per command call (not pre-flattened),
so nested body / then / else lists keep their placeholders
and re-bind each time they execute — letting AC_for_each iterate
over a list while the body sees the current item.
import je_auto_control as ac
from je_auto_control.utils.executor.action_executor import executor
executor.execute_action([
["AC_set_var", {"name": "items", "value": ["alpha", "beta"]}],
["AC_set_var", {"name": "i", "value": 0}],
["AC_for_each", {
"items": "${items}", "as": "name",
"body": [
["AC_inc_var", {"name": "i"}],
["AC_if_var", {
"name": "i", "op": "ge", "value": 2,
"then": [["AC_break"]], "else": [],
}],
],
}],
])
Comparison operators for AC_if_var (and AC_while_var): eq,
ne, lt, le, gt, ge, contains, startswith,
endswith.
AC_while_var loops a body while a variable comparison holds. The
condition is re-evaluated against the live scope before every iteration,
so a body that mutates the variable (e.g. AC_inc_var) terminates the
loop; max_iter (default 1000) caps a condition that never turns
false. AC_break / AC_continue work as in any loop:
executor.execute_action([
["AC_set_var", {"name": "i", "value": 0}],
["AC_while_var", {
"name": "i", "op": "lt", "value": 5,
"body": [["AC_inc_var", {"name": "i"}]],
}],
])
AC_try adds try / catch / finally. When body raises, the
catch branch runs instead of aborting the script; finally always
runs (on success, on a caught error, or while a reraise / loop
break/continue propagates). The error text is exposed to error_var
for the catch branch to inspect, and reraise=true re-raises after
cleanup:
executor.execute_action([
["AC_try", {
"body": [["AC_click_image", {"image": "dialog_ok.png"}]],
"catch": [["AC_set_var", {"name": "dismissed", "value": False}]],
"finally": [["AC_screenshot", {"file_path": "after.png"}]],
"error_var": "err",
}],
])
Action-JSON commands: AC_set_var, AC_get_var, AC_inc_var,
AC_if_var, AC_for_each, AC_while_var, AC_try.
GUI: Variables tab — live view of executor.variables with
single-set, JSON seed, and clear-all controls; reflects what
AC_set_var / AC_for_each mutate at runtime.
LLM action planner
Translate a plain-language description into a validated AC_*
action list by asking an LLM (Anthropic Claude by default). Output is
parsed leniently (strips code fences, extracts the first JSON array
from prose) and then validated by the same schema the executor uses,
so the result can be piped straight into execute_action:
import je_auto_control as ac
from je_auto_control.utils.executor.action_executor import executor
actions = ac.plan_actions(
"click the Submit button, then type 'done' and save",
known_commands=executor.known_commands(),
)
executor.execute_action(actions)
# Or in one call:
ac.run_from_description("open Notepad and type hello", executor=executor)
Backend selection mirrors vision.backends:
Anthropic (
anthropicSDK,ANTHROPIC_API_KEY) — defaultAUTOCONTROL_LLM_BACKENDandAUTOCONTROL_LLM_MODELfor overrides
Action-JSON commands: AC_llm_plan, AC_llm_run.
GUI: LLM Planner tab. Description box, QThread-backed Plan
button, action-list preview, and a Run plan button — long calls run
off the GUI thread so the UI stays responsive.
Remote desktop (host + viewer)
Stream this machine’s screen to another machine, or view and control a remote machine — both directions ship with a headless API and a GUI tab.
The wire format is a length-prefixed framing on raw TCP (no extra deps), starting with an HMAC-SHA256 challenge/response handshake; viewers that fail auth are dropped before they can see a frame. JPEG frames are produced at the configured FPS / quality and broadcast to authenticated viewers via a shared latest-frame slot, so a slow viewer drops frames instead of blocking the rest. Viewer input messages are JSON, validated against an allowlist, and applied through the existing mouse / keyboard wrappers.
Headless host (be remoted by someone else):
from je_auto_control import RemoteDesktopHost
host = RemoteDesktopHost(
token="hunter2", # shared secret (HMAC key)
bind="127.0.0.1", # default; expose externally only via
# SSH tunnel or trusted VPN
port=0, # 0 = auto-assigned
fps=10, quality=70,
)
host.start()
print("listening on", host.port, "viewers:", host.connected_clients)
# ...
host.stop()
Headless viewer (control someone else):
from je_auto_control import RemoteDesktopViewer
viewer = RemoteDesktopViewer(
host="10.0.0.5", port=51234, token="hunter2",
on_frame=lambda jpeg_bytes: ..., # render or save
)
viewer.connect()
viewer.send_input({"action": "mouse_move", "x": 100, "y": 200})
viewer.send_input({"action": "type", "text": "hello"})
viewer.disconnect()
Input message allowlist (validated on the host before dispatch):
mouse_move{x, y}mouse_click{x?, y?, button}mouse_press/mouse_release{button}mouse_scroll{x?, y?, amount}key_press/key_release{keycode}type{text}ping
Action-JSON commands (use the singleton in
utils.remote_desktop.registry):
AC_start_remote_host # token, bind, port, fps, quality, region
AC_stop_remote_host
AC_remote_host_status # → {running, port, connected_clients}
AC_remote_connect # host, port, token, timeout
AC_remote_disconnect
AC_remote_viewer_status # → {connected}
AC_remote_send_input # action: {...}
GUI: Remote Desktop tab opens to the Quick Connect screen
(AnyDesk-style) by default — huge Host ID on one side, a single input
that accepts host:port, ws://, wss://, or a 9-digit Host
ID on the other, with Connect and Start hosting as the two primary
buttons. Recent connections are remembered across sessions. Advanced
per-transport sub-tabs (legacy TCP / WS host + viewer, WebRTC host +
viewer with manual SDP / custom codecs / TLS pinning) stay one click
away. WebRTC sub-tabs lazy-load so a stock install without the
[webrtc] extra still opens the tab.
Warning
Anyone with the host:port and token gets full mouse / keyboard
control of the host machine. Defaults bind to 127.0.0.1;
exposing this to untrusted networks should be paired with an SSH
tunnel or TLS front-end. The token is the only line of defence —
treat it like a password.
Remote desktop — Quick Connect + Phase 4/5 hardening
Quick Connect headless API
The transport coordinator that backs the GUI input box is also exported, so scripts can dispatch the same way:
from je_auto_control import parse_remote_desktop_target
parse_remote_desktop_target("192.168.1.10:5555")
# ConnectTarget(kind='tcp', host='192.168.1.10', port=5555, ...)
parse_remote_desktop_target("ws://hub:8765/desk")
# ConnectTarget(kind='ws', host='hub', port=8765, path='/desk')
parse_remote_desktop_target("123-456-789")
# ConnectTarget(kind='webrtc_id', host_id='123456789')
Connection approval + view-only mode
Optional callback gates every incoming session AnyDesk-style.
Returning "view_only" admits the viewer but drops their INPUT
messages; returning a falsy value (or raising) sends AUTH_FAIL
“rejected by host”:
from je_auto_control import RemoteDesktopHost, PendingViewer
def gate(p: PendingViewer) -> str:
if p.address[0].startswith("10."):
return "view_only"
return "full" # or True
host = RemoteDesktopHost(token="tok", on_pending_viewer=gate)
IP allowlist (CIDR + exact IPs)
Reject peers outside the configured ranges before TLS / auth runs, so attackers can’t probe further:
host = RemoteDesktopHost(
token="tok",
ip_allowlist=["10.0.0.0/8", "192.168.1.100"],
)
TOTP 2FA (RFC 6238, stdlib only)
Layer a 6-digit OTP on top of the token; host accepts ±1 step of clock drift:
from je_auto_control.utils.remote_desktop.totp import (
generate_secret, generate_code, provisioning_uri,
)
secret = generate_secret()
# otpauth:// URI for Google Authenticator / Authy / 1Password QR code
print(provisioning_uri(secret, account="alice"))
host = RemoteDesktopHost(token="tok", totp_secret=secret)
viewer = RemoteDesktopViewer(
host=..., token="tok", totp_code=generate_code(secret),
)
Multi-monitor selection
Capture one specific monitor instead of the combined virtual desktop:
from je_auto_control import list_host_monitors, RemoteDesktopHost
print(list_host_monitors())
# [{'index': 0, 'is_combined': True, ...},
# {'index': 1, 'left': 0, 'top': 0, ...},
# {'index': 2, 'left': 1920, ...}]
host = RemoteDesktopHost(token="tok", monitor_index=1)
Remote cursor overlay
Host broadcasts cursor position at 30 Hz (deduped on still desktops);
the viewer’s popup window draws an arrow on top of the JPEG stream so
operators can see exactly where the host’s pointer is. Disable via
enable_cursor_broadcast=False.
Multi-viewer collaborative cursors + chat
Two new message types (CHAT and CURSOR with viewer_id).
Use MultiViewerHost to relay one viewer’s pointer to the others;
pair with the chat channel for ad-hoc text between operators:
host = RemoteDesktopHost(
token="tok",
on_chat=lambda sender, text: print(sender, ":", text),
)
host.broadcast_chat("session starts in 30s")
host.broadcast_viewer_cursor("alice", 200, 300)
viewer = RemoteDesktopViewer(
host=...,
on_chat=lambda s, t: ...,
on_viewer_cursor=lambda vid, x, y: ...,
)
viewer.send_chat("ack")
Relative mouse mode (FPS / CAD)
New input action that sends deltas instead of absolute coordinates:
viewer.send_input(
{"action": "mouse_move_relative", "dx": 5, "dy": -3},
)
Motion-aware capture
The capture loop now hashes each encoded JPEG; identical frames are skipped, so a static desktop produces ~zero bandwidth. New viewers are seeded with the latest frame on auth so they never see a black popup.
Live stats
Rolling 3-second window of FPS / kbps + session totals:
viewer.stats()
# {'fps': 24.3, 'kbps': 4801.2, 'frames': 720.0,
# 'bytes': 1.8e7, 'uptime': 30.2}
JPEG sequence recorder (no PyAV needed)
TCP-path session capture: each frame written to disk plus
manifest.json so it can be replayed at original cadence:
from je_auto_control.utils.remote_desktop.jpeg_recorder import (
JpegSequenceRecorder,
)
rec = JpegSequenceRecorder("~/recordings/2026-05-23")
rec.start()
viewer = RemoteDesktopViewer(host=..., on_frame=rec.record_frame)
# ... session ...
rec.stop() # writes manifest.json next to the .jpg files
TCP relay (WebRTC fallback)
When P2P fails (strict NAT, mobile CGNAT, hotel Wi-Fi), both peers
connect outbound to a relay and exchange a shared 32-byte session ID;
the relay pipes bytes between them. Same module ships an
encode_handshake(role, session_id) helper for clients:
from je_auto_control.utils.remote_desktop.relay import RelayServer
relay = RelayServer(bind="0.0.0.0", port=9000)
relay.start()
Service installer (unattended host)
python -m je_auto_control.utils.remote_desktop.host_service ...
exposes configure / init / run plus per-platform
installers: install-windows-service / uninstall-windows-service
(needs pywin32), generate-launchd / uninstall-launchd,
generate-systemd / uninstall-systemd.
Remote desktop — secure transports, audio, clipboard, file transfer
Host ID handshake
Every host now exposes a stable 9-digit numeric ID, persisted at
~/.je_auto_control/remote_host_id so it stays the same across
restarts. The ID is announced inside AUTH_OK (so only authenticated
viewers see it), and viewers can verify expected_host_id to defend
against a different process listening on the same address:
from je_auto_control import RemoteDesktopHost, RemoteDesktopViewer
host = RemoteDesktopHost(token="tok")
print(host.host_id) # e.g. "123456789"
viewer = RemoteDesktopViewer(
host="10.0.0.5", port=51234, token="tok",
expected_host_id="123456789",
)
viewer.connect() # raises AuthenticationError on mismatch
Helpers format_host_id("123456789") == "123 456 789" and
parse_host_id("123 456 789") == "123456789" are also exported. The
GUI displays the formatted ID with a Copy button, and the viewer
panel accepts any common spacing / dashing.
TLS
Both RemoteDesktopHost and RemoteDesktopViewer accept an
ssl.SSLContext. When provided, the host wraps each accepted
connection server-side; the viewer wraps the connect socket
client-side. Failed handshakes are logged and silently dropped before
they can register as connected clients:
import ssl
ctx = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER)
ctx.load_cert_chain("cert.pem", "key.pem")
host = RemoteDesktopHost(token="tok", ssl_context=ctx)
client_ctx = ssl.SSLContext(ssl.PROTOCOL_TLS_CLIENT)
client_ctx.load_verify_locations("cert.pem")
viewer = RemoteDesktopViewer(host=..., ssl_context=client_ctx)
For self-signed loopback testing, set
ctx.check_hostname = False and ctx.verify_mode = ssl.CERT_NONE
on the client context. The Remote Desktop GUI host panel has TLS cert
/ key file pickers; the viewer panel has a Skip cert verification
checkbox.
WebSocket transport
A new WebSocketDesktopHost / WebSocketDesktopViewer pair
speaks the same typed-message protocol over RFC 6455 BINARY frames.
The implementation is in-tree (no extra deps); each application
message rides as one full WebSocket frame, so reassembly machinery is
unnecessary. The same ssl_context parameter doubles as the
wss:// switch:
from je_auto_control import (
WebSocketDesktopHost, WebSocketDesktopViewer,
)
host = WebSocketDesktopHost(token="tok", ssl_context=ctx) # wss://
viewer = WebSocketDesktopViewer(
host="example.com", port=443, token="tok",
ssl_context=client_ctx, path="/rd",
)
Why WS: friendly to corporate firewalls and reverse proxies, and compatible with browser viewers. The GUI viewer’s transport dropdown (TCP / WebSocket / TLS / WSS) chooses the right class automatically.
Audio streaming
A new AUDIO message type carries 16-bit signed PCM blocks (default
16 kHz mono, 50 ms / 1600 bytes per block). The optional
sounddevice dependency is loaded lazily — without it, audio is
reported disabled and the host stays up:
from je_auto_control.utils.remote_desktop import AudioCaptureConfig
host = RemoteDesktopHost(
token="tok",
audio_config=AudioCaptureConfig(
enabled=True, device=None, # default mic
sample_rate=16000, channels=1,
),
)
from je_auto_control.utils.remote_desktop import AudioPlayer
player = AudioPlayer(); player.start()
viewer = RemoteDesktopViewer(host=..., on_audio=player.play)
The host fans each captured block out to all authenticated viewers through a bounded per-client deque (~2.5 s of buffering), so a slow viewer drops old audio chunks instead of stalling capture for everyone else. To capture system audio (rather than the mic), pick a loopback / monitor device by index — Windows WASAPI loopback on Windows, the PulseAudio monitor source on Linux, BlackHole on macOS. GUI: Stream system audio on the Host panel, Play received audio on the Viewer panel.
Clipboard sync (text + image)
A new CLIPBOARD message type carries a JSON envelope so kinds can
grow without a protocol bump:
{"kind": "text", "text": "..."}{"kind": "image", "format": "png", "data_b64": "..."}
utils/clipboard/clipboard.py is extended with
get_clipboard_image / set_clipboard_image; Windows uses
CF_DIB via ctypes (Pillow rasterises PNG → BMP → DIB), Linux shells
out to xclip -t image/png, macOS get works via Pillow ImageGrab
and set raises until a PyObjC backend lands. Sync is explicit per
call — no auto-poll loops to avoid paste storms:
# Viewer pushes its local clipboard to the host
viewer.send_clipboard_text("hello")
viewer.send_clipboard_image(open("logo.png", "rb").read())
# Host pushes to all viewers
host.broadcast_clipboard_text("greetings")
host.broadcast_clipboard_image(png_bytes)
# Viewer wires a callback so it can choose when to paste
viewer = RemoteDesktopViewer(
host=..., on_clipboard=lambda kind, data: ...,
)
GUI: Push clipboard text to host button on the Viewer panel; the host applies inbound clipboards via the helpers above.
File transfer with progress
Three new message types form one transfer:
FILE_BEGIN— JSON{transfer_id, dest_path, size}FILE_CHUNK— 36-byte ASCII transfer id + raw payloadFILE_END— JSON{transfer_id, status, error?}
Transfers are bidirectional, chunked (256 KiB per chunk), and have no aggregate size limit and no path restriction on the destination — token holders are trusted users. Progress is reported locally on both sides without an extra wire message:
from je_auto_control.utils.remote_desktop import (
FileReceiver, RemoteDesktopHost, RemoteDesktopViewer, send_file,
)
# Viewer uploads to host
viewer.send_file("local.bin", "/tmp/uploaded.bin",
on_progress=lambda tid, done, total: print(done, total))
# Host pushes to all viewers (each viewer needs a FileReceiver)
viewer.set_file_receiver(FileReceiver(
on_progress=..., on_complete=...,
))
host.send_file_to_viewers("local.bin", "/tmp/from_host.bin")
GUI: Send file… opens a file picker + destination-path prompt and
runs the upload on a QThread with a QProgressBar bound to the
sender’s progress events. The frame display widget also accepts
dragEnter / drop of local files; each dropped file kicks off the same
upload flow.
Warning
Path is unrestricted and there is no size cap. Anyone with the
token can write any file to any location, and can fill the disk.
Keep trusted token holders == trusted users in mind, or wrap
the headless API in your own restricted FileReceiver subclass
that vets the destination path.
Remote desktop — AnyDesk-style popout window
The viewer panel no longer renders the live remote screen inline —
when the viewer authenticates, a dedicated top-level
RemoteScreenWindow opens with the remote desktop, and the
panel shrinks back to the connection card + controls. Closing the
popup ✕ disconnects the session, matching AnyDesk’s session-window
ergonomics.
New module:
je_auto_control/gui/remote_desktop/remote_screen_window.pyWraps a
_FrameDisplayand re-emits its mouse / keyboard / drag-and-drop / annotation signals so the panel keeps a single signal source after the popout.Bottom footer carries the optional file-transfer progress label / bar; hidden when no transfer is active.
Both the TCP
_ViewerPaneland the WebRTC_WebRTCViewerPanelopen the popup on connect / on auth_ok and close it on disconnect / on stop.
- Why
The previous layout fought for vertical space: a frame display + connection card + collapsibles + action row + stats + sparklines + transfer progress + status bar all stacked on one tab. Pulling the live screen out into its own window leaves the operator with a real workspace and keeps the control surface uncluttered.
Remote desktop — responsive sub-tab sizing
Every Remote Desktop sub-tab is now wrapped in a QScrollArea
with setWidgetResizable(True). The wrapper lives in
gui/remote_desktop/tab.py (helper _wrap_in_scroll_area).
Small / shrunk window: a vertical scrollbar appears instead of clipping the dense WebRTC panels.
Enlarged / 4K window: the inner panel widget grows horizontally with the viewport, so the connection card and session table stretch edge-to-edge instead of clustering at the top-left.
The bottom
addStretch(1)in each panel still pushes content up when there is leftover height, so the layout doesn’t sag.
Heavy / rarely used groups (Manual SDP, Remote Files, Sync) on the
WebRTC viewer tab are also wrapped in collapsed-by-default
_CollapsibleSection shells via the new _wrap_collapsed
helper, halving the panel’s first-paint height.
Removed the previous hard setMaximumHeight(140) on the WebRTC
host’s session table: setMinimumHeight(140) keeps 140 px as a
starting hint without capping the table on large displays.
Remote desktop — MCP tool surface
The MCP server now wraps the same singleton remote-desktop
registry the GUI uses. The tools live under a new
remote_desktop_tools() factory in
je_auto_control/utils/mcp_server/tools/_factories.py:
ac_remote_host_startStart (or restart) the singleton TCP host with
token,bind,port,fps,quality,max_clients,host_id. Returns{running, port, host_id, connected_clients}.ac_remote_host_stopStop the host (no-op when nothing is running).
ac_remote_host_statusRead-only snapshot of the host registry. Survives
--readonlymode.ac_remote_viewer_connectOpen the singleton viewer to a remote host, supporting
expected_host_idto verify the 9-digit ID before accepting the session.ac_remote_viewer_disconnect/ac_remote_viewer_statusClose / observe the active viewer (status is read-only).
ac_remote_viewer_send_inputForward an input action dict (
mouse_move,mouse_press,mouse_release,mouse_scroll,key_press,key_release,type,hotkey) through the connected viewer to the remote host. Destructive — stripped under--readonly.
A model can now drive a complete remote-control flow without clicking through the GUI:
ac_remote_host_start(token="tok", bind="127.0.0.1", port=0)
→ {"running": true, "port": 51234, "host_id": "123456789",
"connected_clients": 0}
# … on a different machine …
ac_remote_viewer_connect(host="10.0.0.5", port=51234, token="tok",
expected_host_id="123456789")
→ {"connected": true, "host_id": "123456789"}
ac_remote_viewer_send_input(action={
"action": "mouse_move", "x": 100, "y": 200,
})
ac_remote_viewer_send_input(action={
"action": "type", "text": "hello",
})
The status / observer tools (ac_remote_host_status,
ac_remote_viewer_status) are read-only and survive the MCP
server’s --readonly filter; everything that mutates state is
correctly tagged destructiveHint: true so MCP clients can
prompt for user confirmation.
Driver-level input backends — drive games that ignore SendInput / XTest
The default Windows (SendInput) and Linux (XTest) input paths sit at
the user-mode / X-server layer. Modern games that read input via
GetRawInputData (Win) or evdev (Linux) skip those layers
entirely and ignore synthetic events. Three optional backends bridge
the gap.
Interception (Windows)
Oblita’s WHQL-signed Interception driver (https://github.com/oblitum/Interception) injects keyboard / mouse events at the HID layer; the OS sees them as real-hardware events.
New sub-package:
je_auto_control/windows/interception/(_dll.pyctypes bindings +keyboard.py+mouse.py).Same public surface as
win32_ctype_keyboard_control/win32_ctype_mouse_control— the platform wrapper just swaps modules, no caller changes.Opt-in via
JE_AUTOCONTROL_WIN32_BACKEND=interception; the wrapper falls back to SendInput with a warning when the driver is missing, so deployments can roll the driver out lazily.Override device IDs with
JE_AUTOCONTROL_INTERCEPTION_KEYBOARD/JE_AUTOCONTROL_INTERCEPTION_MOUSE(defaults:1/11).
Operator setup:
# 1. Install the driver as Administrator (one-time, requires reboot)
install-interception.exe /install
# 2. Tell AutoControl to route through it
setx JE_AUTOCONTROL_WIN32_BACKEND interception
uinput (Linux)
The kernel’s synthetic-input gateway. Events emitted via
/dev/uinput show up as a brand-new HID device, so anything reading
evdev (most games + SDL2 apps) sees them as real input.
New sub-package:
je_auto_control/linux_with_x11/uinput/(_device.pyctypes wrapper aroundioctl+keyboard.py+mouse.py).No third-party dependency — direct
ctypes+ioctlto/dev/uinput.Opt-in via
JE_AUTOCONTROL_LINUX_BACKEND=uinput; falls back to XTest with a warning when/dev/uinputisn’t writable.
Operator setup:
# Load the kernel module if it isn't already.
sudo modprobe uinput
# Grant write access. For one-off testing:
sudo chmod 666 /dev/uinput
# For persistent provisioning, drop a udev rule:
echo 'KERNEL=="uinput", GROUP="input", MODE="0660"' \
| sudo tee /etc/udev/rules.d/99-autocontrol-uinput.rules
sudo udevadm control --reload && sudo udevadm trigger
sudo usermod -aG input $USER # log out / back in to apply
# Then opt in:
export JE_AUTOCONTROL_LINUX_BACKEND=uinput
ViGEm virtual gamepad (Windows)
For games that don’t take keyboard input at all but read controllers,
ViGEmBus exposes a virtual Xbox 360 / DualShock 4 controller that
AutoControl drives through the third-party vgamepad Python
package.
New module:
je_auto_control/utils/gamepad/with a friendlyVirtualGamepadAPI (string-keyed buttons / dpad / sticks / triggers, context manager).Headless:
from je_auto_control import VirtualGamepad with VirtualGamepad() as pad: pad.click_button("a") # face button A pad.set_left_stick(16000, 0) # int16 stick offsets pad.set_right_trigger(255) # 0..255 pressure pad.set_dpad("up") # hold dpad up pad.update() # flush → driver
Executor commands:
AC_gamepad_press,AC_gamepad_release,AC_gamepad_click,AC_gamepad_dpad,AC_gamepad_left_stick/_right_stick,AC_gamepad_left_trigger/_right_trigger, andAC_gamepad_reset.MCP tools: same names with the
ac_prefix (ac_gamepad_press,ac_gamepad_left_stick, …) — so a model can play a gamepad-only game over MCP.
Operator setup:
# 1. Install the ViGEmBus driver (one-time, requires reboot)
# https://github.com/nefarius/ViGEmBus/releases
# 2. Install the Python wrapper:
pip install vgamepad
Anti-cheat caveat (all three)
Driver-level injection is harder to detect than SendInput / XTest, but anti-cheat systems with a kernel-mode driver of their own (Vanguard, Easy Anti-Cheat with kernel module, BattlEye) can still enumerate Interception / ViGEmBus / a freshly-created uinput device and refuse to launch.
These backends target legitimate use cases — accessibility software, GUI testing of games that lock out user-mode input, controlling a remote game-running machine from a headless setup — and aren’t a generic anti-cheat bypass.
Per-action profiler
Records wall-clock duration for every AC_* action so you can answer
“which step is dominating this script’s runtime?” without external
tooling. Profiling is opt-in — when disabled, the executor wrapper has
zero overhead:
import je_auto_control as ac
ac.default_profiler.enable()
ac.execute_action([["AC_locate_image_center", {"image": "btn.png"}],
["AC_click_mouse"]])
for row in ac.default_profiler.hot_spots(limit=5):
print(row.name, row.calls, row.average_seconds)
Action-JSON commands:
[["AC_profiler_enable"]]
[["AC_profiler_stats", {"limit": 10}]]
[["AC_profiler_hot_spots", {"limit": 5}]]
[["AC_profiler_reset"]]
[["AC_profiler_disable"]]
GUI: Profiler tab — live hot-spot table (calls / total / avg / min / max / share) refreshed every second. Toggle recording, reset stats, or export the snapshot through the headless API.
Run history timeline + failure thumbnails
The Run History tab gains a Gantt-style strip beneath the filter row: each scheduler / trigger / hotkey / webhook / email fire is rendered as a coloured bar on a horizontal time axis (green = ok, red = error, amber = still running). Selecting a bar syncs the table row, and a right-hand preview panel surfaces the failure screenshot already captured by the artifact manager.
Headless callers query the same data through the existing run history store:
import je_auto_control as ac
for row in ac.default_history_store.list_runs(limit=20):
print(row.id, row.status, row.duration_seconds, row.artifact_path)
No new commands — the store API is unchanged. The GUI is purely a thin visualization wrapper over the existing runs table.
Encrypted secret manager
Action scripts that need API tokens, IMAP passwords, etc. should never
embed plaintext. The new vault stores Fernet-encrypted entries under
~/.je_auto_control/secrets/vault.json; a passphrase derives the
key via PBKDF2-HMAC-SHA256 (600,000 iterations, 16-byte salt):
import je_auto_control as ac
ac.default_secret_manager.initialize("my-vault-passphrase")
ac.default_secret_manager.set("github_token", "ghp_xxxxx")
ac.default_secret_manager.lock()
# later — in the same process or a new run:
ac.default_secret_manager.unlock("my-vault-passphrase")
Action-JSON commands:
[["AC_secret_init", {"passphrase": "..."}]]
[["AC_secret_unlock", {"passphrase": "..."}]]
[["AC_secret_set", {"name": "github_token", "value": "ghp_xxx"}]]
[["AC_secret_list"]]
[["AC_secret_remove", {"name": "github_token"}]]
[["AC_secret_lock"]]
[["AC_secret_status"]]
Action scripts reference vault entries through ${secrets.NAME}
placeholders. The interpolator routes the secrets. namespace to the
vault rather than the regular variable scope, so plaintext values never
land in the variable bag:
[["AC_shell_command",
{"command": "curl -H \"Authorization: Bearer ${secrets.github_token}\" ..."}]]
GUI: Secrets tab — initialize the vault, unlock it, add / remove entries, change passphrase. The vault file is created with mode 0o600 on POSIX systems; on Windows the default ACL already restricts read access to the owning user.
Webhook (HTTP push) trigger
A bundled http.server dispatcher fires an action script when an
external service POSTs to a registered path. Configure path, allowed
methods, and an optional bearer token; the request method, path, query,
headers, raw body, and parsed JSON are seeded into the variable scope:
import je_auto_control as ac
ac.default_webhook_server.add(
path="/jobs/build", script_path="hooks/on_build.json",
methods=["POST"], token="topsecret",
)
host, port = ac.default_webhook_server.start("127.0.0.1", 0)
print("listening on", host, port)
The bound script reads the request through ${webhook.*} placeholders:
[
["AC_set_var", {"name": "branch", "value": "${webhook.query.ref}"}],
["AC_shell_command",
{"command": "echo received build for ${webhook.body}"}]
]
Action-JSON commands:
[["AC_webhook_start", {"host": "127.0.0.1", "port": 8765}]]
[["AC_webhook_add", {"path": "/jobs", "script_path": "...",
"methods": ["POST"], "token": "..."}]]
[["AC_webhook_list"]]
[["AC_webhook_remove", {"webhook_id": "abcd1234"}]]
[["AC_webhook_status"]]
[["AC_webhook_stop"]]
Each fire is recorded in run history as trigger with source id
webhook:<id> so the dashboard surfaces webhook activity alongside
other triggers. The body is capped at 1 MiB and bearer-token comparison
uses hmac.compare_digest(). Bind to 127.0.0.1 unless the
listener genuinely needs to be reachable from elsewhere on the network.
GUI: Webhooks tab — start/stop the server, register paths, view the fire counter and auth state per route.
IMAP email trigger
Poll-based watcher that logs into a mailbox on a configurable interval and runs an action script once per matching message:
import je_auto_control as ac
ac.default_email_trigger_watcher.add(
host="imap.gmail.com", username="user@example.com",
password="app-specific-password",
script_path="hooks/on_alert.json",
mailbox="INBOX", search_criteria='UNSEEN FROM "alerts@..."',
poll_seconds=120, mark_seen=True,
)
ac.default_email_trigger_watcher.start()
The bound script sees the message metadata via ${email.*}:
[
["AC_if_var", {
"name": "email.subject", "op": "contains", "value": "CRITICAL",
"then": [["AC_hotkey", {"keys": ["ctrl", "alt", "p"]}]]
}]
]
Variables seeded per fire: email.uid, email.from, email.to,
email.subject, email.message_id, email.date, email.body.
Action-JSON commands:
[["AC_email_trigger_add", {"host": "...", "username": "...",
"password": "${secrets.imap_pw}",
"script_path": "...",
"mailbox": "INBOX",
"search_criteria": "UNSEEN",
"poll_seconds": 120,
"mark_seen": true,
"use_ssl": true}]]
[["AC_email_trigger_start"]]
[["AC_email_trigger_poll_once"]]
[["AC_email_trigger_list"]]
[["AC_email_trigger_remove", {"trigger_id": "abcd1234"}]]
[["AC_email_trigger_stop"]]
The watcher tracks already-fired UIDs in process memory, and optionally
flags messages \\Seen so the same mail isn’t replayed across
restarts. TLS is pinned at 1.2 minimum. Combine AC_email_trigger_add
with ${secrets.NAME} so passwords never appear in the JSON.
GUI: Email Triggers tab — register IMAP triggers, start/stop the watcher, run a manual poll, inspect last error and fire counter.