Patents Assigned to Anthropic, PBC
-
Patent number: 12619815Abstract: A system for magnitude-invariant image-text agentic interface automation is disclosed. A bit vectorization logic is configured to convert image patches in a plurality of image patches into magnitude-invariant bit vectors, and generate a plurality of lines of magnitude-invariant bit vectors. A tokenization logic is configured to translate the input text sequence into a sequence of input text tokens, and to translate the successive lines of magnitude-invariant bit vectors interleaved with a newline character into a sequence of input magnitude-invariant bit vector tokens. A linear projection logic is configured to linearly project a single token stream of the sequence of input text tokens and the sequence of input magnitude-invariant bit vector tokens into a decoder-only Transformer logic, wherein the linear projection of the single token stream bypasses any embedding lookup.Type: GrantFiled: October 8, 2024Date of Patent: May 5, 2026Assignee: Anthropic, PBCInventors: Curtis Hawthorne, Erich Elsen, Augustus Odena, Maxwell Nye, Arushi Somani, Kyle Vigen, Rohan Bavishi, Sagnak Tasirlar, Warut Vijitbenjaronk, Ulas Kirazci, Joe Gershenson, Shaya Zarkesh
-
Patent number: 12585862Abstract: A system for automating software usage includes an agent configured to automate. The agent is trained on one or more training data sets. The one or more training datasets include one or more of a first training dataset including documents containing text interleaved with images, a second training dataset including text embedded in images, a third training dataset including recorded videos of software usage, a fourth training dataset including portable document format (PDF) documents, a fifth training dataset including recorded videos of software tool usage trajectories, a sixth training dataset including images of open-domain web pages, a seventh training dataset including images of specific-domain web pages, and/or an eighth training dataset including images of agentic trajectories of the agent performing interface automation task workflows.Type: GrantFiled: October 8, 2024Date of Patent: March 24, 2026Assignee: Anthropic, PBCInventors: Sagnak Tasirlar, David Abrahams, Lina Lukyantseva, Erich Elsen, Maxwell Nye, Augustus Odena, Rohan Bavishi, Vibhaa Sivaraman, Adam Hoff, Teddy Rothschild, Shaya Zarkesh, Deepak Moparthi, Jacob van Gogh, Claire Pajot, Curtis Hawthorne, Matt Elkherj, Warut Vijitbenjaronk, Arushi Somani, Johnny Lee, Joe Gershenson, Jordyn Shuell, Danielle Perszyk
-
Patent number: 12566913Abstract: A system for interface automation includes an agent. The agent is configured to process an input that specifies an interface workflow, wherein the interface workflow is otherwise implementable by one or more user-actuated actions directed towards an interface by a user. The agent is also configured to generate an output that specifies a sequence of actuation commands, wherein the sequence of actuation commands triggers one or more machine-actuated actions that replicate the user-actuated actions on the interface and cause automation of the interface workflow.Type: GrantFiled: October 8, 2024Date of Patent: March 3, 2026Assignee: Anthropic, PBCInventors: Rohan Bavishi, Lina Lukyantseva, Shaya Zarkesh, David Luan, Basil Safwat, Amelia Wattenberger, Kadhir Manickam, Inigo Beitia Arevalo, James Lu, Omkar Savant, Zach Brock, Jacob van Gogh, Rick Liu, Deepak Moparthi, Claire Pajot, Joe Gershenson, Arushi Somani, Armaan Goel, Kevin Keller, Erich Elsen, Curtis Hawthorne
-
Patent number: 12437238Abstract: A system for generating training data to train agents to automate tasks otherwise done by users includes an intermediary disposed between an interface and a user. The intermediary is configured to: intercept one or more user-actuated actions directed towards the interface by the user, the user-actuated actions, if received by the interface, execute a task on the interface; preserve a state of the interface prior to the execution of the task; translate the user-actuated actions into one or more actuation commands, the actuation commands configured to trigger one or more machine-actuated actions that replicate the user-actuated actions on the interface to cause automation of the task; and generate a training dataset to train an agent to automate the task, wherein the training dataset requires the agent to process, as input, the state of the interface prior to the execution of the task, and to generate, as output, the actuation commands.Type: GrantFiled: October 7, 2024Date of Patent: October 7, 2025Assignee: Anthropic, PBCInventors: Shaya Zarkesh, Lina Lukyantseva, Rohan Bavishi, David Luan, John Qian, Claire Pajot, Fred Bertsch, Erich Elsen, Curtis Hawthorne
-
Patent number: 12430150Abstract: A system for client-side implementation of an interface automation language at runtime includes agent specification logic and runtime interpretation logic. The agent specification logic, running on client-side, is configured construct an agent specification, and to make the agent specification available for server-side translation into an intermediate representation, wherein the agent specification is configured to automate a multimodal interface workflow.Type: GrantFiled: October 8, 2024Date of Patent: September 30, 2025Assignee: Anthropic, PBCInventors: Rohan Bavishi, Lina Lukyantseva, Shaya Zarkesh, Kadhir Manickam, Jacob van Gogh, Frederick Robinson, Rick Liu, Vibhaa Sivaraman, Matthew Elkherj, Billy Wang, Armaan Goel, Bryan Schmidt, Erich Elsen, Curtis Hawthorne
-
Publication number: 20250299023Abstract: A system for constructing prompts that cause an agent to automate multimodal interface workflows includes agent specification logic and agent calling logic. The agent specification logic is configured to construct agent specifications using prompts and agent functions, wherein the agent specifications are configured to automate a multimodal interface workflow. The agent calling logic is in communication with the agent specification logic and is configured to translate the agent specifications into agent calls that cause an agent to implement the agent functions to produce outputs that are responsive to the prompts.Type: ApplicationFiled: October 8, 2024Publication date: September 25, 2025Applicant: Anthropic, PBCInventors: Lina Lukyantseva, Rohan Bavishi, Shaya Zarkesh, Kadhir Manickam, Jacob van Gogh, Rick Liu, Claire Pajot, Armaan Goel, Erich Elsen, Curtis Hawthorne
-
Data Flow Logic for Providing Artificial Intelligence Agents that Automate Multimodal Software Usage
Publication number: 20250299074Abstract: A system for providing artificial intelligence agents that automate software usage includes training servers configured to train agents during training, production servers configured to execute the trained agents during inference, a plurality of training datasets, and data flow logic. The data flow logic is configured to, provide, during the training, the agents and the plurality of training datasets to the training servers to cause the training servers to train the agents on the plurality of training datasets and thereby produce the trained agents, configure the production servers with the trained agents for use during the inference, provide, during the inference, prompts issued by clients to the production servers to cause the production servers to translate the prompts into agent calls to the trained agents that in turn cause the trained agents to generate outputs that are responsive to the prompts, and make the outputs available to the clients.Type: ApplicationFiled: October 8, 2024Publication date: September 25, 2025Applicant: Anthropic, PBCInventors: Shaya Zarkesh, Lina Lukyantseva, Rohan BAVISHI, David LUAN, Zach Brock, Yufeng Zhou, Inigo Beitia Arevalo, Kadhir Manickam, Kyle VIGEN, James Lu, Bryan Schmidt, Bryan Silverthorn, Armaan Goel, Kavya Ravi Shankar, Phillip Norman, Alexander Jaffe, Bassil Shama, Erich ELSEN, Curtis HAWTHORNE, Sagnak Tasirlar, David Abrahams, Marxell Nye, Augustus Odena, Vibhaa Sivaraman, Adam Hoff, Teddy Rothschild, Deepak MOPARTHI, Jacob van Gogh, Claire Pajot, Matt Elkherj, Warut Vijitbenjaronk, Arushi SOMANI, Johnny Lee, Joe Gershenson, Jordyn Shuell, Danielle Perszyk -
Publication number: 20250299024Abstract: A system for magnitude-invariant image-text agentic interface automation is disclosed. A bit vectorization logic is configured to convert image patches in a plurality of image patches into magnitude-invariant bit vectors, and generate a plurality of lines of magnitude-invariant bit vectors. A tokenization logic is configured to translate the input text sequence into a sequence of input text tokens, and to translate the successive lines of magnitude-invariant bit vectors interleaved with a newline character into a sequence of input magnitude-invariant bit vector tokens. A linear projection logic is configured to linearly project a single token stream of the sequence of input text tokens and the sequence of input magnitude-invariant bit vector tokens into a decoder-only Transformer logic, wherein the linear projection of the single token stream bypasses any embedding lookup.Type: ApplicationFiled: October 8, 2024Publication date: September 25, 2025Applicant: Anthropic, PBCInventors: Curtis HAWTHORNE, Erich ELSEN, Augustus ODENA, Maxwell NYE, Arushi SOMANI, Kyle VIGEN, Rohan BAVISHI, Sagnak Tasirlar, Warut Vijitbenjaronk, Ulas Kirazci, Joe Gershenson, Shaya ZARKESH
-
Publication number: 20250299510Abstract: A system for automating software usage includes an agent configured to automate. The agent is trained on one or more training data sets. The one or more training datasets include one or more of a first training dataset including documents containing text interleaved with images, a second training dataset including text embedded in images, a third training dataset including recorded videos of software usage, a fourth training dataset including portable document format (PDF) documents, a fifth training dataset including recorded videos of software tool usage trajectories, a sixth training dataset including images of open-domain web pages, a seventh training dataset including images of specific-domain web pages, and/or an eighth training dataset including images of agentic trajectories of the agent performing interface automation task workflows.Type: ApplicationFiled: October 8, 2024Publication date: September 25, 2025Applicant: Anthropic, PBCInventors: Sagnak Tasirlar, David Abrahams, Lina Lukyantseva, Erich Elsen, Maxwell NYE, Augustus ODENA, Rohan BAVISHI, Vibhaa Sivaraman, Adam Hoff, Teddy Rothschild, Shaya Zarkesh, Deepak MOPARTHI, Jacob van Gogh, Claire Pajot, Curtis HAWTHORNE, Matt Elkherj, Warut Vijitbenjaronk, Arushi SOMANI, Johnny Lee, Joe Gershenson, Jordyn Shuell, Danielle Perszyk
-
Publication number: 20250298641Abstract: A system for client-side implementation of an interface automation language at runtime includes agent specification logic and runtime interpretation logic. The agent specification logic, running on client-side, is configured construct an agent specification, and to make the agent specification available for server-side translation into an intermediate representation, wherein the agent specification is configured to automate a multimodal interface workflow.Type: ApplicationFiled: October 8, 2024Publication date: September 25, 2025Applicant: Anthropic, PBCInventors: Rohan BAVISHI, Lina Lukyantseva, Shaya Zarkesh, Kadhir Manickam, Jacob van Gogh, Frederick Robinson, Rick Liu, Vibhaa Sivaraman, Matthew Elkherj, Billy Wang, Armaan Goel, Bryan Schmidt, Erich ELSEN, Curtis HAWTHORNE
-
Publication number: 20250298495Abstract: Artificial Intelligence Agents to Automate Multimodal Interface Task Workflows A system for interface automation includes an agent. The agent is configured to process an input that specifies an interface workflow, wherein the interface workflow is otherwise implementable by one or more user-actuated actions directed towards an interface by a user. The agent is also configured to generate an output that specifies a sequence of actuation commands, wherein the sequence of actuation commands triggers one or more machine-actuated actions that replicate the user-actuated actions on the interface and cause automation of the interface workflow.Type: ApplicationFiled: October 8, 2024Publication date: September 25, 2025Applicant: Anthropic, PBCInventors: Rohan BAVISHI, Lina Lukyantseva, Shaya ZARKESH, David LUAN, Basil Safwat, Amelia Wattenberger, Kadhir Manickam, Inigo Beitia Arevalo, James Lu, Omkar Savant, Zach Brock, Jacob van Gogh, Rick Liu, Deepak MOPARTHI, Claire Pajot, Joe Gershenson, Arushi SOMANI, Armaan Goel, Kevin Keller, Erich ELSEN, Curtis HAWTHORNE
-
Publication number: 20250299098Abstract: A system for generating training data to train agents to automate tasks otherwise done by users includes an intermediary disposed between an interface and a user. The intermediary is configured to: intercept one or more user-actuated actions directed towards the interface by the user, the user-actuated actions, if received by the interface, execute a task on the interface; preserve a state of the interface prior to the execution of the task; translate the user-actuated actions into one or more actuation commands, the actuation commands configured to trigger one or more machine-actuated actions that replicate the user-actuated actions on the interface to cause automation of the task; and generate a training dataset to train an agent to automate the task, wherein the training dataset requires the agent to process, as input, the state of the interface prior to the execution of the task, and to generate, as output, the actuation commands.Type: ApplicationFiled: October 7, 2024Publication date: September 25, 2025Applicant: Anthropic, PBCInventors: Shaya ZARKESH, Lina Lukyantseva, Rohan BAVISHI, David LUAN, John Qian, Claire Pajot, Fred Bertsch, Erich ELSEN, Curtis HAWTHORNE
-
Patent number: 12387036Abstract: A system for image-text agentic interface automation is disclosed. A multimodal agent is configured to process arbitrary-length text sequences and arbitrary-resolution images. A newline insertion logic is configured to interleave a newline character between successive lines of image patches in a plurality of lines of image patches, wherein the newline character specifies an end of a line in an input image. A tokenization logic is configured to translate the input text sequence into a sequence of input text tokens, and to translate the successive lines of image patches interleaved with the newline character into a sequence of input image tokens. A linear projection logic is configured to linearly project a single token stream of the sequence of input text tokens and the sequence of input image tokens into a decoder-only Transformer logic, wherein the linear projection of the single token stream bypasses any embedding lookup.Type: GrantFiled: October 8, 2024Date of Patent: August 12, 2025Assignee: Anthropic, PBCInventors: Erich Elsen, Curtis Hawthorne, Augustus Odena, Maxwell Nye, Arushi Somani, Kyle Vigen, Rohan Bavishi, Sagnak Tasirlar, Warut Vijitbenjaronk, Ulas Kirazci, Joe Gershenson, Shaya Zarkesh