Patents by Inventor Behnam Robatmili

Behnam Robatmili has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Systems and methods for accelerating data operations by utilizing native memory management

Patent number: 11194625

Abstract: For one embodiment of the present invention, methods and systems for accelerating data operations with efficient memory management in native code and native dynamic class loading mechanisms are disclosed. In one embodiment, a data processing system comprises memory and a processing unit coupled to the memory. The processing unit is configured to receive input data, to execute a domain specific language (DSL) for a DSL operation with a native implementation, to translate a user defined function (UDF) into the native implementation by translating user defined managed software code into native software code, to execute the native software code in the native implementation, and to utilize a native memory management mechanism for the memory to manage object instances in the native implementation.

Type: Grant

Filed: December 3, 2019

Date of Patent: December 7, 2021

Assignee: BIGSTREAM SOLUTIONS, INC.

Inventors: Weiwei Chen, Behnam Robatmili, Maysam Lavasani, John David Davis
SYSTEMS AND METHODS FOR ACCELERATING DATA OPERATIONS BY UTILIZING DATAFLOW SUBGRAPH TEMPLATES

Publication number: 20200301898

Abstract: Methods and systems are disclosed for accelerating Big Data operations by utilizing subgraph templates for a hardware accelerator of a computational storage device. In one example, a computer-implemented method comprises performing a query with a dataflow compiler, performing a stage acceleration analyzer function including executing a matching algorithm to determine similarities between sub-graphs of an application program and unique templates from an available library of templates; and selecting at least one template that at least partially matches the sub-graphs with the at least one template being associated with a linear set of operators to be executed sequentially within a stage of the Big Data operations.

Type: Application

Filed: June 10, 2020

Publication date: September 24, 2020

Applicant: BigStream Solutions, Inc.

Inventors: Balavinayagam Samynathan, Keith Chapman, Mehdi Nik, Behnam Robatmili, Shahrzad Mirkhani, Maysam Lavasani, John David Davis, Danesh Tavana, Weiwei Chen
SYSTEMS AND METHODS FOR ACCELERATING DATA OPERATIONS BY UTILIZING NATIVE MEMORY MANAGEMENT

Publication number: 20200183749

Abstract: For one embodiment of the present invention, methods and systems for accelerating data operations with efficient memory management in native code and native dynamic class loading mechanisms are disclosed. In one embodiment, a data processing system comprises memory and a processing unit coupled to the memory. The processing unit is configured to receive input data, to execute a domain specific language (DSL) for a DSL operation with a native implementation, to translate a user defined function (UDF) into the native implementation by translating user defined managed software code into native software code, to execute the native software code in the native implementation, and to utilize a native memory management mechanism for the memory to manage object instances in the native implementation.

Type: Application

Filed: December 3, 2019

Publication date: June 11, 2020

Applicant: BigStream Solutions, Inc.

Inventors: Weiwei Chen, Behnam Robatmili, Maysam Lavasani, John David Davis
SYSTEMS AND METHODS FOR ACCELERATING DATA OPERATIONS BY UTILIZING DATAFLOW SUBGRAPH TEMPLATES

Publication number: 20190392002

Abstract: Methods and systems are disclosed for accelerating big data operations by utilizing subgraph templates. In one example, a data processing system includes a data processing system comprising a hardware processor and a hardware accelerator coupled to the hardware processor. The hardware accelerator is configured with a compiler of an accelerator functionality to generate an execution plan, to generate computations for nodes including subgraphs in a distributed system for an application program based on the execution plan, and to execute a matching algorithm to determine similarities between the subgraphs and unique templates from an available library of templates.

Type: Application

Filed: June 25, 2019

Publication date: December 26, 2019

Applicant: BigStream Solutions, Inc.

Inventors: Maysam Lavasani, John David Davis, Danesh Tavana, Weiwei Chen, Balavinayagam Samynathan, Behnam Robatmili
Method for simplified task-based runtime for efficient parallel computing

Patent number: 10169105

Abstract: Aspects include computing devices, systems, and methods for implementing scheduling and execution of lightweight kernels as simple tasks directly by a thread without setting up a task structure. A computing device may determine whether a task pointer in a task queue is a simple task pointer for the lightweight kernel. The computing device may schedule a first simple task for the lightweight kernel for execution by the thread. The computing device may retrieve, from an entry of a simple task table, a kernel pointer for the lightweight kernel. The entry in the simple task table may be associated with the simple task pointer. The computing device may directly execute the lightweight kernel as the simple task.

Type: Grant

Filed: January 11, 2016

Date of Patent: January 1, 2019

Assignee: QUALCOMM Incorporated

Inventors: Han Zhao, Pablo Montesinos Ortego, Arun Raman, Behnam Robatmili, Gheorghe Calin Cascaval
Hardware acceleration for inline caches in dynamic languages

Patent number: 9740504

Abstract: Aspects include apparatuses, systems, and methods for hardware acceleration for inline caches in dynamic languages. An inline cache may be initialized for an instance of a dynamic software operation. A call of an initialized instance of the dynamic software operation may be executed by an inline cache hardware accelerator. The inline cache may be checked to determine that its data is current. When the data is current, the initialized instance of the dynamic software operation may be executed using the related inline cache data. When the data is not current, a new inline cache may be initialized for the instance of the dynamic software operation, including the not current data of a previously initialized instance of the dynamic software operation. The inline cache hardware accelerator may include an inline cache memory, a coprocessor, and/or a functional until one an inline cache pipeline connected to a processor pipeline.

Type: Grant

Filed: April 28, 2014

Date of Patent: August 22, 2017

Assignee: QUALCOMM Incorporated

Inventors: Behnam Robatmili, Gheorghe Calin Cascaval, Madhukar Nagaraja Kedlaya, Dario Suarez Gracia
Hardware acceleration for inline caches in dynamic languages

Patent number: 9710388

Abstract: Aspects include a computing devices, systems, and methods for hardware acceleration for inline caches in dynamic languages. An inline cache may be initialized for an instance of a dynamic software operation. A call of an initialized instance of the dynamic software operation may be executed by an inline cache hardware accelerator. The inline cache may be checked to determine that its data is current. When the data is current, the initialized instance of the dynamic software operation may be executed using the related inline cache data. When the data is not current, a new inline cache may be initialized for the instance of the dynamic software operation, including the not current data of a previously initialized instance of the dynamic software operation. The inline cache hardware accelerator may include an inline cache memory, a coprocessor, and/or a functional until one an inline cache pipeline connected to a processor pipeline.

Type: Grant

Filed: April 28, 2014

Date of Patent: July 18, 2017

Assignee: QUALCOMM Incorporated

Inventors: Behnam Robatmili, Gheorghe Calin Cascaval, Madhukar Nagaraja Kedlaya, Dario Suarez Gracia
Data-Driven Accelerator For Machine Learning And Raw Data Analysis

Publication number: 20170083827

Abstract: Embodiments include computing devices, apparatus, and methods implemented by the apparatus for accelerating machine learning on a computing device. Raw data may be received in the computing device from a raw data source device. The apparatus may identify key features as two dimensional matrices of the raw data such that the key features are mutually exclusive from each other. The key features may be translated into key feature vectors. The computing device may generate a feature vector from at least one of the key feature vectors. The computing device may receive a first partial output resulting from an execution of a basic linear algebra subprogram (BLAS) operation using the feature vector and a weight factor. The first partial output may be combined with a plurality of partial outputs to produce an output matrix. Receiving the raw data on the computing device may include receiving streaming raw data.

Type: Application

Filed: September 23, 2015

Publication date: March 23, 2017

Inventors: Behnam Robatmili, Matthew Leslie Badin, Dario Suárez Gracia, Gheorghe Calin Cascaval, Nayeem Islam
Method For Simplified Task-based Runtime For Efficient Parallel Computing

Publication number: 20170031728

Abstract: Aspects include computing devices, systems, and methods for implementing scheduling and execution of lightweight kernels as simple tasks directly by a thread without setting up a task structure. A computing device may determine whether a task pointer in a task queue is a simple task pointer for the lightweight kernel. The computing device may schedule a first simple task for the lightweight kernel for execution by the thread. The computing device may retrieve, from an entry of a simple task table, a kernel pointer for the lightweight kernel. The entry in the simple task table may be associated with the simple task pointer. The computing device may directly execute the lightweight kernel as the simple task.

Type: Application

Filed: January 11, 2016

Publication date: February 2, 2017

Inventors: Han Zhao, Pablo Montesinos Ortego, Arun Raman, Behnam Robatmili, Gheorghe Calin Cascaval
Method and system for accelerating task control flow

Patent number: 9529643

Abstract: A computing device (e.g., a mobile computing device, etc.) may be configured to may be configured to better exploit the concurrency and parallelism enabled by modern multiprocessor architectures by identifying a sequence of tasks via a task dependency controller, commencing execution of a first task in the sequence of tasks, and setting a value of a register so that each remaining task in the sequence of tasks executes after its predecessor task finishes execution without transferring control to a runtime system of the computing device. The task dependency controller may be a hardware component that is shared by the processor cores and/or otherwise configured to transfer control between tasks executing on different processor cores independent of the runtime system and/or without performing the relatively slow and memory-based inter-task, inter-thread or inter-process communications required by conventional solutions.

Type: Grant

Filed: January 26, 2015

Date of Patent: December 27, 2016

Assignee: QUALCOMM Incorporated

Inventors: Arun Raman, Behnam Robatmili
Method for exploiting parallelism in task-based systems using an iteration space splitter

Patent number: 9501328

Abstract: Embodiments include computing devices, systems, and methods for task-based handling of repetitive processes in parallel. At least one processor of the computing device, or a specialized hardware controller, may be configured to partition iterations of a repetitive process and assign the partitions to initialized tasks to be executed in parallel by a plurality of processor cores. Upon completing a task, remaining divisible partitions of the repetitive process of ongoing tasks may be subpartitioned and assigned to the ongoing task, and the completed task or a newly initialized task. Information about the iteration space for a repetitive process may be stored in a descriptor table, and status information for all partitions of a repetitive process stored in a status table. Each processor core may have an associated local table that tracks iteration execution of each task, and is synchronized with the status table.

Type: Grant

Filed: March 30, 2015

Date of Patent: November 22, 2016

Assignee: QUALCOMM Incorporated

Inventors: Behnam Robatmili, Shaizeen Dilawarhusen Aga, Dario Suarez Gracia, Arun Raman, Aravind Natarajan, Gheorghe Calin Cascaval, Pablo Montesinos Ortego, Han Zhao
REMOVING INVALID LITERAL LOAD VALUES, AND RELATED CIRCUITS, METHODS, AND COMPUTER-READABLE MEDIA

Publication number: 20160291981

Abstract: Removing invalid literal load values, and related circuits, methods, and computer-readable media are disclosed. In one aspect, an instruction processing circuit provides a literal load table containing one or more entries comprising an address and a cached literal load value. Upon detecting a literal load instruction in an instruction stream, the instruction processing circuit determines whether the literal load table contains an entry having an address of the literal load instruction. If so, the instruction processing circuit removes the literal load instruction from the instruction stream, and provides the cached literal load value stored in the entry to at least one dependent instruction. The instruction processing circuit further determines whether an invalidity indicator for the literal load table has been received. If so, the instruction processing circuit flushes the literal load table. The invalidity indicator may be generated responsive to modification of a constant table.

Type: Application

Filed: April 6, 2015

Publication date: October 6, 2016

Inventors: Behnam Robatmili, Gheorghe Calin Cascaval, Michael William Morrow, Derek Jay Conrod, Bohuslav Rychlik
METHOD FOR EXPLOITING PARALLELISM IN TASK-BASED SYSTEMS USING AN ITERATION SPACE SPLITTER

Publication number: 20160292012

Abstract: Embodiments include computing devices, systems, and methods for task-based handling of repetitive processes in parallel. At least one processor of the computing device, or a specialized hardware controller, may be configured to partition iterations of a repetitive process and assign the partitions to initialized tasks to be executed in parallel by a plurality of processor cores. Upon completing a task, remaining divisible partitions of the repetitive process of ongoing tasks may be subpartitioned and assigned to the ongoing task, and the completed task or a newly initialized task. Information about the iteration space for a repetitive process may be stored in a descriptor table, and status information for all partitions of a repetitive process stored in a status table. Each processor core may have an associated local table that tracks iteration execution of each task, and is synchronized with the status table.

Type: Application

Filed: March 30, 2015

Publication date: October 6, 2016

Inventors: Behnam ROBATMILI, Shaizeen Dilawarhusen Aga, Dario Suarez Gracia, Arun Raman, Aravind Natarajan, Gheorghe Calin Cascaval, Pablo Montesinos Ortego, Han Zhao
Method and System for Accelerating Task Control Flow

Publication number: 20160217016

Abstract: A computing device (e.g., a mobile computing device, etc.) may be configured to may be configured to better exploit the concurrency and parallelism enabled by modern multiprocessor architectures by identifying a sequence of tasks via a task dependency controller, commencing execution of a first task in the sequence of tasks, and setting a value of a register so that each remaining task in the sequence of tasks executes after its predecessor task finishes execution without transferring control to a runtime system of the computing device. The task dependency controller may be a hardware component that is shared by the processor cores and/or otherwise configured to transfer control between tasks executing on different processor cores independent of the runtime system and/or without performing the relatively slow and memory-based inter-task, inter-thread or inter-process communications required by conventional solutions.

Type: Application

Filed: January 26, 2015

Publication date: July 28, 2016

Inventors: Arun Raman, Behnam Robatmili
SYSTEM AND METHOD FOR ADAPTIVELY MANAGING REGISTERS IN AN INSTRUCTION PROCESSOR

Publication number: 20160216969

Abstract: Systems and methods for adaptively managing registers in an instruction processor are disclosed. The system identifies one or more registers with inoperable cells. An operand manager identifies a set of operable cells within the one or more registers with inoperable cells and determines if a present instruction will use an operand that can be supported by the set of operable cells. When the set of operable cells can support the operand, the operand manager generates an assignment which is communicated to a register file manager.

Type: Application

Filed: January 28, 2015

Publication date: July 28, 2016

Inventors: Dario Suarez Gracia, Behnam Robatmili
Hardware Acceleration For Inline Caches In Dynamic Languages

Publication number: 20150205720

Abstract: Aspects include apparatuses, systems, and methods for hardware acceleration for inline caches in dynamic languages. An inline cache may be initialized for an instance of a dynamic software operation. A call of an initialized instance of the dynamic software operation may be executed by an inline cache hardware accelerator. The inline cache may be checked to determine that its data is current. When the data is current, the initialized instance of the dynamic software operation may be executed using the related inline cache data. When the data is not current, a new inline cache may be initialized for the instance of the dynamic software operation, including the not current data of a previously initialized instance of the dynamic software operation. The inline cache hardware accelerator may include an inline cache memory, a coprocessor, and/or a functional until one an inline cache pipeline connected to a processor pipeline.

Type: Application

Filed: April 28, 2014

Publication date: July 23, 2015

Applicant: QUALCOMM Incorporated

Inventors: Behnam Robatmili, Gheorghe Calin Cascaval, Madhukar Nagaraja Kedlay, Dario Suarez Gracia
Hardware Acceleration for Inline Caches in Dynamic Languages

Publication number: 20150205726

Abstract: Aspects include a computing devices, systems, and methods for hardware acceleration for inline caches in dynamic languages. An inline cache may be initialized for an instance of a dynamic software operation. A call of an initialized instance of the dynamic software operation may be executed by an inline cache hardware accelerator. The inline cache may be checked to determine that its data is current. When the data is current, the initialized instance of the dynamic software operation may be executed using the related inline cache data. When the data is not current, a new inline cache may be initialized for the instance of the dynamic software operation, including the not current data of a previously initialized instance of the dynamic software operation. The inline cache hardware accelerator may include an inline cache memory, a coprocessor, and/or a functional until one an inline cache pipeline connected to a processor pipeline.

Type: Application

Filed: April 28, 2014

Publication date: July 23, 2015

Applicant: QUALCOMM Incorporated

Inventors: Behnam Robatmili, Gheorghe Calin Cascaval, Madhukar Nagaraja Kedlaya, Dario Suarez Gracia
SYSTEMS AND METHODS FOR SELECTION OF SPECIALIZED FUNCTIONS IN DYNAMICALLY-TYPED LANGUAGES

Publication number: 20140173556

Abstract: Systems, methods, and devices for executing a function in a dynamically-typed language are described herein. In one aspect, a method includes generating a function selection decision tree based on one or more specializations of a generic function and one or more function inputs via an electronic device. The method further includes selecting one of the specializations or the generic function based on an input type of at least one function input via the electronic device. The method further includes calling the selected specialization or generic function via the electronic device. Another aspect of the subject matter described in the disclosure provides a method of executing a function in a prototype-based dynamically-typed language. The method includes maintaining a list of calls to one or more specializations of the function via the electronic device. The method further includes creating or destroying a specialization of the function via the electronic device.

Type: Application

Filed: November 18, 2013

Publication date: June 19, 2014

Applicant: QUALCOMM Incorporated

Inventors: Behnam Robatmili, Derek Jay Conrod, Mohammad Hossein Reshadi, Subrato Kumar De, Gheorghe Calin Cascaval