Patents by Inventor Behnam Robatmili

Behnam Robatmili has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11194625
    Abstract: For one embodiment of the present invention, methods and systems for accelerating data operations with efficient memory management in native code and native dynamic class loading mechanisms are disclosed. In one embodiment, a data processing system comprises memory and a processing unit coupled to the memory. The processing unit is configured to receive input data, to execute a domain specific language (DSL) for a DSL operation with a native implementation, to translate a user defined function (UDF) into the native implementation by translating user defined managed software code into native software code, to execute the native software code in the native implementation, and to utilize a native memory management mechanism for the memory to manage object instances in the native implementation.
    Type: Grant
    Filed: December 3, 2019
    Date of Patent: December 7, 2021
    Assignee: BIGSTREAM SOLUTIONS, INC.
    Inventors: Weiwei Chen, Behnam Robatmili, Maysam Lavasani, John David Davis
  • Publication number: 20200301898
    Abstract: Methods and systems are disclosed for accelerating Big Data operations by utilizing subgraph templates for a hardware accelerator of a computational storage device. In one example, a computer-implemented method comprises performing a query with a dataflow compiler, performing a stage acceleration analyzer function including executing a matching algorithm to determine similarities between sub-graphs of an application program and unique templates from an available library of templates; and selecting at least one template that at least partially matches the sub-graphs with the at least one template being associated with a linear set of operators to be executed sequentially within a stage of the Big Data operations.
    Type: Application
    Filed: June 10, 2020
    Publication date: September 24, 2020
    Applicant: BigStream Solutions, Inc.
    Inventors: Balavinayagam Samynathan, Keith Chapman, Mehdi Nik, Behnam Robatmili, Shahrzad Mirkhani, Maysam Lavasani, John David Davis, Danesh Tavana, Weiwei Chen
  • Publication number: 20200183749
    Abstract: For one embodiment of the present invention, methods and systems for accelerating data operations with efficient memory management in native code and native dynamic class loading mechanisms are disclosed. In one embodiment, a data processing system comprises memory and a processing unit coupled to the memory. The processing unit is configured to receive input data, to execute a domain specific language (DSL) for a DSL operation with a native implementation, to translate a user defined function (UDF) into the native implementation by translating user defined managed software code into native software code, to execute the native software code in the native implementation, and to utilize a native memory management mechanism for the memory to manage object instances in the native implementation.
    Type: Application
    Filed: December 3, 2019
    Publication date: June 11, 2020
    Applicant: BigStream Solutions, Inc.
    Inventors: Weiwei Chen, Behnam Robatmili, Maysam Lavasani, John David Davis
  • Publication number: 20190392002
    Abstract: Methods and systems are disclosed for accelerating big data operations by utilizing subgraph templates. In one example, a data processing system includes a data processing system comprising a hardware processor and a hardware accelerator coupled to the hardware processor. The hardware accelerator is configured with a compiler of an accelerator functionality to generate an execution plan, to generate computations for nodes including subgraphs in a distributed system for an application program based on the execution plan, and to execute a matching algorithm to determine similarities between the subgraphs and unique templates from an available library of templates.
    Type: Application
    Filed: June 25, 2019
    Publication date: December 26, 2019
    Applicant: BigStream Solutions, Inc.
    Inventors: Maysam Lavasani, John David Davis, Danesh Tavana, Weiwei Chen, Balavinayagam Samynathan, Behnam Robatmili
  • Patent number: 10169105
    Abstract: Aspects include computing devices, systems, and methods for implementing scheduling and execution of lightweight kernels as simple tasks directly by a thread without setting up a task structure. A computing device may determine whether a task pointer in a task queue is a simple task pointer for the lightweight kernel. The computing device may schedule a first simple task for the lightweight kernel for execution by the thread. The computing device may retrieve, from an entry of a simple task table, a kernel pointer for the lightweight kernel. The entry in the simple task table may be associated with the simple task pointer. The computing device may directly execute the lightweight kernel as the simple task.
    Type: Grant
    Filed: January 11, 2016
    Date of Patent: January 1, 2019
    Assignee: QUALCOMM Incorporated
    Inventors: Han Zhao, Pablo Montesinos Ortego, Arun Raman, Behnam Robatmili, Gheorghe Calin Cascaval
  • Patent number: 9740504
    Abstract: Aspects include apparatuses, systems, and methods for hardware acceleration for inline caches in dynamic languages. An inline cache may be initialized for an instance of a dynamic software operation. A call of an initialized instance of the dynamic software operation may be executed by an inline cache hardware accelerator. The inline cache may be checked to determine that its data is current. When the data is current, the initialized instance of the dynamic software operation may be executed using the related inline cache data. When the data is not current, a new inline cache may be initialized for the instance of the dynamic software operation, including the not current data of a previously initialized instance of the dynamic software operation. The inline cache hardware accelerator may include an inline cache memory, a coprocessor, and/or a functional until one an inline cache pipeline connected to a processor pipeline.
    Type: Grant
    Filed: April 28, 2014
    Date of Patent: August 22, 2017
    Assignee: QUALCOMM Incorporated
    Inventors: Behnam Robatmili, Gheorghe Calin Cascaval, Madhukar Nagaraja Kedlaya, Dario Suarez Gracia
  • Patent number: 9710388
    Abstract: Aspects include a computing devices, systems, and methods for hardware acceleration for inline caches in dynamic languages. An inline cache may be initialized for an instance of a dynamic software operation. A call of an initialized instance of the dynamic software operation may be executed by an inline cache hardware accelerator. The inline cache may be checked to determine that its data is current. When the data is current, the initialized instance of the dynamic software operation may be executed using the related inline cache data. When the data is not current, a new inline cache may be initialized for the instance of the dynamic software operation, including the not current data of a previously initialized instance of the dynamic software operation. The inline cache hardware accelerator may include an inline cache memory, a coprocessor, and/or a functional until one an inline cache pipeline connected to a processor pipeline.
    Type: Grant
    Filed: April 28, 2014
    Date of Patent: July 18, 2017
    Assignee: QUALCOMM Incorporated
    Inventors: Behnam Robatmili, Gheorghe Calin Cascaval, Madhukar Nagaraja Kedlaya, Dario Suarez Gracia
  • Publication number: 20170083827
    Abstract: Embodiments include computing devices, apparatus, and methods implemented by the apparatus for accelerating machine learning on a computing device. Raw data may be received in the computing device from a raw data source device. The apparatus may identify key features as two dimensional matrices of the raw data such that the key features are mutually exclusive from each other. The key features may be translated into key feature vectors. The computing device may generate a feature vector from at least one of the key feature vectors. The computing device may receive a first partial output resulting from an execution of a basic linear algebra subprogram (BLAS) operation using the feature vector and a weight factor. The first partial output may be combined with a plurality of partial outputs to produce an output matrix. Receiving the raw data on the computing device may include receiving streaming raw data.
    Type: Application
    Filed: September 23, 2015
    Publication date: March 23, 2017
    Inventors: Behnam Robatmili, Matthew Leslie Badin, Dario Suárez Gracia, Gheorghe Calin Cascaval, Nayeem Islam
  • Publication number: 20170031728
    Abstract: Aspects include computing devices, systems, and methods for implementing scheduling and execution of lightweight kernels as simple tasks directly by a thread without setting up a task structure. A computing device may determine whether a task pointer in a task queue is a simple task pointer for the lightweight kernel. The computing device may schedule a first simple task for the lightweight kernel for execution by the thread. The computing device may retrieve, from an entry of a simple task table, a kernel pointer for the lightweight kernel. The entry in the simple task table may be associated with the simple task pointer. The computing device may directly execute the lightweight kernel as the simple task.
    Type: Application
    Filed: January 11, 2016
    Publication date: February 2, 2017
    Inventors: Han Zhao, Pablo Montesinos Ortego, Arun Raman, Behnam Robatmili, Gheorghe Calin Cascaval
  • Patent number: 9529643
    Abstract: A computing device (e.g., a mobile computing device, etc.) may be configured to may be configured to better exploit the concurrency and parallelism enabled by modern multiprocessor architectures by identifying a sequence of tasks via a task dependency controller, commencing execution of a first task in the sequence of tasks, and setting a value of a register so that each remaining task in the sequence of tasks executes after its predecessor task finishes execution without transferring control to a runtime system of the computing device. The task dependency controller may be a hardware component that is shared by the processor cores and/or otherwise configured to transfer control between tasks executing on different processor cores independent of the runtime system and/or without performing the relatively slow and memory-based inter-task, inter-thread or inter-process communications required by conventional solutions.
    Type: Grant
    Filed: January 26, 2015
    Date of Patent: December 27, 2016
    Assignee: QUALCOMM Incorporated
    Inventors: Arun Raman, Behnam Robatmili
  • Patent number: 9501328
    Abstract: Embodiments include computing devices, systems, and methods for task-based handling of repetitive processes in parallel. At least one processor of the computing device, or a specialized hardware controller, may be configured to partition iterations of a repetitive process and assign the partitions to initialized tasks to be executed in parallel by a plurality of processor cores. Upon completing a task, remaining divisible partitions of the repetitive process of ongoing tasks may be subpartitioned and assigned to the ongoing task, and the completed task or a newly initialized task. Information about the iteration space for a repetitive process may be stored in a descriptor table, and status information for all partitions of a repetitive process stored in a status table. Each processor core may have an associated local table that tracks iteration execution of each task, and is synchronized with the status table.
    Type: Grant
    Filed: March 30, 2015
    Date of Patent: November 22, 2016
    Assignee: QUALCOMM Incorporated
    Inventors: Behnam Robatmili, Shaizeen Dilawarhusen Aga, Dario Suarez Gracia, Arun Raman, Aravind Natarajan, Gheorghe Calin Cascaval, Pablo Montesinos Ortego, Han Zhao
  • Publication number: 20160291981
    Abstract: Removing invalid literal load values, and related circuits, methods, and computer-readable media are disclosed. In one aspect, an instruction processing circuit provides a literal load table containing one or more entries comprising an address and a cached literal load value. Upon detecting a literal load instruction in an instruction stream, the instruction processing circuit determines whether the literal load table contains an entry having an address of the literal load instruction. If so, the instruction processing circuit removes the literal load instruction from the instruction stream, and provides the cached literal load value stored in the entry to at least one dependent instruction. The instruction processing circuit further determines whether an invalidity indicator for the literal load table has been received. If so, the instruction processing circuit flushes the literal load table. The invalidity indicator may be generated responsive to modification of a constant table.
    Type: Application
    Filed: April 6, 2015
    Publication date: October 6, 2016
    Inventors: Behnam Robatmili, Gheorghe Calin Cascaval, Michael William Morrow, Derek Jay Conrod, Bohuslav Rychlik
  • Publication number: 20160292012
    Abstract: Embodiments include computing devices, systems, and methods for task-based handling of repetitive processes in parallel. At least one processor of the computing device, or a specialized hardware controller, may be configured to partition iterations of a repetitive process and assign the partitions to initialized tasks to be executed in parallel by a plurality of processor cores. Upon completing a task, remaining divisible partitions of the repetitive process of ongoing tasks may be subpartitioned and assigned to the ongoing task, and the completed task or a newly initialized task. Information about the iteration space for a repetitive process may be stored in a descriptor table, and status information for all partitions of a repetitive process stored in a status table. Each processor core may have an associated local table that tracks iteration execution of each task, and is synchronized with the status table.
    Type: Application
    Filed: March 30, 2015
    Publication date: October 6, 2016
    Inventors: Behnam ROBATMILI, Shaizeen Dilawarhusen Aga, Dario Suarez Gracia, Arun Raman, Aravind Natarajan, Gheorghe Calin Cascaval, Pablo Montesinos Ortego, Han Zhao
  • Publication number: 20160217016
    Abstract: A computing device (e.g., a mobile computing device, etc.) may be configured to may be configured to better exploit the concurrency and parallelism enabled by modern multiprocessor architectures by identifying a sequence of tasks via a task dependency controller, commencing execution of a first task in the sequence of tasks, and setting a value of a register so that each remaining task in the sequence of tasks executes after its predecessor task finishes execution without transferring control to a runtime system of the computing device. The task dependency controller may be a hardware component that is shared by the processor cores and/or otherwise configured to transfer control between tasks executing on different processor cores independent of the runtime system and/or without performing the relatively slow and memory-based inter-task, inter-thread or inter-process communications required by conventional solutions.
    Type: Application
    Filed: January 26, 2015
    Publication date: July 28, 2016
    Inventors: Arun Raman, Behnam Robatmili
  • Publication number: 20160216969
    Abstract: Systems and methods for adaptively managing registers in an instruction processor are disclosed. The system identifies one or more registers with inoperable cells. An operand manager identifies a set of operable cells within the one or more registers with inoperable cells and determines if a present instruction will use an operand that can be supported by the set of operable cells. When the set of operable cells can support the operand, the operand manager generates an assignment which is communicated to a register file manager.
    Type: Application
    Filed: January 28, 2015
    Publication date: July 28, 2016
    Inventors: Dario Suarez Gracia, Behnam Robatmili
  • Publication number: 20150205720
    Abstract: Aspects include apparatuses, systems, and methods for hardware acceleration for inline caches in dynamic languages. An inline cache may be initialized for an instance of a dynamic software operation. A call of an initialized instance of the dynamic software operation may be executed by an inline cache hardware accelerator. The inline cache may be checked to determine that its data is current. When the data is current, the initialized instance of the dynamic software operation may be executed using the related inline cache data. When the data is not current, a new inline cache may be initialized for the instance of the dynamic software operation, including the not current data of a previously initialized instance of the dynamic software operation. The inline cache hardware accelerator may include an inline cache memory, a coprocessor, and/or a functional until one an inline cache pipeline connected to a processor pipeline.
    Type: Application
    Filed: April 28, 2014
    Publication date: July 23, 2015
    Applicant: QUALCOMM Incorporated
    Inventors: Behnam Robatmili, Gheorghe Calin Cascaval, Madhukar Nagaraja Kedlay, Dario Suarez Gracia
  • Publication number: 20150205726
    Abstract: Aspects include a computing devices, systems, and methods for hardware acceleration for inline caches in dynamic languages. An inline cache may be initialized for an instance of a dynamic software operation. A call of an initialized instance of the dynamic software operation may be executed by an inline cache hardware accelerator. The inline cache may be checked to determine that its data is current. When the data is current, the initialized instance of the dynamic software operation may be executed using the related inline cache data. When the data is not current, a new inline cache may be initialized for the instance of the dynamic software operation, including the not current data of a previously initialized instance of the dynamic software operation. The inline cache hardware accelerator may include an inline cache memory, a coprocessor, and/or a functional until one an inline cache pipeline connected to a processor pipeline.
    Type: Application
    Filed: April 28, 2014
    Publication date: July 23, 2015
    Applicant: QUALCOMM Incorporated
    Inventors: Behnam Robatmili, Gheorghe Calin Cascaval, Madhukar Nagaraja Kedlaya, Dario Suarez Gracia
  • Publication number: 20140173556
    Abstract: Systems, methods, and devices for executing a function in a dynamically-typed language are described herein. In one aspect, a method includes generating a function selection decision tree based on one or more specializations of a generic function and one or more function inputs via an electronic device. The method further includes selecting one of the specializations or the generic function based on an input type of at least one function input via the electronic device. The method further includes calling the selected specialization or generic function via the electronic device. Another aspect of the subject matter described in the disclosure provides a method of executing a function in a prototype-based dynamically-typed language. The method includes maintaining a list of calls to one or more specializations of the function via the electronic device. The method further includes creating or destroying a specialization of the function via the electronic device.
    Type: Application
    Filed: November 18, 2013
    Publication date: June 19, 2014
    Applicant: QUALCOMM Incorporated
    Inventors: Behnam Robatmili, Derek Jay Conrod, Mohammad Hossein Reshadi, Subrato Kumar De, Gheorghe Calin Cascaval