Patents by Inventor Behnam Robatmili
Behnam Robatmili has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11194625Abstract: For one embodiment of the present invention, methods and systems for accelerating data operations with efficient memory management in native code and native dynamic class loading mechanisms are disclosed. In one embodiment, a data processing system comprises memory and a processing unit coupled to the memory. The processing unit is configured to receive input data, to execute a domain specific language (DSL) for a DSL operation with a native implementation, to translate a user defined function (UDF) into the native implementation by translating user defined managed software code into native software code, to execute the native software code in the native implementation, and to utilize a native memory management mechanism for the memory to manage object instances in the native implementation.Type: GrantFiled: December 3, 2019Date of Patent: December 7, 2021Assignee: BIGSTREAM SOLUTIONS, INC.Inventors: Weiwei Chen, Behnam Robatmili, Maysam Lavasani, John David Davis
-
Publication number: 20200301898Abstract: Methods and systems are disclosed for accelerating Big Data operations by utilizing subgraph templates for a hardware accelerator of a computational storage device. In one example, a computer-implemented method comprises performing a query with a dataflow compiler, performing a stage acceleration analyzer function including executing a matching algorithm to determine similarities between sub-graphs of an application program and unique templates from an available library of templates; and selecting at least one template that at least partially matches the sub-graphs with the at least one template being associated with a linear set of operators to be executed sequentially within a stage of the Big Data operations.Type: ApplicationFiled: June 10, 2020Publication date: September 24, 2020Applicant: BigStream Solutions, Inc.Inventors: Balavinayagam Samynathan, Keith Chapman, Mehdi Nik, Behnam Robatmili, Shahrzad Mirkhani, Maysam Lavasani, John David Davis, Danesh Tavana, Weiwei Chen
-
Publication number: 20200183749Abstract: For one embodiment of the present invention, methods and systems for accelerating data operations with efficient memory management in native code and native dynamic class loading mechanisms are disclosed. In one embodiment, a data processing system comprises memory and a processing unit coupled to the memory. The processing unit is configured to receive input data, to execute a domain specific language (DSL) for a DSL operation with a native implementation, to translate a user defined function (UDF) into the native implementation by translating user defined managed software code into native software code, to execute the native software code in the native implementation, and to utilize a native memory management mechanism for the memory to manage object instances in the native implementation.Type: ApplicationFiled: December 3, 2019Publication date: June 11, 2020Applicant: BigStream Solutions, Inc.Inventors: Weiwei Chen, Behnam Robatmili, Maysam Lavasani, John David Davis
-
Publication number: 20190392002Abstract: Methods and systems are disclosed for accelerating big data operations by utilizing subgraph templates. In one example, a data processing system includes a data processing system comprising a hardware processor and a hardware accelerator coupled to the hardware processor. The hardware accelerator is configured with a compiler of an accelerator functionality to generate an execution plan, to generate computations for nodes including subgraphs in a distributed system for an application program based on the execution plan, and to execute a matching algorithm to determine similarities between the subgraphs and unique templates from an available library of templates.Type: ApplicationFiled: June 25, 2019Publication date: December 26, 2019Applicant: BigStream Solutions, Inc.Inventors: Maysam Lavasani, John David Davis, Danesh Tavana, Weiwei Chen, Balavinayagam Samynathan, Behnam Robatmili
-
Patent number: 10169105Abstract: Aspects include computing devices, systems, and methods for implementing scheduling and execution of lightweight kernels as simple tasks directly by a thread without setting up a task structure. A computing device may determine whether a task pointer in a task queue is a simple task pointer for the lightweight kernel. The computing device may schedule a first simple task for the lightweight kernel for execution by the thread. The computing device may retrieve, from an entry of a simple task table, a kernel pointer for the lightweight kernel. The entry in the simple task table may be associated with the simple task pointer. The computing device may directly execute the lightweight kernel as the simple task.Type: GrantFiled: January 11, 2016Date of Patent: January 1, 2019Assignee: QUALCOMM IncorporatedInventors: Han Zhao, Pablo Montesinos Ortego, Arun Raman, Behnam Robatmili, Gheorghe Calin Cascaval
-
Patent number: 9740504Abstract: Aspects include apparatuses, systems, and methods for hardware acceleration for inline caches in dynamic languages. An inline cache may be initialized for an instance of a dynamic software operation. A call of an initialized instance of the dynamic software operation may be executed by an inline cache hardware accelerator. The inline cache may be checked to determine that its data is current. When the data is current, the initialized instance of the dynamic software operation may be executed using the related inline cache data. When the data is not current, a new inline cache may be initialized for the instance of the dynamic software operation, including the not current data of a previously initialized instance of the dynamic software operation. The inline cache hardware accelerator may include an inline cache memory, a coprocessor, and/or a functional until one an inline cache pipeline connected to a processor pipeline.Type: GrantFiled: April 28, 2014Date of Patent: August 22, 2017Assignee: QUALCOMM IncorporatedInventors: Behnam Robatmili, Gheorghe Calin Cascaval, Madhukar Nagaraja Kedlaya, Dario Suarez Gracia
-
Patent number: 9710388Abstract: Aspects include a computing devices, systems, and methods for hardware acceleration for inline caches in dynamic languages. An inline cache may be initialized for an instance of a dynamic software operation. A call of an initialized instance of the dynamic software operation may be executed by an inline cache hardware accelerator. The inline cache may be checked to determine that its data is current. When the data is current, the initialized instance of the dynamic software operation may be executed using the related inline cache data. When the data is not current, a new inline cache may be initialized for the instance of the dynamic software operation, including the not current data of a previously initialized instance of the dynamic software operation. The inline cache hardware accelerator may include an inline cache memory, a coprocessor, and/or a functional until one an inline cache pipeline connected to a processor pipeline.Type: GrantFiled: April 28, 2014Date of Patent: July 18, 2017Assignee: QUALCOMM IncorporatedInventors: Behnam Robatmili, Gheorghe Calin Cascaval, Madhukar Nagaraja Kedlaya, Dario Suarez Gracia
-
Publication number: 20170083827Abstract: Embodiments include computing devices, apparatus, and methods implemented by the apparatus for accelerating machine learning on a computing device. Raw data may be received in the computing device from a raw data source device. The apparatus may identify key features as two dimensional matrices of the raw data such that the key features are mutually exclusive from each other. The key features may be translated into key feature vectors. The computing device may generate a feature vector from at least one of the key feature vectors. The computing device may receive a first partial output resulting from an execution of a basic linear algebra subprogram (BLAS) operation using the feature vector and a weight factor. The first partial output may be combined with a plurality of partial outputs to produce an output matrix. Receiving the raw data on the computing device may include receiving streaming raw data.Type: ApplicationFiled: September 23, 2015Publication date: March 23, 2017Inventors: Behnam Robatmili, Matthew Leslie Badin, Dario Suárez Gracia, Gheorghe Calin Cascaval, Nayeem Islam
-
Publication number: 20170031728Abstract: Aspects include computing devices, systems, and methods for implementing scheduling and execution of lightweight kernels as simple tasks directly by a thread without setting up a task structure. A computing device may determine whether a task pointer in a task queue is a simple task pointer for the lightweight kernel. The computing device may schedule a first simple task for the lightweight kernel for execution by the thread. The computing device may retrieve, from an entry of a simple task table, a kernel pointer for the lightweight kernel. The entry in the simple task table may be associated with the simple task pointer. The computing device may directly execute the lightweight kernel as the simple task.Type: ApplicationFiled: January 11, 2016Publication date: February 2, 2017Inventors: Han Zhao, Pablo Montesinos Ortego, Arun Raman, Behnam Robatmili, Gheorghe Calin Cascaval
-
Patent number: 9529643Abstract: A computing device (e.g., a mobile computing device, etc.) may be configured to may be configured to better exploit the concurrency and parallelism enabled by modern multiprocessor architectures by identifying a sequence of tasks via a task dependency controller, commencing execution of a first task in the sequence of tasks, and setting a value of a register so that each remaining task in the sequence of tasks executes after its predecessor task finishes execution without transferring control to a runtime system of the computing device. The task dependency controller may be a hardware component that is shared by the processor cores and/or otherwise configured to transfer control between tasks executing on different processor cores independent of the runtime system and/or without performing the relatively slow and memory-based inter-task, inter-thread or inter-process communications required by conventional solutions.Type: GrantFiled: January 26, 2015Date of Patent: December 27, 2016Assignee: QUALCOMM IncorporatedInventors: Arun Raman, Behnam Robatmili
-
Patent number: 9501328Abstract: Embodiments include computing devices, systems, and methods for task-based handling of repetitive processes in parallel. At least one processor of the computing device, or a specialized hardware controller, may be configured to partition iterations of a repetitive process and assign the partitions to initialized tasks to be executed in parallel by a plurality of processor cores. Upon completing a task, remaining divisible partitions of the repetitive process of ongoing tasks may be subpartitioned and assigned to the ongoing task, and the completed task or a newly initialized task. Information about the iteration space for a repetitive process may be stored in a descriptor table, and status information for all partitions of a repetitive process stored in a status table. Each processor core may have an associated local table that tracks iteration execution of each task, and is synchronized with the status table.Type: GrantFiled: March 30, 2015Date of Patent: November 22, 2016Assignee: QUALCOMM IncorporatedInventors: Behnam Robatmili, Shaizeen Dilawarhusen Aga, Dario Suarez Gracia, Arun Raman, Aravind Natarajan, Gheorghe Calin Cascaval, Pablo Montesinos Ortego, Han Zhao
-
Publication number: 20160291981Abstract: Removing invalid literal load values, and related circuits, methods, and computer-readable media are disclosed. In one aspect, an instruction processing circuit provides a literal load table containing one or more entries comprising an address and a cached literal load value. Upon detecting a literal load instruction in an instruction stream, the instruction processing circuit determines whether the literal load table contains an entry having an address of the literal load instruction. If so, the instruction processing circuit removes the literal load instruction from the instruction stream, and provides the cached literal load value stored in the entry to at least one dependent instruction. The instruction processing circuit further determines whether an invalidity indicator for the literal load table has been received. If so, the instruction processing circuit flushes the literal load table. The invalidity indicator may be generated responsive to modification of a constant table.Type: ApplicationFiled: April 6, 2015Publication date: October 6, 2016Inventors: Behnam Robatmili, Gheorghe Calin Cascaval, Michael William Morrow, Derek Jay Conrod, Bohuslav Rychlik
-
Publication number: 20160292012Abstract: Embodiments include computing devices, systems, and methods for task-based handling of repetitive processes in parallel. At least one processor of the computing device, or a specialized hardware controller, may be configured to partition iterations of a repetitive process and assign the partitions to initialized tasks to be executed in parallel by a plurality of processor cores. Upon completing a task, remaining divisible partitions of the repetitive process of ongoing tasks may be subpartitioned and assigned to the ongoing task, and the completed task or a newly initialized task. Information about the iteration space for a repetitive process may be stored in a descriptor table, and status information for all partitions of a repetitive process stored in a status table. Each processor core may have an associated local table that tracks iteration execution of each task, and is synchronized with the status table.Type: ApplicationFiled: March 30, 2015Publication date: October 6, 2016Inventors: Behnam ROBATMILI, Shaizeen Dilawarhusen Aga, Dario Suarez Gracia, Arun Raman, Aravind Natarajan, Gheorghe Calin Cascaval, Pablo Montesinos Ortego, Han Zhao
-
Publication number: 20160217016Abstract: A computing device (e.g., a mobile computing device, etc.) may be configured to may be configured to better exploit the concurrency and parallelism enabled by modern multiprocessor architectures by identifying a sequence of tasks via a task dependency controller, commencing execution of a first task in the sequence of tasks, and setting a value of a register so that each remaining task in the sequence of tasks executes after its predecessor task finishes execution without transferring control to a runtime system of the computing device. The task dependency controller may be a hardware component that is shared by the processor cores and/or otherwise configured to transfer control between tasks executing on different processor cores independent of the runtime system and/or without performing the relatively slow and memory-based inter-task, inter-thread or inter-process communications required by conventional solutions.Type: ApplicationFiled: January 26, 2015Publication date: July 28, 2016Inventors: Arun Raman, Behnam Robatmili
-
Publication number: 20160216969Abstract: Systems and methods for adaptively managing registers in an instruction processor are disclosed. The system identifies one or more registers with inoperable cells. An operand manager identifies a set of operable cells within the one or more registers with inoperable cells and determines if a present instruction will use an operand that can be supported by the set of operable cells. When the set of operable cells can support the operand, the operand manager generates an assignment which is communicated to a register file manager.Type: ApplicationFiled: January 28, 2015Publication date: July 28, 2016Inventors: Dario Suarez Gracia, Behnam Robatmili
-
Publication number: 20150205720Abstract: Aspects include apparatuses, systems, and methods for hardware acceleration for inline caches in dynamic languages. An inline cache may be initialized for an instance of a dynamic software operation. A call of an initialized instance of the dynamic software operation may be executed by an inline cache hardware accelerator. The inline cache may be checked to determine that its data is current. When the data is current, the initialized instance of the dynamic software operation may be executed using the related inline cache data. When the data is not current, a new inline cache may be initialized for the instance of the dynamic software operation, including the not current data of a previously initialized instance of the dynamic software operation. The inline cache hardware accelerator may include an inline cache memory, a coprocessor, and/or a functional until one an inline cache pipeline connected to a processor pipeline.Type: ApplicationFiled: April 28, 2014Publication date: July 23, 2015Applicant: QUALCOMM IncorporatedInventors: Behnam Robatmili, Gheorghe Calin Cascaval, Madhukar Nagaraja Kedlay, Dario Suarez Gracia
-
Publication number: 20150205726Abstract: Aspects include a computing devices, systems, and methods for hardware acceleration for inline caches in dynamic languages. An inline cache may be initialized for an instance of a dynamic software operation. A call of an initialized instance of the dynamic software operation may be executed by an inline cache hardware accelerator. The inline cache may be checked to determine that its data is current. When the data is current, the initialized instance of the dynamic software operation may be executed using the related inline cache data. When the data is not current, a new inline cache may be initialized for the instance of the dynamic software operation, including the not current data of a previously initialized instance of the dynamic software operation. The inline cache hardware accelerator may include an inline cache memory, a coprocessor, and/or a functional until one an inline cache pipeline connected to a processor pipeline.Type: ApplicationFiled: April 28, 2014Publication date: July 23, 2015Applicant: QUALCOMM IncorporatedInventors: Behnam Robatmili, Gheorghe Calin Cascaval, Madhukar Nagaraja Kedlaya, Dario Suarez Gracia
-
Publication number: 20140173556Abstract: Systems, methods, and devices for executing a function in a dynamically-typed language are described herein. In one aspect, a method includes generating a function selection decision tree based on one or more specializations of a generic function and one or more function inputs via an electronic device. The method further includes selecting one of the specializations or the generic function based on an input type of at least one function input via the electronic device. The method further includes calling the selected specialization or generic function via the electronic device. Another aspect of the subject matter described in the disclosure provides a method of executing a function in a prototype-based dynamically-typed language. The method includes maintaining a list of calls to one or more specializations of the function via the electronic device. The method further includes creating or destroying a specialization of the function via the electronic device.Type: ApplicationFiled: November 18, 2013Publication date: June 19, 2014Applicant: QUALCOMM IncorporatedInventors: Behnam Robatmili, Derek Jay Conrod, Mohammad Hossein Reshadi, Subrato Kumar De, Gheorghe Calin Cascaval