Patents Examined by Corey S Faherty
  • Patent number: 11907723
    Abstract: A data processing apparatus is provided. Rename circuitry performs a register rename stage of a pipeline by storing, in storage circuitry, mappings between registers. Each of the mappings is associated with an elimination field value. Operation elimination circuitry replaces an operation that indicates an action is to be performed on data from a source register and stored in a destination register, with a new mapping in the storage circuitry that references the destination register and has the elimination field value set. Operation circuitry responds to a subsequent operation that accesses the destination register when the elimination field value is set; by obtaining contents of the source register, performing the action on the contents to obtain a result, and returning the result.
    Type: Grant
    Filed: March 21, 2022
    Date of Patent: February 20, 2024
    Assignee: Arm Limited
    Inventors: Nicholas Andrew Plante, Joseph Michael Pusdesris, Jungsoo Kim
  • Patent number: 11907717
    Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.
    Type: Grant
    Filed: February 8, 2023
    Date of Patent: February 20, 2024
    Assignee: NVIDIA Corporation
    Inventors: Andrew Kerr, Jack Choquette, Xiaogang Qiu, Omkar Paranjape, Poornachandra Rao, Shirish Gadre, Steven J. Heinrich, Manan Patel, Olivier Giroux, Alan Kaatz
  • Patent number: 11907720
    Abstract: There is provided a data processing apparatus comprising a plurality of registers, each of the registers having data bits to store data and metadata bits to store metadata. Each of the registers is adapted to operate in a metadata mode in which the metadata bits and the data bits are valid, and a data mode in which the data bits are valid and the metadata bits are invalid. Mode bit storage circuitry indicates whether each of the registers is in the data mode or the metadata mode. Execution circuitry is responsive to a memory operation that is a store operation on one or more given registers.
    Type: Grant
    Filed: November 26, 2020
    Date of Patent: February 20, 2024
    Assignee: Arm Limited
    Inventors: Bradley John Smith, Thomas Christopher Grocutt
  • Patent number: 11907160
    Abstract: This disclosure relates to a distributed processing system for configuring multiple processing channels. The distributed processing system includes a main processor, such as an ARM processor, communicatively coupled to a plurality of co-processors, such as stream processors. The co-processors can execute instructions in parallel with each other and interrupt the ARM processor. Longer latency instructions can be executed by the main processor and lower latency instructions can be executed by the co-processors. There are several ways that a stream can be triggered in the distributed processing system. In an embodiment, the distributed processing system is a stream processor system that includes an ARM processor and stream processors configured to access different register sets. The stream processors can include a main stream processor and stream processors in respective transmit and receive channels. The stream processor system can be implemented in a radio system to configure the radio for operation.
    Type: Grant
    Filed: August 5, 2022
    Date of Patent: February 20, 2024
    Assignee: Analog Devices, Inc.
    Inventors: Manish J. Manglani, Shipra Bhal, Christopher Mayer
  • Patent number: 11900174
    Abstract: Techniques are disclosed for processing unit virtualization with scalable over-provisioning in an information processing system. For example, the method accesses a data structure that maps a correspondence between a plurality of virtualized processing units and a plurality of abstracted processing units, wherein the plurality of abstracted processing units are configured to decouple an allocation decision from the plurality of virtualized processing units, and further wherein at least one of the virtualized processing units is mapped to multiple ones of the abstracted processing units. The method allocates one or more virtualized processing units to execute a given application by allocating one or more abstracted processing units identified from the data structure. The method also enables migration of one or more virtualized processing units across the system.
    Type: Grant
    Filed: June 22, 2022
    Date of Patent: February 13, 2024
    Assignee: Dell Products L.P.
    Inventors: Anzhou Hou, Zhen Jia, Qiang Chen, Victor Fong, Michael Robillard
  • Patent number: 11900123
    Abstract: A system includes a processing unit such as a GPU that itself includes a command processor configured to receive instructions for execution from a software application. A processor pipeline coupled to the processing unit includes a set of parallel processing units for executing the instructions in sets. A set manager is coupled to one or more of the processor pipeline and the command processor. The set manager includes at least one table for storing a set start time, a set end time, and a set execution time. The set manager determines an execution time for one or more sets of instructions of a first window of sets of instructions submitted to the processor pipeline. Based on the execution time of the one or more sets of instructions, a set limit is determined and applied to one or more sets of instructions of a second window subsequent to the first window.
    Type: Grant
    Filed: December 13, 2019
    Date of Patent: February 13, 2024
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Alexander Fuad Ashkar, Manu Rastogi, Harry J. Wise
  • Patent number: 11900107
    Abstract: Disclosed embodiments relate to instructions for fused multiply-add (FMA) operations with variable-precision inputs. In one example, a processor to execute an asymmetric FMA instruction includes fetch circuitry to fetch an FMA instruction having fields to specify an opcode, a destination, and first and second source vectors having first and second widths, respectively, decode circuitry to decode the fetched FMA instruction, and a single instruction multiple data (SIMD) execution circuit to process as many elements of the second source vector as fit into an SIMD lane width by multiplying each element by a corresponding element of the first source vector, and accumulating a resulting product with previous contents of the destination, wherein the SIMD lane width is one of 16 bits, 32 bits, and 64 bits, the first width is one of 4 bits and 8 bits, and the second width is one of 1 bit, 2 bits, and 4 bits.
    Type: Grant
    Filed: March 25, 2022
    Date of Patent: February 13, 2024
    Assignee: Intel Corporation
    Inventors: Dipankar Das, Naveen K. Mellempudi, Mrinmay Dutta, Arun Kumar, Dheevatsa Mudigere, Abhisek Kundu
  • Patent number: 11893392
    Abstract: A method for processing floating point operations in a multi-processor system including a plurality of single processor cores is provided. In this method, upon receiving a group setting for performing an operation, the plurality of single processor cores are grouped into at least one group according to the group setting, and a single processor core set as a master in the group loads an instruction for performing the operation from an external memory, and performs parallel operations by utilizing floating point units (FUPs) of all single processor cores in the group according to the instructions.
    Type: Grant
    Filed: November 30, 2021
    Date of Patent: February 6, 2024
    Assignee: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE
    Inventors: Ju-Yeob Kim, Jin Ho Han
  • Patent number: 11868804
    Abstract: A processor comprises a computational array of computational elements and an instruction dispatch circuit. The computational elements receive data operands via data lanes extending along a first dimension, and processes the operands based upon instructions received from the instruction dispatch circuit via instruction lanes extending along a second dimension. The instruction dispatch circuit receives raw instructions, and comprises an instruction dispatch unit (IDU) processor that processes a set of raw instructions to generate processed instructions for dispatch to the computational elements, where the number of processed instructions is not equal to the number of instructions of the set of raw instructions.
    Type: Grant
    Filed: November 18, 2020
    Date of Patent: January 9, 2024
    Assignee: Groq, Inc.
    Inventors: Brian Lee Kurtz, Dinesh Maheshwari, James David Sprach
  • Patent number: 11868782
    Abstract: Methods and systems are disclosed using an execution pipeline on a multi-processor platform for deep learning network execution. In one example, a network workload analyzer receives a workload, analyzes a computation distribution of the workload, and groups the network nodes into groups. A network executor assigns each group to a processing core of the multi-core platform so that the respective processing core handle computation tasks of the received workload for the respective group.
    Type: Grant
    Filed: August 15, 2022
    Date of Patent: January 9, 2024
    Assignee: Intel Corporation
    Inventors: Liu Yang, Anbang Yao
  • Patent number: 11868769
    Abstract: Deployments of microservices executing in a cloud are automatically managed. Some microservices are deployed on dedicated nodes, others in serverless configurations. Rates of invocation and runtime data of microservices are monitored. Responsive to the monitored rate of invocation of a microservice running serverless exceeding a given threshold, the microservice is automatically redeployed to a dedicated node. A microservice executing on a dedicated node may be redeployed serverless if the infrequency with which it is called is sufficient. Microservices can be automatically redeployed between different dedicated nodes with different capacities based on monitored usage. The underlying cloud service provider may be automatically monitored for changes in serverless support functionality. Responsive to these changes, the thresholds at which microservices are redeployed can be automatically adjusted.
    Type: Grant
    Filed: July 27, 2022
    Date of Patent: January 9, 2024
    Assignee: PANGEA CYBER CORPORATION, INC.
    Inventors: Akshay Dongaonkar, Prashant Pathak, Sourabh Satish
  • Patent number: 11861366
    Abstract: Disclosed in some examples, are methods, systems, devices, and machine-readable mediums which provide for more efficient CGRA execution by assigning different initiation intervals to different PEs executing a same code base. The initiation intervals may be a multiple of each other and the PE with the lowest initiation interval may be used to execute instructions of the code that is to be executed at a greater frequency than other instructions than other instructions that may be assigned to PEs with higher initiation intervals.
    Type: Grant
    Filed: August 11, 2021
    Date of Patent: January 2, 2024
    Assignee: Micron Technology, Inc.
    Inventors: Douglas Vanesko, Tony M. Brewer
  • Patent number: 11847457
    Abstract: A master processor is configured to execute a first thread and a second thread designated to run a program in sequence. A slave processor is configured to execute a third thread to run the program in sequence. An instruction fetch compare engine is provided. The first thread initiates a first thread instruction fetch for the program and stored in an instruction fetch storage. Retrieved data associated with the fetched first thread instruction is stored in a retrieved data storage. The second thread initiates a second thread instruction fetch for the program. The instruction fetch compare logic compares the second thread instruction fetch for the program with the first thread instruction fetch stored in the instruction fetch storage for a match. When there is a match, the retrieved data associated with the fetched first thread instruction is presented from the retrieved data storage, in response to the second thread instruction fetch.
    Type: Grant
    Filed: May 31, 2022
    Date of Patent: December 19, 2023
    Assignee: Ceremorphic, Inc.
    Inventors: Heonchul Park, Sri Hari Nemani, Patel Urvishkumar Jayrambhai, Dhruv Maheshkumar Patel
  • Patent number: 11836497
    Abstract: There is provides an operation module, which includes a memory, a register unit, a dependency relationship processing unit, an operation unit, and a control unit. The memory is configured to store a vector, the register unit is configured to store an extension instruction, and the control unit is configured to acquire and parse the extension instruction, so as to obtain a first operation instruction and a second operation instruction. An execution sequence of the first operation instruction and the second operation instruction can be determined, and an input vector of the first operation instruction can be read from the memory. The operation unit is configured to convert an expression mode of the input data index of the first operation instruction and to screen data, and to execute the first and second operation instruction according to the execution sequence, so as to obtain an extension instruction.
    Type: Grant
    Filed: July 23, 2018
    Date of Patent: December 5, 2023
    Assignee: SHANGHAI CAMBRICON INFORMATION TECHNOLOGY CO., LTD
    Inventors: Bingrui Wang, Shengyuan Zhou, Yao Zhang
  • Patent number: 11836495
    Abstract: The present invention provides a method of implementing an ARM64-bit floating point emulator on a Linux system, which includes: running an ARM64-bit instruction on the Linux system; applying an instruction classifier to a first feature code of a machine code indicated by the ARM64-bit instruction to determine whether the ARM64-bit instruction is an ARM64-bit floating point instruction; and, if the ARM64-bit instruction is an ARM64-bit floating point instruction, applying the instruction classifier to a second feature code of the machine code indicated by the ARM64-bit instruction to determine the ARM64-bit floating point instruction to be a specific ARM64-bit floating point instruction.
    Type: Grant
    Filed: May 4, 2022
    Date of Patent: December 5, 2023
    Assignee: AIROHA TECHNOLOGY (SUZHOU) LIMITED
    Inventors: Fei Yan, Peng Du
  • Patent number: 11822959
    Abstract: Methods and systems for processing requests with load-dependent throttling. The system compares a count of active job requests being currently processed for a user associated with a new job request with an active job cap number for that user. When the count of active job requests being currently processed for that user does not exceed the active job cap number specific to that user, the job request is added to an active job queue for processing. However, when the count of active job requests being currently processed for that user exceeds the active job cap number, the job request is placed on a throttled queue to await later processing when an updated count of active job requests being currently processed for that user is below the active job cap number. Once the count is below the cap, the throttle request is moved to the active job queue for processing.
    Type: Grant
    Filed: February 18, 2022
    Date of Patent: November 21, 2023
    Assignee: Shopify Inc.
    Inventors: Robert Mic, Aline Fatima Manera, Timothy Willard, Nicole Simone, Scott Weber
  • Patent number: 11809867
    Abstract: An apparatus and method for performing dual concurrent multiplications of packed data elements.
    Type: Grant
    Filed: September 21, 2020
    Date of Patent: November 7, 2023
    Assignee: Intel Corporation
    Inventors: Venkateswara Madduri, Elmoustapha Ould-Ahmed-Vall, Jesus Corbal, Mark Charney, Robert Valentine, Binwei Yang
  • Patent number: 11809868
    Abstract: Systems, apparatuses, and methods related to bit string operations using a computing tile are described. An example apparatus includes computing device (or “tile”) that includes a processing unit and a memory resource configured as a cache for the processing unit. A data structure can be coupled to the computing device. The data structure can be configured to receive a bit string that represents a result of an arithmetic operation, a logical operation, or both and store the bit string that represents the result of the arithmetic operation, the logical operation, or both. The bit string can be formatted in a format different than a floating-point format.
    Type: Grant
    Filed: January 21, 2022
    Date of Patent: November 7, 2023
    Assignee: Micron Technology, Inc.
    Inventor: Vijay S. Ramesh
  • Patent number: 11803382
    Abstract: A digital data processor includes a multi-stage butterfly network, which is configured to, in response to a look up table read instruction, receive look up table data from an intermediate register, reorder the look up table data based on control signals comprising look up table configuration register data, and write the reordered look up table data to a destination register specified by the look up table read instruction.
    Type: Grant
    Filed: September 2, 2022
    Date of Patent: October 31, 2023
    Assignee: Texas Instruments Incorporated
    Inventors: Naveen Bhoria, Duc Bui, Dheera Balasubramanian Samudrala, Rama Venkatasubramanian
  • Patent number: 11803385
    Abstract: An array processor includes processor element arrays (PEAs) distributed in rows and columns. The PEAs are configured to perform operations on parameter values. A first sequencer received a first direct memory access (DMA) instruction that includes a request to read data from at least one address in memory. A texture address (TA) engine requests the data from the memory based on the at least one address and a texture data (TD) engine provides the data to the PEAs. The PEAs provide first synchronization signals to the TD engine to indicate availability of registers for receiving the data. The TD engine provides second synchronization signals to the first sequencer in response to receiving acknowledgments that the PEAs have consumed the data.
    Type: Grant
    Filed: December 10, 2021
    Date of Patent: October 31, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Sateesh Lagudu, Arun Vaidyanathan Ananthanarayan, Michael Mantor, Allen H. Rush