Patents Examined by Corey S Faherty
-
Patent number: 11907723Abstract: A data processing apparatus is provided. Rename circuitry performs a register rename stage of a pipeline by storing, in storage circuitry, mappings between registers. Each of the mappings is associated with an elimination field value. Operation elimination circuitry replaces an operation that indicates an action is to be performed on data from a source register and stored in a destination register, with a new mapping in the storage circuitry that references the destination register and has the elimination field value set. Operation circuitry responds to a subsequent operation that accesses the destination register when the elimination field value is set; by obtaining contents of the source register, performing the action on the contents to obtain a result, and returning the result.Type: GrantFiled: March 21, 2022Date of Patent: February 20, 2024Assignee: Arm LimitedInventors: Nicholas Andrew Plante, Joseph Michael Pusdesris, Jungsoo Kim
-
Patent number: 11907717Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.Type: GrantFiled: February 8, 2023Date of Patent: February 20, 2024Assignee: NVIDIA CorporationInventors: Andrew Kerr, Jack Choquette, Xiaogang Qiu, Omkar Paranjape, Poornachandra Rao, Shirish Gadre, Steven J. Heinrich, Manan Patel, Olivier Giroux, Alan Kaatz
-
Patent number: 11907720Abstract: There is provided a data processing apparatus comprising a plurality of registers, each of the registers having data bits to store data and metadata bits to store metadata. Each of the registers is adapted to operate in a metadata mode in which the metadata bits and the data bits are valid, and a data mode in which the data bits are valid and the metadata bits are invalid. Mode bit storage circuitry indicates whether each of the registers is in the data mode or the metadata mode. Execution circuitry is responsive to a memory operation that is a store operation on one or more given registers.Type: GrantFiled: November 26, 2020Date of Patent: February 20, 2024Assignee: Arm LimitedInventors: Bradley John Smith, Thomas Christopher Grocutt
-
Patent number: 11907160Abstract: This disclosure relates to a distributed processing system for configuring multiple processing channels. The distributed processing system includes a main processor, such as an ARM processor, communicatively coupled to a plurality of co-processors, such as stream processors. The co-processors can execute instructions in parallel with each other and interrupt the ARM processor. Longer latency instructions can be executed by the main processor and lower latency instructions can be executed by the co-processors. There are several ways that a stream can be triggered in the distributed processing system. In an embodiment, the distributed processing system is a stream processor system that includes an ARM processor and stream processors configured to access different register sets. The stream processors can include a main stream processor and stream processors in respective transmit and receive channels. The stream processor system can be implemented in a radio system to configure the radio for operation.Type: GrantFiled: August 5, 2022Date of Patent: February 20, 2024Assignee: Analog Devices, Inc.Inventors: Manish J. Manglani, Shipra Bhal, Christopher Mayer
-
Patent number: 11900174Abstract: Techniques are disclosed for processing unit virtualization with scalable over-provisioning in an information processing system. For example, the method accesses a data structure that maps a correspondence between a plurality of virtualized processing units and a plurality of abstracted processing units, wherein the plurality of abstracted processing units are configured to decouple an allocation decision from the plurality of virtualized processing units, and further wherein at least one of the virtualized processing units is mapped to multiple ones of the abstracted processing units. The method allocates one or more virtualized processing units to execute a given application by allocating one or more abstracted processing units identified from the data structure. The method also enables migration of one or more virtualized processing units across the system.Type: GrantFiled: June 22, 2022Date of Patent: February 13, 2024Assignee: Dell Products L.P.Inventors: Anzhou Hou, Zhen Jia, Qiang Chen, Victor Fong, Michael Robillard
-
Patent number: 11900123Abstract: A system includes a processing unit such as a GPU that itself includes a command processor configured to receive instructions for execution from a software application. A processor pipeline coupled to the processing unit includes a set of parallel processing units for executing the instructions in sets. A set manager is coupled to one or more of the processor pipeline and the command processor. The set manager includes at least one table for storing a set start time, a set end time, and a set execution time. The set manager determines an execution time for one or more sets of instructions of a first window of sets of instructions submitted to the processor pipeline. Based on the execution time of the one or more sets of instructions, a set limit is determined and applied to one or more sets of instructions of a second window subsequent to the first window.Type: GrantFiled: December 13, 2019Date of Patent: February 13, 2024Assignee: Advanced Micro Devices, Inc.Inventors: Alexander Fuad Ashkar, Manu Rastogi, Harry J. Wise
-
Patent number: 11900107Abstract: Disclosed embodiments relate to instructions for fused multiply-add (FMA) operations with variable-precision inputs. In one example, a processor to execute an asymmetric FMA instruction includes fetch circuitry to fetch an FMA instruction having fields to specify an opcode, a destination, and first and second source vectors having first and second widths, respectively, decode circuitry to decode the fetched FMA instruction, and a single instruction multiple data (SIMD) execution circuit to process as many elements of the second source vector as fit into an SIMD lane width by multiplying each element by a corresponding element of the first source vector, and accumulating a resulting product with previous contents of the destination, wherein the SIMD lane width is one of 16 bits, 32 bits, and 64 bits, the first width is one of 4 bits and 8 bits, and the second width is one of 1 bit, 2 bits, and 4 bits.Type: GrantFiled: March 25, 2022Date of Patent: February 13, 2024Assignee: Intel CorporationInventors: Dipankar Das, Naveen K. Mellempudi, Mrinmay Dutta, Arun Kumar, Dheevatsa Mudigere, Abhisek Kundu
-
Patent number: 11893392Abstract: A method for processing floating point operations in a multi-processor system including a plurality of single processor cores is provided. In this method, upon receiving a group setting for performing an operation, the plurality of single processor cores are grouped into at least one group according to the group setting, and a single processor core set as a master in the group loads an instruction for performing the operation from an external memory, and performs parallel operations by utilizing floating point units (FUPs) of all single processor cores in the group according to the instructions.Type: GrantFiled: November 30, 2021Date of Patent: February 6, 2024Assignee: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTEInventors: Ju-Yeob Kim, Jin Ho Han
-
Patent number: 11868804Abstract: A processor comprises a computational array of computational elements and an instruction dispatch circuit. The computational elements receive data operands via data lanes extending along a first dimension, and processes the operands based upon instructions received from the instruction dispatch circuit via instruction lanes extending along a second dimension. The instruction dispatch circuit receives raw instructions, and comprises an instruction dispatch unit (IDU) processor that processes a set of raw instructions to generate processed instructions for dispatch to the computational elements, where the number of processed instructions is not equal to the number of instructions of the set of raw instructions.Type: GrantFiled: November 18, 2020Date of Patent: January 9, 2024Assignee: Groq, Inc.Inventors: Brian Lee Kurtz, Dinesh Maheshwari, James David Sprach
-
Patent number: 11868782Abstract: Methods and systems are disclosed using an execution pipeline on a multi-processor platform for deep learning network execution. In one example, a network workload analyzer receives a workload, analyzes a computation distribution of the workload, and groups the network nodes into groups. A network executor assigns each group to a processing core of the multi-core platform so that the respective processing core handle computation tasks of the received workload for the respective group.Type: GrantFiled: August 15, 2022Date of Patent: January 9, 2024Assignee: Intel CorporationInventors: Liu Yang, Anbang Yao
-
Patent number: 11868769Abstract: Deployments of microservices executing in a cloud are automatically managed. Some microservices are deployed on dedicated nodes, others in serverless configurations. Rates of invocation and runtime data of microservices are monitored. Responsive to the monitored rate of invocation of a microservice running serverless exceeding a given threshold, the microservice is automatically redeployed to a dedicated node. A microservice executing on a dedicated node may be redeployed serverless if the infrequency with which it is called is sufficient. Microservices can be automatically redeployed between different dedicated nodes with different capacities based on monitored usage. The underlying cloud service provider may be automatically monitored for changes in serverless support functionality. Responsive to these changes, the thresholds at which microservices are redeployed can be automatically adjusted.Type: GrantFiled: July 27, 2022Date of Patent: January 9, 2024Assignee: PANGEA CYBER CORPORATION, INC.Inventors: Akshay Dongaonkar, Prashant Pathak, Sourabh Satish
-
Patent number: 11861366Abstract: Disclosed in some examples, are methods, systems, devices, and machine-readable mediums which provide for more efficient CGRA execution by assigning different initiation intervals to different PEs executing a same code base. The initiation intervals may be a multiple of each other and the PE with the lowest initiation interval may be used to execute instructions of the code that is to be executed at a greater frequency than other instructions than other instructions that may be assigned to PEs with higher initiation intervals.Type: GrantFiled: August 11, 2021Date of Patent: January 2, 2024Assignee: Micron Technology, Inc.Inventors: Douglas Vanesko, Tony M. Brewer
-
Patent number: 11847457Abstract: A master processor is configured to execute a first thread and a second thread designated to run a program in sequence. A slave processor is configured to execute a third thread to run the program in sequence. An instruction fetch compare engine is provided. The first thread initiates a first thread instruction fetch for the program and stored in an instruction fetch storage. Retrieved data associated with the fetched first thread instruction is stored in a retrieved data storage. The second thread initiates a second thread instruction fetch for the program. The instruction fetch compare logic compares the second thread instruction fetch for the program with the first thread instruction fetch stored in the instruction fetch storage for a match. When there is a match, the retrieved data associated with the fetched first thread instruction is presented from the retrieved data storage, in response to the second thread instruction fetch.Type: GrantFiled: May 31, 2022Date of Patent: December 19, 2023Assignee: Ceremorphic, Inc.Inventors: Heonchul Park, Sri Hari Nemani, Patel Urvishkumar Jayrambhai, Dhruv Maheshkumar Patel
-
Patent number: 11836497Abstract: There is provides an operation module, which includes a memory, a register unit, a dependency relationship processing unit, an operation unit, and a control unit. The memory is configured to store a vector, the register unit is configured to store an extension instruction, and the control unit is configured to acquire and parse the extension instruction, so as to obtain a first operation instruction and a second operation instruction. An execution sequence of the first operation instruction and the second operation instruction can be determined, and an input vector of the first operation instruction can be read from the memory. The operation unit is configured to convert an expression mode of the input data index of the first operation instruction and to screen data, and to execute the first and second operation instruction according to the execution sequence, so as to obtain an extension instruction.Type: GrantFiled: July 23, 2018Date of Patent: December 5, 2023Assignee: SHANGHAI CAMBRICON INFORMATION TECHNOLOGY CO., LTDInventors: Bingrui Wang, Shengyuan Zhou, Yao Zhang
-
Patent number: 11836495Abstract: The present invention provides a method of implementing an ARM64-bit floating point emulator on a Linux system, which includes: running an ARM64-bit instruction on the Linux system; applying an instruction classifier to a first feature code of a machine code indicated by the ARM64-bit instruction to determine whether the ARM64-bit instruction is an ARM64-bit floating point instruction; and, if the ARM64-bit instruction is an ARM64-bit floating point instruction, applying the instruction classifier to a second feature code of the machine code indicated by the ARM64-bit instruction to determine the ARM64-bit floating point instruction to be a specific ARM64-bit floating point instruction.Type: GrantFiled: May 4, 2022Date of Patent: December 5, 2023Assignee: AIROHA TECHNOLOGY (SUZHOU) LIMITEDInventors: Fei Yan, Peng Du
-
Patent number: 11822959Abstract: Methods and systems for processing requests with load-dependent throttling. The system compares a count of active job requests being currently processed for a user associated with a new job request with an active job cap number for that user. When the count of active job requests being currently processed for that user does not exceed the active job cap number specific to that user, the job request is added to an active job queue for processing. However, when the count of active job requests being currently processed for that user exceeds the active job cap number, the job request is placed on a throttled queue to await later processing when an updated count of active job requests being currently processed for that user is below the active job cap number. Once the count is below the cap, the throttle request is moved to the active job queue for processing.Type: GrantFiled: February 18, 2022Date of Patent: November 21, 2023Assignee: Shopify Inc.Inventors: Robert Mic, Aline Fatima Manera, Timothy Willard, Nicole Simone, Scott Weber
-
Patent number: 11809867Abstract: An apparatus and method for performing dual concurrent multiplications of packed data elements.Type: GrantFiled: September 21, 2020Date of Patent: November 7, 2023Assignee: Intel CorporationInventors: Venkateswara Madduri, Elmoustapha Ould-Ahmed-Vall, Jesus Corbal, Mark Charney, Robert Valentine, Binwei Yang
-
Patent number: 11809868Abstract: Systems, apparatuses, and methods related to bit string operations using a computing tile are described. An example apparatus includes computing device (or “tile”) that includes a processing unit and a memory resource configured as a cache for the processing unit. A data structure can be coupled to the computing device. The data structure can be configured to receive a bit string that represents a result of an arithmetic operation, a logical operation, or both and store the bit string that represents the result of the arithmetic operation, the logical operation, or both. The bit string can be formatted in a format different than a floating-point format.Type: GrantFiled: January 21, 2022Date of Patent: November 7, 2023Assignee: Micron Technology, Inc.Inventor: Vijay S. Ramesh
-
Patent number: 11803382Abstract: A digital data processor includes a multi-stage butterfly network, which is configured to, in response to a look up table read instruction, receive look up table data from an intermediate register, reorder the look up table data based on control signals comprising look up table configuration register data, and write the reordered look up table data to a destination register specified by the look up table read instruction.Type: GrantFiled: September 2, 2022Date of Patent: October 31, 2023Assignee: Texas Instruments IncorporatedInventors: Naveen Bhoria, Duc Bui, Dheera Balasubramanian Samudrala, Rama Venkatasubramanian
-
Patent number: 11803385Abstract: An array processor includes processor element arrays (PEAs) distributed in rows and columns. The PEAs are configured to perform operations on parameter values. A first sequencer received a first direct memory access (DMA) instruction that includes a request to read data from at least one address in memory. A texture address (TA) engine requests the data from the memory based on the at least one address and a texture data (TD) engine provides the data to the PEAs. The PEAs provide first synchronization signals to the TD engine to indicate availability of registers for receiving the data. The TD engine provides second synchronization signals to the first sequencer in response to receiving acknowledgments that the PEAs have consumed the data.Type: GrantFiled: December 10, 2021Date of Patent: October 31, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Sateesh Lagudu, Arun Vaidyanathan Ananthanarayan, Michael Mantor, Allen H. Rush