Single Instruction, Multiple Data (simd) Patents (Class 712/22)
  • Patent number: 11327752
    Abstract: A data processing apparatus, a method of operating a data processing apparatus, a non-transitory computer readable storage medium, and an instruction are provided. The instruction specifies a first source register, a second source register, and an index. In response to the instruction control signals are generated, causing processing circuitry to perform a data processing operation with respect to each data group in the first source register and the second source register to generate respective result data groups forming a result of the data processing operation. Each of the first source register and the second source register has a size which is an integer multiple at least twice a predefined size of the data group, and each data group comprises a plurality of data elements. The operands of the data processing operation for each data group are a selected data element identified in the data group of the first source register by the index and each data element in the data group of the second source register.
    Type: Grant
    Filed: February 2, 2018
    Date of Patent: May 10, 2022
    Assignee: ARM LIMITED
    Inventors: Grigorios Magklis, Nigel John Stephens, Jacob Eapen, Mbou Eyole, David Hennah Mansell
  • Patent number: 11221851
    Abstract: Embodiments of the present disclosure provide a method, executed by a computing device, for configuring a vector operation, an apparatus, a device, and a storage medium. The method includes obtaining information indicating at least one configurable vector operation parameter. The information indicating the at least one configurable vector operation parameter indicates a type and a value of the configurable vector operation parameter. The method further includes: based on the type and the value of the configurable vector operation parameter, configuring multiple vector operation circuits to enable each of the vector operation circuits to execute a target vector operation including two or more basic vector operations and defined based on the type and value of the configurable vector operation parameter.
    Type: Grant
    Filed: July 23, 2020
    Date of Patent: January 11, 2022
    Assignees: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD., KUNLUNXIN TECHNOLOGY (BEIJING) COMPANY LIMITED
    Inventors: Huimin Li, Peng Wu, Jian Ouyang
  • Patent number: 11182170
    Abstract: A processor having a SIMD architecture, including an array of elementary processors, each elementary processor being associated with an elementary memory cell, a central controller connected to the elementary processors by an instruction bus and a status bus. The central controller transmits a sequence of instructions in a loop, each instruction including a calculation flow indicator. Each elementary processor has an instruction filter that makes it possible to reject or take into account an instruction depending on the identifier it contains. This operating mode makes it possible to emulate a MIMD processor on a SIMD architecture.
    Type: Grant
    Filed: June 6, 2019
    Date of Patent: November 23, 2021
    Assignee: COMMISSARIAT A L'ENERGIE ATOMIQUE ET AUX ENERGIES ALTERNATIVES
    Inventors: Stéphane Chevobbe, Marc Duranton
  • Patent number: 11126587
    Abstract: Representative apparatus, method, and system embodiments are disclosed for a self-scheduling processor which also provides additional functionality. Representative embodiments include a self-scheduling processor, comprising: a processor core adapted to execute a received instruction; and a core control circuit adapted to automatically schedule an instruction for execution by the processor core in response to a received work descriptor data packet. In another embodiment, the core control circuit is also adapted to schedule a fiber create instruction for execution by the processor core, to reserve a predetermined amount of memory space in a thread control memory to store return arguments, and to generate one or more work descriptor data packets to another processor or hybrid threading fabric circuit for execution of a corresponding plurality of execution threads. Event processing, data path management, system calls, memory requests, and other new instructions are also disclosed.
    Type: Grant
    Filed: April 30, 2019
    Date of Patent: September 21, 2021
    Assignee: Micron Technology, Inc.
    Inventor: Tony M. Brewer
  • Patent number: 11119782
    Abstract: Representative apparatus, method, and system embodiments are disclosed for a self-scheduling processor which also provides additional functionality. Representative embodiments include a self-scheduling processor, comprising: a processor core adapted to execute a received instruction; and a core control circuit adapted to automatically schedule an instruction for execution by the processor core in response to a received work descriptor data packet. In another embodiment, the core control circuit is also adapted to schedule a fiber create instruction for execution by the processor core, to reserve a predetermined amount of memory space in a thread control memory to store return arguments, and to generate one or more work descriptor data packets to another processor or hybrid threading fabric circuit for execution of a corresponding plurality of execution threads. Event processing, data path management, system calls, memory requests, and other new instructions are also disclosed.
    Type: Grant
    Filed: April 30, 2019
    Date of Patent: September 14, 2021
    Assignee: Micron Technology, Inc.
    Inventor: Tony M. Brewer
  • Patent number: 11119972
    Abstract: Representative apparatus, method, and system embodiments are disclosed for a self-scheduling processor which also provides additional functionality. Representative embodiments include a self-scheduling processor, comprising: a processor core adapted to execute a received instruction; and a core control circuit adapted to automatically schedule an instruction for execution by the processor core in response to a received work descriptor data packet. In another embodiment, the core control circuit is also adapted to schedule a fiber create instruction for execution by the processor core, to reserve a predetermined amount of memory space in a thread control memory to store return arguments, and to generate one or more work descriptor data packets to another processor or hybrid threading fabric circuit for execution of a corresponding plurality of execution threads. Event processing, data path management, system calls, memory requests, and other new instructions are also disclosed.
    Type: Grant
    Filed: April 30, 2019
    Date of Patent: September 14, 2021
    Assignee: Micron Technology, Inc.
    Inventor: Tony M. Brewer
  • Patent number: 11113061
    Abstract: Described herein are techniques for saving registers in the event of a function call. The techniques include modifying a program including a block of code designated as a calling code that calls a function. The modifying includes modifying the calling code to set a register usage mask indicating which registers are in use at the time of the function call. The modifying also includes modifying the function to combine the information of the register usage mask with information indicating registers used by the function to generate registers to be saved and save the registers to be saved.
    Type: Grant
    Filed: September 26, 2019
    Date of Patent: September 7, 2021
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Michael John Bedy
  • Patent number: 11074213
    Abstract: Systems, methods, and apparatuses relating to vector processor architecture having an array of identical circuit blocks are described.
    Type: Grant
    Filed: June 29, 2019
    Date of Patent: July 27, 2021
    Assignee: Intel Corporation
    Inventors: Joseph Williams, Jay O'Neill, Jeroen Leijten, Harm Peters, Eugene Scuteri
  • Patent number: 11062680
    Abstract: Systems, apparatuses, and methods for implementing raster order view enforcement techniques are disclosed. A processor includes a plurality of compute units coupled to one or more memories. A plurality of waves are launched in parallel for execution on the plurality of compute units, where each wave comprises a plurality of threads. A dependency chain is generated for each wave of the plurality of waves. The compute units wait for all older waves to complete dependency chain generation prior to executing any threads with dependencies. Responsive to all older waves completing dependency chain generation, a given thread with a dependency is executed only if all other threads upon which the given thread is dependent have become inactive. When executed, the plurality of waves generate a plurality of pixels to be driven to a display.
    Type: Grant
    Filed: December 20, 2018
    Date of Patent: July 13, 2021
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Pazhani Pillai, Christopher J. Brennan
  • Patent number: 11061766
    Abstract: Examples disclosed herein relate to a fault-tolerant dot product engine. The fault-tolerant dot product engine has a crossbar array having a number l of row lines and a number n of column lines intersecting the row lines to form l×n memory locations, with each memory location having a programmable memristive element and defining a matrix value. A number l of digital-to-analog converters are coupled to the row lines of the crossbar array to receive an input signal and a number n of analog-to-digital converters are coupled to the column lines of the crossbar array to generate an output signal. The output signal is a dot product of the input signal and the matrix values in the crossbar array, wherein a number m<n of the n column lines in the crossbar array are programmed with matrix values used to detect errors in the output signal.
    Type: Grant
    Filed: December 12, 2019
    Date of Patent: July 13, 2021
    Assignee: Hewlett Packard Enterprise Development LP
    Inventors: Ron M. Roth, Richard H. Henze
  • Patent number: 11055097
    Abstract: One embodiment of the present invention includes techniques to decrease power consumption by reducing the number of redundant operations performed. In operation, a streamlining multiprocessor (SM) identifies uniform groups of threads that, when executed, apply the same deterministic operation to uniform sets of input operands. Within each uniform group of threads, the SM designates one thread as the anchor thread. The SM disables execution units assigned to all of the threads except the anchor thread. The anchor execution unit, assigned to the anchor thread, executes the operation on the uniform set of input operands. Subsequently, the SM sets the outputs of the non-anchor threads included in the uniform group of threads to equal the value of the anchor execution unit output.
    Type: Grant
    Filed: October 8, 2013
    Date of Patent: July 6, 2021
    Assignee: NVIDIA Corporation
    Inventors: Gary M. Tarolli, John H. Edmondson, John Matthew Burgess, Robert Ohannessian
  • Patent number: 11048512
    Abstract: An apparatus and method for processing non-maskable interrupt source information. For example, one embodiment of a processor comprises: a plurality of cores comprising execution circuitry to execute instructions and process data; local interrupt circuitry comprising a plurality of registers to store interrupt-related data including non-maskable interrupt (NMI) data related to a first NMI; and non-maskable interrupt (NMI) processing mode selection circuitry, responsive to a request, to select between at least two NMI processing modes to process the first NMI including: a first NMI processing mode in which the plurality of registers are to store first data related to a first NMI, wherein no NMI source information related to a source of the NMI is included in the first data, and a second NMI processing mode in which the plurality of registers are to store both the first data related to the first NMI and second data comprising NMI source information indicating the NMI source.
    Type: Grant
    Filed: March 28, 2020
    Date of Patent: June 29, 2021
    Assignee: Intel Corporation
    Inventors: Ashok Raj, Andreas Kleen, Gilbert Neiger, Beeman Strong, Jason Brandt, Rupin Vakharwala, Jeff Huxel, Larisa Novakovsky, Ido Ouziel, Sarathy Jayakumar
  • Patent number: 11016706
    Abstract: An example apparatus includes a processing in memory (PIM) capable device having an array of memory cells and sensing circuitry coupled to the array, where the sensing circuitry includes a sense amplifier and a compute component. The PIM capable device includes timing circuitry selectably coupled to the sensing circuitry. The timing circuitry is configured to control timing of performance of compute operations performed using the sensing circuitry. The PIM capable device also includes a sequencer selectably coupled to the timing circuitry. The sequencer is configured to coordinate the compute operations. The apparatus also includes a source external to the PIM capable device. The sequencer is configured to receive a command instruction set from the source to initiate performance of a compute operation.
    Type: Grant
    Filed: June 13, 2019
    Date of Patent: May 25, 2021
    Assignee: Micron Technology, Inc.
    Inventors: Perry V. Lea, Timothy P. Finkbeiner
  • Patent number: 11003455
    Abstract: According to one embodiment, a processor includes an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the first instruction, to read contiguous a first and a second of the data elements from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry of the first storage location and a second data element in a second entry of a second storage location corresponding to the first entry of the first storage location.
    Type: Grant
    Filed: April 29, 2019
    Date of Patent: May 11, 2021
    Assignee: Intel Corporation
    Inventors: Andrew T. Forsyth, Brian J. Hickmann, Jonathan C. Hall, Christopher J. Hughes
  • Patent number: 11003447
    Abstract: A data processing system (2) supports vector processing operations performed upon vector operands comprising a plurality of vector operand elements. The data processing system includes a processor (4) having an instruction decoder (14) which decodes mixed-element-sized vector arithmetic instructions to generate control signals (16) which control processing circuitry (18) to perform arithmetic operations upon a first vector of first source operand elements ai of a first bit size A, and a second vector of second source operand elements bj of a second bit size B. The second bit size B is greater than the first bit size A.
    Type: Grant
    Filed: June 23, 2016
    Date of Patent: May 11, 2021
    Assignee: ARM Limited
    Inventor: Nigel John Stephens
  • Patent number: 10990341
    Abstract: Disclosed are a display apparatus, a method of controlling the same, and a recording medium thereof, the display apparatus including: a display comprising a plurality of light source modules arrayed like tiles and mounted with a plurality of light emitting elements; an image processor configured to output a signal for displaying an image on a predetermined area of the display, the signal comprising image data and identification information about at least one light source module corresponding to the predetermined area; and a driver configured to selectively drive the at least one light source module corresponding to the identification information among the plurality of light source modules, based on the image data.
    Type: Grant
    Filed: September 3, 2019
    Date of Patent: April 27, 2021
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Sangkyun Im, Joowhan Lee
  • Patent number: 10990394
    Abstract: An integrated circuit may include a mixed instruction multiple data (xIMD) computing system. The xIMD computing system may include a plurality of data processors, each data processor representative of a lane of a single instruction multiple data (SIMD) computing system, wherein the plurality of data processors are configured to use a first dominant lane for instruction execution and to fork a second dominant lane when a data dependency instruction that does not share a taken/not-taken state with the first dominant lane is encountered during execution of a program by the xIMD computing system.
    Type: Grant
    Filed: September 28, 2017
    Date of Patent: April 27, 2021
    Assignee: Intel Corporation
    Inventor: Jeffrey L. Nye
  • Patent number: 10970081
    Abstract: Systems, apparatuses, and methods for implementing a decoupled crossbar for a stream processor are disclosed. In one embodiment, a system includes at least a multi-lane execution pipeline, a vector register file, and a crossbar. The system is configured to determine if a given instruction in an instruction stream requires a permutation on data operands retrieved from the vector register file. The system conveys the data operands to the multi-lane execution pipeline on a first path which includes the crossbar responsive to determining the given instruction requires a permutation on the data operands. The crossbar then performs the necessary permutation to route the data operands to the proper processing lanes. Otherwise, the system conveys the data operands to the multi-lane execution pipeline on a second path which bypasses the crossbar responsive to determining the given instruction does not require a permutation on the input operands.
    Type: Grant
    Filed: June 29, 2017
    Date of Patent: April 6, 2021
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Jiasheng Chen, Bin He, Mohammad Reza Hakami, Timothy Lottes, Justin David Smith, Michael J. Mantor, Derek Carson
  • Patent number: 10929503
    Abstract: An apparatus and method for a masked multiply instruction to support neural network pruning operations. For example, one embodiment of a processor comprises: a decoder to decode a matrix multiplication with masking (GEMM) instruction identifying a destination matrix register to store a result, and source registers storing an A-matrix, a B-matrix, and a matrix mask; execution circuitry to execute the GEMM instruction, the execution circuitry to multiply a plurality of B-matrix elements with a plurality of A-matrix elements, each of the B-matrix elements associated with a mask value in the matrix mask, wherein if the mask value is set to a first value, then the execution circuitry is to multiply the B-matrix element with one or more of the A-matrix elements to generate a first partial result, and if the mask value is set to a second value, then the execution circuitry is to multiply an alternate B-matrix element with a one or more of the A-matrix elements to generate a second partial result.
    Type: Grant
    Filed: December 21, 2018
    Date of Patent: February 23, 2021
    Assignee: Intel Corporation
    Inventors: Omid Azizi, Chen Koren, Nitin Garegrat
  • Patent number: 10922086
    Abstract: To perform a reduction operation to combine data values for threads in a thread group using a data processor, the data processor performs combining steps that each combine the stored combined data value result of a previous combining operation for a thread with the combined data value result of the previous combining operation for a selected another execution lane that has not yet contributed to the stored combined data value result for the thread. The data processor selects as the another execution lane of the execution processing circuitry that has not yet contributed to the combined data value result for the thread, an execution lane from a group of execution lanes whose values have been combined in the previous combining step and that have not yet contributed to the combined data value result for the thread, and having a particular relative position in the group of execution lanes.
    Type: Grant
    Filed: June 15, 2019
    Date of Patent: February 16, 2021
    Assignee: Arm Limited
    Inventor: Kevin Petit
  • Patent number: 10909739
    Abstract: In various embodiments, a parallel processor implements a graphics processing pipeline that generates rendered images. In operation, the parallel processor causes execution threads to execute a task shading program on an input mesh to generate a task shader output specifying a mesh shader count. The parallel processor then generates mesh shader identifiers, where the total number of the mesh shader identifiers equals the mesh shader count. For each mesh shader identifier, the parallel processor invokes a mesh shader based on the mesh shader identifier and the task shader output to generate geometry associated with the mesh shader identifier. Subsequently, the parallel processor performs operations on the geometries associated with the mesh shader identifiers to generate a rendered image. Advantageously, unlike conventional graphics processing pipelines, the performance of the graphics processing pipeline is not limited by a primitive distributor.
    Type: Grant
    Filed: January 26, 2018
    Date of Patent: February 2, 2021
    Assignee: NVIDIA Corporation
    Inventors: Ziyad Hakura, Yury Uralsky, Christoph Kubisch, Pierre Boudier, Henry Moreton
  • Patent number: 10891353
    Abstract: Aspects for matrix multiplication in neural network are described herein. The aspects may include a controller unit configured to receive a matrix-addition instruction. The aspects may further include a computation module configured to receive a first matrix and a second matrix. The first matrix may include one or more first elements and the second matrix includes one or more second elements. The one or more first elements and the one or more second elements may be arranged in accordance with a two-dimensional data structure. The computation module may be further configured to respectively add each of the first elements to each of the second elements based on a correspondence in the two-dimensional data structure to generate one or more third elements for a third matrix.
    Type: Grant
    Filed: January 17, 2019
    Date of Patent: January 12, 2021
    Assignee: CAMBRICON TECHNOLOGIES CORPORATION LIMITED
    Inventors: Xiao Zhang, Shaoli Liu, Tianshi Chen, Yunji Chen
  • Patent number: 10877766
    Abstract: An integrated circuit (IC) may include a scheduler for hardware acceleration. The scheduler may include a command queue having a plurality of slots and configured to store commands offloaded from a host processor for execution by compute units of the IC. The scheduler may include a status register having bit locations corresponding to the slots of the command queue. The scheduler may also include a controller coupled to the command queue and the status register. The controller may be configured to schedule the compute units of the IC to execute the commands stored in the slots of the command queue and update the bit locations of the status register to indicate which commands from the command queue are finished executing.
    Type: Grant
    Filed: May 24, 2018
    Date of Patent: December 29, 2020
    Assignee: Xilinx, Inc.
    Inventors: Soren T. Soe, Idris I. Tarwala, Umang Parekh, Sonal Santan, Hem C. Neema
  • Patent number: 10860681
    Abstract: Aspects for matrix addition in neural network are described herein. The aspects may include a controller unit configured to receive a matrix-add-scalar instruction that includes an address of the first matrix and a scalar value. The aspects may further include a computation module configured to receive the first matrix from a storage device based on the address of the first matrix. The first matrix may include one or more first elements. The one or more first elements are arranged in accordance with a two-dimensional data structure. The computation module may be further configured to respectively add the scalar value to each of the one or more first elements of the first matrix in accordance with the matrix-add-scalar instruction to generate one or more second elements for a second matrix.
    Type: Grant
    Filed: October 26, 2018
    Date of Patent: December 8, 2020
    Assignee: CAMBRICON TECHNOLOGIES CORPORATION LIMITED
    Inventors: Xiao Zhang, Shaoli Liu, Tianshi Chen, Yunji Chen
  • Patent number: 10831477
    Abstract: In-lane vector shuffle operations are described. In one embodiment a shuffle instruction specifies a field of per-lane control bits, a source operand and a destination operand, these operands having corresponding lanes, each lane divided into corresponding portions of multiple data elements. Sets of data elements are selected from corresponding portions of every lane of the source operand according to per-lane control bits. Elements of these sets are copied to specified fields in corresponding portions of every lane of the destination operand. Another embodiment of the shuffle instruction also specifies a second source operand, all operands having corresponding lanes divided into multiple data elements. A set selected according to per-lane control bits contains data elements from every lane portion of a first source operand and data elements from every corresponding lane portion of the second source operand. Set elements are copied to specified fields in every lane of the destination operand.
    Type: Grant
    Filed: December 21, 2017
    Date of Patent: November 10, 2020
    Assignee: Intel Corporation
    Inventors: Zeev Sperber, Robert Valentine, Benny Eitan, Doron Orenstein
  • Patent number: 10824934
    Abstract: Described examples include an integrated circuit including a vector multiply unit including a plurality of multiply/accumulate nodes, in which the vector multiply unit is operable to provide an output from the multiply/accumulate nodes, a first data feeder operable to provide first data to the vector multiply unit in vector format, and a second data feeder operable to provide second data to the vector multiply unit in vector format.
    Type: Grant
    Filed: October 16, 2017
    Date of Patent: November 3, 2020
    Assignee: TEXAS INSTRUMENTS INCORPORATED
    Inventors: Mihir Narendra Mody, Shyam Jagannathan, Manu Mathew, Jason T. Jones
  • Patent number: 10824434
    Abstract: Examples described herein relate to dynamically structured single instruction, multiple data (SIMD) instructions, and systems and circuits implementing such dynamically structured SIMD instructions. An example is a method for processing data. A first SIMD structure is determined by a processor. A characteristic of the first SIMD structure is altered by the processor to obtain a second SIMD structure. An indication of the second SIMD structure is communicated from the processor to a numerical engine. Data is packed by the numerical engine into an SIMD instruction according to the second SIMD structure. The SIMD instruction is transmitted from the numerical engine.
    Type: Grant
    Filed: November 29, 2018
    Date of Patent: November 3, 2020
    Assignee: XILINX, INC.
    Inventors: Sean Settle, Ehsan Ghasemi, Ashish Sirasao, Ralph D. Wittig
  • Patent number: 10795677
    Abstract: Embodiments of systems, apparatuses, and methods for multiplication, negation, and accumulation of data values in a processor are described. For example, execution circuitry executes a decoded instruction to multiply selected data values from a plurality of packed data element positions in first and second packed data source operands to generate a plurality of first result values, sum the plurality of first result values to generate one or more second result values, negate the one or more second result values to generate one or more third result values, accumulate the one or more third result values with one or more data values from the destination operand to generate one or more fourth result values, and store the one or more third result values in one or more packed data element positions in the destination operand.
    Type: Grant
    Filed: September 29, 2017
    Date of Patent: October 6, 2020
    Assignee: Intel Corporation
    Inventors: Venkateswara R. Madduri, Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Mark Charney
  • Patent number: 10691464
    Abstract: Systems and methods for virtually partitioning an integrated circuit may include identifying dimensional attributes of a target input dataset and selecting a data partitioning scheme from a plurality of distinct data partitioning schemes for the target input dataset based on the dimensional attributes of the target dataset and architectural attributes of an integrated circuit. The method may include disintegrating the target dataset into a plurality of distinct subsets of data based on the selected data partitioning scheme and identifying a virtual processing core partitioning scheme from a plurality of distinct processing core partitioning schemes for an architecture of the integrated circuit based on the disintegration of the target input dataset.
    Type: Grant
    Filed: January 21, 2020
    Date of Patent: June 23, 2020
    Assignee: quadric.io
    Inventors: Nigel Drego, Aman Sikka, Mrinalini Ravichandran, Robert Daniel Firu, Veerbhan Kheterpal
  • Patent number: 10678662
    Abstract: A computing system includes: a data block including a data; a storage engine, coupled to the data block, configured to process data, as hard information or soft information, through channels including a failed channel and a remaining channel, calculate an aggregated output from a hard decision from the remaining channel, calculate a selected magnitude from a magnitude from the remaining channel with an error detected, calculate an extrinsic soft information based on the aggregated output and the selected magnitude, and decode the failed channel with a scaled soft metric based on the extrinsic soft information.
    Type: Grant
    Filed: March 25, 2016
    Date of Patent: June 9, 2020
    Assignee: CNEX LABS, Inc.
    Inventor: Xiaojie Zhang
  • Patent number: 10628155
    Abstract: First and second forms of a complex multiply instruction are provided for operating on first and second operand vectors comprising multiple data elements including at least one real data element for representing the real part of a complex number and at least one imaginary element for representing an imaginary part of the complex number. One of the first and second forms of the instruction targets at least one real element of the destination vector and the other targets at least one imaginary element. By executing one of each instruction, complex multiplications of the form (a+ib)*(c+id) can be calculated using relatively few instructions and with only two vector register read ports, enabling DSP algorithms such as FFTs to be calculated more efficiently using relatively low power hardware implementations.
    Type: Grant
    Filed: February 22, 2017
    Date of Patent: April 21, 2020
    Assignee: ARM Limited
    Inventor: Thomas Christopher Grocutt
  • Patent number: 10623015
    Abstract: An apparatus and method are described for performing vector compression. For example, one embodiment of a processor comprises: vector compression logic to compress a source vector comprising a plurality of valid data elements and invalid data elements to generate a destination vector in which valid data elements are stored contiguously on one side of the destination vector, the vector compression logic to utilize a bit mask associated with the source vector and comprising a plurality of bits, each bit corresponding to one of the plurality of data elements of the source vector and indicating whether the data element comprises a valid data element or an invalid data element, the vector compression logic to utilize indices of the bit mask and associated bit values of the bit mask to generate a control vector; and shuffle logic to shuffle/permute the data elements of the source vector to the destination vector in accordance with the control vector.
    Type: Grant
    Filed: March 15, 2018
    Date of Patent: April 14, 2020
    Assignee: Intel Corporation
    Inventors: Simon Rubanovich, David M. Russinoff, Amit Gradstein, John W. O'Leary, Zeev Sperber
  • Patent number: 10613863
    Abstract: Techniques and mechanisms described herein include a signal processor implemented as an overlay on a field-programmable gate array (FPGA) device that utilizes special purpose, hardened intellectual property (IP) modules such as memory blocks and digital signal processing (DSP) cores. A Processing Element (PE) is built from one or more DSP cores connected to additional logic. Interconnected as an array, the PEs may operate in a computational model such as Single Instruction-Multiple Thread (SIMT). A software hierarchy is described that transforms the SIMT array into an effective signal processor.
    Type: Grant
    Filed: July 3, 2019
    Date of Patent: April 7, 2020
    Assignee: Nextera Video, Inc.
    Inventors: John E. Deame, Steven Kaufmann, Liviu Voicu
  • Patent number: 10592466
    Abstract: A GPU architecture employs a crossbar switch to preferentially store operand vectors in a compressed form allowing reduction in the number of memory circuits that must be activated during an operand fetch and to allow existing execution units to be used for scalar execution. Scalar execution can be performed during branch divergence.
    Type: Grant
    Filed: May 12, 2016
    Date of Patent: March 17, 2020
    Assignee: Wisconsin Alumni Research Foundation
    Inventors: Nam Sung Kim, Zhenhong Liu
  • Patent number: 10514917
    Abstract: In-lane vector shuffle operations are described. In one embodiment a shuffle instruction specifies a field of per-lane control bits, a source operand and a destination operand, these operands having corresponding lanes, each lane divided into corresponding portions of multiple data elements. Sets of data elements are selected from corresponding portions of every lane of the source operand according to per-lane control bits. Elements of these sets are copied to specified fields in corresponding portions of every lane of the destination operand. Another embodiment of the shuffle instruction also specifies a second source operand, all operands having corresponding lanes divided into multiple data elements. A set selected according to per-lane control bits contains data elements from every lane portion of a first source operand and data elements from every corresponding lane portion of the second source operand. Set elements are copied to specified fields in every lane of the destination operand.
    Type: Grant
    Filed: November 2, 2017
    Date of Patent: December 24, 2019
    Assignee: Intel Corporation
    Inventors: Zeev Sperber, Robert Valentine, Benny Eitan, Doron Orenstein
  • Patent number: 10514916
    Abstract: In-lane vector shuffle operations are described. In one embodiment a shuffle instruction specifies a field of per-lane control bits, a source operand and a destination operand, these operands having corresponding lanes, each lane divided into corresponding portions of multiple data elements. Sets of data elements are selected from corresponding portions of every lane of the source operand according to per-lane control bits. Elements of these sets are copied to specified fields in corresponding portions of every lane of the destination operand. Another embodiment of the shuffle instruction also specifies a second source operand, all operands having corresponding lanes divided into multiple data elements. A set selected according to per-lane control bits contains data elements from every lane portion of a first source operand and data elements from every corresponding lane portion of the second source operand. Set elements are copied to specified fields in every lane of the destination operand.
    Type: Grant
    Filed: June 5, 2017
    Date of Patent: December 24, 2019
    Assignee: Intel Corporation
    Inventors: Zeev Sperber, Robert Valentine, Benny Eitan, Doron Orenstein
  • Patent number: 10514918
    Abstract: In-lane vector shuffle operations are described. In one embodiment a shuffle instruction specifies a field of per-lane control bits, a source operand and a destination operand, these operands having corresponding lanes, each lane divided into corresponding portions of multiple data elements. Sets of data elements are selected from corresponding portions of every lane of the source operand according to per-lane control bits. Elements of these sets are copied to specified fields in corresponding portions of every lane of the destination operand. Another embodiment of the shuffle instruction also specifies a second source operand, all operands having corresponding lanes divided into multiple data elements. A set selected according to per-lane control bits contains data elements from every lane portion of a first source operand and data elements from every corresponding lane portion of the second source operand. Set elements are copied to specified fields in every lane of the destination operand.
    Type: Grant
    Filed: December 21, 2017
    Date of Patent: December 24, 2019
    Assignee: Intel Corporation
    Inventors: Zeev Sperber, Robert Valentine, Benny Eitan, Doron Orenstein
  • Patent number: 10509652
    Abstract: In-lane vector shuffle operations are described. In one embodiment a shuffle instruction specifies a field of per-lane control bits, a source operand and a destination operand, these operands having corresponding lanes, each lane divided into corresponding portions of multiple data elements. Sets of data elements are selected from corresponding portions of every lane of the source operand according to per-lane control bits. Elements of these sets are copied to specified fields in corresponding portions of every lane of the destination operand. Another embodiment of the shuffle instruction also specifies a second source operand, all operands having corresponding lanes divided into multiple data elements. A set selected according to per-lane control bits contains data elements from every lane portion of a first source operand and data elements from every corresponding lane portion of the second source operand. Set elements are copied to specified fields in every lane of the destination operand.
    Type: Grant
    Filed: December 21, 2017
    Date of Patent: December 17, 2019
    Assignee: Intel Corporation
    Inventors: Zeev Sperber, Robert Valentine, Benny Eitan, Doron Orenstein
  • Patent number: 10497440
    Abstract: A crossbar array, comprises a plurality of row lines, a plurality of column lines intersecting the plurality of row lines at a plurality of intersections, and a plurality of junctions coupled between the plurality of row lines and the plurality of column lines at a portion of the plurality of intersections. Each junction comprises a resistive memory element, and the junctions are positioned to calculate a matrix multiplication of a first matrix and a second matrix.
    Type: Grant
    Filed: August 7, 2015
    Date of Patent: December 3, 2019
    Assignee: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP
    Inventors: Miao Hu, John Paul Strachan, Zhiyong Li, R. Stanley Williams
  • Patent number: 10489155
    Abstract: Systems and methods relate to a mixed-width single instruction multiple data (SIMD) instruction which has at least a source vector operand comprising data elements of a first bit-width and a destination vector operand comprising data elements of a second bit-width, wherein the second bit-width is either half of or twice the first bit-width. Correspondingly, one of the source or destination vector operands is expressed as a pair of registers, a first register and a second register. The other vector operand is expressed as a single register. Data elements of the first register correspond to even-numbered data elements of the other vector operand expressed as a single register, and data elements of the second register correspond to data elements of the other vector operand expressed as a single register.
    Type: Grant
    Filed: July 21, 2015
    Date of Patent: November 26, 2019
    Assignee: QUALCOMM Incorporated
    Inventors: Eric Wayne Mahurin, Ajay Anant Ingle
  • Patent number: 10437769
    Abstract: A method of transition minimized low speed data transfer is described herein. In an embodiment, a data rate of a set data to be transmitted on a data bus is determined. A one hot value is encoded on the data bus in response to a low data rate. An XOR operation is performed with a previous state of the data bus and the encoded one hot value. Additionally, a resulting value of the XOR operation is driven onto the data bus.
    Type: Grant
    Filed: December 26, 2013
    Date of Patent: October 8, 2019
    Assignee: Intel Corporation
    Inventor: Daniel Greenspan
  • Patent number: 10419501
    Abstract: A data streaming unit (DSU) and a method for operating a DSU are disclosed. In an embodiment the DSU includes a memory interface configured to be connected to a storage unit, a compute engine interface configured to be connected to a compute engine (CE) and an address generator configured to manage address data representing address locations in the storage unit. The data streaming unit further includes a data organization unit configured to access data in the storage unit and to reorganize the data to be forwarded to the compute engine, wherein the memory interface is communicatively connected to the address generator and the data organization unit, wherein the address generator is communicatively connected to the data organization unit, and wherein the data organization unit is communicatively connected to the compute engine interface.
    Type: Grant
    Filed: December 3, 2015
    Date of Patent: September 17, 2019
    Assignee: Futurewei Technologies, Inc.
    Inventors: Ashish Rai Shrivastava, Alan Gatherer, Sushma Wokhlu
  • Patent number: 10324515
    Abstract: Approaches are provided for a predictive electrical appliance power-saving management mode. An approach includes ascertaining a location and pace of a mobile device. The approach further includes calculating an amount of time that it will take to enable or start programs and services upon a computing device waking from a sleep mode or hybrid sleep mode. The approach further includes determining a distance threshold to the computing device that allows for the calculated amount of time to pass such that the programs and services are enabled or started prior to a user of the mobile device arriving at the computing device when the user is returning to the computing device at the ascertained pace. The approach further includes sending a signal to awaken the computing device from the sleep mode or hybrid sleep mode when the mobile device is within the distance threshold.
    Type: Grant
    Filed: May 8, 2017
    Date of Patent: June 18, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: James E. Bostick, John M. Ganci, Jr., Sarbajit K. Rakshit, Kimberly G. Starks
  • Patent number: 10311539
    Abstract: A SIMD processing unit processes a plurality of tasks which each include up to a predetermined maximum number of work items. The work items of a task are arranged for executing a common sequence of instructions on respective data items. The data items are arranged into blocks, with some of the blocks including at least one invalid data item. Work items which relate to invalid data items are invalid work items. The SIMD processing unit comprises a group of processing lanes configured to execute instructions of work items of a particular task over a plurality of processing cycles. A control module assembles work items into the tasks based on the validity of the work items, so that invalid work items of the particular task are temporally aligned across the processing lanes. In this way the number of wasted processing slots due to invalid work items may be reduced.
    Type: Grant
    Filed: November 2, 2016
    Date of Patent: June 4, 2019
    Assignee: Imagination Technologies Limited
    Inventors: John Howson, Jonathan Redshaw, Yoong Chert Foo
  • Patent number: 10291514
    Abstract: Aspects of this disclosure provide techniques for dynamically configuring flow splitting via software defined network (SDN) signaling instructions. An SDN controller may instruct an ingress network node to split a traffic flow between two or more egress paths, and instruct the ingress network node, and perhaps downstream network nodes, to transport portions of the traffic flow in accordance with a forwarding protocol. In one example, the SDN controller instructs the network nodes to transport portions of the traffic flow in accordance with a link-based forwarding protocol. In other examples, the SDN controller instructs the network nodes to transport portions of the traffic flow in accordance with a path-based or source-based transport protocol.
    Type: Grant
    Filed: August 28, 2017
    Date of Patent: May 14, 2019
    Assignee: Huawei Technologies Co., Ltd.
    Inventors: Xu Li, Hang Zhang
  • Patent number: 10241802
    Abstract: A parallel processor for processing a plurality of different processing instruction streams in parallel is described. The processor comprises a plurality of data processing units; and a plurality of SIMD (Single Instruction Multiple Data) controllers, each connectable to a group of data processing units of the plurality of data processing units, and each SIMD controller arranged to handle an individual processing task with a subgroup of actively connected data processing units selected from the group of data processing units. The parallel processor is arranged to vary dynamically the size of the subgroup of data processing units to which each SIMD controller is actively connected under control of received processing instruction streams, thereby permitting each SIMD controller to be actively connected to a different number of processing units for different processing tasks.
    Type: Grant
    Filed: November 20, 2015
    Date of Patent: March 26, 2019
    Assignee: TELEFONAKTIEBOLAGET LM ERICSSON (PUBL)
    Inventors: John Lancaster, Martin Whitaker
  • Patent number: 10235732
    Abstract: A method and system are described herein for an optimization technique on two aspects of thread scheduling and dispatch when the driver is allowed to pick the scheduling attributes. The present techniques rely on an enhanced GPGPU Walker hardware command and one dimensional local identification generation to maximize thread residency.
    Type: Grant
    Filed: December 27, 2013
    Date of Patent: March 19, 2019
    Assignee: INTEL CORPORATION
    Inventors: Jayanth N. Rao, Michal Mrozek
  • Patent number: 10228972
    Abstract: In some embodiments, the present invention provides an exemplary computing device, including at least: a scheduler processor; a CPU; a GPU; where the scheduler processor configured to: obtain a computing task; divide the computing task into: a first set of subtasks and a second set of subtasks; submit the first set to the CPU; submit the second set to the GPU; determine, for a first subtask of the first set, a first execution time, a first execution speed, or both; determine, for a second subtask of the second set, a second execution time, a second execution speed, or both; dynamically rebalance an allocation of remaining non-executed subtasks of the computing task to be submitted to the CPU and the GPU, based, at least in part, on at least one of: a first comparison of the first execution time to the second execution time, and a second comparison of the first execution speed to the second execution speed.
    Type: Grant
    Filed: June 21, 2018
    Date of Patent: March 12, 2019
    Assignee: Banuba Limited
    Inventor: Yury Hushchyn
  • Patent number: 10229468
    Abstract: Systems, apparatuses and methods may provide for receiving a general purpose graphics processing unit (GPGPU) workload and converting the GPGPU workload to a three-dimensional (3D) workload. Additionally, the 3D workload may be dispatched to a 3D pipeline. In one example, converting the GPGPU workload to the 3D workload includes identifying a plurality of thread groups in the GPGPU workload and mapping the plurality of thread groups to a 3D matrix of cubes.
    Type: Grant
    Filed: June 3, 2015
    Date of Patent: March 12, 2019
    Assignee: Intel Corporation
    Inventors: Robert B. Taylor, Abhishek Venkatesh
  • Patent number: 10223334
    Abstract: A native tensor processor calculates tensor contractions using a sum of outer products. In one implementation, the native tensor processor preferably is implemented as a single integrated circuit and includes an input buffer and a contraction engine. The input buffer buffers tensor elements retrieved from off-chip and transmits the elements to the contraction engine as needed. The contraction engine calculates the tensor contraction by executing calculations from an equivalent matrix multiplications, as if the tensors were unfolded into matrices, but avoiding the overhead of expressly unfolding the tensors. The contraction engine includes a plurality of outer product units that calculate matrix multiplications by a sum of outer products. By using outer products, the equivalent matrix multiplications can be partitioned into smaller matrix multiplications, each of which is localized with respect to which tensor elements are required.
    Type: Grant
    Filed: July 20, 2017
    Date of Patent: March 5, 2019
    Assignee: NOVUMIND LIMITED
    Inventors: Chien-Ping Lu, Yu-Shuen Tang