Single Instruction, Multiple Data (simd) Patents (Class 712/22)

Digital signal processing array using integrated processing elements

Patent number: 10613863

Abstract: Techniques and mechanisms described herein include a signal processor implemented as an overlay on a field-programmable gate array (FPGA) device that utilizes special purpose, hardened intellectual property (IP) modules such as memory blocks and digital signal processing (DSP) cores. A Processing Element (PE) is built from one or more DSP cores connected to additional logic. Interconnected as an array, the PEs may operate in a computational model such as Single Instruction-Multiple Thread (SIMT). A software hierarchy is described that transforms the SIMT array into an effective signal processor.

Type: Grant

Filed: July 3, 2019

Date of Patent: April 7, 2020

Assignee: Nextera Video, Inc.

Inventors: John E. Deame, Steven Kaufmann, Liviu Voicu
Graphic processor unit providing reduced storage costs for similar operands

Patent number: 10592466

Abstract: A GPU architecture employs a crossbar switch to preferentially store operand vectors in a compressed form allowing reduction in the number of memory circuits that must be activated during an operand fetch and to allow existing execution units to be used for scalar execution. Scalar execution can be performed during branch divergence.

Type: Grant

Filed: May 12, 2016

Date of Patent: March 17, 2020

Assignee: Wisconsin Alumni Research Foundation

Inventors: Nam Sung Kim, Zhenhong Liu
In-lane vector shuffle instructions

Patent number: 10514917

Abstract: In-lane vector shuffle operations are described. In one embodiment a shuffle instruction specifies a field of per-lane control bits, a source operand and a destination operand, these operands having corresponding lanes, each lane divided into corresponding portions of multiple data elements. Sets of data elements are selected from corresponding portions of every lane of the source operand according to per-lane control bits. Elements of these sets are copied to specified fields in corresponding portions of every lane of the destination operand. Another embodiment of the shuffle instruction also specifies a second source operand, all operands having corresponding lanes divided into multiple data elements. A set selected according to per-lane control bits contains data elements from every lane portion of a first source operand and data elements from every corresponding lane portion of the second source operand. Set elements are copied to specified fields in every lane of the destination operand.

Type: Grant

Filed: November 2, 2017

Date of Patent: December 24, 2019

Assignee: Intel Corporation

Inventors: Zeev Sperber, Robert Valentine, Benny Eitan, Doron Orenstein
In-lane vector shuffle instructions

Patent number: 10514918

Abstract: In-lane vector shuffle operations are described. In one embodiment a shuffle instruction specifies a field of per-lane control bits, a source operand and a destination operand, these operands having corresponding lanes, each lane divided into corresponding portions of multiple data elements. Sets of data elements are selected from corresponding portions of every lane of the source operand according to per-lane control bits. Elements of these sets are copied to specified fields in corresponding portions of every lane of the destination operand. Another embodiment of the shuffle instruction also specifies a second source operand, all operands having corresponding lanes divided into multiple data elements. A set selected according to per-lane control bits contains data elements from every lane portion of a first source operand and data elements from every corresponding lane portion of the second source operand. Set elements are copied to specified fields in every lane of the destination operand.

Type: Grant

Filed: December 21, 2017

Date of Patent: December 24, 2019

Assignee: Intel Corporation

Inventors: Zeev Sperber, Robert Valentine, Benny Eitan, Doron Orenstein
In-lane vector shuffle instructions

Patent number: 10514916

Abstract: In-lane vector shuffle operations are described. In one embodiment a shuffle instruction specifies a field of per-lane control bits, a source operand and a destination operand, these operands having corresponding lanes, each lane divided into corresponding portions of multiple data elements. Sets of data elements are selected from corresponding portions of every lane of the source operand according to per-lane control bits. Elements of these sets are copied to specified fields in corresponding portions of every lane of the destination operand. Another embodiment of the shuffle instruction also specifies a second source operand, all operands having corresponding lanes divided into multiple data elements. A set selected according to per-lane control bits contains data elements from every lane portion of a first source operand and data elements from every corresponding lane portion of the second source operand. Set elements are copied to specified fields in every lane of the destination operand.

Type: Grant

Filed: June 5, 2017

Date of Patent: December 24, 2019

Assignee: Intel Corporation

Inventors: Zeev Sperber, Robert Valentine, Benny Eitan, Doron Orenstein
In-lane vector shuffle instructions

Patent number: 10509652

Abstract: In-lane vector shuffle operations are described. In one embodiment a shuffle instruction specifies a field of per-lane control bits, a source operand and a destination operand, these operands having corresponding lanes, each lane divided into corresponding portions of multiple data elements. Sets of data elements are selected from corresponding portions of every lane of the source operand according to per-lane control bits. Elements of these sets are copied to specified fields in corresponding portions of every lane of the destination operand. Another embodiment of the shuffle instruction also specifies a second source operand, all operands having corresponding lanes divided into multiple data elements. A set selected according to per-lane control bits contains data elements from every lane portion of a first source operand and data elements from every corresponding lane portion of the second source operand. Set elements are copied to specified fields in every lane of the destination operand.

Type: Grant

Filed: December 21, 2017

Date of Patent: December 17, 2019

Assignee: Intel Corporation

Inventors: Zeev Sperber, Robert Valentine, Benny Eitan, Doron Orenstein
Crossbar arrays for calculating matrix multiplication

Patent number: 10497440

Abstract: A crossbar array, comprises a plurality of row lines, a plurality of column lines intersecting the plurality of row lines at a plurality of intersections, and a plurality of junctions coupled between the plurality of row lines and the plurality of column lines at a portion of the plurality of intersections. Each junction comprises a resistive memory element, and the junctions are positioned to calculate a matrix multiplication of a first matrix and a second matrix.

Type: Grant

Filed: August 7, 2015

Date of Patent: December 3, 2019

Assignee: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP

Inventors: Miao Hu, John Paul Strachan, Zhiyong Li, R. Stanley Williams
Mixed-width SIMD operations using even/odd register pairs for wide data elements

Patent number: 10489155

Abstract: Systems and methods relate to a mixed-width single instruction multiple data (SIMD) instruction which has at least a source vector operand comprising data elements of a first bit-width and a destination vector operand comprising data elements of a second bit-width, wherein the second bit-width is either half of or twice the first bit-width. Correspondingly, one of the source or destination vector operands is expressed as a pair of registers, a first register and a second register. The other vector operand is expressed as a single register. Data elements of the first register correspond to even-numbered data elements of the other vector operand expressed as a single register, and data elements of the second register correspond to data elements of the other vector operand expressed as a single register.

Type: Grant

Filed: July 21, 2015

Date of Patent: November 26, 2019

Assignee: QUALCOMM Incorporated

Inventors: Eric Wayne Mahurin, Ajay Anant Ingle
Transition-minimized low speed data transfer

Patent number: 10437769

Abstract: A method of transition minimized low speed data transfer is described herein. In an embodiment, a data rate of a set data to be transmitted on a data bus is determined. A one hot value is encoded on the data bus in response to a low data rate. An XOR operation is performed with a previous state of the data bus and the encoded one hot value. Additionally, a resulting value of the XOR operation is driven onto the data bus.

Type: Grant

Filed: December 26, 2013

Date of Patent: October 8, 2019

Assignee: Intel Corporation

Inventor: Daniel Greenspan
Data streaming unit and method for operating the data streaming unit

Patent number: 10419501

Abstract: A data streaming unit (DSU) and a method for operating a DSU are disclosed. In an embodiment the DSU includes a memory interface configured to be connected to a storage unit, a compute engine interface configured to be connected to a compute engine (CE) and an address generator configured to manage address data representing address locations in the storage unit. The data streaming unit further includes a data organization unit configured to access data in the storage unit and to reorganize the data to be forwarded to the compute engine, wherein the memory interface is communicatively connected to the address generator and the data organization unit, wherein the address generator is communicatively connected to the data organization unit, and wherein the data organization unit is communicatively connected to the compute engine interface.

Type: Grant

Filed: December 3, 2015

Date of Patent: September 17, 2019

Assignee: Futurewei Technologies, Inc.

Inventors: Ashish Rai Shrivastava, Alan Gatherer, Sushma Wokhlu
Predictive electrical appliance device power management mode based on presence detection of mobile device

Patent number: 10324515

Abstract: Approaches are provided for a predictive electrical appliance power-saving management mode. An approach includes ascertaining a location and pace of a mobile device. The approach further includes calculating an amount of time that it will take to enable or start programs and services upon a computing device waking from a sleep mode or hybrid sleep mode. The approach further includes determining a distance threshold to the computing device that allows for the calculated amount of time to pass such that the programs and services are enabled or started prior to a user of the mobile device arriving at the computing device when the user is returning to the computing device at the ascertained pace. The approach further includes sending a signal to awaken the computing device from the sleep mode or hybrid sleep mode when the mobile device is within the distance threshold.

Type: Grant

Filed: May 8, 2017

Date of Patent: June 18, 2019

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: James E. Bostick, John M. Ganci, Jr., Sarbajit K. Rakshit, Kimberly G. Starks
Task execution in a SIMD processing unit with parallel groups of processing lanes

Patent number: 10311539

Abstract: A SIMD processing unit processes a plurality of tasks which each include up to a predetermined maximum number of work items. The work items of a task are arranged for executing a common sequence of instructions on respective data items. The data items are arranged into blocks, with some of the blocks including at least one invalid data item. Work items which relate to invalid data items are invalid work items. The SIMD processing unit comprises a group of processing lanes configured to execute instructions of work items of a particular task over a plurality of processing cycles. A control module assembles work items into the tasks based on the validity of the work items, so that invalid work items of the particular task are temporally aligned across the processing lanes. In this way the number of wasted processing slots due to invalid work items may be reduced.

Type: Grant

Filed: November 2, 2016

Date of Patent: June 4, 2019

Assignee: Imagination Technologies Limited

Inventors: John Howson, Jonathan Redshaw, Yoong Chert Foo
Software defined network (SDN) control signaling for traffic engineering to enable multi-type transport in a data plane

Patent number: 10291514

Abstract: Aspects of this disclosure provide techniques for dynamically configuring flow splitting via software defined network (SDN) signaling instructions. An SDN controller may instruct an ingress network node to split a traffic flow between two or more egress paths, and instruct the ingress network node, and perhaps downstream network nodes, to transport portions of the traffic flow in accordance with a forwarding protocol. In one example, the SDN controller instructs the network nodes to transport portions of the traffic flow in accordance with a link-based forwarding protocol. In other examples, the SDN controller instructs the network nodes to transport portions of the traffic flow in accordance with a path-based or source-based transport protocol.

Type: Grant

Filed: August 28, 2017

Date of Patent: May 14, 2019

Assignee: Huawei Technologies Co., Ltd.

Inventors: Xu Li, Hang Zhang
Parallel processor with single instruction multiple data (SIMD) controllers

Patent number: 10241802

Abstract: A parallel processor for processing a plurality of different processing instruction streams in parallel is described. The processor comprises a plurality of data processing units; and a plurality of SIMD (Single Instruction Multiple Data) controllers, each connectable to a group of data processing units of the plurality of data processing units, and each SIMD controller arranged to handle an individual processing task with a subgroup of actively connected data processing units selected from the group of data processing units. The parallel processor is arranged to vary dynamically the size of the subgroup of data processing units to which each SIMD controller is actively connected under control of received processing instruction streams, thereby permitting each SIMD controller to be actively connected to a different number of processing units for different processing tasks.

Type: Grant

Filed: November 20, 2015

Date of Patent: March 26, 2019

Assignee: TELEFONAKTIEBOLAGET LM ERICSSON (PUBL)

Inventors: John Lancaster, Martin Whitaker
Scheduling and dispatch of GPGPU workloads

Patent number: 10235732

Abstract: A method and system are described herein for an optimization technique on two aspects of thread scheduling and dispatch when the driver is allowed to pick the scheduling attributes. The present techniques rely on an enhanced GPGPU Walker hardware command and one dimensional local identification generation to maximize thread residency.

Type: Grant

Filed: December 27, 2013

Date of Patent: March 19, 2019

Assignee: INTEL CORPORATION

Inventors: Jayanth N. Rao, Michal Mrozek
Computer systems and computer-implemented methods for dynamically adaptive distribution of workload between central processing unit(s) and graphics processing unit(s)

Patent number: 10228972

Abstract: In some embodiments, the present invention provides an exemplary computing device, including at least: a scheduler processor; a CPU; a GPU; where the scheduler processor configured to: obtain a computing task; divide the computing task into: a first set of subtasks and a second set of subtasks; submit the first set to the CPU; submit the second set to the GPU; determine, for a first subtask of the first set, a first execution time, a first execution speed, or both; determine, for a second subtask of the second set, a second execution time, a second execution speed, or both; dynamically rebalance an allocation of remaining non-executed subtasks of the computing task to be submitted to the CPU and the GPU, based, at least in part, on at least one of: a first comparison of the first execution time to the second execution time, and a second comparison of the first execution speed to the second execution speed.

Type: Grant

Filed: June 21, 2018

Date of Patent: March 12, 2019

Assignee: Banuba Limited

Inventor: Yury Hushchyn
Automated conversion of GPGPU workloads to 3D pipeline workloads

Patent number: 10229468

Abstract: Systems, apparatuses and methods may provide for receiving a general purpose graphics processing unit (GPGPU) workload and converting the GPGPU workload to a three-dimensional (3D) workload. Additionally, the 3D workload may be dispatched to a 3D pipeline. In one example, converting the GPGPU workload to the 3D workload includes identifying a plurality of thread groups in the GPGPU workload and mapping the plurality of thread groups to a 3D matrix of cubes.

Type: Grant

Filed: June 3, 2015

Date of Patent: March 12, 2019

Assignee: Intel Corporation

Inventors: Robert B. Taylor, Abhishek Venkatesh
Native tensor processor

Patent number: 10223334

Abstract: A native tensor processor calculates tensor contractions using a sum of outer products. In one implementation, the native tensor processor preferably is implemented as a single integrated circuit and includes an input buffer and a contraction engine. The input buffer buffers tensor elements retrieved from off-chip and transmits the elements to the contraction engine as needed. The contraction engine calculates the tensor contraction by executing calculations from an equivalent matrix multiplications, as if the tensors were unfolded into matrices, but avoiding the overhead of expressly unfolding the tensors. The contraction engine includes a plurality of outer product units that calculate matrix multiplications by a sum of outer products. By using outer products, the equivalent matrix multiplications can be partitioned into smaller matrix multiplications, each of which is localized with respect to which tensor elements are required.

Type: Grant

Filed: July 20, 2017

Date of Patent: March 5, 2019

Assignee: NOVUMIND LIMITED

Inventors: Chien-Ping Lu, Yu-Shuen Tang
Network controller-sideband interface port controller

Patent number: 10218635

Abstract: A network interface controller (NC) that can provide a connection for a device to a network. The NC can include a sideband port controller. The sideband port controller can provide a sideband connection between the network and a sideband endpoint circuit that can communicate information with the network via the sideband. The sideband port controller can include a receive data route that has an input for receiving packets of data from the network and an output for passing the packets of data received from the network to the sideband endpoint circuit. The receive data route may include a buffer to receive the packets of data from the network and to pass the packets of data received from the network to the sideband endpoint.

Type: Grant

Filed: September 18, 2015

Date of Patent: February 26, 2019

Assignee: International Business Machines Corporation

Inventors: Jean-Paul Aldebert, Claude Basso, Jean-Luc Frenoy, Fabrice J. Verplanken
Native tensor processor, and systems using native sensor processors

Patent number: 10216704

Abstract: A native tensor processor calculates tensor contractions using a sum of outer products. In one implementation, the native tensor processor preferably is implemented as a single integrated circuit and includes an input buffer and a contraction engine. The input buffer buffers tensor elements retrieved from off-chip and transmits the elements to the contraction engine as needed. The contraction engine calculates the tensor contraction by executing calculations from an equivalent matrix multiplications, as if the tensors were unfolded into matrices, but avoiding the overhead of expressly unfolding the tensors. The contraction engine includes a plurality of outer product units that calculate matrix multiplications by a sum of outer products. By using outer products, the equivalent matrix multiplications can be partitioned into smaller matrix multiplications, each of which is localized with respect to which tensor elements are required.

Type: Grant

Filed: July 20, 2017

Date of Patent: February 26, 2019

Assignee: NOVUMIND LIMITED

Inventors: Chien-Ping Lu, Yu-Shuen Tang
Network controller-sideband interface port controller

Patent number: 10218634

Abstract: A network interface controller for providing a connection for a device to a network. The network interface controller may include a sideband port controller. The sideband port controller may provide a sideband connection between the network and a sideband endpoint circuit that is operative to communicate information with the network via the sideband. The sideband port controller may include a transmit data route having an input for receiving packets from the sideband endpoint circuit and an output for passing packets received from the sideband endpoint to the network. A packet parser is connected to the transmit data route. The packet parser is operative to read data from packets received from the sideband endpoint and is further operative to analyze the data.

Type: Grant

Filed: September 18, 2015

Date of Patent: February 26, 2019

Assignee: International Business Machines Corporation

Inventors: Jean-Paul Aldebert, Claude Basso, Jean-Luc Frenoy, Fabrice J. Verplanken
Methods and systems for grouping and executing initial pilot shader programs

Patent number: 10186069

Abstract: A graphics processing system groups plural initial pilot shader programs into a set of initial pilot shader programs and associates the set of initial pilot shader programs with a set of indexes. The initial pilot shader programs each contain constant program expressions to be executed on behalf of an original shader program. The index for an initial pilot shader program is then used to obtain the instructions contained in the initial pilot shader program for executing the constant program expressions of the initial pilot shader program. The threads for executing a subset of the initial pilot shader programs are also grouped into a thread group and the threads of the thread group are executed in parallel. The graphics processing system provides for efficient preparation and execution of plural initial pilot shader programs.

Type: Grant

Filed: February 15, 2017

Date of Patent: January 22, 2019

Assignee: Arm Limited

Inventors: Alexander Galazin, Jörg Wagner, Andreas Due Engh-Halstvedt
Pre-scheduled replays of divergent operations

Patent number: 10152329

Abstract: One embodiment of the present disclosure sets forth an optimized way to execute pre-scheduled replay operations for divergent operations in a parallel processing subsystem. Specifically, a streaming multiprocessor (SM) includes a multi-stage pipeline configured to insert pre-scheduled replay operations into a multi-stage pipeline. A pre-scheduled replay unit detects whether the operation associated with the current instruction is accessing a common resource. If the threads are accessing data which are distributed across multiple cache lines, then the pre-scheduled replay unit inserts pre-scheduled replay operations behind the current instruction. The multi-stage pipeline executes the instruction and the associated pre-scheduled replay operations sequentially. If additional threads remain unserviced after execution of the instruction and the pre-scheduled replay operations, then additional replay operations are inserted via the replay loop, until all threads are serviced.

Type: Grant

Filed: February 9, 2012

Date of Patent: December 11, 2018

Assignee: NVIDIA CORPORATION

Inventors: Michael Fetterman, Stewart Glenn Carlton, Jack Hilaire Choquette, Shirish Gadre, Olivier Giroux, Douglas J. Hahn, Steven James Heinrich, Eric Lyell Hill, Charles McCarver, Omkar Paranjape, Anjana Rajendran, Rajeshwaran Selvanesan
Native tensor processor, and partitioning of tensor contractions

Patent number: 10073816

Abstract: A native tensor processor calculates tensor contractions using a sum of outer products. In one implementation, the native tensor processor preferably is implemented as a single integrated circuit and includes an input buffer and a contraction engine. The input buffer buffers tensor elements retrieved from off-chip and transmits the elements to the contraction engine as needed. The contraction engine calculates the tensor contraction by executing calculations from an equivalent matrix multiplications, as if the tensors were unfolded into matrices, but avoiding the overhead of expressly unfolding the tensors. The contraction engine includes a plurality of outer product units that calculate matrix mutiplications by a sum of outer products. By using outer products, the equivalent matrix multiplications can be partitioned into smaller matrix multiplications, each of which is localized with respect to which tensor elements are required.

Type: Grant

Filed: July 20, 2017

Date of Patent: September 11, 2018

Assignee: NovuMind Limited

Inventors: Chien-Ping Lu, Yu-Shuen Tang
Redundancy elimination in single instruction multiple data/thread (SIMD/T) execution processing

Patent number: 10061591

Abstract: A method for reducing execution of redundant threads in a processing environment. The method includes detecting threads that include redundant work among many different threads. Multiple threads from the detected threads are grouped into one or more thread clusters based on determining same thread computation results. Execution of all but a particular one thread in each of the one or more thread clusters is suppressed. The particular one thread in each of the one or more thread clusters is executed. Results determined from execution of the particular one thread in each of the one or more thread clusters are broadcasted to other threads in each of the one or more thread clusters.

Type: Grant

Filed: February 26, 2015

Date of Patent: August 28, 2018

Assignee: Samsung Electronics Company, Ltd.

Inventors: Boris Beylin, John Brothers, Santosh Abraham, Lingjie Xu, Maxim Lukyanov, Alex Grosul
Method and apparatus for asynchronous processor with auxiliary asynchronous vector processor

Patent number: 10042641

Abstract: An asynchronous processing system comprising an asynchronous scalar processor and an asynchronous vector processor coupled to the scalar processor. The asynchronous scalar processor is configured to perform processing functions on input data and to output instructions. The asynchronous vector processor is configured to perform processing functions in response to a very long instruction word (VLIW) received from the scalar processor. The VLIW comprises a first portion and a second portion, at least the first portion comprising a vector instruction.

Type: Grant

Filed: September 8, 2014

Date of Patent: August 7, 2018

Assignee: Huawei Technologies Co., Ltd.

Inventors: Qifan Zhang, Wuxian Shi, Yiqun Ge, Tao Huang, Wen Tong
Method, apparatus, and computer-readable medium for parallelization of a computer program on a plurality of computing cores

Patent number: 10019410

Abstract: An apparatus, computer-readable medium, and computer-implemented method for parallelization of a computer program on a plurality of computing cores includes receiving a computer program comprising a plurality of commands, decomposing the plurality of commands into a plurality of node networks, each node network corresponding to a command in the plurality of commands and including one or more nodes corresponding to execution dependencies of the command, mapping the plurality of node networks to a plurality of systolic arrays, each systolic array comprising a plurality of cells and each non-data node in each node network being mapped to a cell in the plurality of cells, and mapping each cell in each systolic array to a computing core in the plurality of computing cores.

Type: Grant

Filed: April 6, 2017

Date of Patent: July 10, 2018

Assignee: CORNAMI, INC.

Inventors: Solomon Harsha, Paul Master
System and method for contextual vectorization of instructions at runtime

Patent number: 10019264

Abstract: Methods and apparatuses relating to processors that contextually optimize instructions at runtime are disclosed. In one embodiment, a processors includes a fetch circuit to fetch an instruction from an instruction storage, a format of the instruction including an opcode, a first source operand identifier, and a second source operand identifier; wherein the instruction storage includes a sequence of sub-optimal instructions preceded by a start-of-sequence instruction and followed by an end-of-sequence instruction.

Type: Grant

Filed: February 24, 2016

Date of Patent: July 10, 2018

Assignee: Intel Corporation

Inventors: Taylor W. Kidd, Matt S. Walsh
Fast deep neural network feature transformation via optimized memory bandwidth utilization

Patent number: 10013652

Abstract: Deep Neural Networks (DNNs) with many hidden layers and many units per layer are very flexible models with a very large number of parameters. As such, DNNs are challenging to optimize. To achieve real-time computation, embodiments disclosed herein enable fast DNN feature transformation via optimized memory bandwidth utilization. To optimize memory bandwidth utilization, a rate of accessing memory may be reduced based on a batch setting. A memory, corresponding to a selected given output neuron of a current layer of the DNN, may be updated with an incremental output value computed for the selected given output neuron as a function of input values of a selected few non-zero input neurons of a previous layer of the DNN in combination with weights between the selected few non-zero input neurons and the selected given output neuron, wherein a number of the selected few corresponds to the batch setting.

Type: Grant

Filed: April 29, 2015

Date of Patent: July 3, 2018

Assignee: Nuance Communications, Inc.

Inventors: Jan Vlietinck, Stephan Kanthak, Rudi Vuerinckx, Christophe Ris
Advanced region of interest function for image sensors

Patent number: 9979904

Abstract: The present invention relates to reading-out sensor array pixels. In particular, the present invention provides an approach according to which only a region of interest is may be read out from the sensor array, thus leading to substantial time savings. In order to achieve this, a circuitry for configuring a region of interest for the sensor array is provided as well as a reading-out circuitry for reading-out pixels belonging to the region of interest. In addition, the corresponding methods for programming the region of interest and for reading-out the region of interest are provided. The circuitry for programming and/or reading-out the region of interest includes per pixel provided storage elements for storing an indication of whether a pixel belongs to a region of interest (ROI). These are configured by the programming circuitry and using when reading-out the ROI for only reading out the pixels of the ROI.

Type: Grant

Filed: January 24, 2014

Date of Patent: May 22, 2018

Assignee: INNOVACIONES MICROELECTRÓNICAS S.L. (ANAFOCUS)

Inventors: Rafael Dominguez Castro, Sergio Morillas Castillo, Rafael Romay Juárez, Fernando Medeiro Hidalgo
Comparison-based sort in a reconfigurable array processor having multiple processing elements for sorting array elements

Patent number: 9891912

Abstract: An array processor includes a managing element having a load streaming unit coupled to multiple processing elements. The load streaming unit provides input data portions to each of a first subset of processing elements and receives output data from each of a second subset of the processing elements based on a comparatively sorted combination of the input data portions. Each processing element is configurable by the managing element to compare input data portions received from the load streaming unit or two or more of the other processing elements. Each processing unit can further select an input data portion to be output data based on the comparison, and in response to selecting the input data portion, remove a queue entry corresponding to the selected input data portion. Each processing element can provide the selected output data portion to the managing element or as an input to one of the processing elements.

Type: Grant

Filed: October 31, 2014

Date of Patent: February 13, 2018

Assignee: International Business Machines Corporation

Inventors: Ganesh Balakrishnan, Bartholomew Blaner, John J. Reilly, Jeffrey A. Stuecheli
Apparatus and method for handling registers in pipeline processing

Patent number: 9841957

Abstract: An apparatus stores a program including a description of loop processing of iterating a plurality of instructions, and rearranges an execution sequence of the plurality of instructions in the program such that the loop processing is pipelined by software pipeline. The apparatus inserts an instruction to use a register for single instruction multiple data (SIMD) extension instruction, into the description of the loop processing in the program.

Type: Grant

Filed: April 19, 2016

Date of Patent: December 12, 2017

Assignee: FUJITSU LIMITED

Inventor: Shun Kamatsuka
Hardware and software solutions to divergent branches in a parallel pipeline

Patent number: 9830164

Abstract: A system and method for efficiently processing instructions in hardware parallel execution lanes within a processor. In response to a given divergent point within an identified loop, a compiler arranges instructions within the identified loop into very large instruction words (VLIW's). At least one VLIW includes instructions intermingled from different basic blocks between the given divergence point and a corresponding convergence point. The compiler generates code wherein when executed assigns at runtime instructions within a given VLIW to multiple parallel execution lanes within a target processor. The target processor includes a single instruction multiple data (SIMD) micro-architecture. The assignment for a given lane is based on branch direction found at runtime for the given lane at the given divergent point. The target processor includes a vector register for storing indications indicating which given instruction within a fetched VLIW for an associated lane to execute.

Type: Grant

Filed: January 29, 2013

Date of Patent: November 28, 2017

Assignee: Advanced Micro Devices, Inc.

Inventor: Reza Yazdani
Video encoding method and apparatus for determining size of parallel motion estimation region based on encoding related information and related video decoding method and apparatus

Patent number: 9832478

Abstract: One exemplary video encoding method has the following steps: determining a size of a parallel motion estimation region according to encoding related information; and encoding a plurality of pixels by at least performing motion estimation based on the size of the parallel motion estimation region. One exemplary video decoding method has the following steps: decoding a video parameter stream to obtain a decoded size of a parallel motion estimation region; checking validity of the decoded size of the parallel motion estimation region, and accordingly generating a checking result; when the checking result indicates that the decoded size of the parallel motion estimation region is invalid, entering an error handling process to decide a size of the parallel motion estimation; and decoding a plurality of pixels by at least performing motion estimation based on the decided size of the parallel motion estimation region.

Type: Grant

Filed: May 6, 2014

Date of Patent: November 28, 2017

Assignee: MEDIATEK INC.

Inventors: Tung-Hsing Wu, Kun-Bin Lee
Instruction and logic for processing text strings

Patent number: 9804848

Abstract: Method, apparatus, and program for performing a string comparison operation. The apparatus includes execution resources to execute a first instruction. In response to the first instruction, the execution resources store a result of a comparison between each data element of a first and second operand corresponding to a first and second text string, respectively.

Type: Grant

Filed: December 5, 2014

Date of Patent: October 31, 2017

Assignee: Intel Corporation

Inventors: Michael A. Julier, Jeffrey D. Gray, Srinivas Chennupaty, Sean P. Mirkes, Mark P. Seconi
Parallelization of random number generators

Patent number: 9804826

Abstract: System and method for pseudo-random number generation based on a recursion with significantly increased multithreaded parallelism. A single pseudo-random generator program is assigned with multiple threads to process in parallel. N state elements indexed incrementally are arranged into a matrix comprising x rows, where a respective adjacent pair of state elements in a same column are related by g=(M+j)mod N, wherein j and g represent indexes of the pair of state elements. x can be determined through an modular manipulative inverse of M and N. The matrix can be divided into sections with each section having a number of columns, and each thread is assigned with a section. In this manner, the majority of the requisite interactions among the state elements occur without expensive inter-thread communications, and further each thread may only need to communicate with a single other thread for a small number of times.

Type: Grant

Filed: December 5, 2014

Date of Patent: October 31, 2017

Assignee: Nvidia Corporation

Inventors: Przemyslaw Tredak, John Clifton Woolley, Jr.
Methods of and apparatus for multidimensional indexing in microprocessor systems

Patent number: 9772864

Abstract: When an OpenCL kernel is to be executed, a bitfield index representation to be used for the indices of the kernel invocations is determined based on the number of bits needed to represent the maximum value that will be needed for each index dimension for the kernel. A bitfield placement data structure 33 describing how the bitfield index representation is partitioned is then prepared together with a maximum value data structure 32 indicating the maximum index dimension values to be used for the kernel. A processor then executes the kernel invocations 36 across the index space indicated by the maximum value data structure 32. A bitfield index representation 35, 37, 38 configured in accordance with the bitfield placement data structure 33 is associated with each kernel invocation to indicate its index.

Type: Grant

Filed: April 16, 2013

Date of Patent: September 26, 2017

Assignee: ARM LIMITED

Inventor: Jorn Nystad
Processor instruction to store indexes of source data elements in positions representing a sorted order of the source data elements

Patent number: 9766888

Abstract: A processor of an aspect includes packed data registers, and a decode unit to decode an instruction. The instruction may indicate a first source packed data to include at least four data elements, indicate a second source packed data to include at least four data elements, and indicate a destination storage location. An execution unit is coupled with the packed data registers and the decode unit. The execution unit, in response to the instruction, is to store a result packed data in the destination storage location. The result packed data may include at least four indexes that may identify corresponding data element positions in the first and second source packed data. The indexes may be stored in positions in the result packed data that are to represent a sorted order of corresponding data elements in the first and second source packed data.

Type: Grant

Filed: March 28, 2014

Date of Patent: September 19, 2017

Assignee: Intel Corporation

Inventors: Shay Gueron, Vlad Krasnov
Method, apparatus, and computer-readable medium for parallelization of a computer program on a plurality of computing cores

Patent number: 9760530

Abstract: An apparatus, computer-readable medium, and computer-implemented method for parallelization of a computer program on a plurality of computing cores includes receiving a computer program comprising a plurality of commands, decomposing the plurality of commands into a plurality of node networks, each node network corresponding to a command in the plurality of commands and including one or more nodes corresponding to execution dependencies of the command, mapping the plurality of node networks to a plurality of systolic arrays, each systolic array comprising a plurality of cells and each non-data node in each node network being mapped to a cell in the plurality of cells, and mapping each cell in each systolic array to a computing core in the plurality of computing cores.

Type: Grant

Filed: March 31, 2017

Date of Patent: September 12, 2017

Assignee: CORNAMI, INC.

Inventors: Solomon Harsha, Paul Master
Method, apparatus, and computer-readable medium for parallelization of a computer program on a plurality of computing cores

Patent number: 9760531

Abstract: An apparatus, computer-readable medium, and computer-implemented method for parallelization of a computer program on a plurality of computing cores includes receiving a computer program comprising a plurality of commands, decomposing the plurality of commands into a plurality of node networks, each node network corresponding to a command in the plurality of commands and including one or more nodes corresponding to execution dependencies of the command, mapping the plurality of node networks to a plurality of systolic arrays, each systolic array comprising a plurality of cells and each non-data node in each node network being mapped to a cell in the plurality of cells, and mapping each cell in each systolic array to a computing core in the plurality of computing cores.

Type: Grant

Filed: April 6, 2017

Date of Patent: September 12, 2017

Assignee: CORNAMI, INC.

Inventors: Solomon Harsha, Paul Master
Global register protection in a multi-threaded processor

Patent number: 9727380

Abstract: Global register protection in a multi-threaded processor is described. In an embodiment, global resources within a multi-threaded processor are protected by performing checks, before allowing a thread to write to a global resource, to determine whether the thread has write access to the particular global resource. The check involves accessing one or more local control registers or a global control field within the multi-threaded processor and in an example, a local register associated with each other thread in the multi-threaded processor is accessed and checked to see whether it contains an identifier for the particular global resource. Only if none of the accessed local resources contain such an identifier, is the instruction issued and the thread allowed to write to the global resource. Otherwise, the instruction is blocked and an exception may be raised to alert the program that issued the instruction that the write failed.

Type: Grant

Filed: February 19, 2015

Date of Patent: August 8, 2017

Assignee: Imagination Technologies Limited

Inventors: Guixin Wang, Hugh Jackson, Robert Graham Isherwood
Synchronous input/output commands writing to multiple targets

Patent number: 9710171

Abstract: Aspects include communicating synchronous input/output (I/O) commands between an operating system and a recipient. Communicating synchronous I/O commands includes issuing a first synchronous I/O command with a first initiation bit set, where the first synchronous I/O command cause a first mailbox command to be initiated by the recipient with respect to a first storage control unit. Further, communicating synchronous I/O commands issuing a second synchronous I/O command with a second initiation bit set, where the second synchronous I/O command causes a second mailbox command to be initiated by the recipient with respect to at least one subsequent storage control unit. Communicating synchronous I/O commands also includes issuing a third synchronous I/O command with a first completion bit set in response to the first mailbox command being initiated and issuing a fourth synchronous I/O command with a second completion bit set in response to the first mailbox command being initiated.

Type: Grant

Filed: October 1, 2015

Date of Patent: July 18, 2017

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: David F. Craddock, Peter G. Sutton, Harry M. Yudenfriend
Synchronous input/output commands writing to multiple targets

Patent number: 9710172

Abstract: Aspects include communicating synchronous input/output (I/O) commands between an operating system and a recipient. Communicating synchronous I/O commands includes issuing a first synchronous I/O command with a first initiation bit set, where the first synchronous I/O command cause a first mailbox command to be initiated by the recipient with respect to a first storage control unit. Further, communicating synchronous I/O commands issuing a second synchronous I/O command with a second initiation bit set, where the second synchronous I/O command causes a second mailbox command to be initiated by the recipient with respect to at least one subsequent storage control unit. Communicating synchronous I/O commands also includes issuing a third synchronous I/O command with a first completion bit set in response to the first mailbox command being initiated and issuing a fourth synchronous I/O command with a second completion bit set in response to the first mailbox command being initiated.

Type: Grant

Filed: June 14, 2016

Date of Patent: July 18, 2017

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: David F. Craddock, Peter G. Sutton, Harry M. Yudenfriend
Cryptographic support instructions

Patent number: 9703966

Abstract: A data processing system includes a single instruction multiple data register file and single instruction multiple processing circuitry. The single instruction multiple data processing circuitry supports execution of cryptographic processing instructions for performing parts of a hash algorithm. The operands are stored within the single instruction multiple data register file. The cryptographic support instructions do not follow normal lane-based processing and generate output operands in which the different portions of the output operand depend upon multiple different elements within the input operand.

Type: Grant

Filed: July 7, 2015

Date of Patent: July 11, 2017

Assignee: ARM LIMITED

Inventors: Matthew James Horsnell, Richard Roy Grisenthwaite, Stuart David Biles, Daniel Kershaw
Method, apparatus, and computer-readable medium for parallelization of a computer program on a plurality of computing cores

Patent number: 9652435

Abstract: An apparatus, computer-readable medium, and computer-implemented method for parallelization of a computer program on a plurality of computing cores includes receiving a computer program comprising a plurality of commands, decomposing the plurality of commands into a plurality of node networks, each node network corresponding to a command in the plurality of commands and including one or more nodes corresponding to execution dependencies of the command, mapping the plurality of node networks to a plurality of systolic arrays, each systolic array comprising a plurality of cells and each non-data node in each node network being mapped to a cell in the plurality of cells, and mapping each cell in each systolic array to a computing core in the plurality of computing cores.

Type: Grant

Filed: October 18, 2016

Date of Patent: May 16, 2017

Assignee: CORNAMI, INC.

Inventors: Solomon Harsha, Paul Master
Instruction and logic for processing text strings

Patent number: 9645821

Abstract: A processor includes a decoder logic to decode a compare instruction, and an execution unit to execute the compare instruction. The compare instruction is to cause the processor to compare integer data elements of a first 64-bit SIMD integer operand with integer data elements of a second 64-bit SIMD integer operand. The integer data elements of the first 64-bit SIMD integer operand to be compared with the integer data elements of the second 64-bit SIMD integer operand are to be in same data element positions. The compare instruction is also to cause the processor to store a plurality of indicators of whether the compared integer data elements of the first and second 64-bit SIMD integer operands are equal. The plurality of indicators are expanded data elements, each of a first multi-bit size.

Type: Grant

Filed: December 5, 2014

Date of Patent: May 9, 2017

Assignee: Intel Corporation

Inventors: Michael A. Julier, Jeffrey D. Gray, Srinivas Chennupaty, Sean P. Mirkes, Mark P. Seconi
CPU/GPU synchronization mechanism

Patent number: 9633407

Abstract: A thread on one processor may be used to enable another processor to lock or release a mutex. For example, a central processing unit thread may be used by a graphics processing unit to secure a mutex for a shared memory.

Type: Grant

Filed: July 29, 2011

Date of Patent: April 25, 2017

Assignee: Intel Corporation

Inventors: Boris Ginzburg, Esfirush Natanzon, Ilya Osadchiy, Yoav Zach
Data compaction using vectorized instructions

Patent number: 9626402

Abstract: Techniques for performing database operations using vectorized instructions are provided. In one technique, data compaction is performed using vectorized instructions to identify a shuffle mask based on matching bits and update an output array based on the shuffle mask and an input array. In a related technique, a hash table probe involves using vectorized instructions to determine whether each key in one or more hash buckets matches a particular input key.

Type: Grant

Filed: August 1, 2013

Date of Patent: April 18, 2017

Assignee: Oracle International Corporation

Inventors: Rajkumar Sen, Sam Idicula, Nipun Agarwal
Data processor including prefix instruction selecting a flag out of a plurality of flags generated by a subsequent instruction operating on multiple operand sizes in parallel

Patent number: 9612838

Abstract: Instructions for generating flags according to operands' data sizes, and instruction sets handled by a RISC data processor including an instruction capable of executing an operation on operands in more than one data size are disclosed. An identical operation process is conducted on the small-size operand and on low-order bits of the large-size operand, and flags are generated capable of coping with the respective data sizes regardless of the data size of each operand subjected to the operation. Thus, a reduction in instruction code space of the RISC data processor can be achieved.

Type: Grant

Filed: May 21, 2014

Date of Patent: April 4, 2017

Assignee: Renesas Electronics Corporation

Inventor: Fumio Arakawa
Hierarchical hash tables for SIMT processing and a method of establishing hierarchical hash tables

Patent number: 9600852

Abstract: A graphical processing unit having an implementation of a hierarchical hash table thereon, a method of establishing a hierarchical hash table in a graphics processing unit and GPU computing system are disclosed herein. In one embodiment, the graphics processing unit includes: (1) a plurality of parallel processors, wherein each of the plurality of parallel processors includes parallel processing cores, a shared memory coupled to each of the parallel processing cores, and registers, wherein each one of the registers is uniquely associated with one of the parallel processing cores and (2) a controller configured to employ at least one of the registers to establish a hierarchical hash table for a key-value pair of a thread processing on one of the parallel processing cores.

Type: Grant

Filed: May 10, 2013

Date of Patent: March 21, 2017

Assignee: Nvidia Corporation

Inventor: Julien Demouth

prev 1 2 3 4 5 6 … next