Vector Processor Operation Patents (Class 712/7)
  • Patent number: 11531549
    Abstract: A system and corresponding method map instructions in an out-of-order (OoO) processor. The system comprises a mapper, integer snapshot circuitry, and floating-point (FP) snapshot circuitry. The mapper maps instructions by mapping integer and FP architectural registers (ARs) of the instructions to integer and FP physical registers of the OoO processor, respectively. The mapper records, via at least one present FP indicator, presence of FP ARs used as destinations in the instructions. The mapper copies, periodically, the integer mapper state to the integer snapshot circuitry and copies, intermittently, based on the at least one FP present indicator, the FP mapper state to the FP snapshot circuitry. Copies of the integer and FP mapper state in the integer and FP snapshot circuitry, respectively, improve performance for instruction unwinding caused, for example, by an exception, branch/jump mispredict, etc. By copying the FP mapper state, intermittently, power efficiency of the OoO processor is improved.
    Type: Grant
    Filed: March 31, 2021
    Date of Patent: December 20, 2022
    Assignee: Marvell Asia Pte, Ltd.
    Inventor: David A. Carlson
  • Patent number: 11455171
    Abstract: A fast and frugal item-state tracking scoreboard circuit is disclosed. The scoreboard maintains per-item partial states across multiple memory circuits, enabling multiple lookups per clock cycle and multiple state updates per clock cycle. In an embodiment a scoreboard is used to schedule instructions in an out-of-order processor. Each clock cycle the scoreboard indicates the busy state of an instruction's registers and may update the busy state of the destination registers of issuing instructions and completing instructions. Applications include register tracking, function-unit tracking, and cache-line state tracking, in embodiments including processor cores (including superscalar, superpipelined, and multithreaded processors), accelerators, memory systems, and networks. In an embodiment, a register-busy scoreboard circuit is implemented using FPGA LUT RAM memory.
    Type: Grant
    Filed: May 29, 2020
    Date of Patent: September 27, 2022
    Assignee: Gray Research LLC
    Inventor: Jan Stephen Gray
  • Patent number: 11442713
    Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed to improve loop optimization with predictable recurring memory reads (PRMRs). An example apparatus includes memory, and first processor circuitry to execute first instructions to at least identify one or more optimizations to convert a first loop into a second loop based on converting PRMRs of the first loop into loop-invariant PRMRs, the converting of the PRMRs in response to a quantity of the PRMRs satisfying a threshold, the second loop to execute in a single iteration corresponding to a quantity of iterations of the first loop, determine one or more optimization parameters based on the one or more optimizations, and compile second instructions based on the first processor circuitry processing the first loop based on the one or more optimization parameters associated with the one or more optimizations, the second instructions to be executed by the first or second processor circuitry.
    Type: Grant
    Filed: October 19, 2020
    Date of Patent: September 13, 2022
    Assignee: Intel Corporation
    Inventors: Diego Luis Caballero de Gea, Hideki Ido, Eric N. Garcia
  • Patent number: 11367160
    Abstract: A parallel processing unit (e.g., a GPU), in some examples, includes a hardware scheduler and hardware arbiter that launch graphics and compute work for simultaneous execution on a SIMD/SIMT processing unit. Each processing unit (e.g., a streaming multiprocessor) of the parallel processing unit operates in a graphics-greedy mode or a compute-greedy mode at respective times. The hardware arbiter, in response to a result of a comparison of at least one monitored performance or utilization metric to a user-configured threshold, can selectively cause the processing unit to run one or more compute work items from a compute queue when the processing unit is operating in the graphics-greedy mode, and cause the processing unit to run one or more graphics work items from a graphics queue when the processing unit is operating in the compute-greedy mode. Associated methods and systems are also described.
    Type: Grant
    Filed: August 2, 2018
    Date of Patent: June 21, 2022
    Assignee: NVIDIA CORPORATION
    Inventors: Rajballav Dash, Gregory Palmer, Gentaro Hirota, Lacky Shah, Jack Choquette, Emmett Kilgariff, Sriharsha Niverty, Milton Lei, Shirish Gadre, Omkar Paranjape, Lei Yang, Rouslan Dimitrov
  • Patent number: 11354126
    Abstract: Data processing apparatus comprises vector processing circuitry to selectively apply vector processing operations defined by vector processing instructions to generate one or more data elements of a data vector comprising a plurality of data elements at respective data element positions of the data vector, according to the state of respective predicate flags associated with the positions of the data vector; and generator circuitry to generate instruction sample data indicative of processing activities of the vector processing circuitry for selected ones of the vector processing instructions, instruction sample data indicating at least the state of the predicate flags at execution of the selected vector processing instructions.
    Type: Grant
    Filed: February 15, 2019
    Date of Patent: June 7, 2022
    Assignee: Arm Limited
    Inventors: Michael John Williams, Nigel John Stephens
  • Patent number: 11219426
    Abstract: A method and system for determining an irradiation dose. The method for determining the irradiation dose includes determining a pixel having a biological feature in a region of interest in a radiotherapy simulated locating image by using a retrospective label; extracting a local radiomics feature based on the pixel having the biological feature, in which the local radiomics feature includes a grayscale histogram intensity, a tumor shape feature, a textural feature, a Laplacian of Gaussian filtering feature, and a wavelet feature; acquiring the local radiomics features to be measured; identifying a positive region having the local radiomics features to be measured based on the local radiomics features; performing three-dimensional reconstruction for the peripheral boundary of the positive region to determine a three-dimensional image.
    Type: Grant
    Filed: July 18, 2018
    Date of Patent: January 11, 2022
    Assignees: Tumor Hospital of Shandong First Medical University (Shandong Cancer Hospital and Institute
    Inventors: Jian Zhu, Zhen Hou, Zhenjiang Li, Haining Yu, Tong Bai, Yong Yin, Baosheng Li
  • Patent number: 11141127
    Abstract: A method and system for determining an irradiation dose. The method for determining the irradiation dose includes determining a pixel having a biological feature in a region of interest in a radiotherapy simulated locating image by using a retrospective label; extracting a local radiomics feature based on the pixel having the biological feature, in which the local radiomics feature includes a grayscale histogram intensity, a tumor shape feature, a textural feature, a Laplacian of Gaussian filtering feature, and a wavelet feature; acquiring the local radiomics features to be measured; identifying a positive region having the local radiomics features to be measured based on the local radiomics features; performing three-dimensional reconstruction for the peripheral boundary of the positive region to determine a three-dimensional image.
    Type: Grant
    Filed: July 18, 2018
    Date of Patent: October 12, 2021
    Assignees: Tumor Hospital of Shandong First Medical University
    Inventors: Jian Zhu, Zhen Hou, Zhenjiang Li, Haining Yu, Tong Bai, Yong Yin, Baosheng Li
  • Patent number: 11126430
    Abstract: A vector processor includes a grouping memory functional unit coupled to grouping memory having multiple bins. The vector processor also includes a bitformatting functional unit that performs bit-level data arrangements using any suitable technique or network, such as a Benes network. The vector processor receives and reads an input vector of data that includes portions (e.g., bits) of multiple data streams, and writes each portion corresponding to a respective data stream to a respective bin in parallel using the bitformatting functional unit to align the data. The vector processor also or alternatively receives and reads multiple outgoing data streams, writes portions of the data streams in respective bins of the grouping memory, and intersperses the portions in an outgoing vector of data in parallel, using the bitformatting functional unit to align the data.
    Type: Grant
    Filed: December 27, 2019
    Date of Patent: September 21, 2021
    Assignee: Intel Corporation
    Inventors: Parakalan Venkataraghavan, Thomas W. Smith, Silpa Naidu Chirumavilla, Ravi Shekhar
  • Patent number: 11042378
    Abstract: Data processing apparatus comprises processing circuitry to selectively apply a vector processing operation to data items at positions within data vectors according to the states of a set of respective predicate flags associated with the positions, the data vectors having a data vector processing order, each data vector comprising a plurality of data items having a data item order, the processing circuitry comprising: instruction decoder circuitry to decode program instructions; and instruction processing circuitry to execute instructions decoded by the instruction decoder circuitry; wherein the instruction decoder circuitry is responsive to a propagation instruction to control the instruction processing circuitry to derive a set of predicate flags applicable to a current data vector in dependence upon a set of predicate flags applicable to a preceding data vector in the data vector processing order, wherein when one or more last-most predicate flags of the set applicable to the preceding data vector are inac
    Type: Grant
    Filed: July 28, 2016
    Date of Patent: June 22, 2021
    Assignee: ARM Limited
    Inventors: Nigel John Stephens, Mbou Eyole, Alejandro Martinez Vicente
  • Patent number: 11003449
    Abstract: A swizzle pattern generator is provided to reduce an overhead due to execution of a swizzle instruction in vector processing. The swizzle pattern generator is configured to provide swizzle patterns with respect to data sets of at least one vector register or vector processing unit. The swizzle pattern generator may be reconfigurable to generate various swizzle patterns for different vector operations.
    Type: Grant
    Filed: January 24, 2019
    Date of Patent: May 11, 2021
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Moo-Kyoung Chung, Woong Seo, Ho-Young Kim, Soo-Jung Ryu, Dong-Hoon Yoo, Jin-Seok Lee, Yeon-Gon Cho, Chang-Moo Kim, Seung-Hun Jin
  • Patent number: 10990396
    Abstract: Disclosed embodiments relate to systems for performing instructions to quickly convert and use matrices (tiles) as one-dimensional vectors. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode, locations of a two-dimensional (2D) matrix and a one-dimensional (1D) vector, and a group of elements comprising one of a row, part of a row, multiple rows, a column, part of a column, multiple columns, and a rectangular sub-tile of the specified 2D matrix, and wherein the opcode is to indicate a move of the specified group between the 2D matrix and the 1D vector, decode circuitry to decode the fetched instruction; and execution circuitry, responsive to the decoded instruction, when the opcode specifies a move from 1D, to move contents of the specified 1D vector to the specified group of elements.
    Type: Grant
    Filed: September 27, 2018
    Date of Patent: April 27, 2021
    Assignee: Intel Corporation
    Inventors: Bret Toll, Christopher J. Hughes, Dan Baum, Elmoustapha Ould-Ahmed-Vall, Raanan Sade, Robert Valentine, Mark J. Charney, Alexander F. Heinecke
  • Patent number: 10942847
    Abstract: Technologies for efficiently performing scatter-gather operations include a device with circuitry configured to associate, with a template identifier, a set of non-contiguous memory locations of a memory having a cross point architecture. The circuitry is additionally configured to access, in response to a request that identifies the non-contiguous memory locations by the template identifier, the memory locations.
    Type: Grant
    Filed: December 18, 2018
    Date of Patent: March 9, 2021
    Assignee: Intel Corporation
    Inventors: Jawad B. Khan, Richard Coulson
  • Patent number: 10936315
    Abstract: In a method of operating a computer system, an instruction loop is executed by a processor in which each iteration of the instruction loop accesses a current data vector and an associated current vector predicate. The instruction loop is repeated when the current vector predicate indicates the current data vector contains at least one valid data element and the instruction loop is exited when the current vector predicate indicates the current data vector contains no valid data elements.
    Type: Grant
    Filed: December 31, 2018
    Date of Patent: March 2, 2021
    Assignee: Texas Instruments Incorporated
    Inventors: Duc Quang Bui, Joseph Raymond Michael Zbiciak
  • Patent number: 10929133
    Abstract: Systems, methods, and apparatuses relating to element sorting of vectors are described. In one embodiment, a processor includes a decoder to decode an instruction into a decoded instruction; and an execution unit to execute the decoded instruction to: provide storage for a comparison matrix to store a comparison value for each element of an input vector compared against the other elements of the input vector, perform a comparison operation on elements of the input vector corresponding to storage of comparison values above a main diagonal of the comparison matrix, perform a different operation on elements of the input vector corresponding to storage of comparison values below the main diagonal of the comparison matrix, and store results of the comparison operation and the different operation in the comparison matrix.
    Type: Grant
    Filed: January 16, 2019
    Date of Patent: February 23, 2021
    Assignee: Intel Corporation
    Inventors: Mikhail Plotnikov, Igor Ermolaev
  • Patent number: 10896042
    Abstract: Examples of the present disclosure provide apparatuses and methods for determining a vector population count in a memory. An example method comprises determining, using sensing circuitry, a vector population count of a number of fixed length elements of a vector stored in a memory array.
    Type: Grant
    Filed: December 3, 2018
    Date of Patent: January 19, 2021
    Assignee: Micron Technology, Inc.
    Inventor: Sanjay Tiwari
  • Patent number: 10877925
    Abstract: A vector processor with a vector first and multi-lane configuration. A vector operation for a vector processor can include a single vector or multiple vectors as input. Multiple lanes for the input can be used to accelerate the operation in parallel. And, a vector first configuration can enhance the multiple lanes by reducing the number of elements accessed in the lanes to perform the operation in parallel.
    Type: Grant
    Filed: March 18, 2019
    Date of Patent: December 29, 2020
    Assignee: Micron Technology, Inc.
    Inventor: Steven Jeffrey Wallach
  • Patent number: 10853043
    Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed to improve loop optimization with predictable recurring memory reads (PRMRs). An example apparatus includes an optimizer including an optimization scenario manager to generate an optimization plan associated with a loop and corresponding optimization parameters, the optimization plan including a set of one or more optimizations, an optimization scenario analyzer to identify the optimization plan as a candidate optimization plan when a quantity of PRMRs included in the loop is greater than a threshold, and a parameter calculator to determine the optimization parameters based on the candidate optimization plan, and a code generator to generate instructions to be executed by a processor, the instructions based on processing the loop with the one or more optimizations included in the candidate optimization plan.
    Type: Grant
    Filed: September 11, 2018
    Date of Patent: December 1, 2020
    Assignee: INTEL CORPORATION
    Inventors: Diego Luis Caballero de Gea, Hideki Ido, Eric N. Garcia
  • Patent number: 10817291
    Abstract: Systems, methods, and apparatuses relating to swizzle operations and disable operations in a configurable spatial accelerator (CSA) are described. Certain embodiments herein provide for an encoding system for a specific set of swizzle primitives across a plurality of packed data elements in a CSA.
    Type: Grant
    Filed: March 30, 2019
    Date of Patent: October 27, 2020
    Assignee: Intel Corporation
    Inventors: Jesus Corbal, Rohan Sharma, Simon Steely, Jr., Chinmay Ashok, Kent D. Glossop, Dennis Bradford, Paul Caprioli, Louise Huot, Kermin ChoFleming, Barry Tannenbaum
  • Patent number: 10684860
    Abstract: This invention provides a high performance processor system and a method based on a common general purpose unit, it may be configured into a variety of different processor architectures; before the processor executes instructions, the instruction is filled into the instruction read buffer, which is directly accessed by the processor core, then instruction read buffer actively provides instructions to processor core to execute, achieving a high cache hit rate.
    Type: Grant
    Filed: July 6, 2018
    Date of Patent: June 16, 2020
    Assignee: SHANGHAI XINHAO MICROELECTRONICS CO. LTD.
    Inventor: Kenneth Chenghao Lin
  • Patent number: 10671583
    Abstract: Techniques for performing database operations using vectorized instructions are provided. In one technique, it is determined whether to perform a database operation using one or more vectorized instructions or without using any vectorized instructions. This determination may comprise estimating a first cost of performing the database operation using one or more vectorized instructions and estimating a second cost of performing the database operation without using any vectorized instructions. Multiple factors that may be used to determine which approach to follow, such as the number of data elements that may fit into a SIMD register, a number of vectorized instructions in the vectorized approach, a number of data movement instructions that involve moving data from a SIMD register to a non-SIMD register and/or vice versa, a size of a cache, and a projected size of a hash table.
    Type: Grant
    Filed: August 24, 2017
    Date of Patent: June 2, 2020
    Assignee: Oracle International Corporation
    Inventors: Rajkumar Sen, Sam Idicula, Nipun Agarwal
  • Patent number: 10572409
    Abstract: A memory arrangement can store a matrix of matrix data elements specified as index-value pairs that indicate row and column indices and associated values. First split-and-merge circuitry is coupled between the memory arrangement and a first set of FIFO buffers for reading the matrix data elements from the memory arrangement and putting the matrix data elements in the first set of FIFO buffers based on column indices. A pairing circuit is configured to read vector data elements, pair the vector data elements with the matrix data elements, and put the paired matrix and vector data elements in a second set of FIFO buffers based on column indices. Second split-and-merge circuitry is configured to read paired matrix and vector data elements from the second set of FIFO buffers and put the paired matrix and vector data elements in a third set of FIFO buffers based on row indices.
    Type: Grant
    Filed: May 10, 2018
    Date of Patent: February 25, 2020
    Assignee: XILINX, INC.
    Inventors: Jindrich Zejda, Ling Liu, Yifei Zhou, Ashish Sirasao
  • Patent number: 10564964
    Abstract: Systems and methods are provided for executing an instruction. The method may include loading a first vector into a first location, the first vector including a plurality of first data elements and loading a second vector into a second location, the second vector including a plurality of second data elements. The method may further include comparing the plurality of first data elements of the first vector to the plurality of data elements of the second vector and performing one or more operations on the plurality of first and second data elements based on at least one vector cross-compare instruction. The one or more operations include counting a number of data elements of the plurality of first and second data elements that satisfy at least one condition, counting a number of times specified values occur in the plurality of first and second data elements, and generating sequence counts for duplicated values.
    Type: Grant
    Filed: August 23, 2016
    Date of Patent: February 18, 2020
    Assignee: International Business Machines Corporation
    Inventors: Jeffrey H. Derby, Robert K. Montoye, Dheeraj Sreedhar
  • Patent number: 10459843
    Abstract: A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements. A steam head register stores data elements next to be supplied to functional units for use as operands. An element duplication unit optionally duplicates data element an instruction specified number of times. A vector masking unit limits data elements received from the element duplication unit to least significant bits within an instruction specified vector length. If the vector length is less than a stream head register size, the vector masking unit stores all 0's in excess lanes of the stream head register (group duplication disabled) or stores duplicate copies of the least significant bits in excess lanes of the stream head register.
    Type: Grant
    Filed: December 30, 2016
    Date of Patent: October 29, 2019
    Assignee: TEXAS INSTRUMENTS INCORPORATED
    Inventor: Joseph Zbiciak
  • Patent number: 10394891
    Abstract: A novel distributed graph database is provided that is designed for efficient graph data storage and processing on modern computing architectures. In particular a single node graph database and a runtime & communication layer allows for composing a distributed graph database from multiple single node instances.
    Type: Grant
    Filed: August 5, 2016
    Date of Patent: August 27, 2019
    Assignee: International Business Machines Corporation
    Inventors: Chun-Fu Chen, Jason L. Crawford, Ching-Yung Lin, Jie Lu, Mark R. Nutter, Toyotaro Suzumura, Ilie G. Tanase, Danny L. Yeh
  • Patent number: 10235398
    Abstract: An object of the present invention is to efficiently perform a data load process or a data store process between a memory and a storage unit in a processor. The processor includes: a plurality of storage units associated with a plurality of data elements included in a data set; and a control unit that reads the plurality of data elements stored in adjacent storage areas from a memory, in which a plurality of the data sets is stored, collectively for respective data sets, sorts the respective read data elements to a storage unit corresponding to the data element among the plurality of storage units, and writes the data elements to the respective data sets.
    Type: Grant
    Filed: April 23, 2015
    Date of Patent: March 19, 2019
    Assignee: Renesas Electronics Corporation
    Inventor: Masayuki Kimura
  • Patent number: 10026458
    Abstract: Memories and methods for performing an atomic memory operation are disclosed, including a memory having a memory store, operation logic, and a command decoder. Operation logic can be configured to receive data and perform operations thereon in accordance with internal control signals. A command decoder can be configured to receive command packets having at least a memory command portion in which a memory command is provided and data configuration portion in which configuration information related to data associated with a command packet is provided. The command decoder is further configured to generate a command control signal based at least in part on the memory command and further configured to generate control signal based at least in part on the configuration information.
    Type: Grant
    Filed: October 21, 2010
    Date of Patent: July 17, 2018
    Assignee: Micron Technology, Inc.
    Inventor: David Resnick
  • Patent number: 9766858
    Abstract: A data processing system supports vector operands with components representing different bit significance portions of an integer number. Processing circuitry performs a processing operation specified by a program instruction in dependence upon a number of components comprising the vector as specified by metadata for the vector.
    Type: Grant
    Filed: December 24, 2014
    Date of Patent: September 19, 2017
    Assignee: ARM Limited
    Inventors: David Raymond Lutz, Neil Burgess, Christopher Neal Hinds
  • Patent number: 9696994
    Abstract: A data processing apparatus includes a comparison unit configured to perform an element comparison process performing a comparison of a first data element at a first index in the first vector with a second data element at a second index in the second vector. A hazard vector generation unit is configured to populate a hazard vector at an index determined by the first index with a value determined by the second index. The comparison unit performs the element comparison process by iteratively comparing data elements of the first vector with each element of a subset of the second vector. It then determines the subset of the second vector as those data elements at indices in the second vector which are less than a current index of the first vector and which are greater than previously determined values of the second index for which the match condition was true.
    Type: Grant
    Filed: December 23, 2011
    Date of Patent: July 4, 2017
    Assignee: ARM LIMITED
    Inventor: Alastair David Reid
  • Patent number: 9411842
    Abstract: Techniques for performing database operations using vectorized instructions are provided. In one technique, it is determined whether to perform a database operation using one or more vectorized instructions or without using any vectorized instructions. This determination may comprise estimating a first cost of performing the database operation using one or more vectorized instructions and estimating a second cost of performing the database operation without using any vectorized instructions. Multiple factors that may be used to determine which approach to follow, such as the number of data elements that may fit into a SIMD register, a number of vectorized instructions in the vectorized approach, a number of data movement instructions that involve moving data from a SIMD register to a non-SIMD register and/or vice versa, a size of a cache, and a projected size of a hash table.
    Type: Grant
    Filed: August 1, 2013
    Date of Patent: August 9, 2016
    Assignee: Oracle International Corporation
    Inventors: Rajkumar Sen, Sam Idicula, Nipun Agarwal
  • Patent number: 9335997
    Abstract: Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an operand vector, a selection vector, and a control vector are disclosed. The executed instructions may also cause the processor to perform a wrapping rotate previous operation dependent upon the input vectors.
    Type: Grant
    Filed: September 28, 2012
    Date of Patent: May 10, 2016
    Assignee: Apple Inc.
    Inventor: Jeffry E. Gonion
  • Patent number: 9218182
    Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor a data element shuffle and an operation on the shuffled data elements in response to a single data element shuffle and an operation instruction that includes a destination vector register operand, a first and second source vector register operands, an immediate value, and an opcode are described.
    Type: Grant
    Filed: June 29, 2012
    Date of Patent: December 22, 2015
    Assignee: Intel Corporation
    Inventors: Igor Ermolaev, Elmoustapha Ould-Ahmed-Vall, Bret Toll, Jesus Corbal, Andrey Naraikin
  • Patent number: 9210480
    Abstract: A video processing system includes a video encoder that encodes a video stream into an independent video layer stream and a first dependent video layer stream based on a motion vector data or grayscale and color data.
    Type: Grant
    Filed: December 20, 2007
    Date of Patent: December 8, 2015
    Assignee: BROADCOM CORPORATION
    Inventors: Stephen E. Gordon, Sherman Chen, Michael Dove, David Rosmann, Thomas J. Quigley, Jeyhan Karaoguz
  • Patent number: 9164767
    Abstract: In a vector processing device, a data dependence detecting unit detects a data dependence relation between a preceding instruction and a succeeding instruction which are inputted from an instruction buffer, and an instruction issuance control unit controls issuance of an instruction based on a detection result thereof. When there is a data dependence relation between the preceding instruction and the succeeding instruction, the instruction issuance control unit generates a new instruction equivalent to processing related to a vector register including the data dependence relation with the succeeding instruction in processing executed by the preceding instruction and issues the new instruction between the preceding instruction and the succeeding instruction, and thereby a data hazard can be avoided between the preceding instruction and the succeeding instruction without making a stall occur.
    Type: Grant
    Filed: December 13, 2012
    Date of Patent: October 20, 2015
    Assignee: SOCIONEXT INC.
    Inventor: Takashi Nishikawa
  • Patent number: 9160338
    Abstract: A method for implementing an adaptive interface between at least one FPGA with at least one FPGA application and at least one I/O module, which are designed as the corresponding sender side or receiver side, for connection to the FPGA, whereby a serial interface is formed between the at least one FPGA and the at least one I/O module, comprising the steps of configuring a maximum number of registers to be transmitted for each FPGA application, configuring a shared, fixed register width for all registers, setting an enable signal on the sender side for the registers to be transmitted out of the maximum number of registers to be transmitted, transmitting the enable signal from the sender side to the receiver side, and transmitting the registers, for which the enable signal is set, from the sender side to the receiver side.
    Type: Grant
    Filed: May 12, 2014
    Date of Patent: October 13, 2015
    Assignee: dSPACE digital signal processing and control engineering GmbH
    Inventors: Dirk Hasse, Robert Polnau
  • Patent number: 9092257
    Abstract: A method, circuit arrangement, and program product for executing instructions including denormal values for one or more operands in a vector execution unit. A denormal value operand may be prenormalized by a first processing lane of the vector execution unit upon detecting the denormal value. The prenormalized value and any other operands of the instruction may be communicated to a dot product adder of the vector execution unit. The dot product adder performs at least a portion of the floating point operation with the prenormalized value and any other operands of the instruction.
    Type: Grant
    Filed: March 11, 2013
    Date of Patent: July 28, 2015
    Assignee: International Business Machines Corporation
    Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
  • Patent number: 9092256
    Abstract: A method, circuit arrangement, and program product for executing instructions including denormal values for one or more operands in a vector execution unit. A denormal value operand may be prenormalized by a first processing lane of the vector execution unit upon detecting the denormal value. The prenormalized value and any other operands of the instruction may be communicated to a dot product adder of the vector execution unit. The dot product adder performs at least a portion of the floating point operation with the prenormalized value and any other operands of the instruction.
    Type: Grant
    Filed: December 6, 2012
    Date of Patent: July 28, 2015
    Assignee: International Business Machines Corporation
    Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
  • Patent number: 9047094
    Abstract: According to embodiments of the invention, there is disclosed a computer processor architecture; and in particular a computer processor, a method of operating the same, and a computer program product that makes use of an instruction set for the computer. In one embodiment according to the invention, there is provided a computer processor, the processor comprising: a decode unit for decoding instruction packets fetched from a memory holding a sequence of instruction packets; and first and second processing channels, each channel comprising a plurality of functional units, wherein the first processing channel is capable of performing control operations and comprises a control register file having a relatively narrower bit width, and the second processing channel is capable of performing data processing operations at least one input of which is a vector and comprises a data register file having a relatively wider bit width.
    Type: Grant
    Filed: March 31, 2004
    Date of Patent: June 2, 2015
    Assignee: Icera Inc.
    Inventor: Simon Knowles
  • Publication number: 20150149744
    Abstract: A data processing apparatus and method are provided for processing execution threads, where each execution thread specifies at least one instruction. The data processing apparatus has a vector processing unit providing a plurality M of lanes of parallel processing, within each lane the vector processing unit being configured to perform a processing operation on a data element input to that lane for each of one or more input operands. A vector instruction is received that is specified by a group of the execution threads, that vector instruction identifying an associated processing operation and also providing an indication of the data elements of each input operand that are to be subjected to that associated processing operation. Vector merge circuitry then determines, based on that information, a required number of lanes of parallel processing for performing the associated processing operation.
    Type: Application
    Filed: October 2, 2014
    Publication date: May 28, 2015
    Inventor: Ronny PEDERSEN
  • Publication number: 20150143078
    Abstract: Vector processing engines (VPEs) employing a tapped-delay line(s) for providing precision filter vector processing operations with reduced sample re-fetching and power consumption are disclosed. Related vector processor systems and methods are also disclosed. The VPEs are configured to provide filter vector processing operations. To minimize re-fetching of input vector data samples from memory to reduce power consumption, a tapped-delay line(s) is included in the data flow paths between a vector data file and execution units in the VPE. The tapped-delay line(s) is configured to receive and provide input vector data sample sets to execution units for performing filter vector processing operations. The tapped-delay line(s) is also configured to shift the input vector data sample set for filter delay taps and provide the shifted input vector data sample set to execution units, so the shifted input vector data sample set does not have to be re-fetched during filter vector processing operations.
    Type: Application
    Filed: November 15, 2013
    Publication date: May 21, 2015
    Applicant: QUALCOMM Incorporated
    Inventors: Raheel Khan, Fahad Ali Mujahid, Afshin Shiravi
  • Publication number: 20150143077
    Abstract: Vector processing engines (VPEs) employing merging circuitry in data flow paths between execution units and vector data memory to provide in-flight merging of output vector data stored to vector data memory are disclosed. Related vector processing instructions, systems, and methods are also disclosed. Merging circuitry is provided in data flow paths between execution units and vector data memory in the VPE. The merging circuitry is configured to merge an output vector data sample set from execution units as a result of performing vector processing operations in-flight while the output vector data sample set is being provided over the output data flow paths from the execution units to the vector data memory to be stored. The merged output vector data sample set is stored in a merged form in the vector data memory without requiring additional post-processing steps, which may delay subsequent vector processing operations to be performed in execution units.
    Type: Application
    Filed: November 15, 2013
    Publication date: May 21, 2015
    Applicant: QUALCOMM Incorporated
    Inventor: Raheel Khan
  • Publication number: 20150143080
    Abstract: Elements from a second operand are added together one-by-one to obtain a first result. The adding includes performing one or more end around carry add operations. The first result is placed in an element of a first operand of the instruction. After each addition of an element, a carry out of a chosen position of the sum, if any, is added to a selected position in an element of the first operand.
    Type: Application
    Filed: December 9, 2014
    Publication date: May 21, 2015
    Inventors: Jonathan D. Bradbury, Eric M. Schwarz
  • Publication number: 20150143076
    Abstract: Vector processing engines (VPEs) employing merging circuitry in data flow paths between execution units and vector data memory to provide in-flight merging of output vector data stored to vector data memory are disclosed. Related vector processing instructions, systems, and methods are also disclosed. Merging circuitry is provided in data flow paths between execution units and vector data memory in the VPE. The merging circuitry is configured to merge an output vector data sample set from execution units as a result of performing vector processing operations in-flight while the output vector data sample set is being provided over the output data flow paths from the execution units to the vector data memory to be stored. The merged output vector data sample set is stored in a merged form in the vector data memory without requiring additional post-processing steps, which may delay subsequent vector processing operations to be performed in execution units.
    Type: Application
    Filed: November 15, 2013
    Publication date: May 21, 2015
    Applicant: QUALCOMM Incorporated
    Inventor: Raheel Khan
  • Publication number: 20150143079
    Abstract: Vector processing engines (VPEs) employing a tapped-delay line(s) for providing precision correlation/covariance vector processing operations with reduced sample re-fetching and/or power consumption are disclosed. The VPEs disclosed herein are configured to provide correlation/covariance vector processing operations, such as code division multiple access (CDMA) correlation/covariance vector processing operations as a non-limiting example. A tapped-delay line(s) is included in the data flow paths between memory and execution units in the VPE. The tapped-delay line (s) is configured to receive and provide an input vector data sample set to execution units for performing correlation/covariance vector processing operations.
    Type: Application
    Filed: November 15, 2013
    Publication date: May 21, 2015
    Applicant: QUALCOMM Incorporated
    Inventors: Raheel Khan, Fahad Ali Mujahid, Afshin Shiravi
  • Publication number: 20150121036
    Abstract: A data processing system 2 includes a single instruction multiple data register file 12 and single instruction multiple processing circuitry 14. The single instruction multiple data processing circuitry 14 supports execution of cryptographic processing instructions for performing parts of a hash algorithm. The operands are stored within the single instruction multiple data register file 12. The cryptographic support instructions do not follow normal lane-based processing and generate output operands in which the different portions of the output operand depend upon multiple different elements within the input operand.
    Type: Application
    Filed: December 30, 2014
    Publication date: April 30, 2015
    Inventors: Matthew James HORSNELL, Richard Roy GRISENTHWAITE, Stuart David BILES, Daniel KERSHAW
  • Publication number: 20150113246
    Abstract: Instructions and logic provide conversions between a mask register and a general purpose register or memory. Some embodiments, responsive to an instruction specifying: a destination operand, a mask length corresponding to a number of mask data fields, and a source operand; values are read from data fields in the source operand, corresponding to the specified mask length, and stored to corresponding data fields in the destination operand specified by the instruction, wherein one of the source or the destination operands is a mask register. Values indicative of masked vector elements may be stored to any data fields in the destination operand other than the number of data fields corresponding to the specified mask length. For some embodiments, the other one of the source or the destination operands may be a general purpose register or a memory location.
    Type: Application
    Filed: November 25, 2011
    Publication date: April 23, 2015
    Applicant: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Jesus Corbal, Robert Valentine, Bret L. Toll, Mark J. Charney
  • Patent number: 9009447
    Abstract: A processor, method, and medium for using vector instructions to perform string comparisons. A single instruction compares the elements of two vectors and simultaneously checks for the null character. If an inequality or the null character is found, then the string comparison loop terminates, and a further check is performed to determine if the strings are equal. If all elements are equal and the null character is not found, then another iteration of the string comparison loop is executed. The vectors are loaded with the next portions of the strings, and then the next comparison is performed. The loop continues until either an inequality or the null character is found.
    Type: Grant
    Filed: July 18, 2011
    Date of Patent: April 14, 2015
    Assignee: Oracle International Corporation
    Inventor: Darryl J. Gove
  • Publication number: 20150100755
    Abstract: A data processing apparatus and a method of controlling performance of speculative vector operations are provided. The apparatus comprises processing circuitry for performing a sequence of speculative vector operations on vector operands, each vector operand comprising a plurality of vector elements, and speculation control circuitry for maintaining a speculation width indication indicating the number of vector elements of each vector operand to be subjected to the speculative vector operations. The speculation width indication is set to an initial value prior to performance of the sequence of speculative vector operations. The processing circuitry generates progress indications during performance of the sequence of speculative vector operations, and the speculation control circuitry detects, with reference to the progress indications and speculation reduction criteria, presence of a speculation reduction condition.
    Type: Application
    Filed: August 18, 2014
    Publication date: April 9, 2015
    Inventors: Alastair David REID, Daniel KERSHAW
  • Patent number: 8996845
    Abstract: A vector compare-and-exchange operation is performed by: decoding by a decoder in a processing device, a single instruction specifying a vector compare-and-exchange operation for a plurality of data elements between a first storage location, a second storage location, and a third storage location; issuing the single instruction for execution by an execution unit in the processing device; and responsive to the execution of the single instruction, comparing data elements from the first storage location to corresponding data elements in the second storage location; and responsive to determining a match exists, replacing the data elements from the first storage location with corresponding data elements from the third storage location.
    Type: Grant
    Filed: December 22, 2009
    Date of Patent: March 31, 2015
    Assignee: Intel Corporation
    Inventors: Ravi Rajwar, Andrew T. Forsyth
  • Publication number: 20150089190
    Abstract: In an embodiment, a processor includes a register attribute tracker configured to track one or more attributes corresponding to registers. The register attribute tracker may track the attributes associated with the registers when those registers are used as output registers of instructions that explicitly define the attributes and, if the register attribute tracker has a tracked attribute associated with an input register of an instruction that does not explicitly define the attribute, the register attribute tracker may annotate the instruction with an attribute and/or associate an attribute with the output register of the instruction in the register attribute tracker.
    Type: Application
    Filed: September 24, 2013
    Publication date: March 26, 2015
    Applicant: Apple Inc.
    Inventor: Jeffry E. Gonion
  • Publication number: 20150089191
    Abstract: In an embodiment, a processor includes an issue circuit configured to issue instruction operations for execution. The issue circuit may be configured to monitor the source operands of the instruction operations, and to issue instruction operations for which the source operands (including predicate operands, as appropriate) are resolved. Additionally, the issue circuit may be configured to detect a null predicate that indicates that none of the vector elements will be modified by a corresponding instruction operation. The issue circuit may be configured to issue the corresponding instruction operation with the null predicate even if other source operands are not yet resolved.
    Type: Application
    Filed: September 24, 2013
    Publication date: March 26, 2015
    Applicant: Apple Inc.
    Inventor: Jeffry E. Gonion