Scalar/vector Processor Interface Patents (Class 712/3)
  • Patent number: 11756618
    Abstract: A log structure is created in persistent memory using hardware support in memory controller or software supported with additional instructions. Writes to persistent memory locations are streamed to the log and written to their corresponding memory location in cache hierarchy. An added victim cache for persistent memory addresses catches cache evictions, which would corrupt open transactions. On the completion of a group of atomic persistent memory operations, the log is closed and the persistent values in the cache can be copied to their source persistent memory location and the log cleaned.
    Type: Grant
    Filed: June 28, 2021
    Date of Patent: September 12, 2023
    Inventors: Ellis Robinson Giles, Peter Joseph Varman
  • Patent number: 11740904
    Abstract: A vector friendly instruction format and execution thereof. According to one embodiment of the invention, a processor is configured to execute an instruction set. The instruction set includes a vector friendly instruction format. The vector friendly instruction format has a plurality of fields including a base operation field, a modifier field, an augmentation operation field, and a data element width field, wherein the first instruction format supports different versions of base operations and different augmentation operations through placement of different values in the base operation field, the modifier field, the alpha field, the beta field, and the data element width field, and wherein only one of the different values may be placed in each of the base operation field, the modifier field, the alpha field, the beta field, and the data element width field on each occurrence of an instruction in the first instruction format in instruction streams.
    Type: Grant
    Filed: November 11, 2021
    Date of Patent: August 29, 2023
    Assignee: Intel Corporation
    Inventors: Robert C. Valentine, Jesus Corbal San Adrian, Roger Espasa Sans, Robert D. Cavin, Bret L. Toll, Santiago Galan Duran, Jeffrey G. Wiedemeier, Sridhar Samudrala, Milind Baburao Girkar, Edward Thomas Grochowski, Jonathan Cannon Hall, Dennis R. Bradford, Elmoustapha Ould-Ahmed-Vall, James C Abel, Mark Charney, Seth Abraham, Suleyman Sair, Andrew Thomas Forsyth, Lisa Wu, Charles Yount
  • Patent number: 11663118
    Abstract: In some examples, a device includes a set of data storage elements, wherein each data storage element of the set of data storage elements is associated with a respective valid address vector, and wherein a bit flip in any bit of any of the valid address vectors leads to one of a set of invalid address vectors not associated with any of the set of data storage elements. The device also includes a decoder configured to receive a first address vector as part of a request and to check whether the first address vector corresponds to one of the valid address vectors or to one of the invalid address vectors. The decoder is also configured to select an associated data storage element in response to receiving the request and in response to determining that the first address vector corresponds to one of the valid address vectors.
    Type: Grant
    Filed: March 10, 2021
    Date of Patent: May 30, 2023
    Assignee: Infineon Technologies AG
    Inventor: Jens Barrenscheen
  • Patent number: 11494592
    Abstract: Systems, apparatuses, and methods for converting data to a tiling format when implementing convolutional neural networks are disclosed. A system includes at least a memory, a cache, a processor, and a plurality of compute units. The memory stores a first buffer and a second buffer in a linear format, where the first buffer stores convolutional filter data and the second buffer stores image data. The processor converts the first and second buffers from the linear format to third and fourth buffers, respectively, in a tiling format. The plurality of compute units load the tiling-formatted data from the third and fourth buffers in memory to the cache and then perform a convolutional filter operation on the tiling-formatted data. The system generates a classification of a first dataset based on a result of the convolutional filter operation.
    Type: Grant
    Filed: August 28, 2020
    Date of Patent: November 8, 2022
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Song Zhang, Jiantan Liu, Hua Zhang, Min Yu
  • Patent number: 11397624
    Abstract: A data processing system including a data processor which is operable to execute programs to perform data processing operations and in which execution threads executing a program to perform data processing operations may be grouped together into thread groups. The data processor comprises a cross-lane permutation circuit which is operable to perform processing for cross-lane instructions which require data to be permuted (copied or moved) between the threads of a thread group. The cross-lane permutation circuit has plural data lanes between which data may be permuted (moved or copied). The number of data lanes is fewer than the number of threads in a thread group.
    Type: Grant
    Filed: January 22, 2019
    Date of Patent: July 26, 2022
    Assignee: Arm Limited
    Inventors: Luka Dejanovic, Mladen Wilder
  • Patent number: 11340904
    Abstract: Disclosed herein are vector index registers in vector processors that each store multiple addresses for accessing multiple positions in vectors. It is known to use scalar index registers in vector processors to access multiple positions of vectors by changing the scalar index registers in vector operations. By using a vector indexing register for indexing positions of one or more operand vectors, the scalar index register can be replaced and at least the continual changing of the scalar index register can be avoided.
    Type: Grant
    Filed: May 20, 2019
    Date of Patent: May 24, 2022
    Assignee: Micron Technology, Inc.
    Inventor: Steven Jeffrey Wallach
  • Patent number: 11226821
    Abstract: A computer processor is provided that employs a plurality of operand storage elements that store operand data values and associated meta-data as unitary operand data elements as well as at least one functional unit that performs operations that produce and access the unitary operand data elements stored in the plurality of operand storage elements. The meta-data associated with a given operand data value as part of a unitary operand data element can specify type of the unitary operand data element (e.g., vector or scalar), elemental width and floating-point error flags. The meta-data can also be used to define special operand data values (e.g., Not-a-Result and None). The meta-data is useful in optimizing execution, such as in speculation and vectorized SIMD operations. The computer processor can also support a number of particular vector operations that are useful in optimizing execution of vectorized SIMD operations.
    Type: Grant
    Filed: September 10, 2019
    Date of Patent: January 18, 2022
    Assignee: Mill Computing, Inc.
    Inventors: Roger Rawson Godard, Arthur David Kahlich, David Arthur Yost, Sebastien Paul Maurice Mirolo
  • Patent number: 11188330
    Abstract: An apparatus comprises processing circuitry, a number of vector register and a number of scalar registers. An instruction decoder is provided which supports decoding of a vector multiply-add instruction specifying at least one vector register and at least one scalar register. In response to the vector multiply-add instruction, the decoder controls the processing circuitry to perform a vector multiply-add instruction in which each lane of processing generates a respective result data element corresponding to a sum of difference of a product value and an addend value, with the product value comprising the product of a respective data element of a first vector value and a multiplier value. In each lane of processing at least one of the multiplier value and the addend value is specified as a portion of a scalar value stored in a scalar register.
    Type: Grant
    Filed: August 14, 2017
    Date of Patent: November 30, 2021
    Assignee: ARM LIMITED
    Inventors: Thomas Christopher Grocutt, Fran├žois Christopher Jacques Botman
  • Patent number: 10922267
    Abstract: A computer processor is disclosed. The computer processor may comprise a vector unit comprising a vector register file comprising at least one register to hold a varying number of elements. The computer processor may further comprise processing logic configured to operate on the varying number of elements in the vector register file using one or more graphics processing instructions. The computer processor may be implemented as a monolithic integrated circuit.
    Type: Grant
    Filed: May 21, 2015
    Date of Patent: February 16, 2021
    Assignee: Optimum Semiconductor Technologies Inc.
    Inventors: Mayan Moudgill, Gary J. Nacer, C. John Glossner, Arthur Joseph Hoane, Vitaly Kalashnikov, Sitij Agrawal
  • Patent number: 10871549
    Abstract: An apparatus is disclosed for proximity detection using adaptive mutual coupling cancellation. In an example aspect, the apparatus includes at least two antennas, a wireless transceiver connected to the at least two antennas, and a mutual coupling cancellation module. The at least two antennas include a first antenna and a second antenna, which are mutually coupled electromagnetically. The second antenna includes two feed ports. The wireless transceiver is configured to transmit a radar transmit signal via the first antenna and receive two versions of a radar receive signal respectively via the two feed ports of the second antenna. The wireless transceiver is also configured to adjust a transmission parameter based on a decoupled signal. The transmission parameter varies based on a range to the object. The mutual coupling cancellation module is configured to generate the decoupled signal based on the two versions of the radar receive signal.
    Type: Grant
    Filed: May 18, 2018
    Date of Patent: December 22, 2020
    Assignee: QUALCOMM Incorporated
    Inventors: Roberto Rimini, Anant Gupta
  • Patent number: 10846259
    Abstract: A computer processor is disclosed. The computer processor comprise a vector unit comprising a vector register file comprising at least one vector register to hold a varying number of elements. The computer processor further comprises out-of-order issue logic that holds a pool of vector instructions, selects a vector instruction from the pool, and sends the vector instruction for execution. The vector instruction operates on the varying number of elements of the at least one vector register.
    Type: Grant
    Filed: May 21, 2015
    Date of Patent: November 24, 2020
    Assignee: Optimum Semiconductor Technologies Inc.
    Inventors: Mayan Moudgill, Gary J. Nacer, C. John Glossner, Arthur Joseph Hoane, Murugappan Senthilvelan, Pablo Balzola
  • Patent number: 10776124
    Abstract: Processing circuitry supports a first type of vector arithmetic instruction specifying at least a first input vector. When at least one exceptional condition is detected for an arithmetic operation performed for a first active data element of the first input vector in a predetermined sequence, the processing circuitry performs at least one response action. When the at least one exceptional condition is detected for a given active data element other than the first active data element in the predetermined sequence, the processing circuitry suppresses the at least one response action and stores elements identifying information identifying which data element is the given active data element which triggered the exceptional condition. This can be useful for reducing the amount of hardware resource for tracking the occurrence of the exceptional conditions and/or supporting speculative execution of vector instructions.
    Type: Grant
    Filed: September 14, 2016
    Date of Patent: September 15, 2020
    Assignee: ARM Limited
    Inventors: Giacomo Gabrielli, Nigel John Stephens
  • Patent number: 10713042
    Abstract: An arithmetic processing device includes, a memory that stores a first data and a second data, a plurality of arithmetic circuits, a first memory arranged for each of the arithmetic circuits and that stores a first predetermined row having the predetermined number of the first data stored in the memory, a second memory arranged for each of the arithmetic circuits and that stores a second predetermined row having a predetermined number of the second data stored in the memory, and a plurality of multiply-add arithmetic circuits arranged for each of the arithmetic circuits, a number of the multiply-add arithmetic circuits corresponding to the predetermined number, each of the multiply-add arithmetic circuits that obtains a third data by executing the operation using the first data and the second data based on a result of performing a row operation which is an operation of one row of the first data.
    Type: Grant
    Filed: June 22, 2018
    Date of Patent: July 14, 2020
    Assignee: FUJITSU LIMITED
    Inventor: Masahiro Kuramoto
  • Patent number: 10642622
    Abstract: Each of product-sum arithmetic units 501 to 503 acquires, from a register file 410, different pieces of first element data included in a first predetermined row of first data that forms a matrix; acquires, from a register file 420, same pieces of second element data included in a second predetermined row of second data that forms a matrix; performs a row portion operation that is an operation performed on the first data by an amount corresponding to a single row by performing a process of performing an operation using the acquired first element data and the second element data; and performs an operation by using the first data and the second data based on the result of the row portion operation.
    Type: Grant
    Filed: October 13, 2017
    Date of Patent: May 5, 2020
    Assignee: FUJITSU LIMITED
    Inventor: Masahiro Kuramoto
  • Patent number: 10599745
    Abstract: Aspects for vector operations in neural network are described herein. The aspects may include a vector caching unit configured to store a first vector and a second vector, wherein the first vector includes one or more first elements and the second vector includes one or more second elements. The aspects may further include one or more adders and a combiner. The one or more adders may be configured to respectively add each of the first elements to a corresponding one of the second elements to generate one or more addition results. The combiner may be configured to combine a combiner configured to combine the one or more addition results into an output vector.
    Type: Grant
    Filed: October 26, 2018
    Date of Patent: March 24, 2020
    Assignee: CAMBRICON TECHNOLOGIES CORPORATION LIMITED
    Inventors: Jinhua Tao, Tian Zhi, Shaoli Liu, Tianshi Chen, Yunji Chen
  • Patent number: 10592582
    Abstract: Aspects for vector operations in neural network are described herein. The aspects may include a vector caching unit configured to store a first vector and a second vector, wherein the first vector includes one or more first elements and the second vector includes one or more second elements. The aspects may further include one or more adders and a combiner. The one or more adders may be configured to respectively add each of the first elements to a corresponding one of the second elements to generate one or more addition results. The combiner may be configured to combine a combiner configured to combine the one or more addition results into an output vector.
    Type: Grant
    Filed: October 26, 2018
    Date of Patent: March 17, 2020
    Assignee: CAMBRICON TECHNOLOGIES CORPORATION LIMITED
    Inventors: Jinhua Tao, Tian Zhi, Shaoli Liu, Tianshi Chen, Yunji Chen
  • Patent number: 10585973
    Abstract: Aspects for vector operations in neural network are described herein. The aspects may include a vector caching unit configured to store a first vector and a second vector, wherein the first vector includes one or more first elements and the second vector includes one or more second elements. The aspects may further include one or more adders and a combiner. The one or more adders may be configured to respectively add each of the first elements to a corresponding one of the second elements to generate one or more addition results. The combiner may be configured to combine a combiner configured to combine the one or more addition results into an output vector.
    Type: Grant
    Filed: October 26, 2018
    Date of Patent: March 10, 2020
    Assignee: CAMBRICON TECHNOLOGIES CORPORATION LIMITED
    Inventors: Jinhua Tao, Tian Zhi, Shaoli Liu, Tianshi Chen, Yunji Chen
  • Patent number: 10496404
    Abstract: The present disclosure provides a data read-write scheduler and a reservation station for vector operations. The data read-write scheduler suspends the instruction execution by providing a read instruction cache module and a write instruction cache module and detecting conflict instructions based on the two modules. After the time is satisfied, instructions are re-executed, thereby solving the read-after-write conflict and the write-after-read conflict between instructions and guaranteeing that correct data are provided to a vector operations component. Therefore, the subject disclosure has more values for promotion and application.
    Type: Grant
    Filed: November 7, 2018
    Date of Patent: December 3, 2019
    Assignee: CAMBRICON TECHNOLOGIES CORPORATION LIMITED
    Inventors: Dong Han, Shaoli Liu, Yunji Chen, Tianshi Chen
  • Patent number: 10409596
    Abstract: Disclosed is an apparatus comprising: a plurality of memory banks; and a controller for generating a plurality of lookup tables storing data, needed for vector arithmetic operations, copied from data stored in the plurality of memory banks, and generating vector data by reading the data in the generated lookup tables.
    Type: Grant
    Filed: November 17, 2015
    Date of Patent: September 10, 2019
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Jung-uk Cho, Suk-jin Kim, Dong-kwan Suh
  • Patent number: 10361717
    Abstract: A first error-detecting code (EDC) is computed based on a first segment of a block of information that is to be encoded, and a second EDC is computed based on at least a second segment of the block of information. The first EDC is masked with a first masking segment and the second EDC with a second masking segment to generate a first masked EDC and a second masked EDC. The first masking segment and the second masking segment are associated with a target receiver of the block of information. A codeword is generated based on a code and an input vector that includes the first segment, the first masked EDC, the second segment, and the second masked EDC. This type of coding could be useful to support early termination of blind detection at a decoder, for example.
    Type: Grant
    Filed: June 1, 2017
    Date of Patent: July 23, 2019
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventors: Yiqun Ge, Ran Zhang, Nan Cheng, Wuxian Shi
  • Patent number: 10318293
    Abstract: A predication method for vector processors that minimizes the use of embedded predicate fields in most instructions by using separate condition code extensions. Dedicated predicate registers provide fine grain predication of vector instructions where each bit of a predicate register controls 8 bit of the vector data.
    Type: Grant
    Filed: July 9, 2014
    Date of Patent: June 11, 2019
    Assignee: TEXAS INSTRUMENTS INCORPORATED
    Inventors: Timothy Anderson, Duc Quang Bui, Joseph Zbiciak
  • Patent number: 10223002
    Abstract: A compare and swap transaction can be issued by a master device to request a processing unit to select whether to write a swap data value to a storage location corresponding to a target address in dependence on whether a compare data value matches a target data value read from the storage location. The compare and swap data values are transported within a data field of the compare and swap transaction. The compare data value is packed into a first region of the data field in dependence of an offset portion of the target address and having a position within the data field corresponding to the position of the target data value within the storage location. This reduces latency and circuitry required at the processing unit for handling the compare and swap transaction.
    Type: Grant
    Filed: February 8, 2017
    Date of Patent: March 5, 2019
    Assignee: ARM Limited
    Inventors: Phanindra Kumar Mannava, Bruce James Mathewson, Klas Magnus Bruce, Geoffray Matthieu Lacourba
  • Patent number: 10157061
    Abstract: According to one embodiment, an occurrence of an instruction is fetched. The instruction's format specifies its only source operand from a single vector write mask register, and specifies as its destination a single general purpose register. In addition, the instruction's format includes a first field whose contents selects the single vector write mask register, and includes a second field whose contents selects the single general purpose register. The source operand is a write mask including a plurality of one bit vector write mask elements that correspond to different multi-bit data element positions within architectural vector registers. The method also includes, responsive to executing the single occurrence of the single instruction, storing data in the single general purpose register such that its contents represent either a first or second scalar constant based on whether the plurality of one bit vector write mask elements in the source operand are all zero.
    Type: Grant
    Filed: December 22, 2011
    Date of Patent: December 18, 2018
    Assignee: Intel Corporation
    Inventors: Jesus Corbal, Matthew J. Craighead, Bret L. Toll, Andrew T. Forsyth
  • Patent number: 9990966
    Abstract: Examples of the present disclosure provide apparatuses and methods for simulating access lines in a memory. An example method can include receiving a first bit-vector and a second bit-vector in a format associated with storing the first bit-vector in memory cells coupled to a first access line and a first number of sense lines and storing the second bit-vector in memory cells coupled to a second access line and the first number of sense lines. The method can include storing the first bit-vector in a number of memory cells coupled to the first access line and a second number of sense lines and storing the second bit-vector in a number of memory cells coupled to the first access line and a third number of sense lines, wherein a quantity of the first number of sense lines is less than a quantity of the second and third number of sense lines.
    Type: Grant
    Filed: July 10, 2017
    Date of Patent: June 5, 2018
    Assignee: Micron Technology, Inc.
    Inventor: Jeremiah J. Willcock
  • Patent number: 9842046
    Abstract: A method of an aspect includes receiving an instruction indicating a first source packed memory indices, a second source packed data operation mask, and a destination storage location. Memory indices of the packed memory indices are compared with one another. One or more sets of duplicate memory indices are identified. Data corresponding to each set of duplicate memory indices is loaded only once. The loaded data corresponding to each set of duplicate memory indices is replicated for each of the duplicate memory indices in the set. A packed data result in the destination storage location in response to the instruction. The packed data result includes data elements from memory locations that are indicated by corresponding memory indices of the packed memory indices when not blocked by corresponding elements of the packed data operation mask.
    Type: Grant
    Filed: September 28, 2012
    Date of Patent: December 12, 2017
    Assignee: Intel Corporation
    Inventors: Andrew T. Forsyth, Dennis R. Bradford, Jonathan C. Hall
  • Patent number: 9830151
    Abstract: An apparatus and method for performing vector index loads and stores. For example, one embodiment of a processor comprises: a vector index register to store a plurality of index values; a mask register to store a plurality of mask bits; a vector register to store a plurality of vector data elements loaded from memory; and vector index load logic to identify an index stored in the vector index register to be used for a load operation using an immediate value and to responsively combine the index with a base memory address to determine a memory address for the load operation, the vector index load logic to load vector data elements from the memory address to the vector register in accordance with the plurality of mask bits.
    Type: Grant
    Filed: December 23, 2014
    Date of Patent: November 28, 2017
    Assignee: INTEL CORPORATION
    Inventors: Ashish Jha, Robert Valentine, Elmoustapha Ould-Ahmed-Vall
  • Patent number: 9740493
    Abstract: Loop vectorization methods and apparatus are disclosed. An example method includes generating a first control mask for a set of iterations of a loop by evaluating a condition of the loop, wherein generating the first control mask includes setting a bit of the control mask to a first value when the condition indicates that an operation of the loop is to be executed, and setting the bit of the first control mask to a second value when the condition indicates that the operation of the loop is to be bypassed. The example method also includes compressing indexes corresponding to the first set of iterations of the loop according to the first control mask.
    Type: Grant
    Filed: September 28, 2012
    Date of Patent: August 22, 2017
    Assignee: Intel Corporation
    Inventors: Christopher J. Hughes, Mikhail Plotnikov, Andrey Naraikin
  • Patent number: 9552208
    Abstract: A system, method, and computer program product are provided for remapping registers based on a change in execution mode. A sequence of instructions is received for execution by a processor and a change in an execution mode from a first execution mode to a second execution mode within the sequence of instructions is identified, where a first register mapping is associated with the first execution mode and a second register mapping is associated with the second execution mode. Data stored in a set of registers within a processor is reorganized based on the first register mapping and the second register mapping in response to the change in the execution mode.
    Type: Grant
    Filed: December 20, 2013
    Date of Patent: January 24, 2017
    Assignee: NVIDIA Corporation
    Inventors: Ben Hertzberg, Guillermo Juan Rozas, Alexander Christian Klaiber, Nickolas Andrew Fortino
  • Patent number: 9513917
    Abstract: A vector friendly instruction format and execution thereof. According to one embodiment of the invention, a processor is configured to execute an instruction set. The instruction set includes a vector friendly instruction format. The vector friendly instruction format has a plurality of fields including a base operation field, a modifier field, an augmentation operation field, and a data element width field, wherein the first instruction format supports different versions of base operations and different augmentation operations through placement of different values in the base operation field, the modifier field, the alpha field, the beta field, and the data element width field, and wherein only one of the different values may be placed in each of the base operation field, the modifier field, the alpha field, the beta field, and the data element width field on each occurrence of an instruction in the first instruction format in instruction streams.
    Type: Grant
    Filed: January 31, 2014
    Date of Patent: December 6, 2016
    Assignee: Intel Corporation
    Inventors: Robert C. Valentine, Jesus Corbal San Adrian, Roger Espasa Sans, Robert D. Cavin, Bret L. Toll, Santiago Galan Duran, Jeffrey G. Wiedemeier, Sridhar Samudrala, Milind Baburao Girkar, Edward Thomas Grochowski, Jonathan Cannon Hall, Dennis R. Bradford, Elmoustapha Ould-Ahmed-Vall, James C. Abel, Mark Charney, Seth Abraham, Suleyman Sair, Andrew Thomas Forsyth, Lisa Wu, Charles Yount
  • Patent number: 9330057
    Abstract: A reconfigurable processor includes a plurality of mini-cores and an external network to which the mini-cores are connected. Each of the mini-cores includes a first function unit including a first group of operation elements, a second function unit including a second group of operation elements that is different from the first group of operation elements, and an internal network to which the first function unit and the second function unit are connected.
    Type: Grant
    Filed: December 11, 2012
    Date of Patent: May 3, 2016
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Dong-Kwan Suh, Suk-Jin Kim, Hyeong-Seok Yu, Ki-Seok Kwon, Jae-Un Park
  • Patent number: 9124935
    Abstract: A television signals receiver for receives and stores television signals encoded at a variable data rate. Time information is generated based on the time of receipt of the signals that defines the duration of the television signals when output in decompressed form at a substantially constant data rate. The received signals are then written to a file on a hard disk (13) in received order together with the time information. The time information of signals stored in the file is monitored and old signals are deleted from the file such that the file stores signals corresponding to a predetermined period of time.
    Type: Grant
    Filed: November 13, 2002
    Date of Patent: September 1, 2015
    Assignee: BRITISH SKY BROADCASTING LTD.
    Inventors: Xavier Willame, Nigel Bodkin, Nicholas James, Ellen Fiona Collins, Benjamin Johnathan Freeman, Brian Francis Sullivan
  • Patent number: 9081561
    Abstract: The present invention relates to a method for improving execution performance of multiply-add instructions during compiling, comprising the following steps of: compiling a source code by a compiler to acquire internal representation; optimizing; generating a machine code on the basis of a target processor, and allocating a physical register to a pseudo-register in the machine code; and improving results of register allocation to multiply-accumulate instructions. The method for improving execution performance of multiply-add instructions during compiling provided by the present invention has the following advantages: the compiler is allowed to realize procedure optimization by acquiring the optimal MAC (multiply-accumulate) instruction use gain.
    Type: Grant
    Filed: April 21, 2014
    Date of Patent: July 14, 2015
    Assignee: SHENZHEN ZHONGWEIDIAN TECHNOLOGY LIMITED
    Inventor: Fred Chow
  • Patent number: 9081564
    Abstract: A data processing apparatus having processing circuitry, a scalar register bank and a vector register bank, including decoding circuitry arranged to decode a sequence of instructions to generate control signals for the processing circuitry. The decoding circuitry is responsive to a decode modifier instruction within the sequence of instructions to alter decoding of a subsequent scalar instruction in the sequence by mapping at least one scalar operand specified by the subsequent scalar instruction to at least one vector operand in the vector register bank, and, in dependence on the scalar operation specified by the subsequent scalar instruction, determining a vector operation to be performed on at least a subset of the operand elements within the at least the one vector operand. Such an approach enables a wide variety of vector operations to be specified without the need to individually define separate vector instructions for those vector operations.
    Type: Grant
    Filed: April 4, 2012
    Date of Patent: July 14, 2015
    Assignee: ARM Limited
    Inventor: Alastair David Reid
  • Patent number: 9064005
    Abstract: A computer-implemented system and method is disclosed for retrieving documents using context-dependant probabilistic modeling of words and documents. The present invention uses multiple overlapping vectors to represent each document. Each vector is centered on each of the words in the document and includes the local environment. The vectors are used to build probability models that are used for predictions of related documents and related keywords. The results of the statistical analysis are used for retrieving an indexed document, for extracting features from a document, or for finding a word within a document. The statistical evaluation is also used to evaluate the probability of relation between the key words appearing in the document and building a vocabulary of key words that are generally found together. The results of the analysis are stored in a repository. Searches of the data repository produce a list of related documents and a list of related terms.
    Type: Grant
    Filed: July 12, 2007
    Date of Patent: June 23, 2015
    Assignee: Nuance Communications, Inc.
    Inventor: Jan Magnus Stensmo
  • Publication number: 20150143073
    Abstract: A data processing system is described in which a plurality of data processing units 521 . . . 52N cooperate with one another in order to process incoming data packets or an incoming data stream. Tasks are managed using a task list which is accessible and updateable by each data processing unit.
    Type: Application
    Filed: January 20, 2015
    Publication date: May 21, 2015
    Applicant: BLUWIRELESS TECHNOLOGY LIMITED
    Inventors: Paul Winser, RAY MCCONNELL
  • Patent number: 9038073
    Abstract: Efficient data processing apparatus and methods include hardware components which are pre-programmed by software. Each hardware component triggers the other to complete its tasks. After the final pre-programmed hardware task is complete, the hardware component issues a software interrupt.
    Type: Grant
    Filed: August 13, 2009
    Date of Patent: May 19, 2015
    Assignee: QUALCOMM Incorporated
    Inventors: Mathias Kohlenz, Irfan Anwar Khan, Sathyanarayan Madhusudan, Shailesh Maheshwari, Srividhya Krishnamoorthy, Sandeep Urgaonkar, Thomas Klingenbrunn, Tim Tynghuei Liou, Idreas Mir
  • Publication number: 20150089187
    Abstract: A hazard check instruction has operands that specify addresses of vector elements to be read by first and second vector memory operations. The hazard check instruction outputs a dependency vector identifying, for each element position of the first vector corresponding to the first vector memory operation, which element position of the second vector that the element of the first vector depends on (if any). In an embodiment, at least one of the vector memory operations has addresses specified using a scalar address in the operands (and a vector attribute associated with the vector). In an embodiment, the operands may include predicates for one or both of the vector memory operations, indicating which vector elements are active. The dependency vector may be qualified by the predicates, indicating dependencies only for active elements.
    Type: Application
    Filed: September 24, 2013
    Publication date: March 26, 2015
    Applicant: APPLE INC.
    Inventor: Jeffry E. Gonion
  • Publication number: 20150089188
    Abstract: In an embodiment, a processor may implement a vector hazard check instruction to detect dependencies between vector memory operations based on the addresses of the vectors accessed by the vector memory operations. The addresses may be specified via a base address and a vector of indexes for each vector. In an embodiment, one of the base addresses may be an implied (or assumed) zero address, reducing the number of operands of the hazard check instruction.
    Type: Application
    Filed: September 24, 2013
    Publication date: March 26, 2015
    Applicant: Apple Inc.
    Inventors: Jeffry E. Gonion, Alexander C. Klaiber
  • Publication number: 20150067299
    Abstract: A hardware circuit component configured to support vector operations in a scalar data path. The hardware circuit component configured to operate in a vector mode configuration and in a scalar mode configuration. The hardware circuit component configured to split the scalar mode configuration into a left half and a right half of the vector mode configuration. The hardware circuit component configured to perform one or more bit shifts over one or more stages of interconnected multiplexers in the vector mode configuration. The hardware circuit component configured to include duplicated coarse shift multiplexers at bit positions that receive data from both the left half and the right half of the vector mode configuration, resulting in one or more coarse shift multiplexers sharing the bit position.
    Type: Application
    Filed: January 9, 2014
    Publication date: March 5, 2015
    Inventors: Maarten J. Boersma, Markus Kaltenbach, Christophe J. Layer, Silvia M. Mueller
  • Publication number: 20150067298
    Abstract: A hardware circuit component configured to support vector operations in a scalar data path. The hardware circuit component configured to operate in a vector mode configuration and in a scalar mode configuration. The hardware circuit component configured to split the scalar mode configuration into a left half and a right half of the vector mode configuration. The hardware circuit component configured to perform one or more bit shifts over one or more stages of interconnected multiplexers in the vector mode configuration. The hardware circuit component configured to include duplicated coarse shift multiplexers at bit positions that receive data from both the left half and the right half of the vector mode configuration, resulting in one or more coarse shift multiplexers sharing the bit position.
    Type: Application
    Filed: September 3, 2013
    Publication date: March 5, 2015
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Maarten J. Boersma, Markus Kaltenbach, Christophe J. Layer, Silvia M. Mueller
  • Publication number: 20150032990
    Abstract: Systems and methods for implementing a scalable very-large-scale integration (VLSI) architecture to perform compressive sensing (CS) hardware reconstruction for data signals in accordance with embodiments of the invention are disclosed. The VLSI architecture is optimized for CS signal reconstruction by implementing a reformulation of the orthogonal matching pursuit (OMP) process and utilizing architecture resource sharing techniques. Typically, the VLSI architecture is a CS reconstruction engine that includes a vector and scalar computation cores where the cores can be time-multiplexed (via dynamic configuration) to perform each task associated with OMP. The vector core includes configurable processing elements (PEs) connected in parallel. Further, the cores can be linked by data-path memories, where complex data flow of OMP can be customized utilizing local memory controllers synchronized by a top-level finite-state machine.
    Type: Application
    Filed: July 29, 2014
    Publication date: January 29, 2015
    Inventors: Dejan Markovic, Fengbo Ren
  • Publication number: 20150019835
    Abstract: A predication method for vector processors that minimizes the use of embedded predicate fields in most instructions by using separate condition code extensions. Dedicated predicate registers provide fine grain predication of vector instructions where each bit of a predicate register controls 8 bit of the vector data.
    Type: Application
    Filed: July 9, 2014
    Publication date: January 15, 2015
    Inventors: Timothy Anderson, Duc Quang Bui, Joseph Zbiciak
  • Publication number: 20150019836
    Abstract: The number of registers required is reduced by overlapping scalar and vector registers. This also allows increased compiler flexibility when mixing scalar and vector instructions. Local register read ports are minimized by restricting read access. Dedicated predicate registers reduces requirements for general registers, and allows reduction of critical timing paths by allowing the predicate registers to be placed next to the predicate unit.
    Type: Application
    Filed: July 9, 2014
    Publication date: January 15, 2015
    Inventors: Timothy David Anderson, Duc Quang Bui, Mel Alan Phipps, Todd T. Hahn, Joseph Zbiciak
  • Publication number: 20150012723
    Abstract: A mini-core and a processor using such a mini-core are provided in which functional units of the mini-core are divided into a scalar domain processor and a vector domain processor. The processor includes at least one such mini-core, and all or a portion of functional units from among the functional units of the mini-core operate based on an operation mode.
    Type: Application
    Filed: July 7, 2014
    Publication date: January 8, 2015
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Young Hwan PARK, Keshava PRASAD, Ho YANG, Yeon Bok LEE
  • Publication number: 20140359251
    Abstract: A signal processing device including: one or more vector processors configured to perform vector processing to a signal using a parameter, one or more scalar processors configured to perform scalar processing for generating the parameter, a first circuit coupled to the one or more vector processors and the one or more scalar processors and configured to transfer the parameter from the one or more scalar processors to the one or more vector processors, and a second circuit coupled to the one or more vector processors and another circuit that inputs the signal to the second circuit, and configured to transfer the signal among the one or more vector processors and the other circuit.
    Type: Application
    Filed: May 30, 2014
    Publication date: December 4, 2014
    Applicant: FUJITSU LIMITED
    Inventor: Noboru KOBAYASHI
  • Publication number: 20140359250
    Abstract: Methods and systems are provided for inferring types in a computer program. In one example, a method comprises: identifying a type of at least one expression of the computer program; and annotating the at least one expression in the computer program when the type of the at least one expression is at least one of a varying type and a uniform type.
    Type: Application
    Filed: May 28, 2013
    Publication date: December 4, 2014
    Applicant: ADVANCED MICRO DEVICES, INC.
    Inventor: Benedict R. Gaster
  • Publication number: 20140317376
    Abstract: A digital processor, such as a vector processor or a scalar processor, is provided having an instruction set with a complex angle function. A complex angle is evaluated for an input value, x, by obtaining one or more complex angle software instructions having the input value, x, as an input; in response to at least one of the complex angle software instructions, performing the following steps: invoking at least one complex angle functional unit that implements the one or more complex angle software instructions to apply the complex angle function to the input value, x; and generating an output corresponding to the complex angle of the input value, x, using one or more multipliers of a Multiply Accumulate (MAC) unit of the digital processor, wherein the complex angle software instruction is part of an instruction set of the digital signal processor. Multiplication operations optionally employ one or more multipliers of the MAC unit of the digital processor.
    Type: Application
    Filed: April 17, 2014
    Publication date: October 23, 2014
    Applicant: LSI Corporation
    Inventor: Kameran Azadet
  • Publication number: 20140297992
    Abstract: An apparatus and method for generating vector code are provided. The apparatus and method generate vector code using scalar-type kernel code, without user's changing a code type or modifying data layout, thereby enhancing user's convenience of use and retaining the portability of OpenCL.
    Type: Application
    Filed: March 28, 2014
    Publication date: October 2, 2014
    Applicants: Seoul National University R&DB Foundation, Samsung Electronics Co., Ltd.
    Inventors: Jin-Seok LEE, Seong-Gun KIM, Dong-Hoon YOO, Seok-Joong HWANG, Jeongho NAH, Jaejin LEE, Jun LEE
  • Publication number: 20140297991
    Abstract: According to one embodiment, an occurrence of an instruction is fetched. The instruction's format specifies its only source operand from a single vector write mask register, and specifies as its destination a single general purpose register. In addition, the instruction's format includes a first field whose contents selects the single vector write mask register, and includes a second field whose contents selects the single general purpose register. The source operand is a write mask including a plurality of one bit vector write mask elements that correspond to different multi-bit data element positions within architectural vector registers. The method also includes, responsive to executing the single occurrence of the single instruction, storing data in the single general purpose register such that its contents represent either a first or second scalar constant based on whether the plurality of one bit vector write mask elements in the source operand are all zero.
    Type: Application
    Filed: December 22, 2011
    Publication date: October 2, 2014
    Inventors: Jesus Corbal, Matthew J. Craighead, Bret L. Toll, Andrew T. Forsyth
  • Patent number: 8832412
    Abstract: Various methods and systems are provided for processing units that may be scaled. In one embodiment, a processing unit includes a plurality of scalar processing units and a vector processing unit in communication with each of the plurality of scalar processing units. The vector processing unit is configured to coordinate execution of instructions received from the plurality of scalar processing units. In another embodiment, a scalar instruction packet including a pre-fix instruction and a vector instruction packet including a vector instruction is obtained. Execution of the vector instruction may be modified by the pre-fix instruction in a processing unit including a vector processing unit. In another embodiment, a scalar instruction packet including a plurality of partitions is obtained. The location of the partitions is determined based upon a partition indicator included in the scalar instruction packet and a scalar instruction included in a partition is executed by a processing unit.
    Type: Grant
    Filed: September 20, 2011
    Date of Patent: September 9, 2014
    Assignee: Broadcom Corporation
    Inventors: Neil Bailey, Eben Upton