Matrix Array Patents (Class 708/514)
  • Patent number: 11741345
    Abstract: Provided are systems, methods, and integrated circuits for a neural network processing system. In various implementations, the system can include a first array of processing engines coupled to a first set of memory banks and a second array of processing engines coupled to a second set of memory banks. The first and second set of memory banks be storing all the weight values for a neural network, where the weight values are stored before any input data is received. Upon receiving input data, the system performs a task defined for the neural network. Performing the task can include computing an intermediate result using the first array of processing engines, copying the intermediate result to the second set of memory banks, and computing a final result using the second array of processing engines, where the final result corresponds to an outcome of performing the task.
    Type: Grant
    Filed: September 25, 2020
    Date of Patent: August 29, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Randy Huang, Ron Diamant
  • Patent number: 11704124
    Abstract: Disclosed embodiments relate to executing a vector multiplication instruction. In one example, a processor includes fetch circuitry to fetch the vector multiplication instruction having fields for an opcode, first and second source identifiers, and a destination identifier, decode circuitry to decode the fetched instruction, execution circuitry to, on each of a plurality of corresponding pairs of fixed-sized elements of the identified first and second sources, execute the decoded instruction to generate a double-sized product of each pair of fixed-sized elements, the double-sized product being represented by at least twice a number of bits of the fixed size, and generate an unsigned fixed-sized result by rounding the most significant fixed-sized portion of the double-sized product to fit into the identified destination.
    Type: Grant
    Filed: January 11, 2022
    Date of Patent: July 18, 2023
    Assignee: Intel Corporation
    Inventors: Venkateswara R. Madduri, Carl Murray, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney, Robert Valentine, Jesus Corbal
  • Patent number: 11120101
    Abstract: The present disclosure advantageously provides a system method for efficiently multiplying matrices with elements that have a value of 0. A bitmap is generated for each matrix. Each bitmap includes a bit position for each matrix element. The value of each bit is set to 0 when the value of the corresponding matrix element is 0, and to 1 when the value of the corresponding matrix element is not 0. Each matrix is compressed into a compressed matrix, which will have fewer elements with a value of 0 than the original matrix. Each bitmap is then adjusted based on the corresponding compressed matrix. The compressed matrices are then multiplied to generate an output matrix. For each element i,j in the output matrix, a dot product of the ith row of the first compressed matrix and the jth column of the second compressed matrix is calculated based on the bitmaps.
    Type: Grant
    Filed: September 27, 2019
    Date of Patent: September 14, 2021
    Assignee: Arm Limited
    Inventors: Zhi-Gang Liu, Matthew Mattina, Paul Nicholas Whatmough
  • Patent number: 11010130
    Abstract: The present invention discloses a floating point processor prototype of multi-channel data. An architecture comprises the following steps: arranging structural data, semi-structured data and unstructured data into a three-way array; decomposing the three-way array into a matrix pattern of a second-order tensor by using higher-order singular value decomposition; and converting the matrix pattern into a sparse domain to conduct block floating point quantization. A floating point processor prototype of multi-channel data is built.
    Type: Grant
    Filed: August 9, 2017
    Date of Patent: May 18, 2021
    Assignee: Shanghai DataCenter Science Co., LTD
    Inventors: Jun Zhang, Ke Xu, Xiaofeng Chen
  • Patent number: 10970201
    Abstract: A system, apparatus and method for utilizing a transpose function to generate a two-dimensional array from three-dimensional input data. The use of the transpose function reduces redundant elements in the resultant two-dimensional array thereby increasing efficiency and decreasing power consumption.
    Type: Grant
    Filed: October 24, 2018
    Date of Patent: April 6, 2021
    Assignee: Arm Limited
    Inventor: Paul Nicholas Whatmough
  • Patent number: 10831862
    Abstract: Methods, systems, and apparatus for performing a matrix multiplication using a hardware circuit are described. An example method begins by obtaining an input activation value and a weight input value in a first floating point format. The input activation value and the weight input value are multiplied to generate a product value in a second floating point format that has higher precision than the first floating point format. A partial sum value is obtained in a third floating point format that has a higher precision than the first floating point format. The partial sum value and the product value are combined to generate an updated partial sum value that has the third floating point format.
    Type: Grant
    Filed: March 20, 2020
    Date of Patent: November 10, 2020
    Assignee: Google LLC
    Inventors: Andrew Everett Phelps, Norman Paul Jouppi
  • Patent number: 10803379
    Abstract: Provided are systems, methods, and integrated circuits for a neural network processing system. In various implementations, the system can include a first array of processing engines coupled to a first set of memory banks and a second array of processing engines coupled to a second set of memory banks. The first and second set of memory banks be storing all the weight values for a neural network, where the weight values are stored before any input data is received. Upon receiving input data, the system performs a task defined for the neural network. Performing the task can include computing an intermediate result using the first array of processing engines, copying the intermediate result to the second set of memory banks, and computing a final result using the second array of processing engines, where the final result corresponds to an outcome of performing the task.
    Type: Grant
    Filed: December 12, 2017
    Date of Patent: October 13, 2020
    Assignee: Amazon Technologies, Inc.
    Inventors: Randy Huang, Ron Diamant
  • Patent number: 10768899
    Abstract: A configurable circuit configurable according to the data width of elements of a matrix is described that includes a memory array, logic to write a matrix to the memory array having elements with a data width which can be specified using configuration data, logic for a transpose read of the matrix as-written and logic for normal read of the matrix as-written. The memory array includes first and second read ports operable in parallel. Transpose read logic and normal read logic can be coupled to the first and second read ports, respectively, allowing transpose and normal read of a matrix simultaneously.
    Type: Grant
    Filed: January 29, 2019
    Date of Patent: September 8, 2020
    Assignee: SambaNova Systems, Inc.
    Inventors: David Alan Koeplinger, Raghu Prabhakar, Ram Sivaramakrishnan, David Brian Jackson, Mark Luttrell
  • Patent number: 10769238
    Abstract: Techniques facilitating matrix multiplication on a systolic array are provided. A computer-implemented method can comprise populating, by a system operatively coupled to a processor, respective first registers of one or more processing elements of a systolic array structure with respective input data bits of a first data matrix. The one or more processing elements can comprise a first processing element that comprises a first input data bit of the first data matrix and a first activation bit of a second data matrix. The method can also include determining, by the system, at the first processing element, a first partial sum of a third data matrix. Further, the method can include streaming, by the system, the first partial sum of the third data matrix from the first processing element.
    Type: Grant
    Filed: September 19, 2019
    Date of Patent: September 8, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Victor Han, Vijayalakshmi Srinivasan, Jintao Zhang
  • Patent number: 10621269
    Abstract: Methods, systems, and apparatus for performing a matrix multiplication using a hardware circuit are described. An example method begins by obtaining an input activation value and a weight input value in a first floating point format. The input activation value and the weight input value are multiplied to generate a product value in a second floating point format that has higher precision than the first floating point format. A partial sum value is obtained in a third floating point format that has a higher precision than the first floating point format. The partial sum value and the product value are combined to generate an updated partial sum value that has the third floating point format.
    Type: Grant
    Filed: May 17, 2018
    Date of Patent: April 14, 2020
    Assignee: Google LLC
    Inventors: Andrew Everett Phelps, Norman Paul Jouppi
  • Patent number: 10452744
    Abstract: Techniques related to memory management for sparse matrix multiplication are disclosed. Computing device(s) may perform a method for multiplying a row of a first sparse matrix with a second sparse matrix to generate a product matrix row. A compressed representation of the second sparse matrix is stored in main memory. The compressed representation comprises a values array that stores non-zero value(s). Tile(s) corresponding to row(s) of second sparse matrix are loaded into scratchpad memory. The tile(s) comprise set(s) of non-zero value(s) of the values array. A particular partition of an uncompressed representation of the product matrix row is generated in the scratchpad memory. The particular partition corresponds to a partition of the second sparse matrix comprising non-zero value(s) included in the tile(s). When a particular tile is determined to comprise non-zero value(s) that are required to generate the particular partition, the particular tile is loaded into the scratchpad memory.
    Type: Grant
    Filed: March 27, 2017
    Date of Patent: October 22, 2019
    Assignee: Oracle International Corporation
    Inventors: Sandeep R. Agrawal, Sam Idicula, Nipun Agarwal
  • Patent number: 10191739
    Abstract: A microprocessor unit (MPU) connected to external sensors is provided with an interface unit that acquires detection information acquired by the external sensors and a digital signal processor (DSP) that estimates the state of a target object on the basis of the detection information acquired by the interface part and generates state information. The DSP is provided with a SIMD type arithmetic processing circuitry that processes a plurality of information with one command and is provided with single precision floating point computing units. The interface part outputs the state information generated by the DSP to an externally provided main processor. Therefore, power consumption can be reduced.
    Type: Grant
    Filed: September 22, 2016
    Date of Patent: January 29, 2019
    Assignee: MEGACHIPS CORPORATION
    Inventors: Mahito Matsumoto, Tomoshige Kato, Takehiro Yoshimura, Takio Yamaoka, Yusuke Sasaki, Shingo Hamaguchi
  • Patent number: 9619152
    Abstract: Techniques for transforming character delimited values are presented herein. An input module may be configured to read a set of character delimited values. A generation module may be configured to generate, in real-time, a synchronization block for the set of values that includes a nibble for each value in the set of values. The nibbles may represent either a byte size of the associated value or may be a flag representing a predetermined value. An output module may be configured to sequentially output the synchronization block and the set of values to a binary data output stream for output in a device dependent byte order according to the respective byte sizes of the values in the set of values.
    Type: Grant
    Filed: December 19, 2014
    Date of Patent: April 11, 2017
    Assignee: eBay Inc.
    Inventors: Gang Ye, Thennarasu Ponnusamy, Belinda Liu, Enlin Wang, Mallikarjun Bhaigond, Amit Desai, Xin Zhuang, Preeta Joshi, Hong-Yen Nguyen
  • Patent number: 9329936
    Abstract: A system, processor and method to increase computational reliability by using underutilized portions of a data path with a SuperFMA ALU. The method allows the reuse of underutilized hardware to implement spatial redundancy by using detection during the dispatch stage to determine if the operation may be executed by redundant hardware in the ALU. During execution, if determination is made that the correct conditions exists as determined by the redundant execution modes, the SuperFMA ALU performs the operation with redundant execution and compares the results for a match in order to generate a computational result. The method to increase computational reliability by using redundant execution is advantageous because the hardware cost of adding support for redundant execution is low and the complexity of implementation of the disclosed method is minimal due to the reuse of existing hardware.
    Type: Grant
    Filed: December 31, 2012
    Date of Patent: May 3, 2016
    Assignee: Intel Corporation
    Inventor: Brian J. Hickman
  • Patent number: 9098460
    Abstract: A matrix calculation system for calculating funny matrix multiplication (FMM) of a matrix A and a matrix B, including: sequentially calculating a permutation of indices {ai} in which values are arranged in a non-decreasing order with respect to each i-th row where i=1 to the number of rows of the matrix A; storing a value, which is greater than expected as a value of a matrix, for C[i, j] with respect to each j-th column where j=1 to the number of columns of the matrix A in the i-th row; sequentially calculating a permutation of indices {bj} in which values are arranged in a non-decreasing order with respect to each j-th column where j=1 to the number of columns of the matrix B; and setting the values of C[i, j], which are i and j components of the matrix C.
    Type: Grant
    Filed: August 22, 2012
    Date of Patent: August 4, 2015
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Hiroki Yanagisawa
  • Patent number: 8943119
    Abstract: A system and a method are configured to improve the performance of general-purpose processors by implementing a functional unit that computes the product of a matrix operand with a vector operand, producing a vector result. The functional unit fully utilizes the entire resources of a 128 b by 128 b multiplier regardless of the operand size, as the number of elements of the matrix and vector operands increase as operand size is reduced. The unit performs both fixed-point and floating-point multiplications and additions with the highest-possible intermediate accuracy with modest resources.
    Type: Grant
    Filed: May 2, 2012
    Date of Patent: January 27, 2015
    Assignee: Microunity Systems Engineering, Inc.
    Inventors: Craig Hansen, Bruce Bateman, John Moussouris
  • Patent number: 8782115
    Abstract: A matrix decomposition circuit is described. In one implementation, the matrix decomposition circuit includes a processing element to process a plurality of processing cells and a scheduler coupled to the processing element, where the scheduler instructs the processing element to process only required processing cells of the plurality of processing cells. In one specific implementation, the required processing cells are processing cells with non-zero inputs.
    Type: Grant
    Filed: April 18, 2008
    Date of Patent: July 15, 2014
    Assignee: Altera Corporation
    Inventor: Kulwinder Dhanoa
  • Patent number: 8649508
    Abstract: A system and method for implementing the Elliptic Curve scalar multiplication method in cryptography, where the Double Base Number System is expressed in decreasing order of exponents and further on using it to determine Elliptic curve scalar multiplication over a finite elliptic curve.
    Type: Grant
    Filed: September 29, 2008
    Date of Patent: February 11, 2014
    Assignee: Tata Consultancy Services Ltd.
    Inventor: Natarajan Vijayarangan
  • Patent number: 8620976
    Abstract: A machine-implemented method for computerized digital signal processing including obtaining a digital signal from data storage or from conversion of an analog signal, and determining, from the digital signal, one or more measuring matrices. Each measuring matrix has a plurality of cells, and each cell has an amplitude corresponding to the signal energy in a frequency bin for a time slice. Cells in each measuring matrix having maximum amplitudes along a time slice and/or frequency bin are identified as maximum cells. Maxima that coincide in time and frequency are identified and a correlated maxima matrix, called a “Precision Measuring Matrix” is constructed showing the coinciding maxima and the adjacent marked maxima are linked into partial chains.
    Type: Grant
    Filed: May 11, 2011
    Date of Patent: December 31, 2013
    Assignee: Paul Reed Smith Guitars Limited Partnership
    Inventors: Paul Reed Smith, Frederick M. Slay, Ernestine M. Smith
  • Patent number: 8612502
    Abstract: Systems and methodologies are described that facilitate equalization of received signals in a wireless communication environment. Multiple transmit and/or receive antennas and utilize MIMO technology to enhance performance. A single tile of transmitted data, including a set of modulation symbols, can be received at multiple receive antennas, resulting in multiple tiles of received modulation symbols. Corresponding modulation symbols from multiple received tiles can be processed as a function of channel and interference estimates to generate a single equalized modulation symbol. Typically, the equalization process is computationally expensive. However, the channels are highly correlated. This correlation is reflected in the channel estimates and can be utilized to reduce complex equalization operations. In particular, a subset of the equalizers can be generated based upon the equalizer function and the remainder can be generated using interpolation. In addition, the equalizer function itself can be simplified.
    Type: Grant
    Filed: March 20, 2008
    Date of Patent: December 17, 2013
    Assignee: QUALCOMM Incorporated
    Inventors: Petru Cristian Budianu, Hermanth Sampath, Alexei Gorokhov, Dhananjay A. Gore
  • Patent number: 8554820
    Abstract: A block matrix multiplication mechanism is provided for reversing the visitation order of blocks at corner turns when performing a block matrix multiplication operation in a data processing system. By reversing the visitation order, the mechanism eliminates a block load at the corner turns. In accordance with the illustrative embodiment, a corner return is referred to as a “bounce” corner turn and results in a serpentine patterned processing order of the matrix blocks. The mechanism allows the data processing system to perform a block matrix multiplication operation with a maximum of three block transfers per time step. Therefore, the mechanism reduces maximum throughput and increases performance. In addition, the mechanism also reduces the number of multi-buffered local store buffers.
    Type: Grant
    Filed: April 20, 2012
    Date of Patent: October 8, 2013
    Assignee: International Business Machines Corporation
    Inventors: Daniel A. Brokenshire, John A. Gunnels, Michael D. Kistler
  • Patent number: 8533251
    Abstract: A block matrix multiplication mechanism is provided for reversing the visitation order of blocks at corner turns when performing a block matrix multiplication operation in a data processing system. By reversing the visitation order, the mechanism eliminates a block load at the corner turns. In accordance with the illustrative embodiment, a corner return is referred to as a “bounce” corner turn and results in a serpentine patterned processing order of the matrix blocks. The mechanism allows the data processing system to perform a block matrix multiplication operation with a maximum of three block transfers per time step. Therefore, the mechanism reduces maximum throughput and increases performance. In addition, the mechanism also reduces the number of multi-buffered local store buffers.
    Type: Grant
    Filed: May 23, 2008
    Date of Patent: September 10, 2013
    Assignee: International Business Machines Corporation
    Inventors: Daniel A. Brokenshire, John A. Gunnels, Michael D. Kistler
  • Patent number: 8521799
    Abstract: Disclosed are a row-vector norm comparison method and a row-vector norm comparison apparatus for an inverse matrix. A row-vector norm comparison apparatus includes: an input matrix processing module that receives and combines constituent elements of a matrix; a cofactor operation module that multiplexes the combination result of the constituent elements to calculate factors constituting an adjoint matrix; a square calculation module that squares the calculated factors; a summation module that selects a predetermined number of factors among the squared factors and sums the selected factors to calculate the norms of row vectors in an inverse matrix; and a norm comparison module that outputs a comparison result of the calculated norms of the row vectors.
    Type: Grant
    Filed: June 30, 2008
    Date of Patent: August 27, 2013
    Assignees: Samsung Electronics Co., Ltd., Electronics and Telecommunications Research Institute
    Inventors: Young Ha Lee, Seung Jae Bahng, Youn-Ok Park
  • Patent number: 8473533
    Abstract: A system, computer-readable storage medium, and method directly solves non-linear systems that have the HB Jacobian as the coefficient matrix. The direct solve method can be used to efficiently simulate non-linear circuits in RF or microwave applications. Additionally, the direct solve method can be applied to Fourier envelope applications. Furthermore, the direct solve method can be used together with preconditioners to provide a more efficient iterative solve technique.
    Type: Grant
    Filed: June 17, 2010
    Date of Patent: June 25, 2013
    Assignee: Berkeley Design Automation, Inc.
    Inventors: Amit Mehrotra, Abhishek Somani
  • Patent number: 8412757
    Abstract: Non-negative matrix factorization, NMF, is combined with identification of a maximum margin classifier by minimizing a cost function that contains a generative component and the discriminative component. The relative weighting between the generative component and the discriminative component are adjusting during subsequent iterations such that initially, when confidence is low, the generative model is favored. But as the iterations proceed, confidence increases and the weight of the discriminative component is steadily increased until it is of equal weight as the generative model. Preferably, the cost function to be minimized is: min F , G ? 0 ? ? X - FG ? 2 + ? ? ( ? w ? 2 + C ? ? i = 1 n ? L ? ( y i , w · g i + b ) ) .
    Type: Grant
    Filed: December 9, 2009
    Date of Patent: April 2, 2013
    Assignee: Seiko Epson Corporation
    Inventors: Mithun Das Gupta, Jing Xiao
  • Publication number: 20130073599
    Abstract: Hardware for performing sequences of arithmetic operations. The hardware comprises a scheduler operable to generate a schedule of instructions from a bitmap denoting whether an entry in a matrix is zero or not. An arithmetic circuit is provided which is configured to perform arithmetic operations on the matrix in accordance with the schedule.
    Type: Application
    Filed: January 7, 2011
    Publication date: March 21, 2013
    Applicant: LINEAR ALGEBRA TECHNOLOGIES, LIMITED
    Inventor: David Maloney
  • Patent number: 8341204
    Abstract: A data processor whose level of operation parallelism is enhanced by composing floating-point inner product execution units to be compatible with single instruction multiple data (SIMD) and thereby enhancing the operation processing capability is made possible. An operating system that can significantly enhance the level of operation parallelism per instruction while maintaining the efficiency of the floating-point length-4 vector inner product execution units is to be implemented. The floating-point length-4 vector inner product execution units are defined in the minimum width (32 bits for single precision) even where an extensive operating system becomes available, and compose the inner product execution units to be compatible with SIMD. The mutually augmenting effects of the inner product execution units and SIMD-compatible composition enhances the level of operation parallelism dramatically.
    Type: Grant
    Filed: July 2, 2009
    Date of Patent: December 25, 2012
    Assignee: Renesas Electronics Corporation
    Inventors: Fumio Arakawa, Tetsuya Yamada
  • Patent number: 8285773
    Abstract: A signal separating device includes an iterative estimator, a repeating calculator, a result output unit, and a repetition controller. The repeating calculator repeatedly causes the iterative estimator to iteratively perform independent component analysis on an observed signal matrix, and to further perform independent component analysis on the source signal matrix obtained as a result. The result output unit outputs the product of the respective mixing matrices obtained during each repetition as a mixing matrix with respect to the observed signal matrix, while also outputting the source signal matrix obtained during the final repetition as a source signal matrix with respect to the observed signal matrix. The repetition controller causes the repeating calculator to repeat the calculation control until all mixing matrices and all source signal matrices satisfy a convergence condition. The iterative estimator may perform a fixed number of iterations, or perform iterations until convergence is obtained.
    Type: Grant
    Filed: April 27, 2007
    Date of Patent: October 9, 2012
    Assignee: Riken
    Inventors: Andrzej Cichocki, Rafal Zdunek, Shunichi Amari, Gen Hori, Ken Umeno
  • Patent number: 8195735
    Abstract: The present invention provides a system and method for improving the performance of general-purpose processors by implementing a functional unit that computes the product of a matrix operand with a vector operand, producing a vector result. The functional unit fully utilizes the entire resources of a 128b by 128b multiplier regardless of the operand size, as the number of elements of the matrix and vector operands increase as operand size is reduced. The unit performs both fixed-point and floating-point multiplications and additions with the highest-possible intermediate accuracy with modest resources.
    Type: Grant
    Filed: December 9, 2008
    Date of Patent: June 5, 2012
    Assignee: Microunity Systems Engineering, Inc.
    Inventors: Craig Hansen, Bruce Bateman, John Moussouris
  • Patent number: 8055607
    Abstract: A system and method for autonomic problem determination. Events and problems associated with the events are received from a computing resource and are expressed as entries in an event-problem matrix. Expert knowledge is expressed as entries in one or more multi-level structure dictionaries. The system and method enables dynamic interaction between the events in the matrix and the current dictionaries with its entries being updated continuously to maximize correlation among the events and problems. The index of each term in the dictionary is used to calculate the weight of each event in the matrix wherein events having frequent association with a specific problem will be given a higher weight in the matrix. Using singular value decomposition (SVD), the weighted events enable an accelerated and accurate convergence to a set of specific associated problems.
    Type: Grant
    Filed: March 3, 2008
    Date of Patent: November 8, 2011
    Assignee: International Business Machines Corporation
    Inventors: Hoi Y. Chan, Thomas Y. Kwok
  • Patent number: 7933404
    Abstract: Techniques are disclosed to enable efficient implementation of secure hash functions and/or stream ciphers. More specifically, a family of graphs is described that has relatively large girth, large claw, and/or rapid mixing properties. The graphs are suitable for construction of cryptographic primitives such as collision resistant hash functions and stream ciphers, which allow efficient software implementation.
    Type: Grant
    Filed: October 16, 2007
    Date of Patent: April 26, 2011
    Assignee: Microsoft Corporation
    Inventors: Ramarathnam Venkatesan, Matthew Cary
  • Publication number: 20100138468
    Abstract: Methods and apparatus are provided for a digital signal processor having an instruction set with one or more non-linear complex functions. A method is provided for a processor. One or more non-linear complex software instructions are obtained from a program. The non-linear complex software instructions have at least one complex number as an input. One or more non-linear complex functions are applied from a predefined instruction set to the at least one complex number. An output is generated comprised of one complex number or two real numbers. A functional unit can implement the one or more non-linear complex functions. In one embodiment, a vector-based digital signal processor is disclosed that processes a complex vector comprised of a plurality of complex numbers. The processor can process the plurality of complex numbers in parallel.
    Type: Application
    Filed: November 28, 2008
    Publication date: June 3, 2010
    Inventors: Kameran Azadet, Jian-Guo Chen, Samer Hijazi, Joseph Williams
  • Patent number: 7567996
    Abstract: A data processor whose level of operation parallelism is enhanced by composing floating-point inner product execution units to be compatible with single instruction multiple data (SIMD) and thereby enhancing the operation processing capability is made possible. An operating system that can significantly enhance the level of operation parallelism per instruction while maintaining the efficiency of the floating-point length-4 vector inner product execution units is to be implemented. The floating-point length-4 vector inner product execution units are defined in the minimum width (32 bits for single precision) even where an extensive operating system becomes available, and compose the inner product execution units to be compatible with SIMD. The mutually augmenting effects of the inner product execution units and SIMD-compatible composition enhances the level of operation parallelism dramatically.
    Type: Grant
    Filed: August 29, 2005
    Date of Patent: July 28, 2009
    Assignee: Renesas Technology Corp.
    Inventors: Fumio Arakawa, Tetsuya Yamada
  • Publication number: 20090094309
    Abstract: The present invention provides a system and method for improving the performance of general-purpose processors by implementing a functional unit that computes the product of a matrix operand with a vector operand, producing a vector result. The functional unit fully utilizes the entire resources of a 128b by 128b multiplier regardless of the operand size, as the number of elements of the matrix and vector operands increase as operand size is reduced. The unit performs both fixed-point and floating-point multiplications and additions with the highest-possible intermediate accuracy with modest resources.
    Type: Application
    Filed: December 9, 2008
    Publication date: April 9, 2009
    Applicant: MICROUNITY SYSTEMS ENGINEERING, INC.
    Inventors: Craig HANSEN, Bruce Bateman, John Moussouris
  • Publication number: 20080104161
    Abstract: An integrated transformation apparatus is provided. The apparatus includes a first multiplexer, a second multiplexer, and a transformation unit. The first multiplexer retrieves point data from columns or rows of a multi-dimensional matrix and input data. The second multiplexer retrieves transformation coefficients corresponding to the point data. The transformation unit transforms data blocks of the multi-dimensional matrix to a plurality of sub data blocks according to the input data, the point data, and the transformation coefficients.
    Type: Application
    Filed: August 24, 2007
    Publication date: May 1, 2008
    Applicant: INDUSTRIAL TECHNOLOGY RESEARCH INSTITUTE
    Inventors: Yi-Jung Wang, Guo-Zua Wu, Chih-Chi Chang, Oscal Tzyh Chiang Chen
  • Patent number: 7363200
    Abstract: A matrix includes samples associated with a first signal and samples associated with a second signal. The second signal includes a first portion associated with the first signal and a second portion associated with at least one disturbance, such as white noise or colored noise. A projection of the matrix is produced using canonical QR-decomposition. Canonical QR-decomposition of the matrix produces an orthogonal matrix and an upper triangular matrix, where each value in the diagonal of the upper triangular matrix is greater than or equal to zero. The projection at least substantially separates the first portion of the second signal from the second portion of the second signal.
    Type: Grant
    Filed: February 5, 2004
    Date of Patent: April 22, 2008
    Assignee: Honeywell International Inc.
    Inventor: Joseph Z. Lu
  • Patent number: 7137005
    Abstract: A method of introducing a non-perceptional signal (watermark) to a digital media data is disclosed. The method is based on the representation of source digital data using a special matrix, insertion of a digital watermark into the special matrix to receive the watermarked matrix, and generation of the watermarked data using the source data and the watermarked matrix. In addition, watermark detection of the watermarked data is performed by calculating the special matrix from the watermarked data.
    Type: Grant
    Filed: March 27, 2002
    Date of Patent: November 14, 2006
    Assignee: LG Electronics Inc.
    Inventors: Mikhail Anatolyevich Sall, Alexander Leonidovich Mayboroda, Viktor Vikrorovich Redkov, Anatoly Igorevich Tikhotsky
  • Patent number: 7089159
    Abstract: A matrix reordering method performs reordering of elements of a coefficient matrix created based on coefficients of linear simultaneous equations whose solutions are to be produced by parallel processing of processors of a computer in accordance with Gaussian elimination. Herein, degrees corresponding to numbers of non-zero elements are calculated with respect to all pivots included in the coefficient matrix. Then, a first pivot whose degree is under a threshold (mindeg+?) is selected from among the pivots of the coefficient matrix, while a second pivot whose critical path length is minimum is also selected from among the pivots of the coefficient matrix. Replacement of elements is performed between the first pivot and second pivot to complete reordering with respect to the first pivot. In addition, non-zero elements, which are newly produced by the Gaussian elimination of the first pivot, are added to the coefficient matrix.
    Type: Grant
    Filed: April 2, 2001
    Date of Patent: August 8, 2006
    Assignee: NEC Electronics Corporation
    Inventor: Koutaro Hachiya
  • Patent number: 7028066
    Abstract: A data processor whose level of operation parallelism is enhanced by composing floating-point inner product execution units to be compatible with single instruction multiple data (SIMD) and thereby enhancing the operation processing capability is made possible. An operating system that can significantly enhance the level of operation parallelism per instruction while maintaining the efficiency of the floating-point length-4 vector inner product execution units is to be implemented. The floating-point length-4 vector inner product execution units are defined in the minimum width (32 bits for single precision) even where an extensive operating system becomes available, and compose the inner product execution units to be compatible with SIMD. The mutually augmenting effects of the inner product execution units and SIMD-compatible composition enhances the level of operation parallelism dramatically.
    Type: Grant
    Filed: March 8, 2001
    Date of Patent: April 11, 2006
    Assignee: Renesas Technology Corp.
    Inventors: Fumio Arakawa, Tetsuya Yamada
  • Patent number: 7010760
    Abstract: An autofill algorithm provides tools for defining and automatically executing batch based procedures in an adaptive hierarchical workflow environment, and may be suitable for a large variety of applications including laboratory procedure planning, execution, documentation, as wells ad driving robotic apparatus.
    Type: Grant
    Filed: March 11, 2004
    Date of Patent: March 7, 2006
    Assignee: Teranode Corporation
    Inventors: Lawrence F. Arnstein, Zheng Li, John M. Hill, Michael R. Kellen, Christophe Poulain, Neil A. Fanger, Kuang Chen
  • Patent number: 6820074
    Abstract: A method and software are disclosed for processing data values of a data array at equally spaced locations in two dimensions where the desired data values are nulls in the data array. The method and software first searches for linear ranges of contiguous nulls, and then performs incidental interpolation of all points in such range.
    Type: Grant
    Filed: July 7, 1999
    Date of Patent: November 16, 2004
    Assignee: Landmark Graphics Corporation
    Inventor: Anne L. Simpson
  • Patent number: 6662125
    Abstract: An electromagnetic wave analyzer and program which can handle non-uniform cells with smaller computation errors. A given computational domain is divided into a plurality of cells for the purpose of finite difference approximation. For each space point, a cell size identification unit identifies the uniformity of surrounding cells. When the surrounding cells are identified as being uniform in size, a first calculation unit calculates electromagnetic field components at that space point with a first calculation method. When the surrounding cells are identified as being non-uniform in size, a second calculation unit calculates the same with a second calculation method which has smaller computational errors than the first calculation method. A data output unit then outputs the calculated electromagnetic field values.
    Type: Grant
    Filed: December 20, 2001
    Date of Patent: December 9, 2003
    Assignee: Fujitsu Limited
    Inventor: Takefumi Namiki
  • Publication number: 20010051969
    Abstract: A multimedia execution unit configured to perform vectored floating point and integer instructions. The execution unit may include an add/subtract pipeline having far and close data paths. The far path is configured to handle effective addition operations and effective subtraction operations for operands having an absolute exponent difference greater than one. The close path is configured to handle effective subtraction operations for operands having an absolute exponent difference less than or equal to one. The close path is configured to generate two output values, wherein one output value is the first input operand plus an inverted version of the second input operand, while the second output value is equal to the first output value plus one. Selection of the first or second output value in the close path effectuates the round-to-nearest operation for the output of the adder.
    Type: Application
    Filed: February 6, 2001
    Publication date: December 13, 2001
    Inventors: Stuart F. Oberman, Norbert Juffa, Fred Weber, Krishnan Ramani, Ravi Krishna Cherukuri
  • Publication number: 20010021941
    Abstract: A data processor whose level of operation parallelism is enhanced by composing floating-point inner product execution units to be compatible with SIMD and thereby enhancing the operation processing capability is made possible. An operating system that can significantly enhance the level of operation parallelism per instruction while maintaining the efficiency of the floating-point length-4 vector inner product execution units is to be implemented. The floating-point length-4 vector inner product execution units are defined in the minimum width (32 bits for single precision) even where an extensive operating system becomes available, and compose the inner product execution units to be compatible with SIMD. The mutually augmenting effects of the inner product execution units and SIMD-compatible composition enhances the level of operation parallelism dramatically.
    Type: Application
    Filed: March 8, 2001
    Publication date: September 13, 2001
    Inventors: Fumio Arakawa, Tetsuya Yamada
  • Publication number: 20010011291
    Abstract: A data processor includes an arithmetic portion incorporated in a floating point unit, in which the arithmetic portion includes a plurality of multipliers supplied mantissa part of floating point number from respectively different data input signal line group and performing mutual multiplication of supplied mantissa parts, an aligner receiving outputs of respective multipliers and performing alignment shift, an exponent processing portion for generating number of alignment shift of the aligner and an exponent before normalization on the basis of generation an exponent part of the floating point number, a multi-input adder and the exponent before normalization, reducing scale of the circuit and performing inner product operation and the like with the floating point numbers in high speed and high accuracy.
    Type: Application
    Filed: March 19, 2001
    Publication date: August 2, 2001
    Inventors: Fumio Arakawa, Norio Nakagawa, Tetsuya Yamada, Yonetaro Totsuka
  • Patent number: 6188240
    Abstract: A programmable function block comprises a core logic circuit having a first argument input group consisting of first through fourth argument input terminals, a second argument input group consisting of first through fourth argument input terminals, first through third configuration input terminals, a core logic carry output terminal, a core logic carry generation output terminal, a core logic carry propagation output terminal, a ripple-core logic carry input terminal, and a sum output terminal. Connected to interconnection wires and the first and the second argument input groups, an input block includes eighth input selection units for selecting, as eight input selected signals, eight ones of signals on the interconnection wires, a fixed logic value of “1”, and a fixed logic value of “0”. Connected to the first through the third configuration input terminals, respectively, first through third memory circuits stores, as first through third stored logic values, a logic value of one bit.
    Type: Grant
    Filed: June 4, 1999
    Date of Patent: February 13, 2001
    Assignees: NEC Corporation, Real World Computing Partnership
    Inventor: Shogo Nakaya