Multiple Parallel Operations Patents (Class 708/524)
  • Patent number: 11947462
    Abstract: Techniques are disclosed relating to cache footprint management. In some embodiments, execution circuitry is configured to perform operations for instructions from multiple threads in parallel. Cache circuitry may store information operated on by threads executed by the execution circuitry. Scheduling circuitry may arbitrate among threads to schedule threads for execution by the execution circuitry. Tracking circuitry may determine one or more performance metrics for the cache circuitry. Control circuitry may, based on the one or more performance metrics meeting a threshold, reduce a limit on a number of threads considered for arbitration by the scheduling circuitry, to control a footprint of information stored by the cache circuitry. Disclosed techniques may advantageously reduce or avoid cache thrashing for certain processor workloads.
    Type: Grant
    Filed: March 3, 2022
    Date of Patent: April 2, 2024
    Assignee: Apple Inc.
    Inventors: Yoong Chert Foo, Terence M. Potter, Donald R. DeSota, Benjiman L. Goodman, Aroun Demeure, Cheng Li, Winnie W. Yeung
  • Patent number: 11886534
    Abstract: The present invention discloses a filtering method and system of parallel computing results, through simultaneously generating the input value of the first valid position fvp of each fragment, and simultaneously computing to obtain the output result corresponding to the input value of each first valid position fvp with the respective first valid position fvp of each fragment, and according to the output result of the first valid position fvp of the first fragment, the parallel computing results are filtered through the manner of selecting the output results of the second to the S-th fragments in sequence, to finally obtain correct parallel computing results. In the present invention, through adopting the manner of parallel filtering, the original serial filtering computation is changed to parallel computation of S fragments, the computing time is only one S-th of the original time, thereby improving the computing efficiency and satisfying the timing requirements of parallel computation.
    Type: Grant
    Filed: September 29, 2019
    Date of Patent: January 30, 2024
    Assignee: Inspur Electronic Information Industry Co., Ltd.
    Inventors: Hongzhi Shi, Haiwei Liu, Jian Zhao
  • Patent number: 11782498
    Abstract: An electronic device includes a coding module that determines whether a parameter of an artificial neural network is an outlier, depending on a value of the parameter and compresses the parameter by truncating a first bit of the parameter when the parameter is a non-outlier and truncating a second bit of the parameter when the parameter is the outlier, and a decoding module that decodes a compressed parameter.
    Type: Grant
    Filed: February 17, 2020
    Date of Patent: October 10, 2023
    Assignee: University-Industry Cooperation Group of Kyung Hee University
    Inventors: Ik Joon Chang, Ho Nguyen Dong, Minhson Le
  • Patent number: 11620169
    Abstract: When communicating through shared memory, a producer thread generates a value that is written to a location in a shared memory. The value is read from the shared memory by a consumer thread. The challenge is to ensure that the consumer thread reads the location only after the value is written and is thereby synchronized. When a memory location is written by a producer thread, a flag that is simultaneously stored in the memory location along with the value is toggled. The consumer thread tracks information to determine whether the flag stored in the location indicates whether the producer has written the value to the location. The flag is read and written simultaneously with reading and writing the location in memory, thereby eliminating the need for a memory fence. After all of the consumer threads read the value, the location may be reused to write additional value(s) and simultaneously toggle the flag.
    Type: Grant
    Filed: March 13, 2020
    Date of Patent: April 4, 2023
    Assignee: NVIDIA Corporation
    Inventor: Vasily Volkov
  • Patent number: 11539812
    Abstract: Provided are a method and device for generating a notification and a method and device for subscribing a notification message. The method for generating a notification (300) includes: receiving a subscription request (S310); creating a first subscription resource according to the subscription request, the first subscription resource including a plurality of first event notification criteria and a second event notification criterion (S320); receiving a plurality of first events generated according to the plurality of first event notification criteria (S330); determining whether the plurality of first events satisfy the second event notification criterion (S340); and generating a notification in a case where the plurality of first events satisfies the second event notification criterion, the notification indicating a second event (S350).
    Type: Grant
    Filed: November 14, 2017
    Date of Patent: December 27, 2022
    Assignee: BOE TECHNOLOGY GROUP CO., LTD.
    Inventors: Zhenpeng Guo, Junjie Zhao
  • Patent number: 11392725
    Abstract: Provided are a security processor for performing a remainder operation by using a random number and an operating method of the security processor. The security processor includes a random number generator configured to generate a first random number; a modular calculator configured to generate a first random operand based on first data and the first random number and generate output data through a remainder operation on the first random operand, wherein a result value of the remainder operation on the first input data is identical to a result value of the remainder operation on the first random operand.
    Type: Grant
    Filed: August 9, 2019
    Date of Patent: July 19, 2022
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Jae-hyeok Kim, Jong-hoon Shin, Ji-su Kang, Hyun-il Kim, Hye-soo Lee, Hong-mook Choi
  • Patent number: 11275559
    Abstract: Certain aspects of the present disclosure are directed to methods and apparatus for circular floating point addition. An example method generally includes obtaining a first floating point number represented by a first significand and a first exponent, obtaining a second floating point number represented by a second significand and second exponent, and adding the first floating point number and the second floating point number using a circular accumulator device.
    Type: Grant
    Filed: July 2, 2020
    Date of Patent: March 15, 2022
    Assignee: Qualcomm Incorproated
    Inventor: Aaron Douglass Lamb
  • Patent number: 11216250
    Abstract: A method includes providing a set of one or more computational units implemented in a set of one or more field programmable gate array (FPGA) devices, where the set of one or more computational units is configured to generate a plurality of output values based on one or more input values. The method further includes, for each computational unit of the set of computational units, performing a first calculation in the computational unit using a first number representation, where a first output of the plurality of output values is based on the first calculation, determining a second number representation based on the first output value, and performing a second calculation in the computational unit using the second number representation, where a second output of the plurality of output values is based on the second calculation.
    Type: Grant
    Filed: December 6, 2017
    Date of Patent: January 4, 2022
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Nicholas P. Malaya, Elliot H. Mednick
  • Patent number: 11157275
    Abstract: The present disclosure relates to systems and methods that provide a reconfigurable cryptographic coprocessor. An example system includes an instruction memory configured to provide ARX instructions and mode control instructions. The system also includes an adjustable-width arithmetic logic unit, an adjustable-width rotator, and a coefficient memory. A bit width of the adjustable-width arithmetic logic unit and a bit width of the adjustable-width rotator are adjusted according to the mode control instructions. The coefficient memory is configured to provide variable-width words to the arithmetic logic unit and the rotator. The arithmetic logic unit and the rotator are configured to carry out the ARX instructions on the provided variable-width words. The systems and methods described herein could accelerate various applications, such as deep learning, by assigning one or more of the disclosed reconfigurable coprocessors to work as a central computation unit in a neural network.
    Type: Grant
    Filed: July 3, 2018
    Date of Patent: October 26, 2021
    Assignees: The Board of Trustees of the University of Illinois, University of Virginia Patent Foundation
    Inventors: Mohamed E Aly, Wen-Mei W. Hwu, Kevin Skadron
  • Patent number: 11145338
    Abstract: A semiconductor memory device includes a storage, a buffer, and a control logic. The storage stores a first algorithm data. The buffer stores a second algorithm data that is at least partially different from the first algorithm data. The control logic is configured to selectively receive the first algorithm data and the second algorithm data.
    Type: Grant
    Filed: April 28, 2020
    Date of Patent: October 12, 2021
    Assignee: SK hynix Inc.
    Inventors: Geonu Kim, Yong Soon Park, Won Sun Park
  • Patent number: 11048661
    Abstract: A dataflow accelerator including a control/command core, a scratchpad and a coarse grain reconfigurable array (CGRA) according to an exemplary embodiment is disclosed. The scratchpad may include a write controller to transmit data to an input vector port interface and to receive data from the input vector port interface. The CGRA may receive data from the input vector port interface and includes a plurality of interconnects and a plurality of functional units.
    Type: Grant
    Filed: April 15, 2019
    Date of Patent: June 29, 2021
    Assignee: SIMPLE MACHINES INC.
    Inventors: Karthikeyan Sankaralingam, Anthony Nowatzki, Vinay Gangadhar, Preyas Shah, Newsha Ardalani
  • Patent number: 10970042
    Abstract: An integrated circuit with specialized processing blocks is provided. A specialized processing block may be optimized for machine learning algorithms and may include a multiplier data path that feeds an adder data path. The multiplier data path may be decomposed into multiple partial product generators, multiple compressors, and multiple carry-propagate adders of a first precision. Results from the carry-propagate adders may be added using a floating-point adder of the first precision. Results from the floating-point adder may be optionally cast to a second precision that is higher or more accurate than the first precision. The adder data path may include an adder of the second precision that combines the results from the floating-point adder with zero, with a general-purpose input, or with other dot product terms. Operated in this way, the specialized processing block provides a technical improvement of greatly increasing the functional density for implementing machine learning algorithms.
    Type: Grant
    Filed: September 27, 2018
    Date of Patent: April 6, 2021
    Assignee: Intel Corporation
    Inventors: Martin Langhammer, Dongdong Chen, Kevin Hurd
  • Patent number: 10915319
    Abstract: An image processor is described. The image processor includes a two dimensional shift register array that couples certain ones of its array locations to support execution of a shift instruction. The shift instruction is to include mask information. The mask information is to specify which of the array locations are to be written to with information being shifted. The two dimensional shift register array includes masking logic circuitry to write the information being shifted into specified ones of the array locations in accordance with the mask information.
    Type: Grant
    Filed: May 15, 2017
    Date of Patent: February 9, 2021
    Assignee: Google LLC
    Inventor: Albert Meixner
  • Patent number: 10705845
    Abstract: A processor includes a core to execute an instruction for conversion between an element array and a packed bit array. The core includes logic to identify one or more bit-field lengths to be used by the packed bit array, identify a width of elements of the element array, and simultaneously for elements of the element array and for bit-fields of the packed bit array, convert between the element array and the packed bit array based upon the bit-field length and the width of elements of the element array.
    Type: Grant
    Filed: June 18, 2018
    Date of Patent: July 7, 2020
    Assignee: Intel IP Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Thomas Willhalm, Robert Valentine
  • Patent number: 10656951
    Abstract: A processing element is implemented in a stage of a pipeline and configured to execute an instruction. A first array of multiplexers is to provide information associated with the instruction to the processing element in response to the instruction being in a first set of instructions. A second array of multiplexers is to provide information associated with the instruction to the first processing element in response to the instruction being in a second set of instructions. A control unit is to gate at least one of power or a clock signal provided to the first array of multiplexers in response to the instruction being in the second set.
    Type: Grant
    Filed: October 20, 2017
    Date of Patent: May 19, 2020
    Assignees: ADVANCED MICRO DEVICES, INC., ADVANCED MICRO DEVICES (SHANGHAI) CO., LTD.
    Inventors: Jiasheng Chen, YunXiao Zou, Bin He, Angel E. Socarras, QingCheng Wang, Wei Yuan, Michael Mantor
  • Patent number: 10649830
    Abstract: It is determined whether an arithmetic operation function of a device to be inspected is normal or not. A MCU 13 to be inspected acquires a constant to be used for an arithmetic problem from a power source IC 12 on an inspection side. The MCU 13 sequentially selects a plurality of the arithmetic problems and carries out an arithmetic operation using the acquired constant according to the selected arithmetic problem. A monitoring circuit 23 of the power source IC 12 receives the result of the arithmetic operation of the arithmetic problem from the MCU 13. The monitoring circuit 23 compares the received arithmetic operation result with the arithmetic operation result of the arithmetic problem calculated at the side of the monitoring circuit 23. The monitoring circuit 23 determines whether the arithmetic operation function of the MCU 13 works normally or not based on the comparison result.
    Type: Grant
    Filed: February 20, 2018
    Date of Patent: May 12, 2020
    Assignee: RENESAS ELECTRONICS CORPORATION
    Inventor: Seiichi Kousokabe
  • Patent number: 10613987
    Abstract: In some embodiments, a system includes an execution unit, a register file, an operand cache, and a predication control circuit. Operands identified by an instruction may be stored in the operand cache. One or more entries of the operand cache that store the operands may be marked as dirty. The predication control circuit may identify an instruction as having an unresolved predication state. Subsequent to initiating execution of the instruction, the predication control circuit may receive results of the at least one unresolved conditional instruction. In response to the results indicating the instruction has a known-to-execute predication state, the predication control circuit may initiate writing, in the operand cache, results of executing the instruction. In response to the results indicating the instruction has a known-not-to-execute predication state, the predication control circuit may prevent the results from executing the instruction from being written in the operand cache.
    Type: Grant
    Filed: September 23, 2016
    Date of Patent: April 7, 2020
    Assignee: Apple Inc.
    Inventors: Andrew M. Havlir, Terence M. Potter
  • Patent number: 10606559
    Abstract: In an example, an apparatus comprises a plurality of execution units and logic, at least partially including hardware logic, to gate at least one of a multiply unit or an accumulate unit in response to an input of value zero. Other embodiments are also disclosed and claimed.
    Type: Grant
    Filed: June 12, 2019
    Date of Patent: March 31, 2020
    Assignee: INTEL CORPORATION
    Inventors: Yaniv Fais, Tomer Bar-On, Jacob Subag, Jeremie Dreyfuss, Lev Faivishevsky, Michael Behar, Amit Bleiweiss, Guy Jacob, Gal Leibovich, Itamar Ben-Ari, Galina Ryvchin, Eyal Yaacoby
  • Patent number: 10530560
    Abstract: In an embodiment, an integrated circuit (IC) device is disclosed. In the embodiment, the IC device includes an Ethernet frame processor, at least one Ethernet port coupled to the Ethernet frame processor, and a hardware synchronization circuit coupled to the Ethernet frame processor and to the at least one Ethernet port, the hardware synchronization circuit including a controller, a local clock, a media-independent peripheral coupled to the controller, and a media-dependent peripheral coupled to the media-independent peripheral, wherein power can be provided to the hardware synchronization circuit independent of the Ethernet frame processor.
    Type: Grant
    Filed: June 20, 2016
    Date of Patent: January 7, 2020
    Assignee: NXP B.V.
    Inventors: Hubertus Gerardus Hendrikus Vermeulen, Nicola Concer
  • Patent number: 10474458
    Abstract: One embodiment provides for a machine-learning hardware accelerator comprising a compute unit having an adder and a multiplier that are shared between integer data path and a floating-point datapath, the upper bits of input operands to the multiplier to be gated during floating-point operation.
    Type: Grant
    Filed: October 18, 2017
    Date of Patent: November 12, 2019
    Assignee: Intel Corporation
    Inventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar
  • Patent number: 10416997
    Abstract: A method and apparatus for including in processor instructions for performing logical-comparison and branch support operations on packed or unpacked data. In one embodiment, instruction decode logic decodes instructions for an execution unit to operate on packed data elements including logical comparisons. A register file including 128-bit packed data registers stores packed single-precision floating point (SPFP) and packed integer data elements. The logical comparisons may include comparison of SPFP data elements and comparison of integer data elements and setting at least one bit to indicate the results. Based on these comparisons, branch support actions are taken. Such branch support actions may include setting the at least one bit, which in turn may be utilized by a branching unit in response to a branch instruction. Alternatively, the branch support actions may include branching to an indicated target code location.
    Type: Grant
    Filed: October 18, 2018
    Date of Patent: September 17, 2019
    Assignee: Intel Corporation
    Inventors: Rajiv Kapoor, Ronen Zohar, Mark J. Buxton, Zeev Sperber, Koby Gottlieb
  • Patent number: 10157164
    Abstract: Aspects disclosed herein relate to aggregating functionality of computer machine instructions to generate additional computer machine instructions and including the additional computer machine instructions in an instruction set architecture (ISA). An exemplary method includes selecting at least first and second computer machine instructions from an instruction set, aggregating functionality of the first and second computer machine instructions to generate a third computer machine instruction, and adding the third computer machine instruction to the instruction set.
    Type: Grant
    Filed: September 20, 2016
    Date of Patent: December 18, 2018
    Assignee: QUALCOMM Incorporated
    Inventors: Sangyeol Kang, Ovidiu Cristian Miclea, Stephen Michael Verrall
  • Patent number: 10127015
    Abstract: An instruction to perform a multiply and shift operation is executed. The executing includes multiplying a first value and a second value obtained by the instruction to obtain a product. The product is shifted in a specified direction by a user-defined selected amount to provide a result, and the result is placed in a selected location. The result is to be used in processing within the computing environment.
    Type: Grant
    Filed: September 30, 2016
    Date of Patent: November 13, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jonathan D. Bradbury, Steven R. Carlough, Reid T. Copeland, Silvia Melitta Mueller
  • Patent number: 10025557
    Abstract: An 8×8 binary digital multiplier reduces the height of partial product columns to be no more than 7 bits high. The six 7-bit high middle columns are each input to a (7:3) counter. An ascending triangle compressor operates on the lesser significant bit columns. A descending triangle compressor operates on the greater significant bit columns. The counter and compressor outputs are combined for a final stage of compression, followed by partial product addition.
    Type: Grant
    Filed: December 5, 2015
    Date of Patent: July 17, 2018
    Assignee: Firefly DSP LLC
    Inventors: Craig Franklin, David Cureton Baker
  • Patent number: 10007518
    Abstract: The number of registers required is reduced by overlapping scalar and vector registers. This also allows increased compiler flexibility when mixing scalar and vector instructions. Local register read ports are minimized by restricting read access. Dedicated predicate registers reduces requirements for general registers, and allows reduction of critical timing paths by allowing the predicate registers to be placed next to the predicate unit.
    Type: Grant
    Filed: July 9, 2014
    Date of Patent: June 26, 2018
    Assignee: TEXAS INSTRUMENTS INCORPORATED
    Inventors: Timothy David Anderson, Duc Quang Bui, Mel Alan Phipps, Todd T. Hahn, Joseph Zbiciak
  • Patent number: 9882739
    Abstract: A method for disabling or removing a Legacy loop detect circuit to eliminate the circuit erroneously detecting a legacy loop during a IEEE-1394 serial bus initialization. The method includes providing a programmable code to the Legacy loop detect circuit for increasing a reset count to a value greater than three (3) to the Legacy loop circuit thus reducing the probability of an erroneous disconnect of a Beta node connection. This method provides for more robust Beta loop node operation during high frequency bus resets.
    Type: Grant
    Filed: August 30, 2016
    Date of Patent: January 30, 2018
    Assignee: DAP Holding B.U.
    Inventor: Richard Mourn
  • Patent number: 9846579
    Abstract: Techniques are disclosed relating to comparison circuitry. In some embodiments, compare circuitry is configured to generate comparison results for sets of inputs in both one or more integer formats and one or more floating-point formats. In some embodiments, the compare circuitry includes padding circuitry configured to add one or more bits to each of first and second input values to generate first and second padded values. In some embodiments, the compare circuitry also includes integer subtraction circuitry configured to subtract the first padded value from the second padded value to generate a subtraction result. In some embodiments, the compare circuitry includes output logic configured to generate the comparison result based on the subtraction result. In various embodiments, using at least a portion of the same circuitry (e.g., the subtractor) for both integer and floating-point comparisons may reduce processor area.
    Type: Grant
    Filed: June 13, 2016
    Date of Patent: December 19, 2017
    Assignee: Apple Inc.
    Inventors: Liang-Kai Wang, Terence M. Potter, Andrew M. Havlir
  • Patent number: 9710318
    Abstract: An electronic circuit includes a microcontroller processor (410), a peripheral (420) coupled with the processor, an endian circuit (470) coupled with the processor and the peripheral to selectively provide different endianess modes of operation, and a detection circuit (140) to detect a failure to select a given endianess, whereby inadvertent switch of endianess due to faults is avoided. Other circuits, devices, systems, methods of operation and processes of manufacture are also disclosed.
    Type: Grant
    Filed: January 22, 2015
    Date of Patent: July 18, 2017
    Assignee: Texas Instruments Incorporated
    Inventors: Yanyang Xiao, Alexandre Pierre Palus, Karl Friedrich Greb, Kevin Patrick Lavery, Paul Krause
  • Patent number: 9678716
    Abstract: An apparatus comprises processing circuitry for performing an absolute difference operation for generating an absolute difference value in response to the first operand the second operand. The processing circuitry supports variable data element sizes for data elements of the first and second operands and the absolute difference value. Each data element of the absolute difference value represents an absolute difference between corresponding data elements of the first and second operands. The processing circuitry has an adding stage for performing at least one addition to generate at least one intermediate value and an inverting stage for inverting selected bits of each intermediate value. Control circuitry generates control information based on the current data element size and status information generated in the adding stage, to identify the selected bits to be inverted in the inverting stage to convert each intermediate value into a corresponding portion of the absolute difference value.
    Type: Grant
    Filed: December 22, 2014
    Date of Patent: June 13, 2017
    Assignee: ARM Limited
    Inventors: Neil Burgess, David Raymond Lutz
  • Patent number: 9602302
    Abstract: A method for disabling or removing a Legacy loop detect circuit to eliminate the circuit erroneously detecting a legacy loop during a IEEE-1394 serial bus initialization. The method includes providing a programmable code to the Legacy loop detect circuit for increasing a reset count to a value greater than three (3) to the Legacy loop circuit thus reducing the probability of an erroneous disconnect of a Beta node connection. This method provides for more robust Beta loop node operation during high frequency bus resets.
    Type: Grant
    Filed: June 17, 2014
    Date of Patent: March 21, 2017
    Assignee: DAP Holding B.U.
    Inventor: Richard Mourn
  • Patent number: 9477628
    Abstract: A collective communication apparatus and method for parallel computing systems. For example, one embodiment of an apparatus comprises a plurality of processor elements (PEs); collective interconnect logic to dynamically form a virtual collective interconnect (VCI) between the PEs at runtime without global communication among all of the PEs, the VCI defining a logical topology between the PEs in which each PE is directly communicatively coupled to a only a subset of the remaining PEs; and execution logic to execute collective operations across the PEs, wherein one or more of the PEs receive first results from a first portion of the subset of the remaining PEs, perform a portion of the collective operations, and provide second results to a second portion of the subset of the remaining PEs.
    Type: Grant
    Filed: September 28, 2013
    Date of Patent: October 25, 2016
    Assignee: Intel Corporation
    Inventors: Allan D. Knies, David Pardo Keppel, Dong Hyuk Woo, Joshua B. Fryman
  • Patent number: 9411726
    Abstract: An embodiment includes a system, comprising a first memory; a plurality of first circuits, wherein each first circuit is coupled to the memory; and includes a second circuit configured to generate a first output value in response to an input value received from the first memory; and an accumulator configured to receive the first output value and generate a second output value; and a controller coupled to the memory and the first circuits, and configured to determine the input values to be transmitted from the memory to the first circuits.
    Type: Grant
    Filed: May 14, 2015
    Date of Patent: August 9, 2016
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Ilia Ovsiannikov, Zhengping Ji, Yibing Michelle Wang, Hongyu Wang
  • Patent number: 9311273
    Abstract: A segment including a set of blocks necessary to calculate blocks having internal states and blocks having no outputs is extracted by tracing from blocks for use in calculating inputs into the blocks having internal states and from the blocks having no outputs in the reverse direction of dependence. To newly extract segments in which blocks contained in the extracted segments are removed, a set of nodes to be temporarily removed is determined on the basis of parallelism. Segments executable independently of other segments are extracted by tracing from nodes whose child nodes are lost by removal of the nodes in the upstream direction. Segments are divided into upstream segments representing the newly extracted segments and downstream segments representing nodes temporarily removed. Upstream and downstream segments are merged so as to reduce overlapping blocks between segments such that the number of segments is reduced to the number of parallel executions.
    Type: Grant
    Filed: July 26, 2013
    Date of Patent: April 12, 2016
    Assignee: International Business Machines Corporation
    Inventors: Shuhichi Shimizu, Takeo Yoshizawa
  • Patent number: 9218317
    Abstract: A segment including a set of blocks necessary to calculate blocks having internal states and blocks having no outputs is extracted by tracing from blocks for use in calculating inputs into the blocks having internal states and from the blocks having no outputs in the reverse direction of dependence. To newly extract segments in which blocks contained in the extracted segments are removed, a set of nodes to be temporarily removed is determined on the basis of parallelism. Segments executable independently of other segments are extracted by tracing from nodes whose child nodes are lost by removal of the nodes in the upstream direction. Segments are divided into upstream segments representing the newly extracted segments and downstream segments representing nodes temporarily removed. Upstream and downstream segments are merged so as to reduce overlapping blocks between segments such that the number of segments is reduced to the number of parallel executions.
    Type: Grant
    Filed: August 21, 2013
    Date of Patent: December 22, 2015
    Assignee: International Business Machines Corporation
    Inventors: Shuhichi Shimizu, Takeo Yoshizawa
  • Patent number: 9207941
    Abstract: Systems, methods, and apparatuses for calculating a square of a data value of a first source operand, a square of a data value of a second source operand, and a multiplication of the data of the first and second operands only using one multiplication are described.
    Type: Grant
    Filed: March 15, 2013
    Date of Patent: December 8, 2015
    Assignee: Intel Corporation
    Inventors: Ilya Albrekht, Elmoustapha Ould-Ahmed-Vall
  • Patent number: 9104510
    Abstract: Arithmetic units and methods for floating point processing are provided. In exemplary embodiments, data paths to and from multiple multipliers and adders are flexibly combined through crossbars and alignment units to allow a wide range of mathematical operations, including affine and SIMD operations. The micro-architecture for a high-performance flexible vector floating point arithmetic unit is provided, which can perform a single-cycle throughput complex multiply-and-accumulate operation, as well as a Fast Fourier Transform (radix-2 decimation-in-time) Butterfly operation.
    Type: Grant
    Filed: April 30, 2010
    Date of Patent: August 11, 2015
    Assignee: Audience, Inc.
    Inventors: Leonardo Rub, Dana Massie, Samuel Dicker
  • Patent number: 8793299
    Abstract: A method and apparatus for including in a processor instructions for performing multiply-add operations on packed data. In one embodiment, a processor is coupled to a memory. The memory has stored therein a first packed data and a second packed data. The processor performs operations on data elements in said first packed data and said second packed data to generate a third packed data in response to receiving an instruction. At least two of the data elements in this third packed data storing the result of performing multiply-add operations on data elements in the first and second packed data.
    Type: Grant
    Filed: March 13, 2013
    Date of Patent: July 29, 2014
    Assignee: Intel Corporation
    Inventors: Alexander Peleg, Millind Mittal, Larry M. Mennemeier, Benny Eitan, Carole Dulong, Eiichi Kowashi, Wolf C. Witt
  • Patent number: 8756267
    Abstract: According to some embodiments, a device is configured to perform a dual multiply-accumulate operation. In one embodiment, the device includes a functional unit configured to calculate, in parallel, a first multiplication product of a first coefficient and a first sample; and a second multiplication product of the first coefficient and a second sample. The first sample is an (n)th sample and the second sample is an (n+2)th sample in a plurality of sequential samples. The functional unit outputs and stores the first multiplication product and the second multiplication product in different storage locations in at least one storage device.
    Type: Grant
    Filed: October 31, 2011
    Date of Patent: June 17, 2014
    Assignee: Marvell International Ltd.
    Inventors: Bradley Aldrich, Nigel C. Paver, William T. Maghielse
  • Patent number: 8745119
    Abstract: A method and apparatus for including in a processor instructions for performing multiply-add operations on packed data. In one embodiment, a processor is coupled to a memory. The memory has stored therein a first packed data and a second packed data. The processor performs operations on data elements in said first packed data and said second packed data to generate a third packed data in response to receiving an instruction. At least two of the data elements in this third packed data storing the result of performing multiply-add operations on data elements in the first and second packed data.
    Type: Grant
    Filed: March 13, 2013
    Date of Patent: June 3, 2014
    Assignee: Intel Corporation
    Inventors: Alexander Peleg, Millind Mittal, Larry M. Mennemeier, Benny Eitan, Carole Dulong, Eiichi Kowashi, Wolf C. Witt
  • Patent number: 8725787
    Abstract: A method and apparatus for including in a processor instructions for performing multiply-add operations on packed data. In one embodiment, a processor is coupled to a memory. The memory has stored therein a first packed data and a second packed data. The processor performs operations on data elements in said first packed data and said second packed data to generate a third packed data in response to receiving an instruction. At least two of the data elements in this third packed data storing the result of performing multiply-add operations on data elements in the first and second packed data.
    Type: Grant
    Filed: April 26, 2012
    Date of Patent: May 13, 2014
    Assignee: Intel Corporation
    Inventors: Alexander D. Peleg, Millind Mittal, Larry M. Mennemeier, Benny Eitan, Carole Dulong, Eiichi Kowashi, Wolf Witt
  • Publication number: 20140095571
    Abstract: A processor or other device, such as a programmable and/or massively parallel processor or other device, includes processing elements designed to perform arithmetic operations (possibly but not necessarily including, for example, one or more of addition, multiplication, subtraction, and division) on numerical values of low precision but high dynamic range (“LPHDR arithmetic”). Such a processor or other device may, for example, be implemented on a single chip. Whether or not implemented on a single chip, the number of LPHDR arithmetic elements in the processor or other device in certain embodiments of the present invention significantly exceeds (e.g., by at least 20 more than three times) the number of arithmetic elements, if any, in the processor or other device which are designed to perform high dynamic range arithmetic of traditional precision (such as 32 bit or 64 bit floating point arithmetic).
    Type: Application
    Filed: March 25, 2013
    Publication date: April 3, 2014
    Inventor: Joseph Bates
  • Patent number: 8687008
    Abstract: A latency tolerant system for executing video processing operations. The system includes a host interface for implementing communication between the video processor and a host CPU, a scalar execution unit coupled to the host interface and configured to execute scalar video processing operations, and a vector execution unit coupled to the host interface and configured to execute vector video processing operations. A command FIFO is included for enabling the vector execution unit to operate on a demand driven basis by accessing the memory command FIFO. A memory interface is included for implementing communication between the video processor and a frame buffer memory. A DMA engine is built into the memory interface for implementing DMA transfers between a plurality of different memory locations and for loading the command FIFO with data and instructions for the vector execution unit.
    Type: Grant
    Filed: November 4, 2005
    Date of Patent: April 1, 2014
    Assignee: NVIDIA Corporation
    Inventors: Ashish Karandikar, Shirish Gadre, Stephen D. Lew
  • Patent number: 8626814
    Abstract: A method and apparatus for including in a processor instructions for performing multiply-add operations on packed data. In one embodiment, a processor is coupled to a memory. The memory has stored therein a first packed data and a second packed data. The processor performs operations on data elements in said first packed data and said second packed data to generate a third packed data in response to receiving an instruction. At least two of the data elements in this third packed data storing the result of performing multiply-add operations on data elements in the first and second packed data.
    Type: Grant
    Filed: July 1, 2011
    Date of Patent: January 7, 2014
    Assignee: Intel Corporation
    Inventors: Alexander Peleg, Milland Mittal, Larry M. Mennemeier, Benny Eitan, Carole Dulong, Eiichi Kowashi, Wolf C. Witt
  • Patent number: 8626815
    Abstract: In a matrix multiplication in which each element of the resultant matrix is the dot product of a row of a first matrix and a column of a second matrix, each row and column can be broken into manageable blocks, with each block loaded in turn to compute a smaller dot product, and then the results can be added together to obtain the desired row-column dot product. The earliest results for each dot product are saved for a number of clock cycles equal to the number of portions into which each row or column is divided. The results are then added to provide an element of the resultant matrix. To avoid repeated loading and unloading of the same data, all multiplications involving a particular row-block can be performed upon loading that row-block, with the results cached until other multiplications for the resultant elements that use the cached results are complete.
    Type: Grant
    Filed: March 3, 2009
    Date of Patent: January 7, 2014
    Assignee: Altera Corporation
    Inventor: Martin Langhammer
  • Patent number: 8619312
    Abstract: An image processing apparatus includes: a receiving unit that receives job data of plural pages; plural RIP processors that interpret and expand the job data into raster images; and an allocating unit that allocates the plural pages of the job data to the plural RIP processors for RIP processing, the allocating unit dividing the job data based on a predetermined data size regardless of page breaks, allocating job data that is to be RIP processed to the plural RIP processors, sending job data, corresponding to and after the pages of the data that is to be RIP processed, to the plural RIP processors, and when a head part of the job data that is to be RIP processed allocated by the allocating unit is in the middle of a page, the plural RIP processors RIP processing the job data from the beginning of the next page.
    Type: Grant
    Filed: September 9, 2010
    Date of Patent: December 31, 2013
    Assignee: Fuji Xerox Co., Ltd.
    Inventor: Takuya Mizuguchi
  • Patent number: 8620985
    Abstract: A method is disclosed that includes computing, using at least one uniformly fine-grain data parallel computing unit, a mean-square error regression within a regression clustering algorithm. The mean-square error regression is represented in the form of at least one summation of a vector-vector multiplication. A computer program product and a computer system are also disclosed.
    Type: Grant
    Filed: October 14, 2010
    Date of Patent: December 31, 2013
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Bin Zhang, Ren Wu, Meichun Hsu
  • Patent number: 8615541
    Abstract: The invention set forth herein describes a mechanism for efficiently performing extended precision operations on multi-word source operands. Corresponding data words of the source operands are processed together via each instruction of a cascading sequence of instructions. State information generated when each instruction is processed is stored in condition code flags. The state information is optionally used in the processing of subsequent instructions in the sequence and/or accumulated with previously set state information.
    Type: Grant
    Filed: September 23, 2010
    Date of Patent: December 24, 2013
    Assignee: NVIDIA Corporation
    Inventors: Richard Craig Johnson, John R. Nickolls
  • Patent number: 8589464
    Abstract: An arithmetic logic unit is provided. The arithmetic logic unit preferably includes a minimum of routing delays. An arithmetic logic unit according to the invention preferably receives a plurality of operands from a plurality of operand registers, performs an arithmetic operation on the operands, obtains a result of the arithmetic operation and that transmits the result to a result register. The arithmetic logic unit includes a signal propagation path that includes no greater than two routing paths that connect non-immediately adjacent logic elements.
    Type: Grant
    Filed: May 8, 2008
    Date of Patent: November 19, 2013
    Assignee: Altera Corporation
    Inventor: Paul J. Metzgen
  • Patent number: 8589465
    Abstract: Digital signal processing (“DSP”) block circuitry on an integrated circuit (“IC”) is adapted for use (e.g., in multiple instances of the DSP block circuitry on the IC) for implementing finite-impulse-response (“FIR”) digital filters in systolic form. Each DSP block may include (1) first and second multiplier circuitry and (2) adder circuitry for adding (a) outputs of the multipliers and (b) signals chained in from a first other instance of the DSP block circuitry. Systolic delay circuitry is provided for either the outputs of the first multiplier (upstream from the adder) or at least one of the sets of inputs to the first multiplier. Additional systolic delay circuitry is provided for outputs of the adder, which are chained out to a second other instance of the DSP block circuitry.
    Type: Grant
    Filed: May 8, 2013
    Date of Patent: November 19, 2013
    Assignee: Altera Corporation
    Inventors: Suleyman Sirri Demirsoy, Hyun Yi
  • Publication number: 20130191431
    Abstract: A processor for calculating a convolution of a first input sequence of numbers with a second input sequence of numbers to generate an output sequence is provided. The processor includes multipliers, each multiplying two real numbers to generate an output; multiplexers to direct the numbers in the first and second input sequences or parts of the numbers to the multipliers; and control circuitry to control the multiplexers to direct the first and second input sequences of numbers to the multipliers dependent on whether the numbers are complex or real. An accumulator adds partial products from multiplications performed by the multipliers to calculate the convolution.
    Type: Application
    Filed: January 19, 2012
    Publication date: July 25, 2013
    Inventors: Srinivasan Iyer, Carsten Aagaard Pedersen