Multiple Parallel Operations Patents (Class 708/524)
-
Patent number: 12050886Abstract: A neuromorphic device includes a first resistor line having a plurality of first resistors that are serially connected to each other, a second resistor line having a plurality of second resistors that are serially connected to each other, one or more current sources to control a current flowing in each of the first resistor line and the second resistor line to a respective current value, a first capacitor electrically connectable to the first resistor line, and a second capacitor electrically connectable to the second resistor line.Type: GrantFiled: October 21, 2020Date of Patent: July 30, 2024Assignee: Samsung Electronics Co., Ltd.Inventors: Sungmeen Myung, Sangjoon Kim, Seungchul Jung
-
Patent number: 11947462Abstract: Techniques are disclosed relating to cache footprint management. In some embodiments, execution circuitry is configured to perform operations for instructions from multiple threads in parallel. Cache circuitry may store information operated on by threads executed by the execution circuitry. Scheduling circuitry may arbitrate among threads to schedule threads for execution by the execution circuitry. Tracking circuitry may determine one or more performance metrics for the cache circuitry. Control circuitry may, based on the one or more performance metrics meeting a threshold, reduce a limit on a number of threads considered for arbitration by the scheduling circuitry, to control a footprint of information stored by the cache circuitry. Disclosed techniques may advantageously reduce or avoid cache thrashing for certain processor workloads.Type: GrantFiled: March 3, 2022Date of Patent: April 2, 2024Assignee: Apple Inc.Inventors: Yoong Chert Foo, Terence M. Potter, Donald R. DeSota, Benjiman L. Goodman, Aroun Demeure, Cheng Li, Winnie W. Yeung
-
Patent number: 11886534Abstract: The present invention discloses a filtering method and system of parallel computing results, through simultaneously generating the input value of the first valid position fvp of each fragment, and simultaneously computing to obtain the output result corresponding to the input value of each first valid position fvp with the respective first valid position fvp of each fragment, and according to the output result of the first valid position fvp of the first fragment, the parallel computing results are filtered through the manner of selecting the output results of the second to the S-th fragments in sequence, to finally obtain correct parallel computing results. In the present invention, through adopting the manner of parallel filtering, the original serial filtering computation is changed to parallel computation of S fragments, the computing time is only one S-th of the original time, thereby improving the computing efficiency and satisfying the timing requirements of parallel computation.Type: GrantFiled: September 29, 2019Date of Patent: January 30, 2024Assignee: Inspur Electronic Information Industry Co., Ltd.Inventors: Hongzhi Shi, Haiwei Liu, Jian Zhao
-
Patent number: 11782498Abstract: An electronic device includes a coding module that determines whether a parameter of an artificial neural network is an outlier, depending on a value of the parameter and compresses the parameter by truncating a first bit of the parameter when the parameter is a non-outlier and truncating a second bit of the parameter when the parameter is the outlier, and a decoding module that decodes a compressed parameter.Type: GrantFiled: February 17, 2020Date of Patent: October 10, 2023Assignee: University-Industry Cooperation Group of Kyung Hee UniversityInventors: Ik Joon Chang, Ho Nguyen Dong, Minhson Le
-
Patent number: 11620169Abstract: When communicating through shared memory, a producer thread generates a value that is written to a location in a shared memory. The value is read from the shared memory by a consumer thread. The challenge is to ensure that the consumer thread reads the location only after the value is written and is thereby synchronized. When a memory location is written by a producer thread, a flag that is simultaneously stored in the memory location along with the value is toggled. The consumer thread tracks information to determine whether the flag stored in the location indicates whether the producer has written the value to the location. The flag is read and written simultaneously with reading and writing the location in memory, thereby eliminating the need for a memory fence. After all of the consumer threads read the value, the location may be reused to write additional value(s) and simultaneously toggle the flag.Type: GrantFiled: March 13, 2020Date of Patent: April 4, 2023Assignee: NVIDIA CorporationInventor: Vasily Volkov
-
Patent number: 11539812Abstract: Provided are a method and device for generating a notification and a method and device for subscribing a notification message. The method for generating a notification (300) includes: receiving a subscription request (S310); creating a first subscription resource according to the subscription request, the first subscription resource including a plurality of first event notification criteria and a second event notification criterion (S320); receiving a plurality of first events generated according to the plurality of first event notification criteria (S330); determining whether the plurality of first events satisfy the second event notification criterion (S340); and generating a notification in a case where the plurality of first events satisfies the second event notification criterion, the notification indicating a second event (S350).Type: GrantFiled: November 14, 2017Date of Patent: December 27, 2022Assignee: BOE TECHNOLOGY GROUP CO., LTD.Inventors: Zhenpeng Guo, Junjie Zhao
-
Patent number: 11392725Abstract: Provided are a security processor for performing a remainder operation by using a random number and an operating method of the security processor. The security processor includes a random number generator configured to generate a first random number; a modular calculator configured to generate a first random operand based on first data and the first random number and generate output data through a remainder operation on the first random operand, wherein a result value of the remainder operation on the first input data is identical to a result value of the remainder operation on the first random operand.Type: GrantFiled: August 9, 2019Date of Patent: July 19, 2022Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventors: Jae-hyeok Kim, Jong-hoon Shin, Ji-su Kang, Hyun-il Kim, Hye-soo Lee, Hong-mook Choi
-
Patent number: 11275559Abstract: Certain aspects of the present disclosure are directed to methods and apparatus for circular floating point addition. An example method generally includes obtaining a first floating point number represented by a first significand and a first exponent, obtaining a second floating point number represented by a second significand and second exponent, and adding the first floating point number and the second floating point number using a circular accumulator device.Type: GrantFiled: July 2, 2020Date of Patent: March 15, 2022Assignee: Qualcomm IncorproatedInventor: Aaron Douglass Lamb
-
Patent number: 11216250Abstract: A method includes providing a set of one or more computational units implemented in a set of one or more field programmable gate array (FPGA) devices, where the set of one or more computational units is configured to generate a plurality of output values based on one or more input values. The method further includes, for each computational unit of the set of computational units, performing a first calculation in the computational unit using a first number representation, where a first output of the plurality of output values is based on the first calculation, determining a second number representation based on the first output value, and performing a second calculation in the computational unit using the second number representation, where a second output of the plurality of output values is based on the second calculation.Type: GrantFiled: December 6, 2017Date of Patent: January 4, 2022Assignee: Advanced Micro Devices, Inc.Inventors: Nicholas P. Malaya, Elliot H. Mednick
-
Patent number: 11157275Abstract: The present disclosure relates to systems and methods that provide a reconfigurable cryptographic coprocessor. An example system includes an instruction memory configured to provide ARX instructions and mode control instructions. The system also includes an adjustable-width arithmetic logic unit, an adjustable-width rotator, and a coefficient memory. A bit width of the adjustable-width arithmetic logic unit and a bit width of the adjustable-width rotator are adjusted according to the mode control instructions. The coefficient memory is configured to provide variable-width words to the arithmetic logic unit and the rotator. The arithmetic logic unit and the rotator are configured to carry out the ARX instructions on the provided variable-width words. The systems and methods described herein could accelerate various applications, such as deep learning, by assigning one or more of the disclosed reconfigurable coprocessors to work as a central computation unit in a neural network.Type: GrantFiled: July 3, 2018Date of Patent: October 26, 2021Assignees: The Board of Trustees of the University of Illinois, University of Virginia Patent FoundationInventors: Mohamed E Aly, Wen-Mei W. Hwu, Kevin Skadron
-
Patent number: 11145338Abstract: A semiconductor memory device includes a storage, a buffer, and a control logic. The storage stores a first algorithm data. The buffer stores a second algorithm data that is at least partially different from the first algorithm data. The control logic is configured to selectively receive the first algorithm data and the second algorithm data.Type: GrantFiled: April 28, 2020Date of Patent: October 12, 2021Assignee: SK hynix Inc.Inventors: Geonu Kim, Yong Soon Park, Won Sun Park
-
Patent number: 11048661Abstract: A dataflow accelerator including a control/command core, a scratchpad and a coarse grain reconfigurable array (CGRA) according to an exemplary embodiment is disclosed. The scratchpad may include a write controller to transmit data to an input vector port interface and to receive data from the input vector port interface. The CGRA may receive data from the input vector port interface and includes a plurality of interconnects and a plurality of functional units.Type: GrantFiled: April 15, 2019Date of Patent: June 29, 2021Assignee: SIMPLE MACHINES INC.Inventors: Karthikeyan Sankaralingam, Anthony Nowatzki, Vinay Gangadhar, Preyas Shah, Newsha Ardalani
-
Patent number: 10970042Abstract: An integrated circuit with specialized processing blocks is provided. A specialized processing block may be optimized for machine learning algorithms and may include a multiplier data path that feeds an adder data path. The multiplier data path may be decomposed into multiple partial product generators, multiple compressors, and multiple carry-propagate adders of a first precision. Results from the carry-propagate adders may be added using a floating-point adder of the first precision. Results from the floating-point adder may be optionally cast to a second precision that is higher or more accurate than the first precision. The adder data path may include an adder of the second precision that combines the results from the floating-point adder with zero, with a general-purpose input, or with other dot product terms. Operated in this way, the specialized processing block provides a technical improvement of greatly increasing the functional density for implementing machine learning algorithms.Type: GrantFiled: September 27, 2018Date of Patent: April 6, 2021Assignee: Intel CorporationInventors: Martin Langhammer, Dongdong Chen, Kevin Hurd
-
Patent number: 10915319Abstract: An image processor is described. The image processor includes a two dimensional shift register array that couples certain ones of its array locations to support execution of a shift instruction. The shift instruction is to include mask information. The mask information is to specify which of the array locations are to be written to with information being shifted. The two dimensional shift register array includes masking logic circuitry to write the information being shifted into specified ones of the array locations in accordance with the mask information.Type: GrantFiled: May 15, 2017Date of Patent: February 9, 2021Assignee: Google LLCInventor: Albert Meixner
-
Patent number: 10705845Abstract: A processor includes a core to execute an instruction for conversion between an element array and a packed bit array. The core includes logic to identify one or more bit-field lengths to be used by the packed bit array, identify a width of elements of the element array, and simultaneously for elements of the element array and for bit-fields of the packed bit array, convert between the element array and the packed bit array based upon the bit-field length and the width of elements of the element array.Type: GrantFiled: June 18, 2018Date of Patent: July 7, 2020Assignee: Intel IP CorporationInventors: Elmoustapha Ould-Ahmed-Vall, Thomas Willhalm, Robert Valentine
-
Patent number: 10656951Abstract: A processing element is implemented in a stage of a pipeline and configured to execute an instruction. A first array of multiplexers is to provide information associated with the instruction to the processing element in response to the instruction being in a first set of instructions. A second array of multiplexers is to provide information associated with the instruction to the first processing element in response to the instruction being in a second set of instructions. A control unit is to gate at least one of power or a clock signal provided to the first array of multiplexers in response to the instruction being in the second set.Type: GrantFiled: October 20, 2017Date of Patent: May 19, 2020Assignees: ADVANCED MICRO DEVICES, INC., ADVANCED MICRO DEVICES (SHANGHAI) CO., LTD.Inventors: Jiasheng Chen, YunXiao Zou, Bin He, Angel E. Socarras, QingCheng Wang, Wei Yuan, Michael Mantor
-
Patent number: 10649830Abstract: It is determined whether an arithmetic operation function of a device to be inspected is normal or not. A MCU 13 to be inspected acquires a constant to be used for an arithmetic problem from a power source IC 12 on an inspection side. The MCU 13 sequentially selects a plurality of the arithmetic problems and carries out an arithmetic operation using the acquired constant according to the selected arithmetic problem. A monitoring circuit 23 of the power source IC 12 receives the result of the arithmetic operation of the arithmetic problem from the MCU 13. The monitoring circuit 23 compares the received arithmetic operation result with the arithmetic operation result of the arithmetic problem calculated at the side of the monitoring circuit 23. The monitoring circuit 23 determines whether the arithmetic operation function of the MCU 13 works normally or not based on the comparison result.Type: GrantFiled: February 20, 2018Date of Patent: May 12, 2020Assignee: RENESAS ELECTRONICS CORPORATIONInventor: Seiichi Kousokabe
-
Patent number: 10613987Abstract: In some embodiments, a system includes an execution unit, a register file, an operand cache, and a predication control circuit. Operands identified by an instruction may be stored in the operand cache. One or more entries of the operand cache that store the operands may be marked as dirty. The predication control circuit may identify an instruction as having an unresolved predication state. Subsequent to initiating execution of the instruction, the predication control circuit may receive results of the at least one unresolved conditional instruction. In response to the results indicating the instruction has a known-to-execute predication state, the predication control circuit may initiate writing, in the operand cache, results of executing the instruction. In response to the results indicating the instruction has a known-not-to-execute predication state, the predication control circuit may prevent the results from executing the instruction from being written in the operand cache.Type: GrantFiled: September 23, 2016Date of Patent: April 7, 2020Assignee: Apple Inc.Inventors: Andrew M. Havlir, Terence M. Potter
-
Patent number: 10606559Abstract: In an example, an apparatus comprises a plurality of execution units and logic, at least partially including hardware logic, to gate at least one of a multiply unit or an accumulate unit in response to an input of value zero. Other embodiments are also disclosed and claimed.Type: GrantFiled: June 12, 2019Date of Patent: March 31, 2020Assignee: INTEL CORPORATIONInventors: Yaniv Fais, Tomer Bar-On, Jacob Subag, Jeremie Dreyfuss, Lev Faivishevsky, Michael Behar, Amit Bleiweiss, Guy Jacob, Gal Leibovich, Itamar Ben-Ari, Galina Ryvchin, Eyal Yaacoby
-
Patent number: 10530560Abstract: In an embodiment, an integrated circuit (IC) device is disclosed. In the embodiment, the IC device includes an Ethernet frame processor, at least one Ethernet port coupled to the Ethernet frame processor, and a hardware synchronization circuit coupled to the Ethernet frame processor and to the at least one Ethernet port, the hardware synchronization circuit including a controller, a local clock, a media-independent peripheral coupled to the controller, and a media-dependent peripheral coupled to the media-independent peripheral, wherein power can be provided to the hardware synchronization circuit independent of the Ethernet frame processor.Type: GrantFiled: June 20, 2016Date of Patent: January 7, 2020Assignee: NXP B.V.Inventors: Hubertus Gerardus Hendrikus Vermeulen, Nicola Concer
-
Patent number: 10474458Abstract: One embodiment provides for a machine-learning hardware accelerator comprising a compute unit having an adder and a multiplier that are shared between integer data path and a floating-point datapath, the upper bits of input operands to the multiplier to be gated during floating-point operation.Type: GrantFiled: October 18, 2017Date of Patent: November 12, 2019Assignee: Intel CorporationInventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar
-
Patent number: 10416997Abstract: A method and apparatus for including in processor instructions for performing logical-comparison and branch support operations on packed or unpacked data. In one embodiment, instruction decode logic decodes instructions for an execution unit to operate on packed data elements including logical comparisons. A register file including 128-bit packed data registers stores packed single-precision floating point (SPFP) and packed integer data elements. The logical comparisons may include comparison of SPFP data elements and comparison of integer data elements and setting at least one bit to indicate the results. Based on these comparisons, branch support actions are taken. Such branch support actions may include setting the at least one bit, which in turn may be utilized by a branching unit in response to a branch instruction. Alternatively, the branch support actions may include branching to an indicated target code location.Type: GrantFiled: October 18, 2018Date of Patent: September 17, 2019Assignee: Intel CorporationInventors: Rajiv Kapoor, Ronen Zohar, Mark J. Buxton, Zeev Sperber, Koby Gottlieb
-
Patent number: 10157164Abstract: Aspects disclosed herein relate to aggregating functionality of computer machine instructions to generate additional computer machine instructions and including the additional computer machine instructions in an instruction set architecture (ISA). An exemplary method includes selecting at least first and second computer machine instructions from an instruction set, aggregating functionality of the first and second computer machine instructions to generate a third computer machine instruction, and adding the third computer machine instruction to the instruction set.Type: GrantFiled: September 20, 2016Date of Patent: December 18, 2018Assignee: QUALCOMM IncorporatedInventors: Sangyeol Kang, Ovidiu Cristian Miclea, Stephen Michael Verrall
-
Patent number: 10127015Abstract: An instruction to perform a multiply and shift operation is executed. The executing includes multiplying a first value and a second value obtained by the instruction to obtain a product. The product is shifted in a specified direction by a user-defined selected amount to provide a result, and the result is placed in a selected location. The result is to be used in processing within the computing environment.Type: GrantFiled: September 30, 2016Date of Patent: November 13, 2018Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Jonathan D. Bradbury, Steven R. Carlough, Reid T. Copeland, Silvia Melitta Mueller
-
Patent number: 10025557Abstract: An 8×8 binary digital multiplier reduces the height of partial product columns to be no more than 7 bits high. The six 7-bit high middle columns are each input to a (7:3) counter. An ascending triangle compressor operates on the lesser significant bit columns. A descending triangle compressor operates on the greater significant bit columns. The counter and compressor outputs are combined for a final stage of compression, followed by partial product addition.Type: GrantFiled: December 5, 2015Date of Patent: July 17, 2018Assignee: Firefly DSP LLCInventors: Craig Franklin, David Cureton Baker
-
Patent number: 10007518Abstract: The number of registers required is reduced by overlapping scalar and vector registers. This also allows increased compiler flexibility when mixing scalar and vector instructions. Local register read ports are minimized by restricting read access. Dedicated predicate registers reduces requirements for general registers, and allows reduction of critical timing paths by allowing the predicate registers to be placed next to the predicate unit.Type: GrantFiled: July 9, 2014Date of Patent: June 26, 2018Assignee: TEXAS INSTRUMENTS INCORPORATEDInventors: Timothy David Anderson, Duc Quang Bui, Mel Alan Phipps, Todd T. Hahn, Joseph Zbiciak
-
Patent number: 9882739Abstract: A method for disabling or removing a Legacy loop detect circuit to eliminate the circuit erroneously detecting a legacy loop during a IEEE-1394 serial bus initialization. The method includes providing a programmable code to the Legacy loop detect circuit for increasing a reset count to a value greater than three (3) to the Legacy loop circuit thus reducing the probability of an erroneous disconnect of a Beta node connection. This method provides for more robust Beta loop node operation during high frequency bus resets.Type: GrantFiled: August 30, 2016Date of Patent: January 30, 2018Assignee: DAP Holding B.U.Inventor: Richard Mourn
-
Patent number: 9846579Abstract: Techniques are disclosed relating to comparison circuitry. In some embodiments, compare circuitry is configured to generate comparison results for sets of inputs in both one or more integer formats and one or more floating-point formats. In some embodiments, the compare circuitry includes padding circuitry configured to add one or more bits to each of first and second input values to generate first and second padded values. In some embodiments, the compare circuitry also includes integer subtraction circuitry configured to subtract the first padded value from the second padded value to generate a subtraction result. In some embodiments, the compare circuitry includes output logic configured to generate the comparison result based on the subtraction result. In various embodiments, using at least a portion of the same circuitry (e.g., the subtractor) for both integer and floating-point comparisons may reduce processor area.Type: GrantFiled: June 13, 2016Date of Patent: December 19, 2017Assignee: Apple Inc.Inventors: Liang-Kai Wang, Terence M. Potter, Andrew M. Havlir
-
Patent number: 9710318Abstract: An electronic circuit includes a microcontroller processor (410), a peripheral (420) coupled with the processor, an endian circuit (470) coupled with the processor and the peripheral to selectively provide different endianess modes of operation, and a detection circuit (140) to detect a failure to select a given endianess, whereby inadvertent switch of endianess due to faults is avoided. Other circuits, devices, systems, methods of operation and processes of manufacture are also disclosed.Type: GrantFiled: January 22, 2015Date of Patent: July 18, 2017Assignee: Texas Instruments IncorporatedInventors: Yanyang Xiao, Alexandre Pierre Palus, Karl Friedrich Greb, Kevin Patrick Lavery, Paul Krause
-
Patent number: 9678716Abstract: An apparatus comprises processing circuitry for performing an absolute difference operation for generating an absolute difference value in response to the first operand the second operand. The processing circuitry supports variable data element sizes for data elements of the first and second operands and the absolute difference value. Each data element of the absolute difference value represents an absolute difference between corresponding data elements of the first and second operands. The processing circuitry has an adding stage for performing at least one addition to generate at least one intermediate value and an inverting stage for inverting selected bits of each intermediate value. Control circuitry generates control information based on the current data element size and status information generated in the adding stage, to identify the selected bits to be inverted in the inverting stage to convert each intermediate value into a corresponding portion of the absolute difference value.Type: GrantFiled: December 22, 2014Date of Patent: June 13, 2017Assignee: ARM LimitedInventors: Neil Burgess, David Raymond Lutz
-
Patent number: 9602302Abstract: A method for disabling or removing a Legacy loop detect circuit to eliminate the circuit erroneously detecting a legacy loop during a IEEE-1394 serial bus initialization. The method includes providing a programmable code to the Legacy loop detect circuit for increasing a reset count to a value greater than three (3) to the Legacy loop circuit thus reducing the probability of an erroneous disconnect of a Beta node connection. This method provides for more robust Beta loop node operation during high frequency bus resets.Type: GrantFiled: June 17, 2014Date of Patent: March 21, 2017Assignee: DAP Holding B.U.Inventor: Richard Mourn
-
Patent number: 9477628Abstract: A collective communication apparatus and method for parallel computing systems. For example, one embodiment of an apparatus comprises a plurality of processor elements (PEs); collective interconnect logic to dynamically form a virtual collective interconnect (VCI) between the PEs at runtime without global communication among all of the PEs, the VCI defining a logical topology between the PEs in which each PE is directly communicatively coupled to a only a subset of the remaining PEs; and execution logic to execute collective operations across the PEs, wherein one or more of the PEs receive first results from a first portion of the subset of the remaining PEs, perform a portion of the collective operations, and provide second results to a second portion of the subset of the remaining PEs.Type: GrantFiled: September 28, 2013Date of Patent: October 25, 2016Assignee: Intel CorporationInventors: Allan D. Knies, David Pardo Keppel, Dong Hyuk Woo, Joshua B. Fryman
-
Patent number: 9411726Abstract: An embodiment includes a system, comprising a first memory; a plurality of first circuits, wherein each first circuit is coupled to the memory; and includes a second circuit configured to generate a first output value in response to an input value received from the first memory; and an accumulator configured to receive the first output value and generate a second output value; and a controller coupled to the memory and the first circuits, and configured to determine the input values to be transmitted from the memory to the first circuits.Type: GrantFiled: May 14, 2015Date of Patent: August 9, 2016Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventors: Ilia Ovsiannikov, Zhengping Ji, Yibing Michelle Wang, Hongyu Wang
-
Patent number: 9311273Abstract: A segment including a set of blocks necessary to calculate blocks having internal states and blocks having no outputs is extracted by tracing from blocks for use in calculating inputs into the blocks having internal states and from the blocks having no outputs in the reverse direction of dependence. To newly extract segments in which blocks contained in the extracted segments are removed, a set of nodes to be temporarily removed is determined on the basis of parallelism. Segments executable independently of other segments are extracted by tracing from nodes whose child nodes are lost by removal of the nodes in the upstream direction. Segments are divided into upstream segments representing the newly extracted segments and downstream segments representing nodes temporarily removed. Upstream and downstream segments are merged so as to reduce overlapping blocks between segments such that the number of segments is reduced to the number of parallel executions.Type: GrantFiled: July 26, 2013Date of Patent: April 12, 2016Assignee: International Business Machines CorporationInventors: Shuhichi Shimizu, Takeo Yoshizawa
-
Patent number: 9218317Abstract: A segment including a set of blocks necessary to calculate blocks having internal states and blocks having no outputs is extracted by tracing from blocks for use in calculating inputs into the blocks having internal states and from the blocks having no outputs in the reverse direction of dependence. To newly extract segments in which blocks contained in the extracted segments are removed, a set of nodes to be temporarily removed is determined on the basis of parallelism. Segments executable independently of other segments are extracted by tracing from nodes whose child nodes are lost by removal of the nodes in the upstream direction. Segments are divided into upstream segments representing the newly extracted segments and downstream segments representing nodes temporarily removed. Upstream and downstream segments are merged so as to reduce overlapping blocks between segments such that the number of segments is reduced to the number of parallel executions.Type: GrantFiled: August 21, 2013Date of Patent: December 22, 2015Assignee: International Business Machines CorporationInventors: Shuhichi Shimizu, Takeo Yoshizawa
-
Patent number: 9207941Abstract: Systems, methods, and apparatuses for calculating a square of a data value of a first source operand, a square of a data value of a second source operand, and a multiplication of the data of the first and second operands only using one multiplication are described.Type: GrantFiled: March 15, 2013Date of Patent: December 8, 2015Assignee: Intel CorporationInventors: Ilya Albrekht, Elmoustapha Ould-Ahmed-Vall
-
Patent number: 9104510Abstract: Arithmetic units and methods for floating point processing are provided. In exemplary embodiments, data paths to and from multiple multipliers and adders are flexibly combined through crossbars and alignment units to allow a wide range of mathematical operations, including affine and SIMD operations. The micro-architecture for a high-performance flexible vector floating point arithmetic unit is provided, which can perform a single-cycle throughput complex multiply-and-accumulate operation, as well as a Fast Fourier Transform (radix-2 decimation-in-time) Butterfly operation.Type: GrantFiled: April 30, 2010Date of Patent: August 11, 2015Assignee: Audience, Inc.Inventors: Leonardo Rub, Dana Massie, Samuel Dicker
-
Patent number: 8793299Abstract: A method and apparatus for including in a processor instructions for performing multiply-add operations on packed data. In one embodiment, a processor is coupled to a memory. The memory has stored therein a first packed data and a second packed data. The processor performs operations on data elements in said first packed data and said second packed data to generate a third packed data in response to receiving an instruction. At least two of the data elements in this third packed data storing the result of performing multiply-add operations on data elements in the first and second packed data.Type: GrantFiled: March 13, 2013Date of Patent: July 29, 2014Assignee: Intel CorporationInventors: Alexander Peleg, Millind Mittal, Larry M. Mennemeier, Benny Eitan, Carole Dulong, Eiichi Kowashi, Wolf C. Witt
-
Patent number: 8756267Abstract: According to some embodiments, a device is configured to perform a dual multiply-accumulate operation. In one embodiment, the device includes a functional unit configured to calculate, in parallel, a first multiplication product of a first coefficient and a first sample; and a second multiplication product of the first coefficient and a second sample. The first sample is an (n)th sample and the second sample is an (n+2)th sample in a plurality of sequential samples. The functional unit outputs and stores the first multiplication product and the second multiplication product in different storage locations in at least one storage device.Type: GrantFiled: October 31, 2011Date of Patent: June 17, 2014Assignee: Marvell International Ltd.Inventors: Bradley Aldrich, Nigel C. Paver, William T. Maghielse
-
Patent number: 8745119Abstract: A method and apparatus for including in a processor instructions for performing multiply-add operations on packed data. In one embodiment, a processor is coupled to a memory. The memory has stored therein a first packed data and a second packed data. The processor performs operations on data elements in said first packed data and said second packed data to generate a third packed data in response to receiving an instruction. At least two of the data elements in this third packed data storing the result of performing multiply-add operations on data elements in the first and second packed data.Type: GrantFiled: March 13, 2013Date of Patent: June 3, 2014Assignee: Intel CorporationInventors: Alexander Peleg, Millind Mittal, Larry M. Mennemeier, Benny Eitan, Carole Dulong, Eiichi Kowashi, Wolf C. Witt
-
Patent number: 8725787Abstract: A method and apparatus for including in a processor instructions for performing multiply-add operations on packed data. In one embodiment, a processor is coupled to a memory. The memory has stored therein a first packed data and a second packed data. The processor performs operations on data elements in said first packed data and said second packed data to generate a third packed data in response to receiving an instruction. At least two of the data elements in this third packed data storing the result of performing multiply-add operations on data elements in the first and second packed data.Type: GrantFiled: April 26, 2012Date of Patent: May 13, 2014Assignee: Intel CorporationInventors: Alexander D. Peleg, Millind Mittal, Larry M. Mennemeier, Benny Eitan, Carole Dulong, Eiichi Kowashi, Wolf Witt
-
Publication number: 20140095571Abstract: A processor or other device, such as a programmable and/or massively parallel processor or other device, includes processing elements designed to perform arithmetic operations (possibly but not necessarily including, for example, one or more of addition, multiplication, subtraction, and division) on numerical values of low precision but high dynamic range (“LPHDR arithmetic”). Such a processor or other device may, for example, be implemented on a single chip. Whether or not implemented on a single chip, the number of LPHDR arithmetic elements in the processor or other device in certain embodiments of the present invention significantly exceeds (e.g., by at least 20 more than three times) the number of arithmetic elements, if any, in the processor or other device which are designed to perform high dynamic range arithmetic of traditional precision (such as 32 bit or 64 bit floating point arithmetic).Type: ApplicationFiled: March 25, 2013Publication date: April 3, 2014Inventor: Joseph Bates
-
Patent number: 8687008Abstract: A latency tolerant system for executing video processing operations. The system includes a host interface for implementing communication between the video processor and a host CPU, a scalar execution unit coupled to the host interface and configured to execute scalar video processing operations, and a vector execution unit coupled to the host interface and configured to execute vector video processing operations. A command FIFO is included for enabling the vector execution unit to operate on a demand driven basis by accessing the memory command FIFO. A memory interface is included for implementing communication between the video processor and a frame buffer memory. A DMA engine is built into the memory interface for implementing DMA transfers between a plurality of different memory locations and for loading the command FIFO with data and instructions for the vector execution unit.Type: GrantFiled: November 4, 2005Date of Patent: April 1, 2014Assignee: NVIDIA CorporationInventors: Ashish Karandikar, Shirish Gadre, Stephen D. Lew
-
Patent number: 8626814Abstract: A method and apparatus for including in a processor instructions for performing multiply-add operations on packed data. In one embodiment, a processor is coupled to a memory. The memory has stored therein a first packed data and a second packed data. The processor performs operations on data elements in said first packed data and said second packed data to generate a third packed data in response to receiving an instruction. At least two of the data elements in this third packed data storing the result of performing multiply-add operations on data elements in the first and second packed data.Type: GrantFiled: July 1, 2011Date of Patent: January 7, 2014Assignee: Intel CorporationInventors: Alexander Peleg, Milland Mittal, Larry M. Mennemeier, Benny Eitan, Carole Dulong, Eiichi Kowashi, Wolf C. Witt
-
Patent number: 8626815Abstract: In a matrix multiplication in which each element of the resultant matrix is the dot product of a row of a first matrix and a column of a second matrix, each row and column can be broken into manageable blocks, with each block loaded in turn to compute a smaller dot product, and then the results can be added together to obtain the desired row-column dot product. The earliest results for each dot product are saved for a number of clock cycles equal to the number of portions into which each row or column is divided. The results are then added to provide an element of the resultant matrix. To avoid repeated loading and unloading of the same data, all multiplications involving a particular row-block can be performed upon loading that row-block, with the results cached until other multiplications for the resultant elements that use the cached results are complete.Type: GrantFiled: March 3, 2009Date of Patent: January 7, 2014Assignee: Altera CorporationInventor: Martin Langhammer
-
Patent number: 8620985Abstract: A method is disclosed that includes computing, using at least one uniformly fine-grain data parallel computing unit, a mean-square error regression within a regression clustering algorithm. The mean-square error regression is represented in the form of at least one summation of a vector-vector multiplication. A computer program product and a computer system are also disclosed.Type: GrantFiled: October 14, 2010Date of Patent: December 31, 2013Assignee: Hewlett-Packard Development Company, L.P.Inventors: Bin Zhang, Ren Wu, Meichun Hsu
-
Patent number: 8619312Abstract: An image processing apparatus includes: a receiving unit that receives job data of plural pages; plural RIP processors that interpret and expand the job data into raster images; and an allocating unit that allocates the plural pages of the job data to the plural RIP processors for RIP processing, the allocating unit dividing the job data based on a predetermined data size regardless of page breaks, allocating job data that is to be RIP processed to the plural RIP processors, sending job data, corresponding to and after the pages of the data that is to be RIP processed, to the plural RIP processors, and when a head part of the job data that is to be RIP processed allocated by the allocating unit is in the middle of a page, the plural RIP processors RIP processing the job data from the beginning of the next page.Type: GrantFiled: September 9, 2010Date of Patent: December 31, 2013Assignee: Fuji Xerox Co., Ltd.Inventor: Takuya Mizuguchi
-
Patent number: 8615541Abstract: The invention set forth herein describes a mechanism for efficiently performing extended precision operations on multi-word source operands. Corresponding data words of the source operands are processed together via each instruction of a cascading sequence of instructions. State information generated when each instruction is processed is stored in condition code flags. The state information is optionally used in the processing of subsequent instructions in the sequence and/or accumulated with previously set state information.Type: GrantFiled: September 23, 2010Date of Patent: December 24, 2013Assignee: NVIDIA CorporationInventors: Richard Craig Johnson, John R. Nickolls
-
Patent number: 8589465Abstract: Digital signal processing (“DSP”) block circuitry on an integrated circuit (“IC”) is adapted for use (e.g., in multiple instances of the DSP block circuitry on the IC) for implementing finite-impulse-response (“FIR”) digital filters in systolic form. Each DSP block may include (1) first and second multiplier circuitry and (2) adder circuitry for adding (a) outputs of the multipliers and (b) signals chained in from a first other instance of the DSP block circuitry. Systolic delay circuitry is provided for either the outputs of the first multiplier (upstream from the adder) or at least one of the sets of inputs to the first multiplier. Additional systolic delay circuitry is provided for outputs of the adder, which are chained out to a second other instance of the DSP block circuitry.Type: GrantFiled: May 8, 2013Date of Patent: November 19, 2013Assignee: Altera CorporationInventors: Suleyman Sirri Demirsoy, Hyun Yi
-
Patent number: 8589464Abstract: An arithmetic logic unit is provided. The arithmetic logic unit preferably includes a minimum of routing delays. An arithmetic logic unit according to the invention preferably receives a plurality of operands from a plurality of operand registers, performs an arithmetic operation on the operands, obtains a result of the arithmetic operation and that transmits the result to a result register. The arithmetic logic unit includes a signal propagation path that includes no greater than two routing paths that connect non-immediately adjacent logic elements.Type: GrantFiled: May 8, 2008Date of Patent: November 19, 2013Assignee: Altera CorporationInventor: Paul J. Metzgen