Patents Assigned to KALRAY
  • Patent number: 11604646
    Abstract: A method of processing data by a processor, the method comprising the steps of: receiving, by the processor, an instruction including an operator code associated with three register references designating registers configured to contain pairs of multiplication operands, an addition operand, and a result register configured to receive an operator result, the operator code designating an operator configured to compute products of the pairs of multiplication operands and add the products with the addition operand; decoding the instruction by an instruction decoder of the processor, to determine the operator to be executed, and the registers containing the operands to be supplied to the operator and the result of the operator; actuating the operator by an arithmetic circuit of the processor, consuming the operands in the registers designated by the register references; and storing the result of the operator in the designated result register.
    Type: Grant
    Filed: December 29, 2021
    Date of Patent: March 14, 2023
    Assignee: Kalray
    Inventor: Benoit Dupont de Dinechin
  • Patent number: 11550544
    Abstract: A fused multiply-add hardware operator comprising a multiplier receiving two multiplicands as floating-point numbers encoded in a first precision format; an alignment circuit associated with the multiplier configured to convert the result of the multiplication into a first fixed-point number; and an adder configured to add the first fixed-point number and an addition operand. The addition operand is a floating-point number encoded in a second precision format, and the operator comprises an alignment circuit associated with the addition operand, configured to convert the addition operand into a second fixed-point number of reduced dynamic range relative to the dynamic range of the addition operand, having a number of bits equal to the number of bits of the first fixed-point number, extended on both sides by at least the size of the mantissa of the addition operand; the adder configured to add the first and second fixed-point numbers without loss.
    Type: Grant
    Filed: June 25, 2020
    Date of Patent: January 10, 2023
    Assignee: Kalray
    Inventor: Nicolas Brunie
  • Patent number: 11489544
    Abstract: A circuit for generating an N-bit cyclic redundancy code of a k-bit digit d, the code based on a reconfigurable generator polynomial P of degree N, the circuit including a dynamic table comprising a multiplication sub-table storing products resulting from multiplication by the polynomial P of each element definable over k bits, in the order of the scalar values of the k-bit elements; a division sub-table storing quotients resulting from Euclidean division by the polynomial P of each k-bit element shifted by N bits to the left, in the order of the scalar values of the k-bit elements; and a group of first multiplexers, each multiplexer connected to be indexed by a respective cell of the division table to transmit the contents of a corresponding cell of the multiplication table to an output of the dynamic table, of same rank as the respective cell of the division table.
    Type: Grant
    Filed: March 30, 2021
    Date of Patent: November 1, 2022
    Assignee: Kalray
    Inventor: Nicolas Brunie
  • Patent number: 11294627
    Abstract: The disclosure relates to a hardware operator for dot-product computation, comprising a plurality of multipliers each receiving two multiplicands in the form of floating-point numbers encoded in a first precision format; an alignment circuit associated with each multiplier, configured to, based on the exponents of the corresponding multiplicands, convert the result of the multiplication into a respective fixed-point number having a sufficient number of bits to cover the full dynamic range of the multiplication; and a multi-adder configured to add without loss the fixed-point numbers provided by the multipliers, providing a sum in the form of a fixed-point number.
    Type: Grant
    Filed: June 25, 2020
    Date of Patent: April 5, 2022
    Assignee: Kalray
    Inventor: Nicolas Brunie
  • Patent number: 11169808
    Abstract: The disclosure relates to a processor including an N-bit data bus configured to access a memory; a central processing unit CPU connected to the data bus; a coprocessor coupled to the CPU, including a register file with N-bit registers; an instruction processing unit in the CPU, configured to, in response to a load-scatter machine instruction received by the CPU, read accessing a memory address and delegating to the coprocessor the processing of the corresponding N-bit word presented on the data bus; and a register control unit in the coprocessor, configured by the CPU in response to the load-scatter instruction, to divide the word presented on the data bus into K segments and writing the K segments at the same position in K respective registers, the position and the registers being designated by the load-scatter instruction.
    Type: Grant
    Filed: December 20, 2019
    Date of Patent: November 9, 2021
    Assignee: Kalray
    Inventors: Benoit Dupont de Dinechin, Julien Le Maire, Nicolas Brunie
  • Patent number: 11144480
    Abstract: The invention relates to a method for updating a variable shared between multiple processor cores. The following steps are implemented during execution in one of the cores of a local scope atomic read-modify-write instruction (AFA), having a memory address (a1) of the shared variable as a parameter: performing operations of the atomic instruction in a cache line (L(a1)) allocated to the memory address; and locally locking the cache line (LCK) while authorizing access to the shared variable by cores connected to another cache memory of same level during execution of the local scope atomic instruction.
    Type: Grant
    Filed: March 7, 2017
    Date of Patent: October 12, 2021
    Assignee: KALRAY
    Inventors: Benoit Dupont De Dinechin, Marta Rybczynska, Vincent Ray
  • Patent number: 10915488
    Abstract: An inter-processor synchronization method using point-to-point links, comprises the steps of defining a point-to-point synchronization channel between a source processor and a target processor; executing in the source processor a wait command expecting a notification associated with the synchronization channel, wherein the wait command is designed to stop the source processor until the notification is received; executing in the target processor a notification command designed to transmit through the point-to-point link the notification expected by the source processor; executing in the target processor a wait command expecting a notification associated with the synchronization channel, wherein the wait command is designed to stop the target processor until the notification is received; and executing in the source processor a notification command designed to transmit through the point-to-point link the notification expected by the target processor.
    Type: Grant
    Filed: May 19, 2015
    Date of Patent: February 9, 2021
    Assignee: KALRAY
    Inventors: Benoît Dupont De Dinechin, Vincent Ray
  • Patent number: 10484514
    Abstract: The invention relates to a method of processing data frames arriving on a network interface, comprising the following steps implemented in the network interface: storing a set of target positions (tgtPOS), positions in a frame at which are expected at least one parameter characterizing a subframe (ETH_TYPE) and parameters (SRC_IP, DST_IP) characterizing a client-server session; storing an expected value (xpVAL) for the subframe parameter; receiving a current frame and comparing the value (xtVAL) received at the position of the subframe parameter to the expected value; if equal, calculating an index (IDX) from the values received at the positions of the session parameters; and routing the current frame to a processing resource associated with the index.
    Type: Grant
    Filed: November 4, 2015
    Date of Patent: November 19, 2019
    Assignee: KALRAY
    Inventors: Patrice Couvert, Marta Rybczynska, Siméon Marijon, Yann Kalemkarian, Benoît Ganne, Alexandre Blampey
  • Patent number: 10250697
    Abstract: A token bucket flow rate limiter is provided for a data transmission, comprising a token counter configured to be incremented at a rate determining the average flow rate of the transmission; a frequency divider connected to control incrementing of the token counter from a clock, the divider having an integer division factor; and a modulator configured to alternate the division factor between two different integers so as to make the resulting average flow rate tend to a programmed flow rate comprised between two boundary flow rates respectively corresponding to the two integers.
    Type: Grant
    Filed: November 18, 2016
    Date of Patent: April 2, 2019
    Assignee: KALRAY
    Inventors: Duco Van Amstel, Alexandre Blampey, Benoit Dupont De Dinechin
  • Patent number: 10175989
    Abstract: A processor including multiple processing units for processing multiple elementary instructions in parallel, the elementary instructions including one or more syllables, each having a rank in the elementary instruction, and an input circuit configured to receive an instruction bundle including multiple elementary instructions, and to transmit to the processing units all syllables of first rank of the elementary instructions of the instruction bundle before syllables of second rank of the elementary instructions of the instruction bundle, the syllables of same rank being ordered according to the target processing unit of each syllable.
    Type: Grant
    Filed: April 27, 2015
    Date of Patent: January 8, 2019
    Assignee: KALRAY
    Inventors: Renaud Ayrignac, Vincent Ray, Benoît Dupont De Dinechin
  • Patent number: 9898251
    Abstract: The invention relates to a processor comprising, in its instruction set, a bit matrix multiplication instruction (sbmm) having a first double precision operand (A) representing a first matrix to multiply, a second operand (B) explicitly designating any two single precision registers whose joint contents represent a second matrix to multiply, and a destination parameter (C) explicitly designating any two single precision registers for jointly containing a matrix representing the result of the multiplication.
    Type: Grant
    Filed: May 19, 2015
    Date of Patent: February 20, 2018
    Assignee: KALRAY
    Inventors: Benoît Dupont De Dinechin, Marta Rybczynska
  • Patent number: 9851977
    Abstract: A method for executing instructions on a single-program, multiple-data processor system having a fixed number of execution lanes, including: scheduling a primary instruction for execution with a first wave of multiple data; assigning the first wave to a corresponding primary subset of the execution lanes; scheduling a secondary instruction having a second wave of multiple data, such that the second wave fits in lanes that are unused by the primary subset of lanes; assigning the second wave to a corresponding secondary subset of the lanes; fetching the primary and secondary instructions; configuring the execution lanes such that the primary subset is responsive to the primary instruction and the secondary subset is simultaneously responsive to the secondary instruction; and simultaneously executing the primary and secondary instructions in the execution lanes.
    Type: Grant
    Filed: December 6, 2012
    Date of Patent: December 26, 2017
    Assignee: KALRAY
    Inventors: Nicolas Brunie, Sylvain Collange
  • Patent number: 9813348
    Abstract: A system for transmitting concurrent data flows on a network, includes a memory containing data of data flows; a plurality of queues assigned respectively to the data flows, organized to receive the data as atomic transmission units; a flow regulator to poll the queues in sequence and, if the polled queue contains a full transmission unit, transmitting the unit on the network at a nominal flow-rate of the network; a sequencer to poll the queues in a round-robin manner and enable a data request signal when the filling level of the polled queue is below a threshold common to all queues, which threshold is greater than the size of the largest transmission unit; and a direct memory access configured to receive the data request signal and respond thereto by transferring data from the memory to the corresponding queue at a nominal speed of the system, up to the common threshold.
    Type: Grant
    Filed: December 19, 2012
    Date of Patent: November 7, 2017
    Assignee: KALRAY
    Inventors: Yves Durand, Alexandre Blampey
  • Patent number: 9766951
    Abstract: A method for synchronizing multiple processing units, comprises the steps of configuring a synchronization register in a target processing unit so that its content is overwritten only by bits that are set in words written in the synchronization register; assigning a distinct bit position of the synchronization register to each processing unit; and executing a program thread in each processing unit. When the program thread of a current processing unit reaches a synchronization point, the method comprises writing in the synchronization register of the target processing unit a word in which the bit position assigned to the current processing unit is set, and suspending the program thread. When all the bits assigned to the processing units are set in the synchronization register, the suspended program threads are resumed.
    Type: Grant
    Filed: May 26, 2015
    Date of Patent: September 19, 2017
    Assignee: KALRAY
    Inventors: Thomas Champseix, Benoît Dupont De Dinechin, Pierre Guironnet De Massas
  • Publication number: 20170255571
    Abstract: The invention relates to a method for updating a variable shared between multiple processor cores. The following steps are implemented during execution in one of the cores of a local scope atomic read-modify-write instruction (AFA), having a memory address (a1) of the shared variable as a parameter: performing operations of the atomic instruction in a cache line (L(a1)) allocated to the memory address; and locally locking the cache line (LCK) while authorizing access to the shared variable by cores connected to another cache memory of same level during execution of the local scope atomic instruction.
    Type: Application
    Filed: March 7, 2017
    Publication date: September 7, 2017
    Applicant: KALRAY
    Inventors: Benoit DUPONT DE DINECHIN, Marta RYBCZYNSKA, Vincent RAY
  • Publication number: 20170192792
    Abstract: A processor including multiple processing units for processing multiple elementary instructions in parallel, the elementary instructions including one or more syllables, each having a rank in the elementary instruction, and an input circuit configured to receive an instruction bundle including multiple elementary instructions, and to transmit to the processing units all syllables of first rank of the elementary instructions of the instruction bundle before syllables of second rank of the elementary instructions of the instruction bundle, the syllables of same rank being ordered according to the target processing unit of each syllable.
    Type: Application
    Filed: April 27, 2015
    Publication date: July 6, 2017
    Applicant: KALRAY
    Inventors: Renaud AYRIGNAC, Vincent RAY, Benoît DUPONT DE DINECHIN
  • Patent number: 9565122
    Abstract: A credit-based data flow control method between a consumer device and a producer device. The method includes the steps of decrementing a credit counter for each transmission of a sequence of data by the producer device, arresting data transmission when the credit counter reaches zero, sending a credit each time the consumer device has consumed a data sequence and incrementing the credit counter upon receipt of each credit.
    Type: Grant
    Filed: October 9, 2012
    Date of Patent: February 7, 2017
    Assignee: KALRAY
    Inventors: Michel Harrand, Yves Durand, Patrice Couvert, Thomas Champseix, Benoît Dupont De Dinechin
  • Patent number: 9367287
    Abstract: A circuit for calculating the fused sum of an addend and product of two multiplication operands, the addend and multiplication operands being binary floating-point numbers represented in a standardized format as a mantissa and an exponent is provided. The multiplication operands are in a lower precision format than the addend, with q>2p, where p and q are the mantissa size of the multiplication operand and addend precision formats. The circuit includes a p-bit multiplier receiving the mantissas of the multiplication operands; a shift circuit aligning the mantissa of the addend with the product output by the multiplier based on the exponent values of the addend and multiplication operands; and an adder processing q-bit mantissas, receiving the aligned mantissa of the addend and the product, the input lines of the adder corresponding to the product being completed to the right by lines at 0 to form a q-bit mantissa.
    Type: Grant
    Filed: April 19, 2012
    Date of Patent: June 14, 2016
    Assignee: KALRAY
    Inventors: Florent Dupont De Dinechin, Nicolas Brunie, Benoit Dupont De Dinechin
  • Patent number: 9064092
    Abstract: An integrated circuit comprises compute nodes arranged in an array; a torus topology network-on-chip interconnecting the compute nodes; and a network extension unit at each end of each row or column of the array, inserted in a network link between two compute nodes. The extension unit has a normal mode establishing the continuity of the network link between the two corresponding compute nodes, and an extension mode dividing the network link in two independent segments that are accessible from outside the integrated circuit.
    Type: Grant
    Filed: August 10, 2012
    Date of Patent: June 23, 2015
    Assignee: KALRAY
    Inventor: Michel Harrand
  • Publication number: 20140164737
    Abstract: A method for executing instructions on a single-program, multiple-data processor system having a fixed number of execution lanes, including: scheduling a primary instruction for execution with a first wave of multiple data; assigning the first wave to a corresponding primary subset of the execution lanes; scheduling a secondary instruction having a second wave of multiple data, such that the second wave fits in lanes that are unused by the primary subset of lanes; assigning the second wave to a corresponding secondary subset of the lanes; fetching the primary and secondary instructions; configuring the execution lanes such that the primary subset is responsive to the primary instruction and the secondary subset is simultaneously responsive to the secondary instruction; and simultaneously executing the primary and secondary instructions in the execution lanes.
    Type: Application
    Filed: December 6, 2012
    Publication date: June 12, 2014
    Applicant: KALRAY
    Inventors: Sylvain COLLANGE, Nicolas BRUNIE