Patents by Inventor Georgios Tournavitis

Georgios Tournavitis has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230020571
    Abstract: An apparatus and method are described for distributed and cooperative computation in artificial neural networks. For example, one embodiment of an apparatus comprises: an input/output (I/O) interface; a plurality of processing units communicatively coupled to the I/O interface to receive data for input neurons and synaptic weights associated with each of the input neurons, each of the plurality of processing units to process at least a portion of the data for the input neurons and synaptic weights to generate partial results; and an interconnect communicatively coupling the plurality of processing units, each of the processing units to share the partial results with one or more other processing units over the interconnect, the other processing units using the partial results to generate additional partial results or final results. The processing units may share data including input neurons and weights over the shared input bus.
    Type: Application
    Filed: September 20, 2022
    Publication date: January 19, 2023
    Inventors: Frederico C. PRATAS, Ayose J. FALCON, Marc LUPON, Fernando LATORRE, Pedro LOPEZ, Enric HERRERO ABELLANAS, Georgios TOURNAVITIS
  • Publication number: 20210326405
    Abstract: An apparatus and method are described for distributed and cooperative computation in artificial neural networks. For example, one embodiment of an apparatus comprises: an input/output (I/O) interface; a plurality of processing units communicatively coupled to the I/O interface to receive data for input neurons and synaptic weights associated with each of the input neurons, each of the plurality of processing units to process at least a portion of the data for the input neurons and synaptic weights to generate partial results; and an interconnect communicatively coupling the plurality of processing units, each of the processing units to share the partial results with one or more other processing units over the interconnect, the other processing units using the partial results to generate additional partial results or final results. The processing units may share data including input neurons and weights over the shared input bus.
    Type: Application
    Filed: May 3, 2021
    Publication date: October 21, 2021
    Inventors: Frederico C. PRATAS, Ayose J. FALCON, Marc LUPON, Fernando LATORRE, Pedro LOPEZ, Enric HERRERO ABELLANAS, Georgios TOURNAVITIS
  • Patent number: 10997273
    Abstract: An apparatus and method are described for distributed and cooperative computation in artificial neural networks. For example, one embodiment of an apparatus comprises: an input/output (I/O) interface; a plurality of processing units communicatively coupled to the I/O interface to receive data for input neurons and synaptic weights associated with each of the input neurons, each of the plurality of processing units to process at least a portion of the data for the input neurons and synaptic weights to generate partial results; and an interconnect communicatively coupling the plurality of processing units, each of the processing units to share the partial results with one or more other processing units over the interconnect, the other processing units using the partial results to generate additional partial results or final results. The processing units may share data including input neurons and weights over the shared input bus.
    Type: Grant
    Filed: November 19, 2015
    Date of Patent: May 4, 2021
    Assignee: Intel Corporation
    Inventors: Frederico C. Pratas, Ayose J. Falcon, Marc Lupon, Fernando Latorre, Pedro Lopez, Enric Herrero Abellanas, Georgios Tournavitis
  • Publication number: 20190004916
    Abstract: A combination of hardware and software collect profile data for asynchronous events, at code region granularity. An exemplary embodiment is directed to collecting metrics for prefetching events, which are asynchronous in nature. Instructions that belong to a code region are identified using one of several alternative techniques, causing a profile bit to be set for the instruction, as a marker. Each line of a data block that is prefetched is similarly marked. Events corresponding to the profile data being collected and resulting from instructions within the code region are then identified. Each time that one of the different types of events is identified, a corresponding counter is incremented. Following execution of the instructions within the code region, the profile data accumulated in the counters are collected, and the counters are reset for use with a new code region.
    Type: Application
    Filed: July 3, 2018
    Publication date: January 3, 2019
    Inventors: Raul Martinez, Enric Gibert Codina, Pedro Lopez, Marti Torrents Lapuerta, Polychronis Xekalakis, Georgios Tournavitis, Kyriakos A. Stavrou, Demos Pavlou, Daniel Ortega, Alejandro Martinez Vicente, Pedro Marcuello, Grigorios Magklis, Josep M. Codina, Crispin Gomez Requena, Antonio Gonzalez, Mirem Hyuseinova, Christos Kotselidis, Fernando Latorre, Marc Lupon, Carlos Madriles
  • Patent number: 10157063
    Abstract: A computer-readable storage medium, method and system for optimization-level aware branch prediction is described. A gear level is assigned to a set of application instructions that have been optimized. The gear level is also stored in a register of a branch prediction unit of a processor. Branch prediction is then performed by the processor based upon the gear level.
    Type: Grant
    Filed: September 28, 2012
    Date of Patent: December 18, 2018
    Assignee: Intel Corporation
    Inventors: Polychronis Xekalakis, Pedro Marcuello, Alejandro Vicente Martinez, Christos E. Kotselidis, Grigorios Magklis, Fernando Latorre, Raul Martinez, Josep M. Codina, Enric Gibert Codina, Crispin Gomez Requena, Antonio Gonzelez, Mirem Hyuseinova, Pedro Lopez, Marc Lupon, Carlos Madriles, Daniel Ortega, Demos Pavlou, Kyriakos A. Stavrou, Georgios Tournavitis
  • Patent number: 10061587
    Abstract: A processor includes a front end, a decoder, an allocator, and a retirement unit. The decoder includes logic to identify an end-of-live-range (EOLR) indicator. The EOLR indicator specifies an architectural register and a location in code for which the architectural register is unused. The allocator includes logic to scan for a mapping of the architectural register to a physical register, based upon the EOLR indicator. The allocator also includes logic to generate a request to disassociate the architectural register from the physical register. The retirement unit includes logic to disassociate the architectural register from the physical register.
    Type: Grant
    Filed: September 25, 2014
    Date of Patent: August 28, 2018
    Assignee: Intel Corporation
    Inventors: David Pardo Keppel, Denis M. Khartikov, Fernando LaTorre, Marc Lupon, Grigorios Magklis, Naveen Neelakantam, Georgios Tournavitis, Polychronis Xekalakis
  • Patent number: 10013326
    Abstract: A combination of hardware and software collect profile data for asynchronous events, at code region granularity. An exemplary embodiment is directed to collecting metrics for prefetching events, which are asynchronous in nature. Instructions that belong to a code region are identified using one of several alternative techniques, causing a profile bit to be set for the instruction, as a marker. Each line of a data block that is prefetched is similarly marked. Events corresponding to the profile data being collected and resulting from instructions within the code region are then identified. Each time that one of the different types of events is identified, a corresponding counter is incremented. Following execution of the instructions within the code region, the profile data accumulated in the counters are collected, and the counters are reset for use with a new code region.
    Type: Grant
    Filed: December 29, 2011
    Date of Patent: July 3, 2018
    Assignee: Intel Corporation
    Inventors: Raul Martinez, Enric Gibert Codina, Pedro Lopez, Marti Torrents Lapuerta, Polychronis Xekalakis, Georgios Tournavitis, Kyriakos A. Stavrou, Demos Pavlou, Daniel Ortega, Alejandro Martinez Vicente, Pedro Marcuello, Grigorios Magklis, Josep M. Codina, Crispin Gomez Requena, Antonio Gonzalez, Mirem Hyuseinova, Christos Kotselidis, Fernando Latorre, Marc Lupon, Carlos Madriles
  • Patent number: 9971540
    Abstract: A storage device and method are described for performing convolution operations. For example, one embodiment of an apparatus to perform convolution operations comprises a plurality of processing units to execute convolution operations on input data and partial results; a unified scratchpad memory comprising a plurality of memory banks communicatively coupled to the plurality of processing units through a plurality of read/write ports, each of the plurality of memory banks partitioned to store both the input data and partial results; a control unit to allocate the input data and partial results to the memory banks to ensure a minimum quality of service in accordance with the specified number of read/write ports and the specified convolution operation to be performed.
    Type: Grant
    Filed: September 22, 2015
    Date of Patent: May 15, 2018
    Assignee: INTEL CORPORATION
    Inventors: Enric Herrero Abellanas, Georgios Tournavitis, Frederico C. Pratas, Marc Lupon, Fernando Latorre, Pedro Lopez, Ayose J. Falcon
  • Patent number: 9811341
    Abstract: Disclosed is an apparatus and method to manage instruction cache prefetching from an instruction cache. A processor may comprise: a prefetch engine; a branch prediction engine to predict the outcome of a branch; and dynamic optimizer. The dynamic optimizer may be used to control: identifying common instruction cache misses and inserting a prefetch instruction from the prefetch engine to the instruction cache.
    Type: Grant
    Filed: December 29, 2011
    Date of Patent: November 7, 2017
    Assignee: Intel Corporation
    Inventors: Kyriakos A. Stavrou, Enric Gibert Codina, Josep M. Codina, Crispin Gomez Requena, Antonio Gonzalez, Mirem Hyuseinova, Christos E. Kotselidis, Fernando Latorre, Pedro Lopez, Marc Lupon, Carlos Madriles Gimeno, Grigorios Magklis, Pedro Marcuello, Alejandro Martinez Vicente, Raul Martinez, Daniel Ortega, Demos Pavlou, Georgios Tournavitis, Polychronis Xekalakis
  • Publication number: 20170277658
    Abstract: An apparatus and method are described for distributed and cooperative computation in artificial neural networks. For example, one embodiment of an apparatus comprises: an input/output (I/O) interface; a plurality of processing units communicatively coupled to the I/O interface to receive data for input neurons and synaptic weights associated with each of the input neurons, each of the plurality of processing units to process at least a portion of the data for the input neurons and synaptic weights to generate partial results; and an interconnect communicatively coupling the plurality of processing units, each of the processing units to share the partial results with one or more other processing units over the interconnect, the other processing units using the partial results to generate additional partial results or final results. The processing units may share data including input neurons and weights over the shared input bus.
    Type: Application
    Filed: November 19, 2015
    Publication date: September 28, 2017
    Inventors: Frederico C. PRATAS, Ayose J. FALCON, Marc LUPON, Fernando LATORRE, Pedro LOPEZ, Enric HERRERO ABELLANAS, Georgios TOURNAVITIS
  • Publication number: 20160179434
    Abstract: A storage device and method are described for performing convolution operations. For example, one embodiment of an apparatus to perform convolution operations comprises a plurality of processing units to execute convolution operations on input data and partial results; a unified scratchpad memory comprising a plurality of memory banks communicatively coupled to the plurality of processing units through a plurality of read/write ports, each of the plurality of memory banks partitioned to store both the input data and partial results; a control unit to allocate the input data and partial results to the memory banks to ensure a minimum quality of service in accordance with the specified number of read/write ports and the specified convolution operation to be performed.
    Type: Application
    Filed: September 22, 2015
    Publication date: June 23, 2016
    Inventors: ENRIC HERRERO ABELLANAS, GEORGIOS TOURNAVITIS, FREDERICO C. PRATAS, MARC LUPON, FERNANDO LATORRE, PEDRO LOPEZ, AYOSE J. FALCON
  • Publication number: 20160092222
    Abstract: A processor includes a front end, a decoder, an allocator, and a retirement unit. The decoder includes logic to identify an end-of-live-range (EOLR) indicator. The EOLR indicator specifies an architectural register and a location in code for which the architectural register is unused. The allocator includes logic to scan for a mapping of the architectural register to a physical register, based upon the EOLR indicator. The allocator also includes logic to generate a request to disassociate the architectural register from the physical register. The retirement unit includes logic to disassociate the architectural register from the physical register.
    Type: Application
    Filed: September 25, 2014
    Publication date: March 31, 2016
    Inventors: David Pardo Keppel, Denis M. Khartikov, Fernando LaTorre, Marc Lupon, Grigorios Magklis, Naveen Neelakantam, Georgios Tournavitis, Polychronis Xekalakis
  • Publication number: 20160026912
    Abstract: A processor includes a processor core and a calculation circuit. The processor core includes logic determine a set of weights for use in a convolutional neural network (CNN) calculation and scale up the weights using a scale value. The calculation circuit includes logic to receive the scale value, the set of weights, and a set of input values, wherein each input value and associated weight of a same fixed size. The calculation circuit also includes logic to determine results from convolutional neural network (CNN) calculations based upon the set of weights applied to the set of input values, scale down the results using the scale value, truncate the scaled down results to the fixed size, and communicatively couple the truncated results to an output for a layer of the CNN.
    Type: Application
    Filed: July 22, 2014
    Publication date: January 28, 2016
    Inventors: Ayose J. Falcon, Marc Lupon, Enric Herrero Abellanas, Pedro Lopez, Fernando Latorre, Frederico C. Pratas, Georgios Tournavitis
  • Publication number: 20140156976
    Abstract: Techniques and mechanisms for a processor to determine whether a commit action is to be performed. In an embodiment, a processor performs operations to determine whether a commit instruction is for contingent performance of a commit action. In another embodiment, one or more conditions of processor state are evaluated in response to determining that the commit instruction is for contingent performance of the commit action, where the evaluation is performed to determine whether the commit action indicated by the commit instruction is to be performed.
    Type: Application
    Filed: December 22, 2011
    Publication date: June 5, 2014
    Inventors: Enric Gibert Codina, Josep M. Codina, Fernando Latorre, Pedro Marcuello, Pedro Lopez, Crispin Gomez Requena, Antonio Gonzalez, Mirem Hyuseinova, Christos E. Kotselidis, Marc Lupon, Carlos Madriles Gimeno, Grigorios Magklis, Alejandro Martinez Vicente, Raul Martinez, Daniel Ortega, Demos Pavlou, Kyriakos A. Stavrou, Georgios Tournavitis, Polychronis Xekalakis
  • Publication number: 20140095849
    Abstract: A computer-readable storage medium, method and system for optimization-level aware branch prediction is described. A gear level is assigned to a set of application instructions that have been optimized. The gear level is also stored in a register of a branch prediction unit of a processor. Branch prediction is then performed by the processor based upon the gear level.
    Type: Application
    Filed: September 28, 2012
    Publication date: April 3, 2014
    Inventors: Polychronis Xekalakis, Pedro Marcuello, Alejandro Vicente Martinez, Christos E. Kotselidis, Grigorios Magklis, Fernando Latorre, Raul Martinez, Josep M. Codina, Enric Gibert Codina, Crispin Gomez Requena, Antonio Gonzalez, Mirem Hyuseinova, Pedro Lopez, Marc Lupon, Carlos Madriles, Daniel Ortega, Demos Pavlou, Kyriakos A. Stavrou, Georgios Tournavitis
  • Publication number: 20140019721
    Abstract: Disclosed is an apparatus and method to manage instruction cache prefetching from an instruction cache. A processor may comprise: a prefetch engine; a branch prediction engine to predict the outcome of a branch; and dynamic optimizer. The dynamic optimizer may be used to control: indentifying common instruction cache misses and inserting a prefetch instruction from the prefetch engine to the instruction cache.
    Type: Application
    Filed: December 29, 2011
    Publication date: January 16, 2014
    Inventors: Kyriakos A. Stavrou, Enric Gibert Codina, Josep M. Codina, Crispin Gomez Requena, Antonio Gonzalez, Mirem Hyuseinova, Christos E. Kotselidis, Fernando Latorre, Pedro Lopez, Marc Lupon, Carlos Madriles gimeno, Grigorios Magklis, Pedro Marcuello, Alejandro Martinez Vicente, Raul Martinez, Daniel Ortega, Demos Pavlou, Georgios Tournavitis, Polychronis Xekalakis
  • Publication number: 20130332705
    Abstract: A combination of hardware and software collect profile data for asynchronous events, at code region granularity. An exemplary embodiment is directed to collecting metrics for prefetching events, which are asynchronous in nature. Instructions that belong to a code region are identified using one of several alternative techniques, causing a profile bit to be set for the instruction, as a marker. Each line of a data block that is prefetched is similarly marked. Events corresponding to the profile data being collected and resulting from instructions within the code region are then identified. Each time that one of the different types of events is identified, a corresponding counter is incremented. Following execution of the instructions within the code region, the profile data accumulated in the counters are collected, and the counters are reset for use with a new code region.
    Type: Application
    Filed: December 29, 2011
    Publication date: December 12, 2013
    Inventors: Raul Martinez, Enric Gibert Codina, Pedro Lopez, Marti Torrents Lapuerta, Polychronis Xekalakis, Georgios Tournavitis, Kyriakos A. Stavrou, Demos Pavlou, Daniel Ortega, Alejandro Martinez Vicente, Pedro Marcuello, Grigorios Magklis, Josep M. Codina, Crispin Gomez Requena, Antonio Gonzalez, Mirem Hyuseinova, Christos Kotselidis, Fernando Latorre, Marc Lupon, Carlos Madriles
  • Publication number: 20130326199
    Abstract: Disclosed is an apparatus and method generally related to controlling a multimedia extension control and status register (MXCSR). A processor core may include a floating point unit (FPU) to perform arithmetic functions; and a multimedia extension control register (MXCR) to provide control bits to the FPU. Further an optimizer may be used to select a speculative multimedia extension status register (SPEC_MXSR) from a plurality of SPEC_MXSRs to update a multimedia extension status register (MXSR) based upon an instruction.
    Type: Application
    Filed: December 29, 2011
    Publication date: December 5, 2013
    Inventors: Grigorios Magklis, Josep M. Codina, Craig B. Zilles, Michael Neilly, Sridhar Samudrala, Alejandro Martinez Vicente, Polychronis Xekalakis, F. Jesus Sanchez, Marc Lupon, Georgios Tournavitis, Enric Gibert Codina, Crispin Gomez Requena, Antonio Gonzalez, Mirem Hyuseinova, Christos E. Kotselidis, Fernando Latorre, Pedro Lopez, Carlos Madriles Gimeno, Pedro Marcuello, Raul Martinez, Daniel Ortega, Demos Pavlou, Kyriakos A. Stavrou
  • Publication number: 20130268735
    Abstract: Techniques are described for providing an enhanced cache coherency protocol for a multi-core processor that includes a Speculative Request For Ownership Without Data (SRFOWD) for a portion of cache memory. With a SRFOWD, only an acknowledgement message may be provided as an answer to a requesting core. The contents of the affected cache line are not required to be a part of the answer. The enhanced cache coherency protocol may assure that a valid copy of the current cache line exists in case of misspeculation by the requesting core. Thus, an owner of the current copy of the cache line may maintain a copy of the old contents of the cache line. The old contents of the cache line may be discarded if speculation by the requesting core turns out to be correct. Otherwise, in case of misspeculation by the requesting core, the old contents of the cache line may be set back to a valid state.
    Type: Application
    Filed: December 29, 2011
    Publication date: October 10, 2013
    Inventors: Enric Gibert Codina, Fernando Latorre, Josep M. Codina, Crispin Gomez Requena, Antonio Gonzalez, Meyrem Hyuseinova, Christos E. Kotselidis, Pedro Lopez, Marc Lupon, Carlos Madriles, Grigorios Magklis, Pedro Marcuello, Alejandro Martinez Vicente, Raul Martinez, Daniel Ortega, Demos Pavlou, Kyriakos A. Stavrou, Georgios Tournavitis, Polychronis Xekalakis