Patents by Inventor Sameh Asaad
Sameh Asaad has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 10884949Abstract: Embodiments of the invention are directed to a computer-implemented method of memory acceleration. The computer-implemented method includes mapping, by a processor, an array of logic blocks in system memory to an array of logic blocks stored in level 1 (L1) on an accelerator chip, wherein each logic block stores a respective look up table for a function, wherein each function row of a respective look up table stores an output function value and a combination of inputs to the function. The processor determines that a number of instances of request for the output function value from a logic block is less than a first threshold. The processor evicts the function row to a higher level memory.Type: GrantFiled: April 5, 2019Date of Patent: January 5, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Bulent Abali, Sameh Asaad
-
Publication number: 20200320018Abstract: Embodiments of the invention are directed to a computer-implemented method of memory acceleration. The computer-implemented method includes mapping, by a processor, an array of logic blocks in system memory to an array of logic blocks stored in level 1 (L1) on an accelerator chip, wherein each logic block stores a respective look up table for a function, wherein each function row of a respective look up table stores an output function value and a combination of inputs to the function. The processor determines that a number of instances of request for the output function value from a logic block is less than a first threshold. The processor evicts the function row to a higher level memory.Type: ApplicationFiled: April 5, 2019Publication date: October 8, 2020Inventors: Bulent Abali, Sameh Asaad
-
Publication number: 20190266149Abstract: Techniques are provided for data filtering using hardware accelerators. An apparatus comprises a processor, a memory and a plurality of hardware accelerators. The processor is configured to stream data from the memory to a first one of the hardware accelerators and to receive filtered data from a second one of the hardware accelerators. The plurality of hardware accelerators are configured to filter the streamed data utilizing at least one bit vector partitioned across the plurality of hardware accelerators. The hardware accelerators may be field-programmable gate arrays.Type: ApplicationFiled: May 15, 2019Publication date: August 29, 2019Inventors: Sameh Asaad, Robert J. Halstead, Bharat Sukhwani
-
Patent number: 10127275Abstract: Methods and arrangements for mapping a query operation to an accelerator are provided. The method includes receiving, by a processor, a query operation and determining the design logic of the query operation, receiving a configuration of one or more available accelerators and a design logic of each of the one or more available accelerators, and determining if the query operation can be offloaded to one or more of the one or more available accelerators. Based on a determination that the query operation can be offloaded to one or more of the one or more available accelerators, the method also includes creating software structures to interface with a selected accelerator and executing the query operation on the selected accelerator. Based on a determination that the query operation cannot be offloaded to one or more of the one or more available accelerators, the method further includes executing the query operation in software.Type: GrantFiled: July 11, 2014Date of Patent: November 13, 2018Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Sameh Asaad, Parijat Dube, Balakrishna R. Iyer, Hong Min, Bharat Sukhwani, Mathew S. Thoennes
-
Patent number: 9971713Abstract: A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaflop-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC). The ASIC nodes are interconnected by a five dimensional torus network that optimally maximize the throughput of packet communications between nodes and minimize latency. The network implements collective network and a global asynchronous network that provides global barrier and notification functions. Integrated in the node design include a list-based prefetcher. The memory system implements transaction memory, thread level speculation, and multiversioning cache that improves soft error rate at the same time and supports DMA functionality allowing for parallel processing message-passing.Type: GrantFiled: April 30, 2015Date of Patent: May 15, 2018Assignee: GLOBALFOUNDRIES INC.Inventors: Sameh Asaad, Ralph E. Bellofatto, Michael A. Blocksome, Matthias A. Blumrich, Peter Boyle, Jose R. Brunheroto, Dong Chen, Chen-Yong Cher, George L. Chiu, Norman Christ, Paul W. Coteus, Kristan D. Davis, Gabor J. Dozsa, Alexandre E. Eichenberger, Noel A. Eisley, Matthew R. Ellavsky, Kahn C. Evans, Bruce M. Fleischer, Thomas W. Fox, Alan Gara, Mark E. Giampapa, Thomas M. Gooding, Michael K. Gschwind, John A. Gunnels, Shawn A. Hall, Rudolf A. Haring, Philip Heidelberger, Todd A. Inglett, Brant L. Knudson, Gerard V. Kopcsay, Sameer Kumar, Amith R. Mamidala, James A. Marcella, Mark G. Megerian, Douglas R. Miller, Samuel J. Miller, Adam J. Muff, Michael B. Mundy, John K. O'Brien, Kathryn M. O'Brien, Martin Ohmacht, Jeffrey J. Parker, Ruth J. Poole, Joseph D. Ratterman, Valentina Salapura, David L. Satterfield, Robert M. Senger, Burkhard Steinmacher-Burow, William M. Stockdell, Craig B. Stunkel, Krishnan Sugavanam, Yutaka Sugawara, Todd E. Takken, Barry M. Trager, James L. Van Oosten, Charles D. Wait, Robert E. Walkup, Alfred T. Watson, Robert W. Wisniewski, Peng Wu
-
Patent number: 9449134Abstract: A method for dynamically reconfiguring logic circuits on an FPGA includes the steps of: classifying a general function into sets of static functions and modal functions to be implemented on the FPGA; for each of the modal functions, generating a list of one-active actions; devising a circuit topology including at least a subset of look-up tables (LUTs) such that any one of the modal functions can be implemented at a time on the devised circuit topology; for each modal function, associating the devised circuit topology with a controller adapted to load a LUT configuration corresponding to a prescribed one of the one-active actions; implementing a single fixed circuit on the FPGA including devised circuit topologies for each of the modal functions; and updating contents of LUTs corresponding to the LUT configuration in the devised circuit topology when a change in modal function to be implemented on the FPGA is required.Type: GrantFiled: June 25, 2015Date of Patent: September 20, 2016Assignee: International Business Machines CorporationInventors: Roger Moussalli, Sameh Asaad
-
Publication number: 20160012107Abstract: Methods and arrangements for mapping a query operation to an accelerator are provided. The method includes receiving, by a processor, a query operation and determining the design logic of the query operation, receiving a configuration of one or more available accelerators and a design logic of each of the one or more available accelerators, and determining if the query operation can be offloaded to one or more of the one or more available accelerators. Based on a determination that the query operation can be offloaded to one or more of the one or more available accelerators, the method also includes creating software structures to interface with a selected accelerator and executing the query operation on the selected accelerator. Based on a determination that the query operation cannot be offloaded to one or more of the one or more available accelerators, the method further includes executing the query operation in software.Type: ApplicationFiled: July 11, 2014Publication date: January 14, 2016Inventors: Sameh Asaad, Parijat Dube, Balakrishna R. Iyer, Hong Min, Bharat Sukhwani, Mathew S. Thoennes
-
Publication number: 20160011996Abstract: A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaflop-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC). The ASIC nodes are interconnected by a five dimensional torus network that optimally maximize the throughput of packet communications between nodes and minimize latency. The network implements collective network and a global asynchronous network that provides global barrier and notification functions. Integrated in the node design include a list-based prefetcher. The memory system implements transaction memory, thread level speculation, and multiversioning cache that improves soft error rate at the same time and supports DMA functionality allowing for parallel processing message-passing.Type: ApplicationFiled: April 30, 2015Publication date: January 14, 2016Inventors: Sameh Asaad, Ralph E. Bellofatto, Michael A. Blocksome, Matthias A. Blumrich, Peter Boyle, Jose R. Brunheroto, Dong Chen, Chen-Yong Cher, George L. Chiu, Norman Christ, Paul W. Coteus, Kristan D. Davis, Gabor J. Dozsa, Alexandre E. Eichenberger, Noel A. Eisley, Matthew R. Ellavsky, Kahn C. Evans, Bruce M. Fleischer, Thomas W. Fox, Alan Gara, Mark E. Giampapa, Thomas M. Gooding, Michael K. Gschwind, John A. Gunnels, Shawn A. Hall, Rudolf A. Haring, Philip Heidelberger, Todd A. Inglett, Brant L. Knudson, Gerard V. Kopcsay, Sameer Kumar, Amith R. Mamidala, James A. Marcella, Mark G. Megerian, Douglas R. Miller, Samuel J. Miller, Adam J. Muff, Michael B. Mundy, John K. O'Brien, Kathryn M. O'Brien, Martin Ohmacht, Jeffrey J. Parker, Ruth J. Poole, Joseph D. Ratterman, Valentina Salapura, David L. Satterfield, Robert M. Senger, Burkhard Steinmacher-Burow, William M. Stockdell, Craig B. Stunkel, Krishnan Sugavanam, Yutaka Sugawara, Todd E. Takken, Barry M. Trager, James L. Van Oosten, Charles D. Wait, Robert E. Walkup, Alfred T. Watson, Robert W. Wisniewski, Peng Wu
-
Patent number: 9081501Abstract: A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC).Type: GrantFiled: January 10, 2011Date of Patent: July 14, 2015Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Sameh Asaad, Ralph E. Bellofatto, Michael A. Blocksome, Matthias A. Blumrich, Peter Boyle, Jose R. Brunheroto, Dong Chen, Chen-Yong Cher, George L. Chiu, Norman Christ, Paul W. Coteus, Kristan D. Davis, Gabor J. Dozsa, Alexandre E. Eichenberger, Noel A. Eisley, Matthew R. Ellavsky, Kahn C. Evans, Bruce M. Fleischer, Thomas W. Fox, Alan Gara, Mark E. Giampapa, Thomas M. Gooding, Michael K. Gschwind, John A. Gunnels, Shawn A. Hall, Rudolf A. Haring, Philip Heidelberger, Todd A. Inglett, Brant L. Knudson, Gerard V. Kopcsay, Sameer Kumar, Amith R. Mamidala, James A. Marcella, Mark G. Megerian, Douglas R. Miller, Samuel J. Miller, Adam J. Muff, Michael B. Mundy, John K. O'Brien, Kathryn M. O'Brien, Martin Ohmacht, Jeffrey J. Parker, Ruth J. Poole, Joseph D. Ratterman, Valentina Salapura, David L. Satterfield, Robert M. Senger, Brian Smith, Burkhard Steinmacher-Burow, William M. Stockdell, Craig B. Stunkel, Krishnan Sugavanam, Yutaka Sugawara, Todd E. Takken, Barry M. Trager, James L. Van Oosten, Charles D. Wait, Robert E. Walkup, Alfred T. Watson, Robert W. Wisniewski, Peng Wu
-
Patent number: 9002693Abstract: First and second field programmable gate arrays are provided which implement first and second blocks of a circuit design to be simulated. The field programmable gate arrays are operated at a first clock frequency and a wire like link is provided to send a plurality of signals between them. The wire like link includes a serializer, on the first field programmable gate array, to serialize the plurality of signals; a deserializer on the second field programmable gate array, to deserialize the plurality of signals; and a connection between the serializer and the deserializer. The serializer and the deserializer are operated at a second clock frequency, greater than the first clock frequency, and the second clock frequency is selected such that latency of transmission and reception of the plurality of signals is less than the period corresponding to the first clock frequency.Type: GrantFiled: January 2, 2012Date of Patent: April 7, 2015Assignee: International Business Machines CorporationInventors: Sameh Asaad, Mohit Kapur, Benjamin D. Parker
-
Patent number: 8983992Abstract: Methods and arrangements for facilitating accelerations of database functions. A field programmable gate array is incorporated. At least one query control block is incorporated in the field programmable gate array, and database management system operations are accelerated via the field programmable gate array. The accelerating includes employing the at least one query control block to execute a query without reconfiguring the field programmable gate array.Type: GrantFiled: September 14, 2012Date of Patent: March 17, 2015Assignee: International Business Machines CorporationInventors: Sameh Asaad, Bernard V. Brezzo, Donna N Eng Dillenberger, Parijat Dube, Balakrishna Raghavendra Iyer, Hong Min, Bharat Sukhwani, Mathew S. Thoennes
-
Patent number: 8977637Abstract: Methods and arrangements for facilitating accelerations of database functions. A field programmable gate array is incorporated. At least one query control block is incorporated in the field programmable gate array, and database management system operations are accelerated via the field programmable gate array. The accelerating includes employing the at least one query control block to execute a query without reconfiguring the field programmable gate array.Type: GrantFiled: August 30, 2012Date of Patent: March 10, 2015Assignee: International Business Machines CorporationInventors: Sameh Asaad, Bernard V. Brezzo, Donna N Eng Dillenberger, Parijat Dube, Balakrishna Raghavendra Iyer, Hong Min, Bharat Sukhwani, Mathew S. Thoennes
-
Patent number: 8838577Abstract: An apparatus comprises a hardware accelerator coupled to a memory. The hardware accelerator comprises one or more decompression units. The one or more decompression units are reconfigurable. The hardware accelerator may be a field-programmable gate array. The hardware accelerator may also comprise one or more reconfigurable scanner units. The one or more decompression units, in the aggregate, are operative to decompress one or more rows of a database at a bus speed of the coupling between the hardware accelerator and the memory. Two or more decompression units are operative to decompress two or more rows of a database in parallel. The apparatus allows for hardware accelerated row decompression.Type: GrantFiled: July 24, 2012Date of Patent: September 16, 2014Assignee: International Business Machines CorporationInventors: Bharat Sukhwani, Sameh Asaad, Balakrishna Raghavendra Iyer, Hong Min, Mathew S. Thoennes
-
Patent number: 8737233Abstract: Techniques are disclosed for increasing the throughput of a multiplexed electrical bus by exploiting available pipeline stages of a computer or other system. For example, a method for increasing a throughput of an electrical bus that connects at least two devices in a system comprises introducing at least one signal hold stage in a signal-receiving one of the two devices, such that a maximum frequency at which the two devices are operated is not limited by a number of cycles of an operating frequency of the electrical bus needed for a signal to propagate from a signal-transmitting one of the two devices to the signal-receiving one of the two devices. Preferably, the signal hold stage introduced in the signal-receiving one of the two devices is a pipeline stage re-allocated from the signal-transmitting one of the two devices.Type: GrantFiled: September 19, 2011Date of Patent: May 27, 2014Assignee: International Business Machines CorporationInventors: Sameh Asaad, Bernard V. Brezzo, Mohit Kapur
-
Publication number: 20140067845Abstract: Methods and arrangements for facilitating accelerations of database functions. A field programmable gate array is incorporated. At least one query control block is incorporated in the field programmable gate array, and database management system operations are accelerated via the field programmable gate array. The accelerating includes employing the at least one query control block to execute a query without reconfiguring the field programmable gate array.Type: ApplicationFiled: August 30, 2012Publication date: March 6, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Sameh Asaad, Bernard V. Brezzo, Donna N. Eng Dillenberger, Parijat Dube, Balakrishna Raghavendra Iyer, Hong Min, Bharat Sukhwani, Mathew S. Thoennes
-
Publication number: 20140067851Abstract: Methods and arrangements for facilitating accelerations of database functions. A field programmable gate array is incorporated. At least one query control block is incorporated in the field programmable gate array, and database management system operations are accelerated via the field programmable gate array. The accelerating includes employing the at least one query control block to execute a query without reconfiguring the field programmable gate array.Type: ApplicationFiled: September 14, 2012Publication date: March 6, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Sameh Asaad, Bernard V. Brezzo, Donna N. Eng Dillenberger, Parijat Dube, Balakrishna Raghavendra Iyer, Hong Min, Bharat Sukhwani, Mathew S. Thoennes
-
Publication number: 20140032516Abstract: An apparatus comprises a hardware accelerator coupled to a memory. The hardware accelerator comprises one or more decompression units. The one or more decompression units are reconfigurable. The hardware accelerator may be a field-programmable gate array. The hardware accelerator may also comprise one or more reconfigurable scanner units. The one or more decompression units, in the aggregate, are operative to decompress one or more rows of a database at a bus speed of the coupling between the hardware accelerator and the memory. Two or more decompression units are operative to decompress two or more rows of a database in parallel. The apparatus allows for hardware accelerated row decompression.Type: ApplicationFiled: July 24, 2012Publication date: January 30, 2014Applicant: International Business Machines CorporationInventors: Bharat Sukhwani, Sameh Asaad, Balakrishna R. Iyer, Hong Min, Mathew S. Thoennes
-
Publication number: 20140032509Abstract: A method comprises streaming one or more pages of a database to a hardware accelerator, extracting one or more rows from each of the one or more pages of the database, determining whether a given one of the extracted rows is compressed, decompressing the given one of the extracted rows responsive to the determination and outputting the decompressed row. The decompressing step is performed in the hardware accelerator. The hardware accelerator may be a field-programmable gate array. The method allows for hardware accelerated row decompression.Type: ApplicationFiled: August 24, 2012Publication date: January 30, 2014Applicant: International Business Machines CorporationInventors: BHARAT SUKHWANI, SAMEH ASAAD, BALAKRISHNA R. IYER, HONG MIN, MATHEW S. THOENNES
-
Publication number: 20130318107Abstract: Generating a data feed specific parser circuit is provided. An input of a number of bytes of feed data associated with a particular data feed that the data feed specific parser circuit is to process is received. A feed format specification file that describes a data format of the particular data feed is parsed to generate an internal data structure of the feed format specification file. A minimum number of parallel pipeline stages in the data feed specific parser circuit to process the number of bytes of feed data associated with the particular data is determined based on the generated internal data structure of the feed format specification file. Then, a description of the data feed specific parser circuit with the determined number of parallel pipeline stages is generated.Type: ApplicationFiled: May 23, 2012Publication date: November 28, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Sameh Asaad, Roger Moussalli, Bharat Sukhwani
-
Publication number: 20130170525Abstract: First and second field programmable gate arrays are provided which implement first and second blocks of a circuit design to be simulated. The field programmable gate arrays are operated at a first clock frequency and a wire like link is provided to send a plurality of signals between them. The wire like link includes a serializer, on the first field programmable gate array, to serialize the plurality of signals; a deserializer on the second field programmable gate array, to deserialize the plurality of signals; and a connection between the serializer and the deserializer. The serializer and the deserializer are operated at a second clock frequency, greater than the first clock frequency, and the second clock frequency is selected such that latency of transmission and reception of the plurality of signals is less than the period corresponding to the first clock frequency.Type: ApplicationFiled: January 2, 2012Publication date: July 4, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Sameh Asaad, Mohit Kapur, Benjamin D. Parker