Patents by Inventor Thomas W. Fox
Thomas W. Fox has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 10963380Abstract: A simultaneous multithread (SMT) processor having a shared dispatch pipeline includes a first circuit that detects a cache miss thread. A second circuit determines a first cache hierarchy level at which the detected cache miss occurred. A third circuit determines a Next To Complete (NTC) group in the thread and a plurality of additional groups (X) in the thread. The additional groups (X) are dynamically configured based on the detected cache miss. A fourth circuit determines whether any groups in the thread are younger than the determined NTC group and the plurality of additional groups (X), and flushes all the determined younger groups from the cache miss thread.Type: GrantFiled: April 2, 2019Date of Patent: March 30, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Gregory W. Alexander, Brian D. Barrick, Thomas W. Fox, Christian Jacobi, Anthony Saporito, Somin Song, Aaron Tsai
-
Publication number: 20190227932Abstract: A simultaneous multithread (SMT) processor having a shared dispatch pipeline includes a first circuit that detects a cache miss thread. A second circuit determines a first cache hierarchy level at which the detected cache miss occurred. A third circuit determines a Next To Complete (NTC) group in the thread and a plurality of additional groups (X) in the thread. The additional groups (X) are dynamically configured based on the detected cache miss. A fourth circuit determines whether any groups in the thread are younger than the determined NTC group and the plurality of additional groups (X), and flushes all the determined younger groups from the cache miss thread.Type: ApplicationFiled: April 2, 2019Publication date: July 25, 2019Inventors: Gregory W. Alexander, Brian D. Barrick, Thomas W. Fox, Christian Jacobi, Anthony Saporito, Somin Song, Aaron Tsai
-
Patent number: 10353817Abstract: A simultaneous multithread (SMT) processor having a shared dispatch pipeline includes a first circuit that detects a cache miss thread. A second circuit determines a first cache hierarchy level at which the detected cache miss occurred. A third circuit determines a Next To Complete (NTC) group in the thread and a plurality of additional groups (X) in the thread. The additional groups (X) are dynamically configured based on the detected cache miss. A fourth circuit determines whether any groups in the thread are younger than the determined NTC group and the plurality of additional groups (X), and flushes all the determined younger groups from the cache miss thread.Type: GrantFiled: March 7, 2017Date of Patent: July 16, 2019Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Gregory W. Alexander, Brian D. Barrick, Thomas W. Fox, Christian Jacobi, Anthony Saporito, Somin Song, Aaron Tsai
-
Publication number: 20180260326Abstract: A simultaneous multithread (SMT) processor having a shared dispatch pipeline includes a first circuit that detects a cache miss thread. A second circuit determines a first cache hierarchy level at which the detected cache miss occurred. A third circuit determines a Next To Complete (NTC) group in the thread and a plurality of additional groups (X) in the thread. The additional groups (X) are dynamically configured based on the detected cache miss. A fourth circuit determines whether any groups in the thread are younger than the determined NTC group and the plurality of additional groups (X), and flushes all the determined younger groups from the cache miss thread.Type: ApplicationFiled: March 7, 2017Publication date: September 13, 2018Inventors: Gregory W. Alexander, Brian D. Barrick, Thomas W. Fox, Christian Jacobi, Anthony Saporito, Somin Song, Aaron Tsai
-
Patent number: 10049061Abstract: Embodiments relate to loading and storing of data. An aspect includes a method for transferring data in an active memory device that includes memory and a processing element. An instruction is fetched and decoded for execution by the processing element. Based on determining that the instruction is a gather instruction, the processing element determines a plurality of source addresses in the memory from which to gather data elements and a destination address in the memory. One or more gathered data elements are transferred from the source addresses to contiguous locations in the memory starting at the destination address. Based on determining that the instruction is a scatter instruction, a source address in the memory from which to read data elements at contiguous locations and one or more destination addresses in the memory to store the data elements at non-contiguous locations are determined, and the data elements are transferred.Type: GrantFiled: November 12, 2012Date of Patent: August 14, 2018Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, James A. Kahle, Jaime H. Moreno, Ravi Nair
-
Patent number: 10007242Abstract: A computer detects a request by a process for access to a shadow control page, wherein the shadow control page allows the process access to one or more devices. The computer assigns the shadow control page and a key to the process associated with the request. The computer detects a request by the process via the assigned shadow control page for creation of a subset of devices from the one or more devices. The computer inputs information detailing an association between the subset of devices and the assigned key into a subset definition table, wherein the subset definition table includes one or more keys and one or more corresponding subsets.Type: GrantFiled: June 11, 2015Date of Patent: June 26, 2018Assignee: International Business Machines CorporationInventors: Thomas W. Fox, Hans M. Jacobson, Ravi Nair, Bryan S. Rosenburg
-
Patent number: 9971713Abstract: A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaflop-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC). The ASIC nodes are interconnected by a five dimensional torus network that optimally maximize the throughput of packet communications between nodes and minimize latency. The network implements collective network and a global asynchronous network that provides global barrier and notification functions. Integrated in the node design include a list-based prefetcher. The memory system implements transaction memory, thread level speculation, and multiversioning cache that improves soft error rate at the same time and supports DMA functionality allowing for parallel processing message-passing.Type: GrantFiled: April 30, 2015Date of Patent: May 15, 2018Assignee: GLOBALFOUNDRIES INC.Inventors: Sameh Asaad, Ralph E. Bellofatto, Michael A. Blocksome, Matthias A. Blumrich, Peter Boyle, Jose R. Brunheroto, Dong Chen, Chen-Yong Cher, George L. Chiu, Norman Christ, Paul W. Coteus, Kristan D. Davis, Gabor J. Dozsa, Alexandre E. Eichenberger, Noel A. Eisley, Matthew R. Ellavsky, Kahn C. Evans, Bruce M. Fleischer, Thomas W. Fox, Alan Gara, Mark E. Giampapa, Thomas M. Gooding, Michael K. Gschwind, John A. Gunnels, Shawn A. Hall, Rudolf A. Haring, Philip Heidelberger, Todd A. Inglett, Brant L. Knudson, Gerard V. Kopcsay, Sameer Kumar, Amith R. Mamidala, James A. Marcella, Mark G. Megerian, Douglas R. Miller, Samuel J. Miller, Adam J. Muff, Michael B. Mundy, John K. O'Brien, Kathryn M. O'Brien, Martin Ohmacht, Jeffrey J. Parker, Ruth J. Poole, Joseph D. Ratterman, Valentina Salapura, David L. Satterfield, Robert M. Senger, Burkhard Steinmacher-Burow, William M. Stockdell, Craig B. Stunkel, Krishnan Sugavanam, Yutaka Sugawara, Todd E. Takken, Barry M. Trager, James L. Van Oosten, Charles D. Wait, Robert E. Walkup, Alfred T. Watson, Robert W. Wisniewski, Peng Wu
-
Patent number: 9928190Abstract: Direct communication of data between processing elements is provided. An aspect includes sending, by a first processing element, data over an inter-processing element chaining bus. The data is destined for another processing element via a data exchange component that is coupled between the first processing element and a second processing element via a communication line disposed between corresponding multiplexors of the first processing element and the second processing element. A further aspect includes determining, by the data exchange component, whether the data has been received at the data exchange element. If so, an indicator is set in a register of the data exchange component and the data is forwarded to the other processing element. Setting the indicator causes the first processing element to stall. If the data has not been received, the other processing element is stalled while the data exchange component awaits receipt of the data.Type: GrantFiled: June 15, 2015Date of Patent: March 27, 2018Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair
-
Patent number: 9910802Abstract: Direct communication of data between processing elements is provided. An aspect includes sending, by a first processing element, data over an inter-processing element chaining bus. The data is destined for another processing element via a data exchange component that is coupled between the first processing element and a second processing element via a communication line disposed between corresponding multiplexors of the first processing element and the second processing element. A further aspect includes determining, by the data exchange component, whether the data has been received at the data exchange element. If so, an indicator is set in a register of the data exchange component and the data is forwarded to the other processing element. Setting the indicator causes the first processing element to stall. If the data has not been received, the other processing element is stalled while the data exchange component awaits receipt of the data.Type: GrantFiled: November 23, 2015Date of Patent: March 6, 2018Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair
-
Patent number: 9841926Abstract: According to one embodiment, a method for traffic prioritization in a memory device includes sending a memory access request including a priority value from a processing element in the memory device to a crossbar interconnect in the memory device. The memory access request is routed through the crossbar interconnect to a memory controller in the memory device associated with the memory access request. The memory access request is received at the memory controller. The priority value of the memory access request is compared to priority values of a plurality of memory access requests stored in a queue of the memory controller to determine a highest priority memory access request. A next memory access request is performed by the memory controller based on the highest priority memory access request.Type: GrantFiled: June 30, 2016Date of Patent: December 12, 2017Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair
-
Patent number: 9632778Abstract: Embodiments relate to packed loading and storing of data. An aspect includes a system for packed loading and storing of distributed data. The system includes memory and a processing element configured to communicate with the memory. The processing element is configured to perform a method including fetching and decoding an instruction for execution by the processing element. A plurality of individually addressable data elements is gathered from non-contiguous locations in the memory which are narrower than a nominal width of register file elements in the processing element based on the instruction. The processing element packs and loads the data elements into register file elements of a register file entry based on the instruction, such that at least two of the data elements gathered from the non-contiguous locations in the memory are packed and loaded into a single register file element of the register file entry.Type: GrantFiled: August 8, 2012Date of Patent: April 25, 2017Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Jaime H. Moreno, Ravi Nair, Daniel A. Prener
-
Gather/scatter of multiple data elements with packed loading/storing into/from a register file entry
Patent number: 9632777Abstract: Embodiments relate to packed loading and storing of data. An aspect includes a method for packed loading and storing of data distributed in a system that includes memory and a processing element. The method includes fetching and decoding an instruction for execution by the processing element. The processing element gathers a plurality of individually addressable data elements from non-contiguous locations in the memory which are narrower than a nominal width of register file elements in the processing element based on the instruction. The data elements are packed and loaded into register file elements of a register file entry by the processing element based on the instruction, such that at least two of the data elements gathered from the non-contiguous locations in the memory are packed and loaded into a single register file element of the register file entry.Type: GrantFiled: August 3, 2012Date of Patent: April 25, 2017Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Jaime H. Moreno, Ravi Nair, Daniel A. Prener -
Patent number: 9594724Abstract: An aspect includes accessing a vector register in a vector register file. The vector register file includes a plurality of vector registers and each vector register includes a plurality of elements. A read command is received at a read port of the vector register file. The read command specifies a vector register address. The vector register address is decoded by an address decoder to determine a selected vector register of the vector register file. An element address is determined for one of the plurality of elements associated with the selected vector register based on a read element counter of the selected vector register. A word is selected in a memory array of the selected vector register as read data based on the element address. The read data is output from the selected vector register based on the decoding of the vector register address by the address decoder.Type: GrantFiled: August 9, 2012Date of Patent: March 14, 2017Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair
-
Patent number: 9582466Abstract: An aspect includes accessing a vector register in a vector register file. The vector register file includes a plurality of vector registers and each vector register includes a plurality of elements. A read command is received at a read port of the vector register file. The read command specifies a vector register address. The vector register address is decoded by an address decoder to determine a selected vector register of the vector register file. An element address is determined for one of the plurality of elements associated with the selected vector register based on a read element counter of the selected vector register. A word is selected in a memory array of the selected vector register as read data based on the element address. The read data is output from the selected vector register based on the decoding of the vector register address by the address decoder.Type: GrantFiled: August 13, 2012Date of Patent: February 28, 2017Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair
-
Patent number: 9575755Abstract: Embodiments relate to vector processing in an active memory device. An aspect includes a method for vector processing in an active memory device that includes memory and a processing element. The method includes decoding, in the processing element, an instruction including a plurality of sub-instructions to execute in parallel. An iteration count to repeat execution of the sub-instructions in parallel is determined. Based on the iteration count, execution of the sub-instructions in parallel is repeated for multiple iterations by the processing element. Multiple locations in the memory are accessed in parallel based on the execution of the sub-instructions.Type: GrantFiled: August 3, 2012Date of Patent: February 21, 2017Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair, Daniel A. Prener
-
Patent number: 9575756Abstract: Embodiments relate to vector processor predication in an active memory device. An aspect includes a system for vector processor predication in an active memory device. The system includes memory in the active memory device and a processing element in the active memory device. The processing element is configured to perform a method including decoding an instruction with a plurality of sub-instructions to execute in parallel. One or more mask bits are accessed from a vector mask register in the processing element. The one or more mask bits are applied by the processing element to predicate operation of a unit in the processing element associated with at least one of the sub-instructions.Type: GrantFiled: August 8, 2012Date of Patent: February 21, 2017Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair
-
Patent number: 9569211Abstract: Embodiments relate to vector processor predication in an active memory device. An aspect includes a method for vector processor predication in an active memory device that includes memory and a processing element. The method includes decoding, in the processing element, an instruction including a plurality of sub-instructions to execute in parallel. One or more mask bits are accessed from a vector mask register in the processing element. The one or more mask bits are applied by the processing element to predicate operation of a unit in the processing element associated with at least one of the sub-instructions.Type: GrantFiled: August 3, 2012Date of Patent: February 14, 2017Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair
-
Patent number: 9535694Abstract: Embodiments relate to vector processing in an active memory device. An aspect includes a system for vector processing in an active memory device. The system includes memory in the active memory device and a processing element in the active memory device. The processing element is configured to perform a method including decoding an instruction with a plurality of sub-instructions to execute in parallel. An iteration count to repeat execution of the sub-instructions in parallel is determined. Execution of the sub-instructions is repeated in parallel for multiple iterations, by the processing element, based on the iteration count. Multiple locations in the memory are accessed in parallel based on the execution of the sub-instructions.Type: GrantFiled: August 8, 2012Date of Patent: January 3, 2017Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair, Daniel A. Prener
-
Publication number: 20160364364Abstract: Direct communication of data between processing elements is provided. An aspect includes sending, by a first processing element, data over an inter-processing element chaining bus. The data is destined for another processing element via a data exchange component that is coupled between the first processing element and a second processing element via a communication line disposed between corresponding multiplexors of the first processing element and the second processing element. A further aspect includes determining, by the data exchange component, whether the data has been received at the data exchange element. If so, an indicator is set in a register of the data exchange component and the data is forwarded to the other processing element. Setting the indicator causes the first processing element to stall. If the data has not been received, the other processing element is stalled while the data exchange component awaits receipt of the data.Type: ApplicationFiled: November 23, 2015Publication date: December 15, 2016Inventors: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair
-
Publication number: 20160364352Abstract: Direct communication of data between processing elements is provided. An aspect includes sending, by a first processing element, data over an inter-processing element chaining bus. The data is destined for another processing element via a data exchange component that is coupled between the first processing element and a second processing element via a communication line disposed between corresponding multiplexors of the first processing element and the second processing element. A further aspect includes determining, by the data exchange component, whether the data has been received at the data exchange element. If so, an indicator is set in a register of the data exchange component and the data is forwarded to the other processing element. Setting the indicator causes the first processing element to stall. If the data has not been received, the other processing element is stalled while the data exchange component awaits receipt of the data.Type: ApplicationFiled: June 15, 2015Publication date: December 15, 2016Inventors: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair