Patents by Inventor John R. Nickolls
John R. Nickolls has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 7343472Abstract: A processor includes an instruction memory, arithmetic logic unit, finite field arithmetic unit, at least one digital storage device, and an instruction decoder. The instruction memory temporarily stores an instruction that includes at least one of: an operational code, destination information, and source information. The instruction decoder is operably coupled to interpret the instruction to identify the arithmetic logic unit and/or the finite field arithmetic unit to perform the operational code of the corresponding instruction. The instruction decoder then identifies at least one destination location within the digital storage device based on the destination information contained within the corresponding instruction. The instruction decoder then identifies at least one source location within the digital storage device based on the source information of the corresponding instruction.Type: GrantFiled: June 11, 2003Date of Patent: March 11, 2008Assignee: Broadcom CorporationInventors: Joshua Porten, Won Kim, Scott D. Johnson, John R. Nickolls
-
Patent number: 7339592Abstract: An apparatus and method for simulating a multiported memory using lower port count memories as banks. A portion of memory is allocated for storing data associated with a thread. The portion of memory allocated to a thread may be stored in a single bank or in multiple banks. A collector unit coupled to each bank gathers source operands needed to process a program instruction as the source operands output from one or more banks. The collector unit outputs the source operands to an execution unit when all of the source operands needed to process the program instruction have been gathered.Type: GrantFiled: July 13, 2004Date of Patent: March 4, 2008Assignee: NVIDIA CorporationInventors: John Erik Lindholm, Ming Y. Siu, Simon S. Moy, Samuel Liu, John R. Nickolls
-
Patent number: 7313583Abstract: A Galois field arithmetic unit includes a Galois field multiplier section and a Galois field adder section. The Galois field multiplier section includes a plurality of Galois field multiplier arrays that perform a Galois field multiplication by multiplying, in accordance with a generating polynomial, a 1st operand and a 2nd operand. The bit size of the 1st and 2nd operands correspond to the bit size of a processor data path, where each of the Galois field multiplier arrays performs a portion of the Galois field multiplication by multiplying, in accordance with a corresponding portion of the generating polynomial, corresponding portions of the 1st and 2nd operands. The bit size of the corresponding portions of the 1st and 2nd operands corresponds to a symbol size of symbols of a coding scheme being implemented by the corresponding processor.Type: GrantFiled: June 12, 2003Date of Patent: December 25, 2007Assignee: Broadcom CorporationInventors: Joshua Porten, Won Kim, Scott D. Johnson, John R. Nickolls
-
Patent number: 7027062Abstract: A graphics processing unit can queue a large number of texture requests to balance out the variability of texture requests without the need for a large texture request buffer. A dedicated texture request buffer queues the relatively small texture commands and parameters. Additionally, for each queued texture command, an associated set of texture arguments, which are typically much larger than the texture command, are stored in a general purpose register. The texture unit retrieves texture commands from the texture request buffer and then fetches the associated texture arguments from the appropriate general purpose register. The texture arguments may be stored in the general purpose register designated as the destination of the final texture value computed by the texture unit. Because the destination register must be allocated for the final texture value as texture commands are queued, storing the texture arguments in this register does not consume any additional registers.Type: GrantFiled: February 27, 2004Date of Patent: April 11, 2006Assignee: NVIDIA CorporationInventors: John Erik Lindholm, John R. Nickolls, Simon S. Moy, Brett W. Coon
-
Patent number: 6976141Abstract: A memory management system provides the ability for multiple requesters to access blocks of memory in a pipelined manner. During a first clock, requests for one or more of the memory blocks are received by the system. A determination is made of whether one of the memory blocks is requested by one or more requests. If the same memory block is requested by two or more requests, the system performs a further determination of which of the requests will be provided to the memory block. The determined request is provided to the memory block on the first clock. During a second clock, the data of the determined request is latched to the memory block and a memory access is initiated. If the request is a write request, the data is written to the memory block. If the request is a read request, then the requested data is retrieved and, on a third clock, the data is driven onto a bus, routed to the determined requester, and available to be latched into the requester on the fourth clock.Type: GrantFiled: November 2, 2001Date of Patent: December 13, 2005Assignee: Broadcom CorporationInventors: Lawrence J. Madar, III, John R. Nickolls, Ethan Mirsky
-
Patent number: 6959378Abstract: A reconfigurable processing system executes instructions and configurations in parallel. Initially, a first instruction loads configurations into configuration registers. The configuration field of a subsequently fetched instruction selects a configuration register. The instruction controls and controls of the configuration in the selected configuration register are decoded and modified as specified by the instruction. The controls provide data operands to the execution units which process the operands and generate results. Scalar data, vector data, or a combination of scalar and vector data can be processed. The processing is controlled by instructions executed in parallel with configurations invoked by configuration fields within the instructions. Vectors are processed using a vector register file which stores vectors. A vector address unit identifies addresses of vector elements in the vector register file to be processed.Type: GrantFiled: November 2, 2001Date of Patent: October 25, 2005Assignee: Broadcom CorporationInventors: John R. Nickolls, Scott D. Johnson, Mark Williams, Ethan Mirsky, Kambdur Kirthiranjan, Amrit Raj Pant, Lawrence J. Madar, III
-
Patent number: 6879207Abstract: Circuits, methods, and apparatus for using redundant circuitry on integrated circuits in order to increase manufacturing yields. One exemplary embodiment of the present invention provides a circuit configuration wherein functional circuit blocks in a group of circuit blocks are selected by multiplexers. Multiplexers at the input and output of the group of circuit blocks steer input and output signals to and from functional circuit blocks, avoiding circuit blocks found to be defective or nonfunctional. Multiple groups of these circuit blocks may be arranged in series and in parallel. Alternate multiplexer configurations may be used in order to provide a higher level of redundancy. Other embodiments use all functional circuit blocks and sort integrated circuits based on the level of functionality or performance. Other embodiments provide methods of testing integrated circuits having one or more of these circuit configurations.Type: GrantFiled: December 18, 2003Date of Patent: April 12, 2005Assignee: NVIDIA CorporationInventor: John R. Nickolls
-
Publication number: 20040078411Abstract: A Galois field arithmetic unit includes a Galois field multiplier section and a Galois field adder section. The Galois field multiplier section includes a plurality of Galois field multiplier arrays that perform a Galois field multiplication by multiplying, in accordance with a generating polynomial, a 1st operand and a 2nd operand. The bit size of the 1st and 2nd operands correspond to the bit size of a processor data path, where each of the Galois field multiplier arrays performs a portion of the Galois field multiplication by multiplying, in accordance with a corresponding portion of the generating polynomial, corresponding portions of the 1st and 2nd operands. The bit size of the corresponding portions of the 1st and 2nd operands corresponds to a symbol size of symbols of a coding scheme being implemented by the corresponding processor.Type: ApplicationFiled: June 12, 2003Publication date: April 22, 2004Inventors: Joshua Porten, Won Kim, Scott D. Johnson, John R. Nickolls
-
Publication number: 20040078410Abstract: A Galois field multiplier array includes a 1st register, a 2nd register, a 3rd register, and a plurality of multiplier cells. The 1st register stores bits of a 1st operand. The 2nd register stores bits of a 2nd operand. The 3rd register stores bits of a generating polynomial that corresponds to one of a plurality of applications (e.g., FEC, CRC, Reed Solomon, et cetera). The plurality of multiplier cells is arranged in rows and columns. Each of the multiplier cells outputs a sum and a product and each cell includes five inputs. The 1st input receives a preceding cell's multiply output, the 2nd input receives at least one bit of the 2nd operand, the 3rd input receives a preceding cell's sum output, a 4th input receives at least one bit of the generating polynomial, and the 5th input receives a feedback term from a preceding cell in a preceding row. The multiplier cells in the 1st row have the 1st input, 3rd input, and 5th input set to corresponding initialization values in accordance with the 2nd operand.Type: ApplicationFiled: June 12, 2003Publication date: April 22, 2004Inventors: Joshua Porten, Won Kim, Scott D. Johnson, John R. Nickolls
-
Publication number: 20040078555Abstract: A processor includes an instruction memory, arithmetic logic unit, finite field arithmetic unit, at least one digital storage device, and an instruction decoder. The instruction memory temporarily stores an instruction that includes at least one of: an operational code, destination information, and source information. The instruction decoder is operably coupled to interpret the instruction to identify the arithmetic logic unit and/or the finite field arithmetic unit to perform the operational code of the corresponding instruction. The instruction decoder then identifies at least one destination location within the digital storage device based on the destination information contained within the corresponding instruction. The instruction decoder then identifies at least one source location within the digital storage device based on the source information of the corresponding instruction.Type: ApplicationFiled: June 11, 2003Publication date: April 22, 2004Inventors: Joshua Porten, Won Kim, Scott D. Johnson, John R. Nickolls
-
Publication number: 20020087846Abstract: A reconfigurable processing system executes instructions and configurations in parallel. Initially, a first instruction loads configurations into configuration registers. The configuration field of a subsequently fetched instruction selects a configuration register. The instruction controls and controls of the configuration in the selected configuration register are decoded and modified as specified by the instruction. The controls provide data operands to the execution units which process the operands and generate results. Scalar data, vector data, or a combination of scalar and vector data can be processed. The processing is controlled by instructions executed in parallel with configurations invoked by configuration fields within the instructions. Vectors are processed using a vector register file which stores vectors. A vector address unit identifies addresses of vector elements in the vector register file to be processed.Type: ApplicationFiled: November 2, 2001Publication date: July 4, 2002Inventors: John R. Nickolls, Scott D. Johnson, Mark Williams, Ethan Mirsky, Kambdur Kirthiranjan, Amrit Raj Pant, Lawrence J. Madar
-
Publication number: 20020056032Abstract: A memory management system provides the ability for multiple requesters to access blocks of memory in a pipelined manner. During a first clock, requests for one or more of the memory blocks are received by the system. A determination is made of whether one of the memory blocks is requested by one or more requests. If the same memory block is requested by two or more requests, the system performs a further determination of which of the requests will be provided to the memory block. The determined request is provided to the memory block on the first clock. During a second clock, the data of the determined request is latched to the memory block and a memory access is initiated. If the request is a write request, the data is written to the memory block. If the request is a read request, then the requested data is retrieved and, on a third clock, the data is driven onto a bus, routed to the determined requester, and available to be latched into the requester on the fourth clock.Type: ApplicationFiled: November 2, 2001Publication date: May 9, 2002Inventors: Lawrence J. Madar, John R. Nickolls, Ethan Mirsky
-
Patent number: 5598408Abstract: A massively parallel computer system is disclosed having a global router network in which pipeline registers are spatially distributed to increase the messaging speed of the global router network. The global router network includes an expansion tap for processor to I/O messaging so that I/O messaging bandwidth matches interprocessor messaging bandwidth. A route-opening message packet includes protocol bits which are treated homogeneously with steering bits. The route-opening packet further includes redundant address bits for imparting a multiple-crossbars personality to router chips within the global router network. A structure and method for spatially supporting the processors of the massively parallel system and the global router network are also disclosed.Type: GrantFiled: January 14, 1994Date of Patent: January 28, 1997Assignee: MasPar Computer CorporationInventors: John R. Nickolls, John Zapisek, Won S. Kim, Jeffrey C. Kalb, W. Thomas Blank, Eliot Wegbreit, Kevin Van Horn
-
Patent number: 5581777Abstract: A massively parallel processor is provided with a plurality of clusters. Each cluster includes a plurality of processor elements ("PEs") and a cluster memory. Each PE of the cluster has associated with it an address register, a stage register, an error register, a PE enable flag, a memory flag, and a grant request flag. A cluster data bus and an error bus connects each of the stage registers and error registers of the cluster to the memory. The grant request flags of the cluster are interconnected by a polling network, which polls only one of the grant request flags at a time. In response to a signal on the polling network and the state of the associated memory flag, the grant request flag determines an I/O operation between the associated data register and the cluster memory over the cluster data bus. Both data and error bits are associated with respective processor elements. The sequential memory operations proceed in parallel with parallel processor operations.Type: GrantFiled: March 3, 1995Date of Patent: December 3, 1996Assignee: MasPar Computer CorporationInventors: Won S. Kim, David M. Bulfer, John R. Nickolls, W. Thomas Blank, Hannes Figel
-
Patent number: 5542074Abstract: A parallel processor system which operates in a single-instruction multiple-data mode has a highly flexible local control capability for enabling the system to operate fast. The system contains an array of processing elements or PEs (12.sub.1 -12.sub.N) that process respective sets of data according to instructions supplied from a global control unit (20). Each instruction is furnished simultaneously to all the PEs. One local control feature (52) entails selectively inverting certain instruction signals according to a data-dependent signal. Another local control feature (48) involves controlling the amount of a bit shift in a barrel shifter (98) according to a data-dependent signal.Type: GrantFiled: October 22, 1992Date of Patent: July 30, 1996Assignee: MasPar Computer CorporationInventors: Won S. Kim, John R. Nickolls
-
Patent number: 5488694Abstract: To effect a block data transfer between a plurality of physical I/O devices coupled through interfaces to an I/O channel ("IOC") bus, a source logical device is established by programmably assigning to each of the physical device interfaces a logical device identifier, a leaf identifier determining when the physical device participates relative to the first data transfer in the block data transfer, a burst count specifying the number of consecutive transfers for which the physical device is responsible when its interleave period arrives, and an interleave factor identifying how often the physical device participates in the block data transfer. A destination logical device is similarly established. The source and logical devices are then activated to accomplish a block transfer of data between them.Type: GrantFiled: August 28, 1992Date of Patent: January 30, 1996Assignee: MasPar Computer CompanyInventors: Mark P. McKee, John Zapisek, David M. Bulfer, John M. Long, John R. Nickolls, William T. Blank
-
Patent number: 5280474Abstract: A massively parallel computer system is disclosed having a global router network in which pipeline registers are spatially distributed to increase the messaging speed of the global router network. The global router network includes an expansion tap for processor to I/O messaging so that I/O messaging bandwidth matches interprocessor messaging bandwidth. A route-opening message packet includes protocol bits which are treated homogeneously with steering bits. The route-opening packet further includes redundant address bits for imparting a multiple-crossbars personality to router chips within the global router network. A structure and method for spatially supporting the processors of the massively parallel system and the global router network are also disclosed.Type: GrantFiled: January 5, 1990Date of Patent: January 18, 1994Assignee: Maspar Computer CorporationInventors: John R. Nickolls, John Zapisek, Won S. Kim, Jeffery C. Kalb, W. Thomas Blank, Eliot Wegbreit, Kevin Van Horn
-
Patent number: 5243699Abstract: A massively parallel processor includes an array of processor elements (20), of PEs, and a multi-stage router interconnection network (30), which is used both for I/O communications and for PE to PE communications. The I/O system (10) for the massively parallel processor is based on a globally shared addressable I/O RAM buffer memory (50) that has address and data buses (52) to the I/O devices (80, 82) and other address and data buses (42) which are coupled to a router I/O element array (40). The router I/O element array is in turn coupled to the router ports (e.g. S2.sub.-- 0.sub.-- X0) of the second stage (430) of the router interconnection network. The router I/O array provides the corner turn conversion between the massive array of router lines (32) and the relatively few buses (52) to the I/O devices.Type: GrantFiled: December 6, 1991Date of Patent: September 7, 1993Assignee: MasPar Computer CorporationInventors: John R. Nickolls, Won S. Kim, John Zapisek, William T. Blank