Patents by Inventor John R. Nickolls

John R. Nickolls has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 7343472
    Abstract: A processor includes an instruction memory, arithmetic logic unit, finite field arithmetic unit, at least one digital storage device, and an instruction decoder. The instruction memory temporarily stores an instruction that includes at least one of: an operational code, destination information, and source information. The instruction decoder is operably coupled to interpret the instruction to identify the arithmetic logic unit and/or the finite field arithmetic unit to perform the operational code of the corresponding instruction. The instruction decoder then identifies at least one destination location within the digital storage device based on the destination information contained within the corresponding instruction. The instruction decoder then identifies at least one source location within the digital storage device based on the source information of the corresponding instruction.
    Type: Grant
    Filed: June 11, 2003
    Date of Patent: March 11, 2008
    Assignee: Broadcom Corporation
    Inventors: Joshua Porten, Won Kim, Scott D. Johnson, John R. Nickolls
  • Patent number: 7339592
    Abstract: An apparatus and method for simulating a multiported memory using lower port count memories as banks. A portion of memory is allocated for storing data associated with a thread. The portion of memory allocated to a thread may be stored in a single bank or in multiple banks. A collector unit coupled to each bank gathers source operands needed to process a program instruction as the source operands output from one or more banks. The collector unit outputs the source operands to an execution unit when all of the source operands needed to process the program instruction have been gathered.
    Type: Grant
    Filed: July 13, 2004
    Date of Patent: March 4, 2008
    Assignee: NVIDIA Corporation
    Inventors: John Erik Lindholm, Ming Y. Siu, Simon S. Moy, Samuel Liu, John R. Nickolls
  • Patent number: 7313583
    Abstract: A Galois field arithmetic unit includes a Galois field multiplier section and a Galois field adder section. The Galois field multiplier section includes a plurality of Galois field multiplier arrays that perform a Galois field multiplication by multiplying, in accordance with a generating polynomial, a 1st operand and a 2nd operand. The bit size of the 1st and 2nd operands correspond to the bit size of a processor data path, where each of the Galois field multiplier arrays performs a portion of the Galois field multiplication by multiplying, in accordance with a corresponding portion of the generating polynomial, corresponding portions of the 1st and 2nd operands. The bit size of the corresponding portions of the 1st and 2nd operands corresponds to a symbol size of symbols of a coding scheme being implemented by the corresponding processor.
    Type: Grant
    Filed: June 12, 2003
    Date of Patent: December 25, 2007
    Assignee: Broadcom Corporation
    Inventors: Joshua Porten, Won Kim, Scott D. Johnson, John R. Nickolls
  • Patent number: 7027062
    Abstract: A graphics processing unit can queue a large number of texture requests to balance out the variability of texture requests without the need for a large texture request buffer. A dedicated texture request buffer queues the relatively small texture commands and parameters. Additionally, for each queued texture command, an associated set of texture arguments, which are typically much larger than the texture command, are stored in a general purpose register. The texture unit retrieves texture commands from the texture request buffer and then fetches the associated texture arguments from the appropriate general purpose register. The texture arguments may be stored in the general purpose register designated as the destination of the final texture value computed by the texture unit. Because the destination register must be allocated for the final texture value as texture commands are queued, storing the texture arguments in this register does not consume any additional registers.
    Type: Grant
    Filed: February 27, 2004
    Date of Patent: April 11, 2006
    Assignee: NVIDIA Corporation
    Inventors: John Erik Lindholm, John R. Nickolls, Simon S. Moy, Brett W. Coon
  • Patent number: 6976141
    Abstract: A memory management system provides the ability for multiple requesters to access blocks of memory in a pipelined manner. During a first clock, requests for one or more of the memory blocks are received by the system. A determination is made of whether one of the memory blocks is requested by one or more requests. If the same memory block is requested by two or more requests, the system performs a further determination of which of the requests will be provided to the memory block. The determined request is provided to the memory block on the first clock. During a second clock, the data of the determined request is latched to the memory block and a memory access is initiated. If the request is a write request, the data is written to the memory block. If the request is a read request, then the requested data is retrieved and, on a third clock, the data is driven onto a bus, routed to the determined requester, and available to be latched into the requester on the fourth clock.
    Type: Grant
    Filed: November 2, 2001
    Date of Patent: December 13, 2005
    Assignee: Broadcom Corporation
    Inventors: Lawrence J. Madar, III, John R. Nickolls, Ethan Mirsky
  • Patent number: 6959378
    Abstract: A reconfigurable processing system executes instructions and configurations in parallel. Initially, a first instruction loads configurations into configuration registers. The configuration field of a subsequently fetched instruction selects a configuration register. The instruction controls and controls of the configuration in the selected configuration register are decoded and modified as specified by the instruction. The controls provide data operands to the execution units which process the operands and generate results. Scalar data, vector data, or a combination of scalar and vector data can be processed. The processing is controlled by instructions executed in parallel with configurations invoked by configuration fields within the instructions. Vectors are processed using a vector register file which stores vectors. A vector address unit identifies addresses of vector elements in the vector register file to be processed.
    Type: Grant
    Filed: November 2, 2001
    Date of Patent: October 25, 2005
    Assignee: Broadcom Corporation
    Inventors: John R. Nickolls, Scott D. Johnson, Mark Williams, Ethan Mirsky, Kambdur Kirthiranjan, Amrit Raj Pant, Lawrence J. Madar, III
  • Patent number: 6879207
    Abstract: Circuits, methods, and apparatus for using redundant circuitry on integrated circuits in order to increase manufacturing yields. One exemplary embodiment of the present invention provides a circuit configuration wherein functional circuit blocks in a group of circuit blocks are selected by multiplexers. Multiplexers at the input and output of the group of circuit blocks steer input and output signals to and from functional circuit blocks, avoiding circuit blocks found to be defective or nonfunctional. Multiple groups of these circuit blocks may be arranged in series and in parallel. Alternate multiplexer configurations may be used in order to provide a higher level of redundancy. Other embodiments use all functional circuit blocks and sort integrated circuits based on the level of functionality or performance. Other embodiments provide methods of testing integrated circuits having one or more of these circuit configurations.
    Type: Grant
    Filed: December 18, 2003
    Date of Patent: April 12, 2005
    Assignee: NVIDIA Corporation
    Inventor: John R. Nickolls
  • Publication number: 20040078411
    Abstract: A Galois field arithmetic unit includes a Galois field multiplier section and a Galois field adder section. The Galois field multiplier section includes a plurality of Galois field multiplier arrays that perform a Galois field multiplication by multiplying, in accordance with a generating polynomial, a 1st operand and a 2nd operand. The bit size of the 1st and 2nd operands correspond to the bit size of a processor data path, where each of the Galois field multiplier arrays performs a portion of the Galois field multiplication by multiplying, in accordance with a corresponding portion of the generating polynomial, corresponding portions of the 1st and 2nd operands. The bit size of the corresponding portions of the 1st and 2nd operands corresponds to a symbol size of symbols of a coding scheme being implemented by the corresponding processor.
    Type: Application
    Filed: June 12, 2003
    Publication date: April 22, 2004
    Inventors: Joshua Porten, Won Kim, Scott D. Johnson, John R. Nickolls
  • Publication number: 20040078410
    Abstract: A Galois field multiplier array includes a 1st register, a 2nd register, a 3rd register, and a plurality of multiplier cells. The 1st register stores bits of a 1st operand. The 2nd register stores bits of a 2nd operand. The 3rd register stores bits of a generating polynomial that corresponds to one of a plurality of applications (e.g., FEC, CRC, Reed Solomon, et cetera). The plurality of multiplier cells is arranged in rows and columns. Each of the multiplier cells outputs a sum and a product and each cell includes five inputs. The 1st input receives a preceding cell's multiply output, the 2nd input receives at least one bit of the 2nd operand, the 3rd input receives a preceding cell's sum output, a 4th input receives at least one bit of the generating polynomial, and the 5th input receives a feedback term from a preceding cell in a preceding row. The multiplier cells in the 1st row have the 1st input, 3rd input, and 5th input set to corresponding initialization values in accordance with the 2nd operand.
    Type: Application
    Filed: June 12, 2003
    Publication date: April 22, 2004
    Inventors: Joshua Porten, Won Kim, Scott D. Johnson, John R. Nickolls
  • Publication number: 20040078555
    Abstract: A processor includes an instruction memory, arithmetic logic unit, finite field arithmetic unit, at least one digital storage device, and an instruction decoder. The instruction memory temporarily stores an instruction that includes at least one of: an operational code, destination information, and source information. The instruction decoder is operably coupled to interpret the instruction to identify the arithmetic logic unit and/or the finite field arithmetic unit to perform the operational code of the corresponding instruction. The instruction decoder then identifies at least one destination location within the digital storage device based on the destination information contained within the corresponding instruction. The instruction decoder then identifies at least one source location within the digital storage device based on the source information of the corresponding instruction.
    Type: Application
    Filed: June 11, 2003
    Publication date: April 22, 2004
    Inventors: Joshua Porten, Won Kim, Scott D. Johnson, John R. Nickolls
  • Publication number: 20020087846
    Abstract: A reconfigurable processing system executes instructions and configurations in parallel. Initially, a first instruction loads configurations into configuration registers. The configuration field of a subsequently fetched instruction selects a configuration register. The instruction controls and controls of the configuration in the selected configuration register are decoded and modified as specified by the instruction. The controls provide data operands to the execution units which process the operands and generate results. Scalar data, vector data, or a combination of scalar and vector data can be processed. The processing is controlled by instructions executed in parallel with configurations invoked by configuration fields within the instructions. Vectors are processed using a vector register file which stores vectors. A vector address unit identifies addresses of vector elements in the vector register file to be processed.
    Type: Application
    Filed: November 2, 2001
    Publication date: July 4, 2002
    Inventors: John R. Nickolls, Scott D. Johnson, Mark Williams, Ethan Mirsky, Kambdur Kirthiranjan, Amrit Raj Pant, Lawrence J. Madar
  • Publication number: 20020056032
    Abstract: A memory management system provides the ability for multiple requesters to access blocks of memory in a pipelined manner. During a first clock, requests for one or more of the memory blocks are received by the system. A determination is made of whether one of the memory blocks is requested by one or more requests. If the same memory block is requested by two or more requests, the system performs a further determination of which of the requests will be provided to the memory block. The determined request is provided to the memory block on the first clock. During a second clock, the data of the determined request is latched to the memory block and a memory access is initiated. If the request is a write request, the data is written to the memory block. If the request is a read request, then the requested data is retrieved and, on a third clock, the data is driven onto a bus, routed to the determined requester, and available to be latched into the requester on the fourth clock.
    Type: Application
    Filed: November 2, 2001
    Publication date: May 9, 2002
    Inventors: Lawrence J. Madar, John R. Nickolls, Ethan Mirsky
  • Patent number: 5598408
    Abstract: A massively parallel computer system is disclosed having a global router network in which pipeline registers are spatially distributed to increase the messaging speed of the global router network. The global router network includes an expansion tap for processor to I/O messaging so that I/O messaging bandwidth matches interprocessor messaging bandwidth. A route-opening message packet includes protocol bits which are treated homogeneously with steering bits. The route-opening packet further includes redundant address bits for imparting a multiple-crossbars personality to router chips within the global router network. A structure and method for spatially supporting the processors of the massively parallel system and the global router network are also disclosed.
    Type: Grant
    Filed: January 14, 1994
    Date of Patent: January 28, 1997
    Assignee: MasPar Computer Corporation
    Inventors: John R. Nickolls, John Zapisek, Won S. Kim, Jeffrey C. Kalb, W. Thomas Blank, Eliot Wegbreit, Kevin Van Horn
  • Patent number: 5581777
    Abstract: A massively parallel processor is provided with a plurality of clusters. Each cluster includes a plurality of processor elements ("PEs") and a cluster memory. Each PE of the cluster has associated with it an address register, a stage register, an error register, a PE enable flag, a memory flag, and a grant request flag. A cluster data bus and an error bus connects each of the stage registers and error registers of the cluster to the memory. The grant request flags of the cluster are interconnected by a polling network, which polls only one of the grant request flags at a time. In response to a signal on the polling network and the state of the associated memory flag, the grant request flag determines an I/O operation between the associated data register and the cluster memory over the cluster data bus. Both data and error bits are associated with respective processor elements. The sequential memory operations proceed in parallel with parallel processor operations.
    Type: Grant
    Filed: March 3, 1995
    Date of Patent: December 3, 1996
    Assignee: MasPar Computer Corporation
    Inventors: Won S. Kim, David M. Bulfer, John R. Nickolls, W. Thomas Blank, Hannes Figel
  • Patent number: 5542074
    Abstract: A parallel processor system which operates in a single-instruction multiple-data mode has a highly flexible local control capability for enabling the system to operate fast. The system contains an array of processing elements or PEs (12.sub.1 -12.sub.N) that process respective sets of data according to instructions supplied from a global control unit (20). Each instruction is furnished simultaneously to all the PEs. One local control feature (52) entails selectively inverting certain instruction signals according to a data-dependent signal. Another local control feature (48) involves controlling the amount of a bit shift in a barrel shifter (98) according to a data-dependent signal.
    Type: Grant
    Filed: October 22, 1992
    Date of Patent: July 30, 1996
    Assignee: MasPar Computer Corporation
    Inventors: Won S. Kim, John R. Nickolls
  • Patent number: 5488694
    Abstract: To effect a block data transfer between a plurality of physical I/O devices coupled through interfaces to an I/O channel ("IOC") bus, a source logical device is established by programmably assigning to each of the physical device interfaces a logical device identifier, a leaf identifier determining when the physical device participates relative to the first data transfer in the block data transfer, a burst count specifying the number of consecutive transfers for which the physical device is responsible when its interleave period arrives, and an interleave factor identifying how often the physical device participates in the block data transfer. A destination logical device is similarly established. The source and logical devices are then activated to accomplish a block transfer of data between them.
    Type: Grant
    Filed: August 28, 1992
    Date of Patent: January 30, 1996
    Assignee: MasPar Computer Company
    Inventors: Mark P. McKee, John Zapisek, David M. Bulfer, John M. Long, John R. Nickolls, William T. Blank
  • Patent number: 5280474
    Abstract: A massively parallel computer system is disclosed having a global router network in which pipeline registers are spatially distributed to increase the messaging speed of the global router network. The global router network includes an expansion tap for processor to I/O messaging so that I/O messaging bandwidth matches interprocessor messaging bandwidth. A route-opening message packet includes protocol bits which are treated homogeneously with steering bits. The route-opening packet further includes redundant address bits for imparting a multiple-crossbars personality to router chips within the global router network. A structure and method for spatially supporting the processors of the massively parallel system and the global router network are also disclosed.
    Type: Grant
    Filed: January 5, 1990
    Date of Patent: January 18, 1994
    Assignee: Maspar Computer Corporation
    Inventors: John R. Nickolls, John Zapisek, Won S. Kim, Jeffery C. Kalb, W. Thomas Blank, Eliot Wegbreit, Kevin Van Horn
  • Patent number: 5243699
    Abstract: A massively parallel processor includes an array of processor elements (20), of PEs, and a multi-stage router interconnection network (30), which is used both for I/O communications and for PE to PE communications. The I/O system (10) for the massively parallel processor is based on a globally shared addressable I/O RAM buffer memory (50) that has address and data buses (52) to the I/O devices (80, 82) and other address and data buses (42) which are coupled to a router I/O element array (40). The router I/O element array is in turn coupled to the router ports (e.g. S2.sub.-- 0.sub.-- X0) of the second stage (430) of the router interconnection network. The router I/O array provides the corner turn conversion between the massive array of router lines (32) and the relatively few buses (52) to the I/O devices.
    Type: Grant
    Filed: December 6, 1991
    Date of Patent: September 7, 1993
    Assignee: MasPar Computer Corporation
    Inventors: John R. Nickolls, Won S. Kim, John Zapisek, William T. Blank