Method and system for efficiently loading primitives into processors of a graphics system
A method and system for more efficiently loading a plurality of primitives for a scene into processors of a computer graphics system is disclosed. Each primitive has a top and a bottom. The primitives are ordered based on the top of each primitive. The system and method include providing at least one input, a merge circuit, a distributor, a feedback circuit and a controller. The input(s) is for receiving data relating to each primitive. The merge circuit is coupled with the input(s) and adds the data for a primitive having a top not lower than a current line. The distributor is coupled with the feedback circuit, eliminates an expired primitive and outputs the data for remaining primitives after the expired primitive has been removed. The expired primitive has a bottom above the current line. The feedback circuit is coupled to the merge circuit and the distributor and re-inputs to the merge circuit the data for the remaining primitives. The controller controls the feedback circuit, the distributor and the merge circuit.
The present invention relates to computer graphics system, and more particularly to a method and system for more efficiently loading primitives into processors for a computer graphics system.
BACKGROUND OF THE INVENTIONA conventional computer graphics system can display graphical images of objects on a display. The display includes a plurality of display elements, known as pixels, typically arranged in a grid. In order to display objects, the conventional computer graphics system typically breaks each object into a plurality of polygons, termed primitives. A conventional system then renders the primitives in a particular order.
Some computer graphics systems are capable of rendering the primitives in raster, order. Such as system is described in U.S. Pat. No. ______, entitled “______” and assigned to the assignee of the present application. In such a system, all of the primitives intersecting a particular pixel are rendered for that pixel. The primitives intersecting a next pixel in the line are then rendered. Typically, this process proceeds from left to right in the line until the line has been rendered, then recommences on the next line. The frame is rendered line by line, until the frame has been completed.
In order to render the frame, the primitives are loaded into processors. Typically, all of the primitives starting at a particular line are loaded into the processors at the start of the line. After the line has completed processing, primitives which have expired are ejected. An expired primitive is one which can not be present on the next line. In other words, an expired primitive has a bottom that is no lower than the line that was just processed. Any new primitives for the next line are loaded at the start of the next line. The line is then processed as described above. This procedure continues until the frame is rendered.
Although the system is capable of rendering primitives in raster order, one of ordinary skill in the art will readily recognize that the processes of loading primitives and ejecting expired primitives each consume time and resources. In addition, in a complex scene, many primitives might expire at the end of a particular line and a large number of primitives might start at the next line. Ejecting the expired primitives and loading the new primitives might cause a significant delay in the pipeline. Furthermore, the primitives are all loaded into and processed by the processors. Thus, the number of primitives capable of being processed at a particular pixel is limited by the number of processors in the system. Typically, the number of processors is on the order of sixteen or thirty two. As a result, the number of primitives that overlap at a particular pixel and that can be processed is limited to sixteen or thirty two. The complexity of the frame is thereby limited. This limitation can be improved by increasing the number of processors. However, increasing the number of processors increases the space consumed by the graphics system, which is undesirable.
Accordingly, what is needed is a system and method for more efficiently loading primitives into the processors. The present invention addresses such a need.
SUMMARY OF THE INVENTIONThe present invention provides a method and system for more efficiently loading a plurality of primitives for a scene into a plurality processors of a computer graphics system. Each of the plurality of primitives has a top and a bottom. The plurality of primitives is ordered based on the top of each of the plurality of primitives. The system and method comprise providing at least one input, a merge circuit, a distributor, a feedback circuit and a controller. The at least one input is for receiving data relating to each of the plurality of primitives. The merge circuit is coupled with the input and is for adding the data for a primitive having a top that is not lower than a current line. The distributor is coupled with the feedback circuit. The distributor eliminates an expired primitive and outputs the data for a remaining portion of the primitives after the expired primitive has been removed. The expired primitive has a bottom that is above the current line. The feedback circuit is coupled to the merge circuit and the distributor and re-inputs to the merge circuit the data for the remaining portion of the plurality of primitives. The controller controls the feedback circuit, the distributor and the merge circuit.
According to the system and method disclosed herein, the present invention provides a more efficient mechanism for loading primitives.
BRIEF DESCRIPTION OF THE DRAWINGS
The present invention relates to an improvement in computer graphics system. The following description is presented to enable one of ordinary skill in the art to make and use the invention and is provided in the context of a patent application and its requirements. Various modifications to the preferred embodiment will be readily apparent to those skilled in the art and the generic principles herein may be applied to other embodiments. Thus, the present invention is not intended to be limited to the embodiment shown, but is to be accorded the widest scope consistent with the principles and features described herein.
Referring to
Although the method and system shown in
The present invention provides a method and system for more efficiently loading a plurality of primitives for a scene into a plurality processors of a computer graphics system. Each of the plurality of primitives has a top and a bottom. The plurality of primitives is ordered based on the top of each of the plurality of primitives. The system and method comprise providing at least one input, a merge circuit, a distributor, a feedback circuit and a controller. The at least one input is for receiving data relating to each of the plurality of primitives. The merge circuit is coupled with the input and is for adding the data for a primitive having a top that is not lower than a current line. The distributor is coupled with the feedback circuit. The distributor eliminates an expired primitive and outputs the data for a remaining portion of the primitives after the expired primitive has been removed. The expired primitive has a bottom that is above the current line. The feedback circuit is coupled to the merge circuit and the distributor and re-inputs to the merge circuit the data for the remaining portion of the plurality of primitives. The controller controls the feedback circuit, the distributor and the merge circuit.
The present invention will be described in terms of a particular computer'system, a particular computer graphics system and a particular set of processors. However, one of ordinary skill in the art will readily recognize that this method and system will operate effectively for other computer system, other computer graphics systems, and other numbers of processors.
To more particularly illustrate the method and system in accordance with the present invention, refer now to
The y-loop merge 140 is used to merge data for new primitives with data for primitives that have been fed back through the feedback FIFO 170, as discussed below. The y-loop merge includes a compare block 142 and a merge block 144. The primitives input to the y-loop merge 140 are ordered based on y-values, or height in the frame. Thus, the primitives input to the y-loop merge 140 are preferably ordered based on the position of their top, shown as y-top 132 in
The distributor 150 is coupled to and receives data from the y-loop merge 140. The data received includes the y-bot 134, the index 136, the primitive type 137 and the top-bot-is left 138. The distributor 150 includes a compare block 152 and a distribute block 154. The distributor 150 evicts primitives that have expired and distributes data for primitives that have not expired. To do so, the distributor 150 uses the compare block 152 to compare the bottom of each primitive with the current line and provides an evict signal 156 to the distribute block 154. The evict signal 156 indicates whether to evict a particular primitive. If the bottom of the primitive, the y-bot 134, is less than the current line then the primitive will be evicted.
The distribute block 154 provides data for primitives that are not evicted to two components. First, the distribute block 154 outputs the index 136 to the processors 120 (not shown in
The distribute block 154 also feeds back data to the feedback FIFO 170, as well as providing the data to the control 160 through lines 182. The feedback FIFO 170 is thus coupled both to the output of the distributor 150 and to the input of the y-loop merge 140. Because it preserves the order of the data that was provided to it, the feedback FIFO 170 will retain the ordering of the primitives, from top to bottom. In addition, the feedback FIFO 170 will feed data for primitives which have not expired back to the input of the y-loop 130.
It is determined whether any new primitives start on the current line using the y-loop merge 140, via step 202. Preferably step 202 is performed by determining whether the top of the primitive, as determined by the y-top 132, is less than or equal to the current line 168. If any new primitives commence on the current line, then they are merged with data for certain previous primitives using the merge block 144, via step 204. Using the distributor 150, it is determined whether any of the primitives have expired, via step 206. Step 206 is preferably performed by determining whether the bottom of the primitive, as determined by the y-bot 134, is less than the current line 168. If so, then the expired primitives are ejected, via step 208. If no primitives have expired or once the expired primitives have been ejected, the remaining primitives are output to the processors 120 and fed back to the feedback FIFO 170, via step 210. The method 200 may then repeat until the frame has been rendered. Thus, primitives which will contribute to the frame for a particular number lines will be looped through the feedback FIFO 170 the particular number times. The primitive need not be reloaded into the y-loop 130.
Using the y-loop 130 and the method 200, primitives can be continuously loaded and ejected. As a result, any delays at the end of a line due to ejecting and loading of primitives can be reduced or eliminated. Thus, loading of primitives to the processors 120 in the graphics system 100 can be made more efficient. Furthermore, because the feedback FIFO 170 can hold data for a large number of primitives, the y-loop 130 can be used with a large number of virtual (or actual) processors. This feature allows more primitives to overlap a single pixel. Consequently, limitations in the complexity of the scene are reduced.
The method 220 commences by setting the current line value 168 to the top of the frame, and setting the read and write addresses for the feedback FIFO 170 to the start of the feedback FIFO 170, via step 222. Step 222 is performed once per frame. It is determined whether a new primitive in a FIFO (not shown) connected to the y-loop merge 140 is ready, via step 224. The FIFO holds the primitives to be rendered in order from lowest to highest y-top 132. If a new primitive is ready, then for the new primitive, it is determined whether the y-bot 134 is less than the current line 168, via step 230. If so, then because the primitive actually ends above the current line, then the primitive is rejected, via step 228. The method 220 would then return to step 224. If the y-bot 134 is not less than the current line 168, then using the compare block 142 it is determined whether y-top 132 is less than or equal to the current line 168, via step 236. If so, then the new primitive starts at least at the current line, so the primitive is read into the y-loop merge 140 from the FIFO which is connect to the y-merge 140, via step 234. The new primitive would then be output to the processors using the distribute block 154 of the distributor 150, via step 242.
If it is determined in step 232 that the y-top 132 is not less than or equal to the current line 168, then it is determined whether the feedback FIFO 170 is empty, via step 236. If not, then the primitive is read from the feedback FIFO 170 into the y-merge 140, via step 238. The primitive would then be output by the distributor, via step 238.
If it is determined in step 236 that the feedback FIFO 170 is empty, then it is determined whether any primitive were processed for the current line 168, via step 240. If not, then the empty line is output, via step 244. If there were primitives during the current line 168 or once the empty line is output, it is determined whether the current line 168 is the bottom line, via step 246. If so, then the input stream is flushed and the current line is set to the top line, via step 248. The method 220 could then start again for the new frame. If the current line 168 is not the bottom of the frame, then the current line is incremented and the address in the feedback FIFO 170 from which data is read is incremented, via step 250. Step 224 would then be returned to.
If it is determined in step 224 that a new primitive is not ready to be loaded, then it is determined whether the feedback FIFO 170 has more than one entry, via step 230. If not, then the method returns to step 224. If so, then the primitive(s) from the feedback FIFO 170 are read, via step 238. The primitive would then be output in step 242.
Once either from the feedback FIFO 170 or a new primitive is output in step 242, using the compare block 152, it is determined whether the line after the current line is below the bottom line of the primitive, via step 252. If so, then primitive is evicted via step 254. Otherwise, the primitive is provided to the feedback FIFO 170 using the distributor 154.
Thus, the primitives are provided to the processors 120 through the y-loop 140. Using the y-loop 130 and the method 220, primitives can be continuously loaded and ejected. Primitives that contribute to multiple lines of a scene are looped through the y-loop 140 using the feedback FIFO 170, while primitives which have expired are evicted using the distributor 150. As a result, delays at the end of a line due to ejecting and loading of primitives can be reduced or eliminated. Thus, loading of primitives to the processors 120 in the graphics system 100 can be made more efficient. Furthermore, because the FIFO 170 can hold data for a large number of primitives, the y-loop 130 can be used with a large number of virtual (or actual) processors. This feature allows more primitives to overlap a single pixel. Consequently, limitations in the complexity of the scene are reduced.
A method and system has been disclosed for more efficiently loading primitives into processors for a graphics system. Software written according to the present invention is to be stored in some form of computer-readable medium, such as memory, CD-ROM or transmitted over a network, and executed by a processor. Consequently, a computer-readable medium is intended to include a computer readable signal which, for example, may be transmitted over a network. Although the present invention has been described in accordance with the embodiments shown, one of ordinary skill in the art will readily recognize that there could be variations to the embodiments and those variations would be within the spirit and scope of the present invention. Accordingly, many modifications may be made by one of ordinary skill in the art without departing from the spirit and scope of the appended claims.
Claims
1. A system for more efficiently loading a plurality of primitives for a scene into a plurality processors of a computer graphics system, each of the plurality of primitives having a top and a bottom, the plurality of primitives being ordered based on the top of each of the plurality of primitives, the system comprising:
- a merge circuit for receiving data relating to each of the plurality of primitives adding the data for a primitive having a top that is not lower than a current line;
- a distributor, coupled with the feedback circuit, for eliminating an expired primitive, the expired primitive having a bottom that is above the current line and for outputting at least a portion of the data for a remaining portion of the primitives after the expired primitive has been removed, the at least a portion of the data output by the distributor controlling loading of the plurality of primitives by the plurality of processors;
- a feedback circuit, coupled to the merge circuit and the distributor, for re-inputting to the merge circuit the data for the remaining portion of the plurality of primitives; and
- a controller for controlling the feedback circuit, the distributor and the merge circuit.
2. The system of claim 1 wherein the feedback circuit further includes a first inn first out (“FIFO”) buffer.
3. The system of claim 1 wherein each of the plurality of primitives includes a y-top that marks the top of each of the plurality of primitives and wherein the merge circuit compares the y-top for a primitive of the plurality of primitives to a current y-value for the current line and merges the primitive if the y-top is not greater than the current line.
4. The system of claim 1 wherein each of the plurality of primitives includes a y-bottom that marks at a particular line the bottom of each of the plurality of primitives and wherein the distributor circuit compares the y-value for a primitive of the plurality of primitives to a next line-y-value for the current line and discards the primitive if the y-bottom is not greater than the a next line y-value.
5. The system of claim 1 wherein the at least a portion of the data for each of the plurality of primitives is an identifier for each of the plurality of primitives.
6. A method for more efficiently loading a plurality of primitives for a scene into a plurality processors of a computer graphics system, each of the plurality of primitives having a top and a bottom, the plurality of primitives being ordered based on the top of each of the plurality of primitives, the method comprising the steps of:
- (a) determining whether the top of at least one new primitive of the plurality of primitives is not lower than a current line;
- (b) merging data for the at least one new primitive if the top is not lower than the current line;
- (c) eliminating an expired primitive and outputting at least a portion of data for a remaining portion of the primitives after the expired primitive has been removed, the expired primitive having a bottom that is above the current line, the data output by the distributor controlling loading of the plurality of primitives by the plurality of processors;
- (d) for re-inputting to the merge circuit data for the remaining portion of the plurality of primitives.
7. The method of claim 6 wherein each of the plurality of primitives includes a y-top that marks a top of each of the plurality of primitives and wherein the determining step (a) further includes the step of:
- (a1) comparing the y-top for a primitive of the plurality of primitives to a current y-value for the current line and wherein the merging step (b) further includes the step of
- (b1) merging the primitive if the y-top is not greater than the current line.
8. The method of claim 6 wherein each of the plurality of primitives includes a y-bottom that marks at a particular line the bottom of each of the plurality of primitives and wherein the eliminating step (c) further includes the steps of:
- (c1) comparing the y-value for a primitive of the plurality of primitives to a next line y-value for the current line and
- (c2) discarding the primitive if the y-bottom is not greater than the next line y-value.
9. The method of claim 6 wherein the at least a portion of the data for each of the plurality of primitives is an identifier for each of the plurality of primitives.
10. The method of claim 6 wherein the computer graphics system further includes an internal memory, and wherein the method further includes the steps of:
- (e) continuously loading the plurality of primitives into the internal memory; and
- (f) providing a primitive of the plurality of primitives to a processor of the plurality of processors only if the distributor outputs the data for the primitive.
Type: Application
Filed: Nov 16, 2004
Publication Date: Mar 31, 2005
Inventors: Aleksandr Movshovich (Santa Clara, CA), Brad Delanghe (San Jose, CA), David Baer (San Jose, CA)
Application Number: 10/990,838