Abstract: A system and method for efficiently determining the number of distinct values in a column of source data is disclosed. Source data (e.g., source table) may be in the form of rows and columns that represent information. From the source table a count distinct function may be carried out to determine the number of distinct values in one or more columns of the source table. Results from an in memory count distinct function performed by a plurality of parallel query processors may be placed into a results grid. Another aspect of the invention relates to determining how many distinct values fall into each cell of the results grid.
Abstract: An analytical database system provides access to all of the data collected by an entity in interactive time. The analytical database system transforms relational database data. The relational database is denormalized and inverted such that the data fields of tables in the relational database are stored in separate files that contain a row number field and a single data field. At least one of the files, that contains repeating data values that are stored in successive rows, is compressed. The files include files with partition values and files with analytical data. Processing of the files is distributed by creating sub-rowsets of the files and by assigning the sub-rowsets to servers. The partial result sets are merged into a complete result set.