Abstract: A method and a system are disclosed for generating a global identifier for linking or unifying a plurality of de-identified customer data received from multiple source environments. The plurality of customer data is de-identified based on a master salt and a master token is generated. The master token is encrypted using a source-encryption key to generate a source token. The source token is also encrypted using a target-encryption key to generate a transfer token. At a central environment or a central storage unit, the transfer token is decrypted and the source token is obtained. Thereafter, source token is decrypted to obtain the master token. At the central storage unit, the master token is hashed with a target salt to generate the global identifier which is subsequently used to unify the plurality of de-identified customer data.
Abstract: Systems and methods for automatic visual display overlays of contextually related data from multiple applications are provided. The method includes: capturing an image of at least a portion of a graphical user interface (GUI) of a first application visually displayed on a computerized display device; identifying at least one primary contextual data point within the captured image; searching for at least one secondary data point in at least a second application, wherein the at least one secondary data point is contextually relevant to the primary contextual data point; fetching the at least one secondary data point from the second application; and visually displaying a panel on the computerized display device concurrently with at least a portion of the GUI of the first application, wherein the panel includes the at least one secondary data point.
Abstract: A system and method for training a computerized data model for the algorithmic detection of non-linearity in a data set includes providing two master data sets corresponding to two discrete time periods, respectively, and a third data set for a third discrete time period. The two master data sets are mapped to at least one code model. A stacking average model is trained with the at least two master data sets corresponding to two discrete time periods by using a stacked regression algorithm. A box-cox transformation function is applied to the models to provide a predicted value for the third data set of the third discrete time period. An ensemble is created using the predicted value for the third data set and the first, second, and third models of the trained stacking average model to identify a non-linearity in the third data set.
Abstract: A system and method for extracting relevant data elements from a file for conversion to a tabular format includes a computing device receiving an XML format file having a loop with nested blocks. Each of the blocks has at least one data element. Features are extracted from each data element. These extracted features are processed using a machine learning algorithm to estimate a column header value for the data elements relative to a data schema. With the data element classified, a configuration file is generated to map the column header value to the data elements of the XML file. The configuration file is used to extract the data elements from the XML file to a tabular format. In the healthcare industry, the system and method may be used to extract relevant health information from a clinical document for conversion to a tabular format.
Abstract: A method of querying a data lake using natural language includes: receiving a natural language query directed to an electronic data lake; parsing the natural language query to determine a plurality of entities within the natural language query; identifying the plurality of entities using at least one contextual knowledge base, wherein the plurality of entities are compared against at least one entry in the at least one contextual knowledge base; mapping a dependency of the plurality of identified entities based on the parsed natural language query; constructing a structured data query based on the plurality of identified entities and the mapped dependency; and automatically generating a visual output of a result of the structured data query based on at least one characteristic from the set of: a data type, a data format, and a data size of the result of the structured data query.