Abstract: The present invention relates to systems and methods for statistical caching. Inputs are captured via an appropriate network protocol. The input includes statistical data and a corresponding cache key. The values for each cache key within a cache are compacted using the input. The compacting involves determining if the corresponding cache key is already set within the cache, and if the cache key is present, aggregating the statistical data with the value stored within the cache to generate an updated value. The updated cache may be periodically synchronized with a final data store. Additionally, each operation performed by the statistical cache may be recorded in a transaction log for fault tolerance.
Abstract: The present invention relates to systems and methods for the tokenization of user-generated content in order to prevent attacks on the user-generated content. The systems and methods initially pre-process the user-generated content string utilizing a secondary input of target language. Pre-processing may also include initialization of finite state machines, token markers and string buffers (text, HTML tag name, HTML attribute name, HTML attribute value, CSS selector, CSS property name, and CSS property value). The user-generated content string is scanned by rune, and the system sends each rune to a specific buffer based upon signaling by individual finite state machine states. Buffers are then converted to token stream nodes to be inserted into the token stream. The tokens represent a string of characters and are symbolically categorized according to activated finite state machine states.
Abstract: The present invention relates to systems and methods for parsing of a token stream for user generated content in order to prevent attacks on the user generated content. The systems and methods include a database which stores one or more whitelists, and a parser. The parser removes tokens from the token stream by comparing the tokens against the whitelist. Next, the parser validates CSS property values, encodes data within attribute values and text nodes, reconciles closing HTML tags, and coerces media tags into safe variants. The tokens removed may be any of HTML tags, HTML attributes, HTML protocols, CSS selectors and CSS properties.