Relative Entropy & WinZip

Data compression:
Low entropy: a string of 1000 repeated 0 has little information content, can be compressed as “1000 x 0”.

High Entropy: a string of 1000 random ‘0’ and ‘1’ cannot be compressed.

Relative Entropy: the best optimized compression.

Example:
Morse code let 1 dot ‘.’ represents the most commonly used alphabet ‘e’, and less commonly used alphabet ‘q’ is ‘–.-‘

Applications: (use WinZip tool)
1. Analyze two articles, if they are written by same author or two different authors: the later case has higher relative entropy, requires more disk space for the file. If the compressed file is smaller, likelihood they are from the same author.

2. Analyzing 52 European languages: French and Italian have low Relative Entropy, they belong to same language family (Latin); Swedish and Croatian have high Relative Entropy, they are from different family.

3. WinZip can tell if your article after compressed is only 1/3 of the original size, most likely 2/3 of its content are redundant.

4. WinZip could be used to analyze information from data string of DNA sequences or Stock market movements.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s