Short: file compressor (100% bugfixed) Author: Wildfire of Decay Uploader: aminet aminet net Type: util/pack Version: 1.2 Architecture: m68k-amigaos MaxPacker version 1.20 final release (100% version for DECAY!) INTRODUCTION: Well this is it the final 100% version release. Only 1 major bugfix! which occured when users tried packing already compressed files. Now when this happens and as a result nothing can be made smaller within a 16 kb distance you will see a message telling you of the problem. This however should rarely happen since gains can even be made on other types of compression This Program is written in Assembler for use on the Amiga only! The current version of MaxPacker is fashioned after that of LHARC, although it is designed to operate more like TetraPacker. Testing has revealed that this version Packs 40% faster than Tetra- Packer and Depacking is also over 200% faster!!! The average rates are as follows - Pack: 22 kilobytes per minute DePack: 67 kilobytes per second Please NOTE: The Following Informations are derived in part from an early ARJ document. I sought to include the applicable info here so as to enlighten users a bit more on how file compression has evolved over the years... So the following is a brief synopsis on the state of the art in data compression research. MaxPacker does not use any of the proprietary algorithms quoted in the text below. Infact the algorithm is one which is not patented, but like ARJ & LHARC etc. is based on both Lempel and Huffman encoding to achieve optimum results. TERMINOLOGY: COMPRESSION - The process of encoding redundant information into data requiring less storage space. COMPRESSION PERCENTAGE/RATIO - The percentage compression reported by MaxPacker is a variation of one of the TWO standard methods of expressing compression ratio in the technical literature. Max uses the compressed size / original size ratio. The other method is the inverse ratio. When Max reports 95% as the compression ratio, that means that the compressed file is 95 percent of the original size (very little compression). Other archivers use their own methods. LHARC v1.30 uses an (inverse ratio) the opposite ratio of MaxPacker RECENT DATA COMPRESSION RESEARCH: The following information is explained in more detail in the book, TEXT COMPRESSION by Timothy Bell, John Cleary and Ian Witten published by Prentice Hall, 1990. ISBN 0-13-911991-4. Three of the most useful types of data compression involve context modeling, Lempel-Ziv 1977, and Lempel-Ziv 1978. Context models use the preceding few characters to PREDICT (estimate the probability of) the next character. For example, what is the next character after "tabl"? There is a high probability that the character is "e". Much of today's research is focused on these types of data compressors because they provide the best "TEXT" compression results. The disadvantage of this family of compressors is the amount of memory required. Three practical models require from 500,000 to 5,000,000 bytes of memory. More familiar is the Lempel-Ziv 1978 (LZ78) family of data compressors. This family as well as Lempel-Ziv 1977 are classified as adaptive dictionary encoders. In these methods, text strings are replaced by pointers to previous occurrences of duplicate strings. For example, the words in this document could be represented by dictionary page and line numbers. The significant characteristic of LZ78 algorithms is the parsing of previous text data into PHRASES that are stored in a dictionary. Pointers are allowed to reference such phrases; however, pointers are not allowed to reference substrings of such phrases. Lempel-Ziv-Welch, 1984, (ARC shrinking, crushing, UNIX COMPRESS) is the most familiar of this family. Numerous articles have been published concerning this method. Unknown to many is the fact that this algorithm has a US patent number owned by UNISYS. These algorithms are FAST. Their disadvantages include the problem of handling a FULL dictionary when the input text data no longer matches the contents of the dictionary and the poor compression of non-text data. Lempel-Ziv 1977 (LZ77) algorithms have recently come into popular practical use (LHARC, PKZIP 1.10, ARJ, PAK). In the original LZ77 scheme, pointers are allowed to reference any phrase in a fixed-size window that precedes the current phrase. A matched current phrase is replaced by the pointer to the previously occurring duplicate phrase. This pointer in LZ77 consists of an offset into the window and the length of the phrase. The original implementation was considered impractical as it used a brute force string searching method to find duplicates. It was quite slow requiring up to N character comparisons in an N sized window for each phrase looked for. However, in 1987, Timothy Bell proposed using a binary tree to index into the window to speed the lookup of phrases. This provided an order of magnitude speed increase. In the same year, Brent published an algorithm that used a secondary compression method (Huffman encoding, 1952) to further encode the output of a LZ77 encoder. Huffman encoding is the subsitution of variable bit-length codes for standard codes (8-bit ASCII) based upon frequency of occurrence. The most frequent codes have the shortest bit-lengths. LHARC is a combination of these two ideas. It uses a binary tree LZ77 encoder with dynamic Huffman encoding (1978) of the output. The advantage of using dynamic Huffman encoding is adaptivity to rapidly changing data. The disadvantage is slow decoding. PKZIP 1.0, I believe, uses a combination of binary tree encoding with SHANNON-FANO (1949) static encoding. Static SHANNON-FANO encoding has the advantage of fast decoding and the disadvantage of less optimal compression especially on rapidly changing data. PKZIP will perform better in compression than LHARC on many large files because PKZIP uses a larger window (8K) than LHARC (4K). LHARC will perform best on binary data with rapidly changing characteristics like executables that include a lot of text data. LH (LHarc 2.0), I believe, uses a digital TRIE LZ77 encoder with a static Huffman encoder. It uses an 8K window. LH uses a particularly efficient Huffman encoder which works on small data sets. Many static encoders suffer from the cost of including the table of codes with the encoded data. The digital TRIE has the advantage of always producing the most recent match unlike binary trees. This recency of match is significant when secondary compression is used. Much of this work has been derived and improved upon from literature published in the last two years. It will be interesting to see if developers can significantly improve upon LH since it is based upon the most recent published research in data compression. One of the most significant new encoders published is that of Fiala and Greene, 1989. This method uses pointers that point to nodes in the TREE data structure as opposed to the phrases in the window. This bypasses the problem of redundancy of phrases in any given window. Since each node is unique, fewer pointers are needed and thus are more easily compressed by a secondary compressor. This technique is currently patent pending. I hope that this synopsis of compression research has been enlightening. End of document