Small Polygon Compression for Integer Coordinates

Jauhri, Abhinav; Jauhri, Abhinav

Geographical targeting is widely used to provide relevant information to the public, particularly during emergencies and disasters. People in an affected area need to be informed and guided throughout the duration of an emergency or disaster. Sometimes, however, internet and network congestion occurs, or voice landline telephones are down, and the public needs to be alerted in other ways. Smartphone capabilities with GPS and maps allow the public to be informed about emergencies in a particular area. The Wireless Emergency Alerts (WEA) service, part of FEMA IPAWS, went live in April 2012, and provides a nation-wide service for broadcasting 90-character short messages via cell broadcast. WEA-enabled wireless devices in a designated geographic area receive the alerts, thus ensuring that the subscribers get the message and have enough time to take action.

The targeted area is typically identified by a polygon specified by alert-originators, although currently many use rather imprecise targeting (such as to a whole county via FIPS code). This leads to issues of false positive and false negative alerts being received on mobile phones, and thereby diminishing the credibility and adoption of the system, and impacting the relevance of the messages to the targeted population.

In this work, we describe several polygon compression techniques to enable efficient transmission of polygons representing geographical targets within the short WEA message size, yet still leaving some room for an associated text message. Such embedded polygons can be used to provide additional information on a map, indicate directions for evacuation, or support enhanced precision in message filtering. Compression of these polygons presents different challenges and opportunities from the compression of the large number of 2D and 3D polygons associated with maps and images. These WEA polygons have only 4-24 points/vertices, and cover relatively small geographic regions with closely clustered coordinates.

The best techniques transform the original decimal polygon coordinates to remove redundant information, convert decimals to integers, replace these absolute integer coordinates with relative "deltas" from neighboring coordinates, followed by a conversion of the consequent base-10 small integers to higher bases (such as base 70 by using all allowable characters, not just 0-9, to represent coordinate values in fewer characters). The techniques we developed are respectful of computation and storage constraints typical of mobile phones. Two of the best techniques include a bignum quadratic combination of integer coordinates ("BIG") and a variable length encoding ("VAR"), which takes advantage of a strongly skewed polygon coordinate distribution, and tighter encoding in the selected character base. Compared to standard compression techniques such as Huffmann or LZW reveal that our techniques are equal or better for all test cases.

We tested our techniques with a corpus of some 11,370 messages orginated by the National Weather Service (NWS) of the National Oceanic and Atmospheric Adminstration (NOAA), and were able to compress all polygons to between 9.7% and 23.6% of their original length, depending on characteristics of the specific polygon, reducing original polygon lengths from 43-331 characters to 8-55 characters.

In the figure, we summarize compression results of the 11370 polygons collected from the NWS online portal for 2012, 2013, and through October, 2014. The original and compressed polygons are output as character strings. The figure shows the length of these various strings as the number of base-70 characters on the vertical axis, and number of points/vertices of a polygon on the horizontal axis. Different compression techniques considered in our work guarantee good compression in comparison to the original length of the polygon (shown as O), with the best techniques shown by VAR and BIG.

All techniques maintain good compression with almost linear dependence on the number of polygon points. Higher base encodings (>70) using more of the allowable characters can give better compression results. We can improve our two best techniques somewhat by generating a repeated substring dictionary (RSD) of the most frequent substrings from all of the compressed polygon strings, and replacing those substrings in each polygon by a short index. We have also explored a merger of our best techniques into a single "polyalgorithm" for additional improvements in compression.

We compared our results to several standard compression methods. Huffman encoding is an optimal prefix encoding technique, such that the Huffman encoding length is never longer than the entropy of the given distribution. We observed that BIG and VAR are close to the almost optimal Huffman encoding. A related variable length bit encoding method, Golomb, is similar to VAR and also gave good compression results. None of the other methods like 7zip, LZW, and gzip produced compressions as good on the NWS corpus as our techniques.

It is believed that by compressing these polygons to fit within current and future WEA message lengths, we can greatly improve the adoption of the WEA system and improve the relevance of the associated warnings.

Work supported in part under a contract from the U.S. Department of Homeland Security, Science & Technology.

5.4 Small Polygon Compression for Integer Coordinates