Data Crunching:Solve Everyday Problems Using Java, Python, and More
review by Eric Walstad, June 2005
| Data Crunching is a short book with great
how-to-like code examples of very common data parsing and manipulation
techniques. The examples are easy to follow and clearly demonstrate the
author's point. None of the topics are covered in great depth but each
contains enough to whet the reader's appetite for more. The text and
examples are thought provoking, leading the reader to ask the right
kind of questions when detailed information is needed.|
| The book covers the
most common aspects of data crunching, including text files, regular
expressions, XML, binary files, relational databases and unit testing.
The book dedicates a chapter to each of these topics. Each chapter has
one or more sample problems to solve. I found the sample problems to be
well thought out. If not exactly the same as a real-life data crunching
problem I've had to solve in the past, then sufficiently close to
easily apply the principals (and sample code) to my problem. I thought
the regular expressions section was an excellent, succinct,
(re)introduction to regular expressions. Wilson starts with basic
patterns, quickly and clearly working up to common complex patterns.
The regular expressions chapter also includes a nice bit of Python code
that generates a table of patterns, test strings and those patterns
that match them.
| I liked the chapter on XML but noticed
that there was no code example on performing an XSLT. There is,
however, a good example of an XSLT template, but no code on how to
process it. The chapter on relational databases covers all the most
common SQL needed for daily use (think 10% of the SQL that works on 90%
of the problems). This includes sub-selects, negation, aggregation and
views. The last chapter, "Horshoe Nails", covers miscellaneous topics
including testing. The author of course covers unit testing but also
simple ways of testing when full-blown unit testing is overkill. The
last chapter also has sections on encoding, dealing with floating point
numbers, dates and times and how to format them with strftime. I was
impressed by the author's ability to cull such important techniques and
idioms and organize them into a small, yet incredibly useful text.|
Data Crunching covers real-life data parsing and manipulation concepts.
It does so without tangential journeys into other areas of programming.
Each of the five main topics include simple code examples, usually in
Python, Java or both, that clearly demonstrate the topic. The author
does an impressive job of squeezing in most all the issues in the daily
work of data crunching. The reader can expect to come away with
something of value on each topic covered, especially the newbie or
occasional script writer.