Zip entries that are zero length but stored with deflate. This
is arguably a silly thing to do (deflating a zero-length file actually
makes it bigger) but apparently quite a few Zip writers do this.
This was broken in two places: archive_write_disk disliked being asked
to write data to zero-length files (even if the write was zero-length)
and zip_read_file_header tripped over itself when non-regular files
had compressed bodies.
This is an attempt to eliminate a lot of redundant
code from the read ("decompression") filters by
changing them to juggle arbitrary-sized blocks
and consolidate reblocking code at a single point
in archive_read.c.
Along the way, I've changed the internal read/consume
API used by the format handlers to a slightly
different style originally suggested by des@. It
does seem to simplify a lot of common cases.
The most dramatic change is, of course, to
archive_read_support_compression_none(), which
has just evaporated into a no-op as the blocking
code this used to hold has all been moved up
a level.
There's at least one more big round of refactoring
yet to come before the individual filters are as
straightforward as I think they should be...
If it's not a regular file, don't return any data, even if the size is unknown.
Update the Zip test with a hand-tweaked Zip archive that has a
directory (with length-at-end set), a regular file without
length-at-end set, and a regular file with length-at-end set and a bad
CRC. Update the test code to verify that the file size is unset
for the regular file with length-at-end.
MFC after: 7 days
feedback, but the 2.5 branch is shaping up nicely.)
In addition to many small bug fixes and code improvements:
* Another iteration of versioning; I think I've got it right now.
* Portability: A lot of progress on Windows support (though I'm
not committing all of the Windows support files to FreeBSD CVS)
* Explicit tracking of MBS, WCS, and UTF-8 versions of strings
in archive_entry; the archive_entry routines now correctly return
NULL only when something is unset, setting NULL properly clears
string values. Most charset conversions have been pushed down to
archive_string.
* Better handling of charset conversion failure when writing or
reading UTF-8 headers in pax archives
* archive_entry_linkify() provides multiple strategies for
hardlink matching to suit different format expectations
* More accurate bzip2 format detection
* Joerg Sonnenberger's extensive improvements to mtree support
* Rough support for self-extracting ZIP archives. Not an ideal
approach, but it works for the archives I've tried.
* New "sparsify" option in archive_write_disk converts blocks of nulls
into seeks.
* Better default behavior for the test harness; it now reports
all failures by default instead of coredumping at the first one.
one part" by simply ignoring the marker at the beginning
of the file. (Zip archivers reserve four bytes at the beginning
of each part of a multi-part archive, if it happens to only
require one part, those four bytes get filled with a placeholder
that can be ignored.)
Thanks to: Marius Nuennerich,
for pointing me to a Zip archive that libarchive couldn't handle
MFC after: 7 days
the number of bytes read is actually not important as long as we have at
least what we ask for. Illustrate its benefits by using it throughout
the ZIP support code, except for the few cases where it doesn't apply.
Approved by: kientzle
that I've been working on but put off committing until after the
RELENG_7 branch, including:
* New manpages: cpio.5 mtree.5
* New archive_entry_strmode()
* New archive_entry_link_resolver()
* New read support: mtree format
* Internal API change: read format auction only runs once
* Running the auction only once allowed simplifying a lot of bid logic.
* Cpio robustness: search for next header after a sync error
* Support device nodes on ISO9660 images
* Eliminate a lot of unnecessary copies for uncompressed archives
* Corrected handling of new GNU --sparse --posix formats
* Correctly handle a zero-byte write to a compressed archive
* Fixed memory leaks
Many of these improvements were motivated by the upcoming bsdcpio
front-end.
There have also been extensive improvements to the libarchive_test
test harness, which I'll commit separately.
a length field of zero; it does not mean the body is empty.
Thanks to: Lapo Luchini for sending me a JAR archive that demonstrated this bug
MFC after: 3 days
Return EOF immediately if an entry in a ZIP archive has no body.
In particular, the latter issue was causing bsdtar to emit spurious
warnings when extracting directory entries from ZIP archives.
MFC after: 3 days
couldn't allocate more memory for a string. Change
this so it returns NULL in that case, and update
all of its callers to handle the error. Some of
those callers can now return errors back to the
client instead of calling exit(3).
Approved by: re (bmah)
* "compression_program" support uses an external program
* Portability: no longer uses "struct stat" as a primary
data interchange structure internally
* Part of the above: refactor archive_entry to separate
out copy_stat() and stat() functions
* More complete tests for archive_entry
* Finish archive_entry_clone()
* Isolate major()/minor()/makedev() in archive_entry; remove
these from everywhere else.
* Bug fix: properly handle decompression look-ahead at end-of-data
* Bug fixes to 'ar' support
* Fix memory leak in ZIP reader
* Portability: better timegm() emulation in iso9660 reader
* New write_disk flags to suppress auto dir creation and not
overwrite newer files (for future cpio front-end)
* Simplify trailing-'/' fixup when writing tar and pax
* Test enhancements: fix various compiler warnings, improve
portability, add lots of new tests.
* Documentation: document new functions, first draft of
libarchive_internals.3
MFC after: 14 days
Thanks to: Joerg Sonnenberger (compression_program)
Thanks to: Kai Wang (ar)
Thanks to: Colin Percival (many small fixes)
Thanks to: Many others who sent me various patches and problem reports.
* libarchive_test program exercises many of the core features
* Refactored old "read_extract" into new "archive_write_disk", which
uses archive_write methods to put entries onto disk. In particular,
you can now use archive_write_disk to create objects on disk
without having an archive available.
* Pushed some security checks from bsdtar down into libarchive, where
they can be better optimized.
* Rearchitected the logic for creating objects on disk to reduce
the number of system calls. Several common cases now use a
minimum number of system calls.
* Virtualized some internal interfaces to provide a clearer separation
of read and write handling and make it simpler to override key
methods.
* New "empty" format reader.
* Corrected return types (this ABI breakage required the "2.0" version bump)
* Many bug fixes.
a vanilla 2-clause BSD license, but somehow some confusing
extra verbage get copied from somewhere.
Also, update the copyright dates to 2007 for all of the files.
Prompted by: several questions about what those extra words really mean
* Actually use the HAVE_<header>_H macros to conditionally include
system headers. They've been defined for a long time, but only
used in a few places. Now they're used pretty consistently
throughout.
* Fill in a lot of missing casts for conversions from void*.
Although Standard C doesn't require this, some people have been
trying to use C++ compilers with this code, and they do require it.
Bit-for-bit, the compiled object files are identical, except for
one assert() whose line number changed, so I'm pretty confident I
didn't break anything. ;-)
* Handles entries with compressed size >2GB (signed/unsigned cleanup)
* Handles entries with compressed size >4GB ("ZIP64" extension)
* Handles Unix extensions (ctime, atime, mtime, mode, uid, etc)
* Format-specific "skip data" override allows ZIP reader to skip
entries without decompressing them, which makes "tar -t"
a lot faster.
* Handles "length-at-end" entries generated by, e.g., "zip -r - foo"
Many thanks to: Dan Nelson, who contributed the code and test files for
the first three items above and suggested the fourth.
When reading the bodies of Zip archive entries, request a minimum of 1
byte, rather than a minimum of the full entry size. This is faster
(since it does not force the decompression layer to combine reads) and
works around a bug in the "none" decompression handler (which I'm
testing a separate fix for now). I've also renamed "bytes_read" to
"bytes_avail" in several places to more accurately reflect that the
value returned from (a->compression_read_ahead) is the number of bytes
available, not necessarily the number of bytes requested.
Only supports "deflate" and "none" compression for now.
Also, add a few clarifications to the archive_read.3 manpage as
requested by William Dean DeVries.