1 files changed, 178 insertions, 0 deletions
diff --git a/TODO b/TODO
new file mode 100644
index 0000000..ff3ecc8
--- /dev/null
+++ b/TODO
@@ -0,0 +1,178 @@
+General
+=======
+
+ * See if I can support the encryption format used with /R 5 /V 5,
+   even though a qpdf-announce subscriber with an adobe.com email
+   address mentioned that this is deprecated.  There is also a new
+   encryption format coming in a future release, which may be better
+   to support.  As of the qpdf 3.0 release, the specification was not
+   publicly available yet.
+
+ * Consider the possibility of doing something locale-aware to support
+   non-ASCII passwords.  Update documentation if this is done.
+
+ * Look for %PDF header somewhere within the first 1024 bytes of the
+   file.  Also accept headers of the form "%!PS−Adobe−N.n PDF−M.m".
+   See Implementation notes 13 and 14 in appendix H of the PDF 1.7
+   specification.  This is bug 3267974.
+
+ * Consider impact of article threads on page splitting/merging.
+   Subramanyam provided a test file; see ../misc/article-threads.pdf.
+   Email Q-Count: 431864 from 2009-11-03.  Other things to consider:
+   outlines, page labels, thumbnails, zones.  There are probably
+   others.
+
+ * See whether it's possible to remove the call to
+   flattenScalarReferences.  I can't easily figure out why I do it,
+   but removing it causes strange test failures in linearization.  I
+   would have to study the optimization and linearization code to
+   figure out why I added this to begin with and what in the code
+   assumes it's the case.  For enqueueObject and unparseChild in
+   QPDFWriter, simply removing the checks for indirect scalars seems
+   sufficient.  Looking back at the branch in the apex epub
+   repository, before flattening scalar references, there was special
+   case code in QPDFWriter to avoid writing out indirect nulls.  It's
+   still not obvious to me why I did it though.
+
+   To pursue this, remove the call to flattenScalarReferences in
+   QPDFWriter.cc and disable the logic_error exceptions for indirect
+   scalars.  Just search for flattenScalarReferences in QPDFWriter.cc
+   since the logic errors have comments that mention
+   flattenScalarReferences.  Then run the test suite.  Several files
+   that explicitly test flattening of scalar references fail, but the
+   indirect scalars are properly preserved and written.  But then
+   there are some linearized files that have a bunch of unreferenced
+   objects that contain scalars.  Need to figure out what these are
+   and why they're there.  Maybe they're objects that used to be
+   stream lengths.  Probably we just need to make sure don't traverse
+   through a stream's /Length stream when enqueueing stream
+   dictionaries.  This could potentially happen with any object that
+   QPDFWriter replaces when writing out files.  Such objects would be
+   orphaned in the newly written file.  This could be fixed, but it
+   may not be worth fixing.
+
+   If flattenScalarReferences is removed, a new method will be needed
+   for checking PDF files.
+
+ * See if we can avoid preserving unreferenced objects in object
+   streams even when preserving the object streams.
+
+ * For debugging linearization bugs, consider adding an option to save
+   pass 1 of linearization.  This code is sufficient.  Change the
+   interface to allow specification of a pass1 file, which would
+   change the behavior as in this patch.
+
+------------------------------
+Index: QPDFWriter.cc
+===================================================================
+--- QPDFWriter.cc	(revision 932)
++++ QPDFWriter.cc	(working copy)
+@@ -1965,11 +1965,15 @@
+ 
+     // Write file in two passes.  Part numbers refer to PDF spec 1.4.
+ 
++    FILE* XXX = 0;
+     for (int pass = 1; pass <= 2; ++pass)
+     {
+ 	if (pass == 1)
+ 	{
+-	    pushDiscardFilter();
++//	    pushDiscardFilter();
++	    XXX = fopen("/tmp/pass1.pdf", "w");
++	    pushPipeline(new Pl_StdioFile("pass1", XXX));
++	    activatePipelineStack();
+ 	}
+ 
+ 	// Part 1: header
+@@ -2204,6 +2208,8 @@
+ 
+ 	    // Restore hint offset
+ 	    this->xref[hint_id] = QPDFXRefEntry(1, hint_offset, 0);
++	    fclose(XXX);
++	    XXX = 0;
+ 	}
+     }
+ }
+------------------------------
+
+ * Handle embedded files.  PDF Reference 1.7 section 3.10, "File
+   Specifications", discusses this.  Once we can definitely recognize
+   all embedded files in a document, we can update the encryption
+   code to handle it properly.  In QPDF_encryption.cc, search for
+   cf_file.  Remove exception thrown if cf_file is different from
+   cf_stream, and write code in the stream decryption section to use
+   cf_file instead of cf_stream.  In general, add interfaces to get
+   the list of embedded files and to extract them.  To handle general
+   embedded files associated with the whole document, follow root ->
+   /Names -> /EmbeddedFiles -> /Names to get to the file specification
+   dictionaries.  Then, in each file specification dictionary, follow
+   /EF -> /F to the actual stream.  There may be other places file
+   specification dictionaries may appear, and there are also /RF keys
+   with related files, so reread section 3.10 carefully.
+
+ * The description of Crypt filters is unclear with respect to how to
+   use them to override /StmF for specific streams.  I'm not sure
+   whether qpdf will do the right thing for any specific individual
+   streams that might have crypt filters.  The specification seems to
+   imply that only embedded file streams and metadata streams can have
+   crypt filters, and there are already special cases in the code to
+   handle those.  Most likely, it won't be a problem, but someday
+   someone may find a file that qpdf doesn't work on because of crypt
+   filters.  There is an example in the spec of using a crypt filter
+   on a metadata stream.
+
+   For now, we notice /Crypt filters and decode parameters consistent
+   with the example in the PDF specification, and the right thing
+   happens for metadata filters that happen to be uncompressed or
+   otherwise compressed in a way we can filter.  This should handle
+   all normal cases, but it's more or less just a guess since I don't
+   have any test files that actually use stream-specific crypt filters
+   in them.
+
+ * The second xref stream for linearized files has to be padded only
+   because we need file_size as computed in pass 1 to be accurate.  If
+   we were not allowing writing to a pipe, we could seek back to the
+   beginning and fill in the value of /L in the linearization
+   dictionary as an optimization to alleviate the need for this
+   padding.  Doing so would require us to pad the /L value
+   individually and also to save the file descriptor and determine
+   whether it's seekable.  This is probably not worth bothering with.
+
+ * The whole xref handling code in the QPDF object allows the same
+   object with more than one generation to coexist, but a lot of logic
+   assumes this isn't the case.  Anything that creates mappings only
+   with the object number and not the generation is this way,
+   including most of the interaction between QPDFWriter and QPDF.  If
+   we wanted to allow the same object with more than one generation to
+   coexist, which I'm not sure is allowed, we could fix this by
+   changing xref_table.  Alternatively, we could detect and disallow
+   that case.  In fact, it appears that Adobe reader and other PDF
+   viewing software silently ignores objects of this type, so this is
+   probably not a big deal.
+
+ * Pl_PNGFilter is only partially implemented.  If we ever decoded
+   images, we'd have to finish implementing it along with the other
+   filter decode parameters and types.  For just handling xref
+   streams, there's really no need as it wouldn't make sense to use
+   any kind of predictor other than 12 (PNG UP filter).
+
+ * If we ever want to have check mode check the integrity of the free
+   list, this can be done by looking at the code from prior to the
+   object stream support of 4/5/2008.  It's in an if (0) block and
+   there's a comment about it.  There's also something about it in
+   qpdf.test -- search for "free table".  On the other hand, the value
+   of doing this seems very low since no viewer seems to care, so it's
+   probably not worth it.
+
+ * QPDFObjectHandle::getPageImages() doesn't notice images in
+   inherited resource dictionaries.  See comments in that function.
+
+ * Based on an idea suggested by user "Atom Smasher", consider
+   providing some mechanism to recover earlier versions of a file
+   embedded prior to appended sections.
+
+ * From a suggestion in bug 3152169, consider having an option to
+   re-encode inline images with an ASCII encoding.
+
+ * From github issue 2, provide more in-depth output for examining
+   hint stream contents.