summaryrefslogtreecommitdiff
path: root/TODO
diff options
context:
space:
mode:
Diffstat (limited to 'TODO')
-rw-r--r--TODO178
1 files changed, 178 insertions, 0 deletions
diff --git a/TODO b/TODO
new file mode 100644
index 0000000..ff3ecc8
--- /dev/null
+++ b/TODO
@@ -0,0 +1,178 @@
+General
+=======
+
+ * See if I can support the encryption format used with /R 5 /V 5,
+ even though a qpdf-announce subscriber with an adobe.com email
+ address mentioned that this is deprecated. There is also a new
+ encryption format coming in a future release, which may be better
+ to support. As of the qpdf 3.0 release, the specification was not
+ publicly available yet.
+
+ * Consider the possibility of doing something locale-aware to support
+ non-ASCII passwords. Update documentation if this is done.
+
+ * Look for %PDF header somewhere within the first 1024 bytes of the
+ file. Also accept headers of the form "%!PS−Adobe−N.n PDF−M.m".
+ See Implementation notes 13 and 14 in appendix H of the PDF 1.7
+ specification. This is bug 3267974.
+
+ * Consider impact of article threads on page splitting/merging.
+ Subramanyam provided a test file; see ../misc/article-threads.pdf.
+ Email Q-Count: 431864 from 2009-11-03. Other things to consider:
+ outlines, page labels, thumbnails, zones. There are probably
+ others.
+
+ * See whether it's possible to remove the call to
+ flattenScalarReferences. I can't easily figure out why I do it,
+ but removing it causes strange test failures in linearization. I
+ would have to study the optimization and linearization code to
+ figure out why I added this to begin with and what in the code
+ assumes it's the case. For enqueueObject and unparseChild in
+ QPDFWriter, simply removing the checks for indirect scalars seems
+ sufficient. Looking back at the branch in the apex epub
+ repository, before flattening scalar references, there was special
+ case code in QPDFWriter to avoid writing out indirect nulls. It's
+ still not obvious to me why I did it though.
+
+ To pursue this, remove the call to flattenScalarReferences in
+ QPDFWriter.cc and disable the logic_error exceptions for indirect
+ scalars. Just search for flattenScalarReferences in QPDFWriter.cc
+ since the logic errors have comments that mention
+ flattenScalarReferences. Then run the test suite. Several files
+ that explicitly test flattening of scalar references fail, but the
+ indirect scalars are properly preserved and written. But then
+ there are some linearized files that have a bunch of unreferenced
+ objects that contain scalars. Need to figure out what these are
+ and why they're there. Maybe they're objects that used to be
+ stream lengths. Probably we just need to make sure don't traverse
+ through a stream's /Length stream when enqueueing stream
+ dictionaries. This could potentially happen with any object that
+ QPDFWriter replaces when writing out files. Such objects would be
+ orphaned in the newly written file. This could be fixed, but it
+ may not be worth fixing.
+
+ If flattenScalarReferences is removed, a new method will be needed
+ for checking PDF files.
+
+ * See if we can avoid preserving unreferenced objects in object
+ streams even when preserving the object streams.
+
+ * For debugging linearization bugs, consider adding an option to save
+ pass 1 of linearization. This code is sufficient. Change the
+ interface to allow specification of a pass1 file, which would
+ change the behavior as in this patch.
+
+------------------------------
+Index: QPDFWriter.cc
+===================================================================
+--- QPDFWriter.cc (revision 932)
++++ QPDFWriter.cc (working copy)
+@@ -1965,11 +1965,15 @@
+
+ // Write file in two passes. Part numbers refer to PDF spec 1.4.
+
++ FILE* XXX = 0;
+ for (int pass = 1; pass <= 2; ++pass)
+ {
+ if (pass == 1)
+ {
+- pushDiscardFilter();
++// pushDiscardFilter();
++ XXX = fopen("/tmp/pass1.pdf", "w");
++ pushPipeline(new Pl_StdioFile("pass1", XXX));
++ activatePipelineStack();
+ }
+
+ // Part 1: header
+@@ -2204,6 +2208,8 @@
+
+ // Restore hint offset
+ this->xref[hint_id] = QPDFXRefEntry(1, hint_offset, 0);
++ fclose(XXX);
++ XXX = 0;
+ }
+ }
+ }
+------------------------------
+
+ * Handle embedded files. PDF Reference 1.7 section 3.10, "File
+ Specifications", discusses this. Once we can definitely recognize
+ all embedded files in a document, we can update the encryption
+ code to handle it properly. In QPDF_encryption.cc, search for
+ cf_file. Remove exception thrown if cf_file is different from
+ cf_stream, and write code in the stream decryption section to use
+ cf_file instead of cf_stream. In general, add interfaces to get
+ the list of embedded files and to extract them. To handle general
+ embedded files associated with the whole document, follow root ->
+ /Names -> /EmbeddedFiles -> /Names to get to the file specification
+ dictionaries. Then, in each file specification dictionary, follow
+ /EF -> /F to the actual stream. There may be other places file
+ specification dictionaries may appear, and there are also /RF keys
+ with related files, so reread section 3.10 carefully.
+
+ * The description of Crypt filters is unclear with respect to how to
+ use them to override /StmF for specific streams. I'm not sure
+ whether qpdf will do the right thing for any specific individual
+ streams that might have crypt filters. The specification seems to
+ imply that only embedded file streams and metadata streams can have
+ crypt filters, and there are already special cases in the code to
+ handle those. Most likely, it won't be a problem, but someday
+ someone may find a file that qpdf doesn't work on because of crypt
+ filters. There is an example in the spec of using a crypt filter
+ on a metadata stream.
+
+ For now, we notice /Crypt filters and decode parameters consistent
+ with the example in the PDF specification, and the right thing
+ happens for metadata filters that happen to be uncompressed or
+ otherwise compressed in a way we can filter. This should handle
+ all normal cases, but it's more or less just a guess since I don't
+ have any test files that actually use stream-specific crypt filters
+ in them.
+
+ * The second xref stream for linearized files has to be padded only
+ because we need file_size as computed in pass 1 to be accurate. If
+ we were not allowing writing to a pipe, we could seek back to the
+ beginning and fill in the value of /L in the linearization
+ dictionary as an optimization to alleviate the need for this
+ padding. Doing so would require us to pad the /L value
+ individually and also to save the file descriptor and determine
+ whether it's seekable. This is probably not worth bothering with.
+
+ * The whole xref handling code in the QPDF object allows the same
+ object with more than one generation to coexist, but a lot of logic
+ assumes this isn't the case. Anything that creates mappings only
+ with the object number and not the generation is this way,
+ including most of the interaction between QPDFWriter and QPDF. If
+ we wanted to allow the same object with more than one generation to
+ coexist, which I'm not sure is allowed, we could fix this by
+ changing xref_table. Alternatively, we could detect and disallow
+ that case. In fact, it appears that Adobe reader and other PDF
+ viewing software silently ignores objects of this type, so this is
+ probably not a big deal.
+
+ * Pl_PNGFilter is only partially implemented. If we ever decoded
+ images, we'd have to finish implementing it along with the other
+ filter decode parameters and types. For just handling xref
+ streams, there's really no need as it wouldn't make sense to use
+ any kind of predictor other than 12 (PNG UP filter).
+
+ * If we ever want to have check mode check the integrity of the free
+ list, this can be done by looking at the code from prior to the
+ object stream support of 4/5/2008. It's in an if (0) block and
+ there's a comment about it. There's also something about it in
+ qpdf.test -- search for "free table". On the other hand, the value
+ of doing this seems very low since no viewer seems to care, so it's
+ probably not worth it.
+
+ * QPDFObjectHandle::getPageImages() doesn't notice images in
+ inherited resource dictionaries. See comments in that function.
+
+ * Based on an idea suggested by user "Atom Smasher", consider
+ providing some mechanism to recover earlier versions of a file
+ embedded prior to appended sections.
+
+ * From a suggestion in bug 3152169, consider having an option to
+ re-encode inline images with an ASCII encoding.
+
+ * From github issue 2, provide more in-depth output for examining
+ hint stream contents.