diff options
Diffstat (limited to 'docs/INTERNALS')
-rw-r--r-- | docs/INTERNALS | 488 |
1 files changed, 488 insertions, 0 deletions
diff --git a/docs/INTERNALS b/docs/INTERNALS new file mode 100644 index 000000000..9d0bdbaa1 --- /dev/null +++ b/docs/INTERNALS @@ -0,0 +1,488 @@ + _ _ ____ _ + ___| | | | _ \| | + / __| | | | |_) | | + | (__| |_| | _ <| |___ + \___|\___/|_| \_\_____| + +INTERNALS + + The project is split in two. The library and the client. The client part uses + the library, but the library is designed to allow other applications to use + it. + + The largest amount of code and complexity is in the library part. + +GIT +=== + All changes to the sources are committed to the git repository as soon as + they're somewhat verified to work. Changes shall be commited as independently + as possible so that individual changes can be easier spotted and tracked + afterwards. + + Tagging shall be used extensively, and by the time we release new archives we + should tag the sources with a name similar to the released version number. + +Portability +=========== + + We write curl and libcurl to compile with C89 compilers. On 32bit and up + machines. Most of libcurl assumes more or less POSIX compliance but that's + not a requirement. + + We write libcurl to build and work with lots of third party tools, and we + want it to remain functional and buildable with these and later versions + (older versions may still work but is not what we work hard to maintain): + + OpenSSL 0.9.6 + GnuTLS 1.2 + zlib 1.1.4 + libssh2 0.16 + c-ares 1.6.0 + libidn 0.4.1 + *yassl 1.4.0 (http://curl.haxx.se/mail/lib-2008-02/0093.html) + openldap 2.0 + MIT krb5 lib 1.2.4 + qsossl V5R2M0 + NSS 3.11.x + Heimdal ? + + * = only partly functional, but that's due to bugs in the third party lib, not + because of libcurl code + + On systems where configure runs, we aim at working on them all - if they have + a suitable C compiler. On systems that don't run configure, we strive to keep + curl running fine on: + + Windows 98 + AS/400 V5R2M0 + Symbian 9.1 + Windows CE ? + TPF ? + + When writing code (mostly for generating stuff included in release tarballs) + we use a few "build tools" and we make sure that we remain functional with + these versions: + + GNU Libtool 1.4.2 + GNU Autoconf 2.57 + GNU Automake 1.7 (we currently avoid 1.10 due to Solaris-related bugs) + GNU M4 1.4 + perl 4 + roffit 0.5 + groff ? (any version that supports "groff -Tps -man [in] [out]") + ps2pdf (gs) ? + +Windows vs Unix +=============== + + There are a few differences in how to program curl the unix way compared to + the Windows way. The four perhaps most notable details are: + + 1. Different function names for socket operations. + + In curl, this is solved with defines and macros, so that the source looks + the same at all places except for the header file that defines them. The + macros in use are sclose(), sread() and swrite(). + + 2. Windows requires a couple of init calls for the socket stuff. + + That's taken care of by the curl_global_init() call, but if other libs also + do it etc there might be reasons for applications to alter that behaviour. + + 3. The file descriptors for network communication and file operations are + not easily interchangable as in unix. + + We avoid this by not trying any funny tricks on file descriptors. + + 4. When writing data to stdout, Windows makes end-of-lines the DOS way, thus + destroying binary data, although you do want that conversion if it is + text coming through... (sigh) + + We set stdout to binary under windows + + Inside the source code, We make an effort to avoid '#ifdef [Your OS]'. All + conditionals that deal with features *should* instead be in the format + '#ifdef HAVE_THAT_WEIRD_FUNCTION'. Since Windows can't run configure scripts, + we maintain two curl_config-win32.h files (one in lib/ and one in src/) that + are supposed to look exactly as a curl_config.h file would have looked like on + a Windows machine! + + Generally speaking: always remember that this will be compiled on dozens of + operating systems. Don't walk on the edge. + +Library +======= + + There are plenty of entry points to the library, namely each publicly defined + function that libcurl offers to applications. All of those functions are + rather small and easy-to-follow. All the ones prefixed with 'curl_easy' are + put in the lib/easy.c file. + + curl_global_init_() and curl_global_cleanup() should be called by the + application to initialize and clean up global stuff in the library. As of + today, it can handle the global SSL initing if SSL is enabled and it can init + the socket layer on windows machines. libcurl itself has no "global" scope. + + All printf()-style functions use the supplied clones in lib/mprintf.c. This + makes sure we stay absolutely platform independent. + + curl_easy_init() allocates an internal struct and makes some initializations. + The returned handle does not reveal internals. This is the 'SessionHandle' + struct which works as an "anchor" struct for all curl_easy functions. All + connections performed will get connect-specific data allocated that should be + used for things related to particular connections/requests. + + curl_easy_setopt() takes three arguments, where the option stuff must be + passed in pairs: the parameter-ID and the parameter-value. The list of + options is documented in the man page. This function mainly sets things in + the 'SessionHandle' struct. + + curl_easy_perform() does a whole lot of things: + + It starts off in the lib/easy.c file by calling Curl_perform() and the main + work then continues in lib/url.c. The flow continues with a call to + Curl_connect() to connect to the remote site. + + o Curl_connect() + + ... analyzes the URL, it separates the different components and connects to + the remote host. This may involve using a proxy and/or using SSL. The + Curl_resolv() function in lib/hostip.c is used for looking up host names + (it does then use the proper underlying method, which may vary between + platforms and builds). + + When Curl_connect is done, we are connected to the remote site. Then it is + time to tell the server to get a document/file. Curl_do() arranges this. + + This function makes sure there's an allocated and initiated 'connectdata' + struct that is used for this particular connection only (although there may + be several requests performed on the same connect). A bunch of things are + inited/inherited from the SessionHandle struct. + + o Curl_do() + + Curl_do() makes sure the proper protocol-specific function is called. The + functions are named after the protocols they handle. Curl_ftp(), + Curl_http(), Curl_dict(), etc. They all reside in their respective files + (ftp.c, http.c and dict.c). HTTPS is handled by Curl_http() and FTPS by + Curl_ftp(). + + The protocol-specific functions of course deal with protocol-specific + negotiations and setup. They have access to the Curl_sendf() (from + lib/sendf.c) function to send printf-style formatted data to the remote + host and when they're ready to make the actual file transfer they call the + Curl_Transfer() function (in lib/transfer.c) to setup the transfer and + returns. + + If this DO function fails and the connection is being re-used, libcurl will + then close this connection, setup a new connection and re-issue the DO + request on that. This is because there is no way to be perfectly sure that + we have discovered a dead connection before the DO function and thus we + might wrongly be re-using a connection that was closed by the remote peer. + + Some time during the DO function, the Curl_setup_transfer() function must + be called with some basic info about the upcoming transfer: what socket(s) + to read/write and the expected file tranfer sizes (if known). + + o Transfer() + + Curl_perform() then calls Transfer() in lib/transfer.c that performs the + entire file transfer. + + During transfer, the progress functions in lib/progress.c are called at a + frequent interval (or at the user's choice, a specified callback might get + called). The speedcheck functions in lib/speedcheck.c are also used to + verify that the transfer is as fast as required. + + o Curl_done() + + Called after a transfer is done. This function takes care of everything + that has to be done after a transfer. This function attempts to leave + matters in a state so that Curl_do() should be possible to call again on + the same connection (in a persistent connection case). It might also soon + be closed with Curl_disconnect(). + + o Curl_disconnect() + + When doing normal connections and transfers, no one ever tries to close any + connections so this is not normally called when curl_easy_perform() is + used. This function is only used when we are certain that no more transfers + is going to be made on the connection. It can be also closed by force, or + it can be called to make sure that libcurl doesn't keep too many + connections alive at the same time (there's a default amount of 5 but that + can be changed with the CURLOPT_MAXCONNECTS option). + + This function cleans up all resources that are associated with a single + connection. + + Curl_perform() is the function that does the main "connect - do - transfer - + done" loop. It loops if there's a Location: to follow. + + When completed, the curl_easy_cleanup() should be called to free up used + resources. It runs Curl_disconnect() on all open connectons. + + A quick roundup on internal function sequences (many of these call + protocol-specific function-pointers): + + curl_connect - connects to a remote site and does initial connect fluff + This also checks for an existing connection to the requested site and uses + that one if it is possible. + + curl_do - starts a transfer + curl_transfer() - transfers data + curl_done - ends a transfer + + curl_disconnect - disconnects from a remote site. This is called when the + disconnect is really requested, which doesn't necessarily have to be + exactly after curl_done in case we want to keep the connection open for + a while. + + HTTP(S) + + HTTP offers a lot and is the protocol in curl that uses the most lines of + code. There is a special file (lib/formdata.c) that offers all the multipart + post functions. + + base64-functions for user+password stuff (and more) is in (lib/base64.c) and + all functions for parsing and sending cookies are found in (lib/cookie.c). + + HTTPS uses in almost every means the same procedure as HTTP, with only two + exceptions: the connect procedure is different and the function used to read + or write from the socket is different, although the latter fact is hidden in + the source by the use of curl_read() for reading and curl_write() for writing + data to the remote server. + + http_chunks.c contains functions that understands HTTP 1.1 chunked transfer + encoding. + + An interesting detail with the HTTP(S) request, is the add_buffer() series of + functions we use. They append data to one single buffer, and when the + building is done the entire request is sent off in one single write. This is + done this way to overcome problems with flawed firewalls and lame servers. + + FTP + + The Curl_if2ip() function can be used for getting the IP number of a + specified network interface, and it resides in lib/if2ip.c. + + Curl_ftpsendf() is used for sending FTP commands to the remote server. It was + made a separate function to prevent us programmers from forgetting that they + must be CRLF terminated. They must also be sent in one single write() to make + firewalls and similar happy. + + Kerberos + + The kerberos support is mainly in lib/krb4.c and lib/security.c. + + TELNET + + Telnet is implemented in lib/telnet.c. + + FILE + + The file:// protocol is dealt with in lib/file.c. + + LDAP + + Everything LDAP is in lib/ldap.c. + + GENERAL + + URL encoding and decoding, called escaping and unescaping in the source code, + is found in lib/escape.c. + + While transfering data in Transfer() a few functions might get used. + curl_getdate() in lib/parsedate.c is for HTTP date comparisons (and more). + + lib/getenv.c offers curl_getenv() which is for reading environment variables + in a neat platform independent way. That's used in the client, but also in + lib/url.c when checking the proxy environment variables. Note that contrary + to the normal unix getenv(), this returns an allocated buffer that must be + free()ed after use. + + lib/netrc.c holds the .netrc parser + + lib/timeval.c features replacement functions for systems that don't have + gettimeofday() and a few support functions for timeval convertions. + + A function named curl_version() that returns the full curl version string is + found in lib/version.c. + +Persistent Connections +====================== + + The persistent connection support in libcurl requires some considerations on + how to do things inside of the library. + + o The 'SessionHandle' struct returned in the curl_easy_init() call must never + hold connection-oriented data. It is meant to hold the root data as well as + all the options etc that the library-user may choose. + o The 'SessionHandle' struct holds the "connection cache" (an array of + pointers to 'connectdata' structs). There's one connectdata struct + allocated for each connection that libcurl knows about. Note that when you + use the multi interface, the multi handle will hold the connection cache + and not the particular easy handle. This of course to allow all easy handles + in a multi stack to be able to share and re-use connections. + o This enables the 'curl handle' to be reused on subsequent transfers. + o When we are about to perform a transfer with curl_easy_perform(), we first + check for an already existing connection in the cache that we can use, + otherwise we create a new one and add to the cache. If the cache is full + already when we add a new connection, we close one of the present ones. We + select which one to close dependent on the close policy that may have been + previously set. + o When the transfer operation is complete, we try to leave the connection + open. Particular options may tell us not to, and protocols may signal + closure on connections and then we don't keep it open of course. + o When curl_easy_cleanup() is called, we close all still opened connections, + unless of course the multi interface "owns" the connections. + + You do realize that the curl handle must be re-used in order for the + persistent connections to work. + +multi interface/non-blocking +============================ + + We make an effort to provide a non-blocking interface to the library, the + multi interface. To make that interface work as good as possible, no + low-level functions within libcurl must be written to work in a blocking + manner. + + One of the primary reasons we introduced c-ares support was to allow the name + resolve phase to be perfectly non-blocking as well. + + The ultimate goal is to provide the easy interface simply by wrapping the + multi interface functions and thus treat everything internally as the multi + interface is the single interface we have. + + The FTP and the SFTP/SCP protocols are thus perfect examples of how we adapt + and adjust the code to allow non-blocking operations even on multi-stage + protocols. The DICT, LDAP and TELNET are crappy examples and they are subject + for rewrite in the future to better fit the libcurl protocol family. + +SSL libraries +============= + + Originally libcurl supported SSLeay for SSL/TLS transports, but that was then + extended to its successor OpenSSL but has since also been extended to several + other SSL/TLS libraries and we expect and hope to further extend the support + in future libcurl versions. + + To deal with this internally in the best way possible, we have a generic SSL + function API as provided by the sslgen.[ch] system, and they are the only SSL + functions we must use from within libcurl. sslgen is then crafted to use the + appropriate lower-level function calls to whatever SSL library that is in + use. + +Library Symbols +=============== + + All symbols used internally in libcurl must use a 'Curl_' prefix if they're + used in more than a single file. Single-file symbols must be made static. + Public ("exported") symbols must use a 'curl_' prefix. (There are exceptions, + but they are to be changed to follow this pattern in future versions.) + +Return Codes and Informationals +=============================== + + I've made things simple. Almost every function in libcurl returns a CURLcode, + that must be CURLE_OK if everything is OK or otherwise a suitable error code + as the curl/curl.h include file defines. The very spot that detects an error + must use the Curl_failf() function to set the human-readable error + description. + + In aiding the user to understand what's happening and to debug curl usage, we + must supply a fair amount of informational messages by using the Curl_infof() + function. Those messages are only displayed when the user explicitly asks for + them. They are best used when revealing information that isn't otherwise + obvious. + +API/ABI +======= + + We make an effort to not export or show internals or how internals work, as + that makes it easier to keep a solid API/ABI over time. See docs/libcurl/ABI + for our promise to users. + +Client +====== + + main() resides in src/main.c together with most of the client code. + + src/hugehelp.c is automatically generated by the mkhelp.pl perl script to + display the complete "manual" and the src/urlglob.c file holds the functions + used for the URL-"globbing" support. Globbing in the sense that the {} and [] + expansion stuff is there. + + The client mostly messes around to setup its 'config' struct properly, then + it calls the curl_easy_*() functions of the library and when it gets back + control after the curl_easy_perform() it cleans up the library, checks status + and exits. + + When the operation is done, the ourWriteOut() function in src/writeout.c may + be called to report about the operation. That function is using the + curl_easy_getinfo() function to extract useful information from the curl + session. + + Recent versions may loop and do all this several times if many URLs were + specified on the command line or config file. + +Memory Debugging +================ + + The file lib/memdebug.c contains debug-versions of a few functions. Functions + such as malloc, free, fopen, fclose, etc that somehow deal with resources + that might give us problems if we "leak" them. The functions in the memdebug + system do nothing fancy, they do their normal function and then log + information about what they just did. The logged data can then be analyzed + after a complete session, + + memanalyze.pl is the perl script present in tests/ that analyzes a log file + generated by the memory tracking system. It detects if resources are + allocated but never freed and other kinds of errors related to resource + management. + + Internally, definition of preprocessor symbol DEBUGBUILD restricts code which + is only compiled for debug enabled builds. And symbol CURLDEBUG is used to + differentiate code which is _only_ used for memory tracking/debugging. + + Use -DCURLDEBUG when compiling to enable memory debugging, this is also + switched on by running configure with --enable-curldebug. Use -DDEBUGBUILD + when compiling to enable a debug build or run configure with --enable-debug. + + curl --version will list 'Debug' feature for debug enabled builds, and + will list 'TrackMemory' feature for curl debug memory tracking capable + builds. These features are independent and can be controlled when running + the configure script. When --enable-debug is given both features will be + enabled, unless some restriction prevents memory tracking from being used. + +Test Suite +========== + + Since November 2000, a test suite has evolved. It is placed in its own + subdirectory directly off the root in the curl archive tree, and it contains + a bunch of scripts and a lot of test case data. + + The main test script is runtests.pl that will invoke the two servers + httpserver.pl and ftpserver.pl before all the test cases are performed. The + test suite currently only runs on unix-like platforms. + + You'll find a complete description of the test case data files in the + tests/README file. + + The test suite automatically detects if curl was built with the memory + debugging enabled, and if it was it will detect memory leaks too. + +Building Releases +================= + + There's no magic to this. When you consider everything stable enough to be + released, run the 'maketgz' script (using 'make distcheck' will give you a + pretty good view on the status of the current sources). maketgz prompts for + version number of the client and the library before it creates a release + archive. maketgz uses 'make dist' for the actual archive building, why you + need to fill in the Makefile.am files properly for which files that should + be included in the release archives. + + NOTE: you need to have curl checked out from git to be able to do a proper + release build. The release tarballs do not have everything setup in order to + do releases properly. |