Timeline
It's not the size of the dog in the fight, it's the size of the fight in the dog.
To Upgrade G-WAN: (a) overwrite your ./include files and the gwan executable with archive files and then (b) run G-WAN once without -d (daemon mode) to make sure that all your servlets and handlers compile without modifications.
get_env(FRAGMENT_ID), Accept-Encoding (Firefox), extended 'trace'
G-WAN v2.10.15/Linux: Development Release
- fixed the "Accept-Encoding:" HTTP header which failed on Firefox and worked on Chrome and others
- added parsing of the SDCH encoding type, see served_from.c (G-WAN does not implement the encoding)
- added the '-t' command-line option to log all client requests in the 'trace' file (when access.log is not enough)
- added parsing of URL fragment (the last '#') that can now be queried with get_env(FRAGMENT_ID).
HTTP fragments use the final '#' character. Note that for /csp? requests a fragment will be part of the last query parameter (and not available with a get_env(FRAGMENT_ID) call).
The new -t command-line switch has an enormous educational value. Some clients are not short of imagination. Just in case, you will have a time stamp, their IP address and the whole request.
csp script improperly updated when invoked for first time
G-WAN v2.10.13/Linux: Development Release
- fixed a pointless (timeout calculation error) update of all C scripts when called the first time
- enhanced the adaptative Lorenz-Waterwheel for even more speed and less CPU resources consumption.
Cache miss fix
G-WAN v2.10.12/Linux: Development Release
- fixed the "Accept-Encoding: gzip" parsing (gzip was sometimes ignored, thanks to Aris and Eric)
- fixed a v2.10.11 timer issue making cached entries expire at an earlier time than the expected time.
Pipelined/URI requests sanity checks, better cache management
G-WAN v2.10.11/Linux: Development Release
- added memory checks to limit memory consumption of huge loads of huge data sets (thanks Jacques)
- Fredrik's "infinite loop" was merely HTTP pipelining merely consuming looong "/csp/aaa..." junk (no longer)
- fixed Fredrik's "/.." unsuccessful directory traversal attempt (stopped at level 1) – the test was wrong by 1 character
- from the same report, escaping "/%31+%32=%33 HTTP/1.0" was checked but seems to work as expected.
Fast startup with huge /www folders, JSON empty array bug fix
G-WAN v2.10.10/Linux: Development Release
- G-WAN no longer conducts /www analysis at startup, leading to instant availability with huge data sets
- fixed a JSON renderer bug (missing '[' with empty arrays), thank you Griffin for the relevant report.
New serve cache.c servlet example, graceful servlet crash reports
G-WAN v2.10.8/Linux: Development Release
- G-WAN v2.10.7 disabled graceful servlet crash reports (for debugging) – this is enabled again in v2.10.8
- added a cache.c servlet example to show how to serve existing cache entries without data copy from servlets
Update: on Oct. 10th, Fredrik Widlund's blog published inaccurate information that the text below attempts to correct (Fredrik rejected all suggestions to correct the technical "errors" that he has published):
Here are the most serious claims, made for v2.10.7: "A) A buffer overflow issue exists in the routine handling URL encoding for the "csp" (so called G-WAN servlets) sub-directory." "B) SIGPIPE signals were not handled correctly. Exploiting the vulnerability resulted in denial of service." What is going on here – and why v2.10.8 fixed it by merely re-enabling signal handlers? Even more oddly, why previous versions of G-WAN were not affected? When G-WAN receives a dynamic request, before running user-defined code (C servlets), it installs a signal handler to catch faults in order to produce "graceful crash reports" instead of having the crash stop the server. Without this handler the server crashes because critical structures are initialized there. And this is precisely what Fredrik was doing: triggering a segfault that was not handled by v2.10.7 (which had a lifespan of less than 24 hours) but which is properly handled by v2.10.8+ – and by all the versions that preceded 2.10.6/7. So, let's expose the value of Fredrik's "research": --------------------------------------------------- "Vulnerability" A: ================== v2.10.7 crashes because it lacks its (accidentally disabled) signal handler to handle crashes gracefully instead of stopping the server. --------------------------------------------------- "Vulnerability" B: ================== v2.10.7 stops with a SIGPIPE because it lacks its (accidentally disabled) signal handler to handle, ahem, signals. --------------------------------------------------- Conclusion: =========== Those facts have been documented and fixed on Oct. 8th by v2.10.8 (see the text above), that is, two days before Fredrik Widlund wrote his "exploit". Fredrik denies having found inspiration in v2.10.8's timeline despite having been caught reading it on Oct. 8th (in gwan.ch logs); that was two days before he wrote his "exploit". Fredrik's "advisory" wording suggests that AFTER this "exploit" people should upgrade G-WAN to get the "fix". This is incorrectly implying that G-WAN was corrected because of what Fredrik insists to call "research": on gwan.ch, the flaw was publicly documented and a fix was posted two days before Fredrik Widlund wrote his v2.10.7 "exploit". Regarding the quality of Fredrik's other "findings", he skips "sudo ./gwan -d:www-data" (documented in the PDF manual and in "./gwan -h") to incorrectly claim that: "The daemon does not limit privileges and actually runs all routines as 'root'" In the same vein, he wrote what he knew was plain lies since he had read v2.10.8's timeline two days before he wrote his v2.10.7 "exploit": "The latest version has been silently updated on the site without even increasing the version number" Before concluding: "The dishonesty is remarkable." Indeed. But on which side remains disputable: The attentive reader will notice that Fredrik Widlund's blog, created on October 3 2011 and never updated after October 19 2011, is dedicated to trashing G-WAN. No other subject is ever mentioned. Also, this "exploit" has been written by a self-proclaimed "security expert" who – in the course of his whole life – has never published any other "research" about any other product. That did not prevent Fredrik from posting his "advisory" on hundreds of "security" sites (none of which even tried to check the bogus claim before relaying it).
The less naive among us will remark that v2.10.6/7 (posted the same day) were severely humbling Nginx, Varnish and all others once again. Each time G-WAN gets a boost in performance or scalability, the F.U.D. and censorship machine flies to new stratospheric highs to rescue the less gifted. No wonder why their servers are weak in the tech area: doing the right thing requires other skills:
G-WAN has this very nice key value storage feature using wait-free algorithms. I would LOVE to see that implementation, this is top-notch stuff. Very few people are capable of correctly coding such a thing.
Not all Web server authors are crooks. How refreshing. It recalls me the good old days of C coders' attitude: aiming for higher goals, and respecting those pioneers who inspire us with great new ideas and code.
worker threads, weighttp, xbuf_vcat(), AFTER_READ handler
G-WAN v2.10.7/Linux: Development Release
- G-WAN now always uses all its worker threads for both static and dynamic contents (see weighttp's tests)
- rewrote the ab.c ApacheBench wrapper to collect CPU/RAM statistics and add support for weighttp
- the gwan executable renamed 'gwan_1' will run with one worker ('gwan_4' for 4 workers, and so on)
- optimized further memory management to use even less memory during huge loads (side effect: speed)
- added a few MIME types including *.c, *.h, *.php, *.py, *.jsp, *.aspx, *.fcs, *.amf (JohnnyOpCode, Arek)
- added xbuf_vcat() to add an array of buffers to an xbuffer, just like the writev() call (thanks NilssonRio)
- optimized (2x) the HTTP errors path which was significantly slowed-down in the recent past versions
- prevented POST / PUT requests from trying ot use use pipelined content, clearing some client troubles
- moved an HTTP conformance test after the AFTER_READ handler is triggered for TCP-servers (thanks Progamer)
- fixed broken lingering close in recent versions due to a typo (symptom: incomplete downloads for slow clients)
- optimized system C headers research for the Linux distributions/configurations which have trouble finding files quickly.
The choice to use all worker threads will make G-WAN look slower on (single-threaded) AB tests, but G-WAN will be much faster when facing SMP client test tools like Lighty's weighttp. With only multi-Core CPUs in production, the choice was easy (the Pentium 4 is 11 years old and its successor, in 2005, was a dual-Core).
Bug fix, clean up, KV flag, Hexadecimal Dumps
G-WAN v2.9.16/Linux: Development Release
- fixed a double-free issue causing crashes after huge memory loads (like loan benchmarks)
- removed malloc debugging wrapper which reduced performances in the recent versions
- added http_t.h_do_not_track for the new (optional) W3 Consortium "DNT:" HTTP header
- added a KV_NO_UPDATE flag to make kv_add() fail to update an existing KV store entry (thanks Ersun):
- added a "%v" format to s_snprintf() and xbuf_xcat() to dump data in an "hexdump -C" like format:
1 2 3 4 5 6 7 8 9 |
kv_t store; kv_init(&store, "users", 0, 0, 0, 0); kv_item item; item.key = "pierre"; item.klen = sizeof("pierre") - 1; item.val = "pierre@example.com"; item.flags = KV_NO_UPDATE; // do not update an existing entry kv_add(&store, &item); // return old/new entry, or NULL:out of memory |
Note that if you are not using the new KV_NO_UPDATE flag then you MUST setup item.flags = 0; the kv.c and kv_bench.c examples have been updated to reflect this new policy.
0: 5B 44 65 73 6B 74 6F 70 20 45 6E 74 72 79 5D 0A | [Desktop Entry]. 16: 56 65 72 73 69 6F 6E 3D 31 2E 30 0A 54 79 70 65 | Version=1.0.Type 32: 3D 4C 69 6E 6B 0A 4E 61 6D 65 3D 45 78 61 6D 70 | =Link.Name=Examp
Instead of 0-padded hexadecimal offsets I used aligned decimal offsets with pretty thousands, and a more compact format (handy for frame dumps in log files or during debugging). The output buffer must be at least 5x larger than the input buffer. If the output buffer is too small then no output is returned and s_snprintf() returns the required output buffer length.
You can dump 16 bytes of binary data this way: "%16v" (if "%v" is used then a zero-terminated string is expected).
It's good to be back to the office: the last month was spent abroad, with only a day during week-ends available for coding - and the quality suffered. Focus is everything.
HTTP Pipelining, rotated log files date
G-WAN v2.9.4/Linux: Development Release
- fixed the rotated log file dates (since last version it used a boolean variable)
- fixed the HTTP pipelining support (we should expect more requests per connections from new browsers).
Inline ASM, get_env(ROOT_PATH)
G-WAN v2.8.28/Linux: Development Release
- fixed the timestamp variable used to rotate the gwan.log file
- added support for inline ASM in C scripts (see the asm.c example)
- made WWW/CSP/LOGS/HLD_ROOT get_env() values use different pointers.
Bug Fixes
G-WAN v2.8.21/Linux: Development Release
- fixed a time/date stamps glitch in log files
- fixed the directory listings mess (one line of code)
- fixed the cache1,2,3.c examples and the cacheget() call
- restored "application/octet-stream" as the default MIME type
- added the "Accept-Language:" HTTP header to the http_t structure (see served_from.c).
Those little glitches were due to the whole v2.8 rewrite which has broken many minor things that worked fine in v2.1. Why rewrite the newborn G-WAN while veterans like Apache or Nginx don't bother? Well, G-WAN's design and implementation seek to be optimal – and this is what makes all the difference with the incumbents.
Bug Fixes
G-WAN v2.8.14/Linux: Development Release
- fixed the http->h_cookies issue (the HTTP header was incorrectly parsed)
- modified the get.c servlet example to read as much data as made available
- fixed the If-Modified-Since glitch (time comparisons were using different units).
The no-keep-alives problem was a by-product of the above time format issue. Like, probably, the abrupt connection cuts for visitors in Asia or Latin America (high latency).
Someone reported a problem with G-WAN's persistent pointers. I tested them all without a glitch in the persistent.c example. Note that handlers may access a connection state at which time the persistent pointer is not available YET (after accept(), the server does not know yet which virtual host is involved). As a result, you should always test the returned value before using it, or use the global G-WAN persistent pointer to avoid these cases.
I did not address the symbolic links reported issue, that will be for later. I believe that the fixes above are already worth having.
Thank you for the prompt feedback. This release candidate still needs a bit more love but we are on the right path!
More examples
G-WAN v2.8.13/Linux: Development Release
- added an email.c servlet example to illustrate the sendemail() call
- added a persistence.c servlet example to illustrate G-WAN pointers
- added a CLIENT_SOCKET value for get_env() to interact with clients
- modified the post.c servlet example to read more data for long entities
- modified the kv_bench.c servlet example to let it work without Tokyo Cabinet.
The gwan.ch web site has been updated with a more modern design, and with more features: you now can support G-WAN financially to make it evolve faster and stay alive on the long term!
"G-WAN v2.1 is a wimp" says G-WAN v2.8
G-WAN v2.8.8/Linux: Development Release
- a bunch of new optimizations lead to a nice 5-10x speedup
- added an HTTP Headers struct for get_env(HTTP_HEADERS)
- added the "%n" format to the s_snprintf() / xbuf_xcat() calls
- added US_HANDLER_STATES to get_env() (thanks Atmo)
- added SCRIPT_TMO/KALIVE_TMO to get_env() for timeouts
- quoted ETag headers (thanks Marc Lehmann, libev's author)
- added the cacheget() call for C scripts to find cached entries
- replaced the cacheadd() data structure by a new faster beast
- added the key-value store kv_add/_get/_del/_free/_do() calls
- added the strerror() "%m" format in s_snprintf() / xbuf_xcat()
- used a RDTSC fallback, tell me: I don't have the HW to test it
- added listener/hosts details in the daily HTML server reports
- added Garbage Collection: gc_malloc()/gc_free() for C scripts
- added HDL_HTTP_ERRORS to let handlers redirect 404 errors
- added HDL_AFTER_WRITE to let handlers do all (thanks Chang)
- fixed non-existent paths invoked by handlers before parsing
- allowed handlers to query the handler-folder before parsing
- fixed split POST issue with IE and Chrome (Firefox was fine)
- modified get_reply() so it no longer resets reply->len (Atmo)
- the get.c example uses connect(), read(), write(), and close()
- made Linux DNS lookup work with asynchronous socket calls
- G-WAN/Linux now uses as many threads as CPUs/CPU Cores
and full 64-bit timers (thank you Chang for the pertinent report).
By comparison, past versions were mere experimentation. This version pushes the multi-Core design to its long-term form (1 to 10,000-Core ready), brings the (wait-free) G-WAN Key-Value Store and a total rewrite of the Memory Manager (reducing the load, using garbage collection, getting better locality) and of the Lorenz Waterwheel (server and worker threads and traffic flow scheduler). This allows G-WAN to run faster in every possible case (single-Core and multi-Core CPUs, small and large payloads, low and high concurrencies) and even to outdo Nginx's exemplary memory usage (for static and dynamic contents).
OK, it took me 4 months but this new G-WAN puts previous versions to shame:
- Old G-WAN loan(100 years): 8,000 requests per second;
- New G-WAN loan(100 years): 70,000 requests per second.
Proof that incremental enhancements and users' feedback have their value: nothing better kills a project than the belief that what has been done in the past should not be broken (as many times as needed) to reach the next step.
C scripts parsing error reporting has also been enhanced: Terminal ANSI codes effects make it clearer where the error is located in the C source code, and C scripts are renamed only when G-WAN runs as a daemon (without a programmer trying to fix the script). It was suggested a long time ago to get rid of the script ("script.c.bug") renaming feature, and this modification will make it less intrusive.
Tomek Gryczka also signaled that the error.css file was not used for resources located outside of the root folder. This is now fixed, like the large downloads (the variable size, a bit-field, was just too small) and the index.html file used in folders to prevent directory listings. Per the request of two G-WAN users, I have also added three MIME types "rdf+xml", "text/turtle" and "application/x-encrypted-gwan".
Also, you can now define which notifications a Handler's main() will receive:
1 2 3 4 5 6 8 9 10 11 12 13 |
int init(int argc, char *argv[]) { // get a pointer on the Handler states u32 *states = (u32*)get_env(argv, US_HANDLER_STATES, 0); // setup the Handler states that we want *states = (1 << HDL_AFTER_ACCEPT) | (1 << HDL_BEFORE_PARSE) | (1 << HDL_HTTP_ERRORS); // only those states return 0; // >= 0:success } |
Per the demand of Chang, I have added a server_report() call which lists the current state of G-WAN either in HTML or text (like in the daily HTML report), see the report.c example for details.
Linking libraries is now much easier since G-WAN will try to find the relevant path when not provided (you can still provide a full path if G-WAN does not find the library - in which case an error message is printed on the terminal):
1 2 3 4 5 6 7 8 9 |
// what you had to do in the past: #ifdef LP64 # pragma link "/usr/lib32/libsqlite3.so.0" // 64-bit #else # pragma link "/usr/lib/libsqlite3.so.0" // 32-bit #endif // what you can do from now: #pragma link "sqlite3" |
Finally, removing the G-WAN version number from the "Server:" HTTP response header, as suggested by Thomas Meitz, can only contribute to make G-WAN safer. I also documented /include/xbuffer.h for every call (thanks sNielsson) like is is the case for /include/gwan.h.
Thank you for all the great feedback on the G-WAN forum (which listed 1,500+ new registered users in only 6 months) - special thanks for Paco for having made it possible in the first place by graciously setting up the forum and hosting it!
Formatting, HTML redux
G-WAN v2.1.20/Linux: Development Release
- fixed the recently broken formatting of crash reports
- fixed missing char when HTML files started with blanks
(this was due to the on-the-fly HTML 'redux' feature).
Your feedback is improving G-WAN, thanks to 'Ascent' for the HTML bug. Still have to address the xbuf_frurl() client connection cut for slow servers.
Out of EPOLLution
G-WAN v2.1.19/Linux: Development Release
- finally got my way out the (nasty) EPOLLHUP trap.
The CPU usage is now clean from erratic kernel panics, making G-WAN as fast as expected. Asynchronous BSD socket calls have now as little overhead as synchronous calls, making them suitable for all tasks (like G-WAN C script handlers - more on this soon).
csp scheduler, Chrome
G-WAN v2.1.18/Linux: Development Release
- added data URI padding for Google Chrome (IE/Firefox were fine)
- fixed asynchronous calls calling the G-WAN server from C scripts
- rewrote the C scripts scheduler completely (no more 'ab' errors).
Thank you all for the great feedback!
Auth, JSON, Ranges, POST
G-WAN v2.1.16/Linux: Development Release
- added binary application/octet-stream POST support (see post.c)
- added an extensive JSON example to illustrate the 9 new functions
- added support for the "If-Range:" HTTP header (byte ranges only)
- added BASIC and DIGEST HTTP authorization schemes support
- added a Data URI (icon) workaround for MS Internet Explorer 6/7
- simplified the get_env() call while keeping backward compatibility*
- fixed the conflicts between asynchronous blocking calls and epoll
- fixed HTML redux: it turned "<p style" into "<pstyle" (thanks Mike)
- fixed URL parsing that was eating encoded spaces (thanks Paco)
- fixed the escape_html() function call broken unicode processing
- fixed cases where the returned HTTP status code wasn't set (zero)
- removed the duplicated entry at the bottom of the directory listings
- added an HTTP code parameter to cacheadd(), see cache0.c and
the manual (cache redirections: ret = 301, JSON payloads: ret = 1)
MAKE SURE TO UPDATE SCRIPTS USING cacheadd().
Starting with 2011, G-WAN version numbers will use the YY.MM.DD format.
This version is much faster because it reduces the number of connections needed to serve the SAME contents (see 'Data URIs' in the manual). For example, the gwan.ch/index.html page is served with 4 times less requests, leading to lower latency and (4 times) better scalability.
The json.c example illustrates serialization and deserialization, how to search items by name, by value or by index, how to delete them, update them or add them, how to create arrays, etc. JSON RPC looks good. Correctly implemented, it will fly.
A note about BSD socket calls made asynchronous by G-WAN in C scripts: they work fine (see attack.c, post.c, or request.c, getheaders.c), but there are are still problems when you connect() to G-WAN from within a G-WAN C script (as opposed to reaching another server from a G-WAN script). Then, there are cases that trigger an EPOLLHUP tempest by the kernel. This is happening with relatively high concurrencies and I have not yet found how to get rid of this.
[*] char *entity = get_env(argv, ENTITY, 0); can now be used just like for integers: u32 port = get_env(argv, SERVER_PORT, 0); limiting the use of the last parameter (here zero) to the rare cases where we need to change a value (like DOWNLOAD_SPEED or HTTP_CODE). All the servlet examples have been updated with the new, simpler, get_env() possible usage.