• 201406.19

    Windows GUI apps: Bad file descriptor. (or how to convert a GUI app into a console app for easy debugging)

    Lately I've been neck-deep in embedding. Currently, I'm building a portable (hopefully) version of Turtl's core features in ECL.

    Problem is, when embedding turtl-core into Node-webkit or Firefox, any output that ECL writes to STDOUT triggers:

    C operation (write) signaled an error. C library explanation: Bad file descriptor.
    

    Well it turns out Windows doesn't let you write to STDOUT unless a console is available, and even if using msys, it doesn't create a console for GUI apps. So here's a tool (in lisp, of course) that will let you convert an executable between GUI and console.

    Seems to work great. Special thanks to death.

    Comments
  • 201005.03

    Using gzip_static in nginx to cache gzip files

    Recently I've been working on speeding up the homepage of beeets.com. Most speed tests say it takes between 4-6 seconds. Obviously, all of them are somehow fatally flawed. I digress, though.

    Everyone (who's anyone) knows that gzipping your content is a great way to reduce download time for your users. It can cut the size of html, css, and javascript by about 60-90%. Everyone also knows that gzipping can be very cpu intensive. Not anymore.

    I just installed nginx's Gzip Static Module (compile nginx with --with-http_gzip_static_module) on beeets.com. It allows you to pre-cache your gzip files. What?

    Let's say you have the file /css/beeets.css. When a request for beeets.css comes through. the static gzip module will look for /css/beeets.css.gz. If it finds it, it will serve that file as gzipped content. This allows you to gzip your static files using the highest compression ratio (gzip -9) when deploying your site. Nginx then has absolutely no work to do besides serving the static gzip file (it's very good at serving static content).

    Wherever you have a gzip section in your nginx config, you can do:

    gzip_static on;

    That's it. Note that you will have to create the .gz versions of the files yourself, and it's mentioned in the docs that it's better if the original and the .gz files have the same timestamp; so it may be a good idea to "touch" the files after both are created. It's also a good idea to turn the gzip compression down (gzip_comp_level 1..3). This will minimally compress dynamic content without putting too much strain on the server.

    This is a great way to get the best of both worlds: gzipping (faster downloads) without the extra load on the server. Once again, nginx pulls through as the best thing since multi-cellular life. Keep in mind that this only works on static content (css, javascript, etc etc). Dynamic pages can and should be gzipped, but with a lower compression ratio to keep load off the server.

    Comments
  • 200912.01

    SSH Agent on Cygwin

    There are probably a billion guides for this already, but whatever. If you DON'T have a ~/.bash_profile (a file that gets executed every time you start cyg):

    touch ~/.bash_profile
    chmod a+x ~/.bash_profile
    

    Now that you have the file, add this to it:

    SSHAGENT=/usr/bin/ssh-agent
    SSHAGENTARGS="-s"
    if [ -z "$SSH_AUTH_SOCK" -a -x "$SSHAGENT" ]; then
    	eval `$SSHAGENT $SSHAGENTARGS`
    	trap "kill $SSH_AGENT_PID" 0
    fi
    

    This will start up ssh-agent for each Cygwin shell you have open. Close your Cygwin shell (if one is open) and open a new one. Now type:

    ssh-add ~/.ssh/id_rsa
    [enter your password]
    

    Voila! No more typing your stupid password every time you need to ssh somewhere. Note that if you close the Cygwin window, you'll have to ssh-add your key again! This is good security...you can close the window when you're done and someone who happens on your computer sitting there won't have password-less access to any of your secure logins.

    Comments
  • 200912.01

    A simple (but long-winded) guide to REST web services

    After all my research on what it means for a service to be RESTful, I think I've finally got a very good understanding. Once you understand a critical mass of information on the subject, something clicks and the first thing that comes in to your head is "Oh yeah! That makes sense!"

    It's important to think of a REST web service as a web site. How does a website work?

    • A website works using HTTP. If you need to fetch something on a website, you use the HTTP verb "GET." If you need to change something, you use "POST." A RESTful web service uses other HTTP verbs as well, namely PUT and DELETE, and can also implement OPTIONS to show which methods are appropriate for a resource.
    • A website has resources. A resource can be information, images, flash, etc. These resources can have different representations: HTML, a jpeg, an embedded video. REST is the same way. It is resource-centric. Want a list of users? GET /users. Want an event? GET /events/5. Want to edit that event? PUT /events/5. Every resource has a unique URL to identify it!
    • Resources are not dealt with directly. Instead, representations of resources are used. This can be a bit hard to grasp. What is a user? It's a nebulous object somewhere that I cannot interact with. It is an idea, an entity. A representation is a form of the user resource I can interact with. A representation can be a comma delimited list, JSON, XML...anything the client and server both understand. How do we know what we're interacting with? Media types:
    • As a website will tell you what kind of image you're requesting, a REST service tells you what kind of resource representation you are receiving. This is done using media types. For instance, if I do a GET /events/7, the Content-Type may be "application/vnd.beeets.event+json" which tells us this is a vendor specific media (the "vnd") and it's an event in JSON format. You can pass these media types in your Accept headers to specify what type of representation you would like. These media types are documented somewhere so that client will know exactly what to expect when consuming them.
    • If you request a page that doesn't exist or you aren't authorized to view, a website will tell you. This is done using headers. A good REST service will utilize HTTP status headers to do the same. 200 Ok, 404 Not Found, 500 Internal Server Error, etc. These have already been defined and refined over many, many years by people who have been doing this a lot longer than you (probably)...use them.
    • A website will have links from one page to another. This is one of the main points of a REST service, and is also widely forgotten or misunderstood (it took me a while to figure it out even doing intense research). Resources in a REST service link to eachother, letting a client know what resources can be found where, and how they relate to eachother. An HTML page has links to it. So does a REST resource. Links can be structured however you like, but some good things to include are the URI of the linked resource, the relationship it has with the current resource, and the media type. This creates what's known as a "loose coupling" between client and server. A client can crawl the server and figure out, only knowing a pre-defined set of media types, what resources are where and how to find them. This principal is known as HATEOAS (or "Hypermedia as the Engine of Application State").
    • REST is stateless. This means that the server does not track any sort of client state. There are no session tokens the client uses to identify itself. There are no cookies set. Every request to the REST service must contain all information needed to make that request. Need to access a restricted resource? Send your authentication info for each request. It's that simple. Isn't it easier to track session? Not really. Maybe it's easier on a small level, but once you start needing to scale, you will wish you'd gone stateless. Using a combination of HTTP basic authentication and API/Secret request signing, you don't have to send over plain text passwords at all. Hell, even throw in a timestamp with each request to minimize replay attacks. You can get as crazy as you'd like with security. Or for those who prefer security over performance, use SSL.

    Now for some examples. Because I'm currently working on an event application, we'll use that for most of the examples.

    Let's get a list of events from our server:

    GET /events
    Host: api.beeets.com
    Accept: application/vnd.beeets.events+json
    {"page":1,"per_page":10}
    -----------------------------------------
    HTTP/1.1 200 OK
    Date: Tue, 01 Dec 2009 04:12:48 GMT
    Content-Length: 1430
    Content-Type: application/vnd.beeets.events+json
    {
    	"total":81,
    	"events":
    	[
    		{
    			"links":
    			[
    				{
    					"uri":"/events/6",
    					"rel":"/rel/event self edit",
    					"type":"application/vnd.beeets.event"
    				},
    				{
    					"uri":"/locations/121",
    					"rel":"/rel/location",
    					"type":"application/vnd.beeets.location"
    				}
    			],
    			"id":6,
    			"title":"Paris Hilton naked onstage",
    			...
    		},
    		...
    	]
    }

    What do we have? A list of events, with links to the resource representations of those events. Notice we also have links to another resource: the location. We can leave that for now, but let's pull up an event:

    GET /events/6
    Host: api.beeets.com
    Accept: application/vnd.beeets.event+json
    -----------------------------------------
    HTTP/1.1 200 OK
    Date: Tue, 01 Dec 2009 04:12:48 GMT
    Content-Length: 666
    Content-Type: application/vnd.beeets.event+json
    {
    	"links":
    	[
    		{
    			"uri":"/events/6",
    			"rel":"/rel/event self edit",
    			"type":"application/vnd.beeets.event"
    		},
    		{
    			"uri":"/locations/121",
    			"rel":"/rel/location",
    			"type":"application/vnd.beeets.location"
    		}
    	],
    	"id":6,
    	"title":"Paris Hilton naked onstage",
    	"date":"2009-12-05T04:00:00Z"
    }

    Using the link provided in the event listing, we managed to pull up an individual event, which we know how to parse because we know the media type...but wait, what's this? OMG, someone is trying to smear Paris!! She's on at 8:30!!! NOT 8!!! Let's edit...if we do a PUT with new information, we'll be able to save Paris' good name:

    PUT /events/6
    Host: api.beeets.com
    Accept: application/vnd.beeets.event+json
    Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==
    {"title":"Paris Hilton naked onstage (yuck)","date":"2009-12-05T04:30:00Z"}
    -----------------------------------------
    HTTP/1.1 200 OK
    Date: Tue, 01 Dec 2009 04:12:48 GMT
    Content-Length: 666
    Content-Type: application/vnd.beeets.event+json
    {
    	"links":
    	[
    		{
    			"uri":"/events/6",
    			"rel":"/rel/event self edit",
    			"type":"application/vnd.beeets.event"
    		},
    		{
    			"uri":"/locations/121",
    			"rel":"/rel/location",
    			"type":"application/vnd.beeets.location"
    		}
    	],
    	"id":6,
    	"title":"Paris Hilton naked onstage (yuck)",
    	"date":"2009-12-05T04:30:00Z"
    }

    What have we learned? Given one URL (/events), we have discovered two more (/locations/[id] and /events/[id]). We've also seen the media types in the responses that allow the client to know what kind of resource it's dealing with and how to consume it.

    Hopefully this pounds two really important points in: media types and HATEOAS. Without them, it's not REST. You can't just pass application/xml or application/json for every response. Sure, maybe the client can decode it, but they don't know what it is, and without linking to other resources, they don't know how to find anything...unless you want to document everything and never change your service.

    Some other tips/points:

    • Give yourself a few initial entry points to your REST service. You should be able to discover all of the resources in it just by crawling. If you can't, you haven't done HATEOAS correctly. This is a lot harder than it sounds, but it's more than useful later on. Think of your REST service like a website with good navigation.
    • Remember to implement the OPTIONS verb for your resources. It will tell the client what verbs can be used on what resources. With some decent routing built into your application, this should be a cakewalk.
    • As mentioned, you can use HTTP basic authentication for your requests. If the client is anything but a web browser, you won't have to serve up an ugly popup login box, you can just do all that shit transparently. If you don't want to send a cleartext password (please don't!) you can salt the password on the client side and send it over. Hash the password again with the client's secret for added security. Crackers will be amazed at your 1337 computer hacking skillz. You can then verify the hashed salted value on the server side. Add client-secret request signing with a timestamp for uber security.
    • Read a lot more info on REST. It seems that SO many "RESTful" services out there are half-baked and made by people who researched the topic for half a day. Some good ones to take points from are the Sun Cloud API and the Netflix API. Notice the documentation of media types and LACK of documentation on every single URL you can request. This is that loose-coupling stuff I was talking about.

    That's it for now! I wrote this as a culmination of knowledge for the last week or so of research I've done...please let me know if any information is missing or incorrect and I can make updates. Hope it was helpful!

    Comments
  • 200911.03

    How to convert HTML named entities to numbered entities in PHP

    I recently (read: today) had an obnoxious problem: I'm writing some code for creating an ATOM feed, and kept getting errors about entity-escaped values. Namely, things like ’, •, etc. Even written as entities, Opera and IE7 did not recognize them. I read somewhere that it was necessary to convert the named entities to numbered entities. Great.

    Well, PHP doesn't have a native function for this. Why, I do not know...there seems to be functions for many other things, and adding an argument to htmlentities that returns numbered entities would seem easy enough. Either way, I wrote a quick function that takes the htmlentities translation table, adds any missing values that are not in the translation table, and runs the conversion to numbered entities. Check it:

    function htmlentities_numbered($string)
    {
    	$table	=	get_html_translation_table(HTML_ENTITIES);
    	$trans	=	array();
    	foreach($table as $char => $ent)
    	{
    		$trans[$ent]	=	'&#'. ord($char) .';';
    	}
    	$trans['€']	=	'€';
    	$trans['‚']	=	'‚';
    	$trans['ƒ']	=	'ƒ';
    	$trans['„']	=	'„';
    	$trans['…']	=	'…';
    	$trans['†']	=	'†';
    	$trans['‡']	=	'‡';
    	$trans['ˆ']	=	'ˆ';
    	$trans['‰']	=	'‰';
    	$trans['Š']	=	'Š';
    	$trans['‹']	=	'‹';
    	$trans['Œ']	=	'Œ';
    	$trans['‘']	=	'‘';
    	$trans['’']	=	'’';
    	$trans['“']	=	'“';
    	$trans['”']	=	'”';
    	$trans['•']	=	'•';
    	$trans['–']	=	'–';
    	$trans['—']	=	'—';
    	$trans['˜']	=	'˜';
    	$trans['™']	=	'™';
    	$trans['š']	=	'š';
    	$trans['›']	=	'›';
    	$trans['œ']	=	'œ';
    	$trans['Ÿ']	=	'Ÿ';
    	$string	=	strtr($string, $trans);
    	return $string;
    }
    

    Hope it's helpful.

    UPDATE - apparently, even the numbered entities are not valid XML. Fair enough, I've converted them all to unicode (0x80 - 0x9F). All my ATOM feeds validate now (through feedvalidator.org).

    Comments
  • 200910.26

    VirtualBox: how to "clone" your VMs

    Notice the "clone" in quotes. Why? You can't actually clone a VM technically. We can work around that, though. Keep in mind, this guide is for VirtualBox <= 3.0.8 (later versions may have a clone button or something).

    One thing you CAN do is export to OVF and re-import, but I've found that OVF loses many settings (like video ram, network settings, whether you use SATA or not, etc). I prefer not to even bother with this method.

    The next thing you can do is just clone your VM's hard disk(s):

    1. Go into the ~/.VirtualBox/HardDisks/ folder. Copy and paste (windows) or cp (linux/unix) from db_master1.vdi to db_master2.vdi. If you try to import this into the Virtual Media Manager, it will piss and moan about the UUID being duplicate or some shit.
    2. VBoxManage internalcommands setvdiuuid db_master2.vdi - this is the magic command that allows you to import that new HD.
    3. Create a new VM, and set db_master2.vdi as the primary drive.
    4. Configure your new VM to have the same settings. (this is a pain, but there really aren't that many settings).

    There are a few things you'll have to dick with once you have your VM cloned. If you're into networking/cluster/HA crap like me, you'll probably have a static IP. This obviously needs to be changed. It's different for every distro, but it's in /etc/rc.d/rc.inet1.conf for Slackware, and /etc/networking/interfaces for Debian (any other distro can go to hell).

    Your old network interfaces, eth[0...n] will now be eth[(n+1)...(n*2)]: eg, if you had eth0 and eth1 before, they will now be eth2 and eth3. To reset this, (in Slackware):

    1. Open/etc/udev/rules.d/75-network-devices.rules in your favorite editor
    2. Remove all the entries.
    3. Restart. (note - someone please correct me if you don't need a restart... perhaps /sbin/udevtrigger will fix this?)
    4. You will now have eth0 and eth1 again. Hoo-fucking-ray.

    The process is the same for Debian, but the 75-network-devices.rules will most likely have a different name.

    Good luck.

    Comments