• 201111.16

    Rekon - a simple Riak GUI

    I was looking around for Riak information when I stumbled (not via stumble, but actually doing my own blundering) across a blog post that mentioned a Riak GUI. I checked it out. Install is simple and oddly enough, the tool uses only javascript and Riak (no web server needed). I have to say I'm thoroughly impressed by it. Currently the tool doesn't do a ton besides listing buckets, keys, and stats, but you can edit your data inline and delete objects. It also supports Luwak, which I have no first-hand experience with and was unable to try out.

    One thing I thought that was missing was a way to run a map-reduce on the cluster via text input boxes for the functions. It would make writing and testing them a bit simpler I think, but then again it would be easy enough to write this myself in PHP or even JS, so maybe I'll add it in. Search integration would be nice too, although going to 127.0.0.1:8098/solr/[bucket]search?... is pretty stupid easy.

    All in all, a great tool.

    Comments
  • 201109.14

    Mono, C# for a large backend system

    I just did a writeup about MongoDB's performance in the last big app we did. Now it's time to rip Mono a new one.

    Mono has been great. It's .NET for linux. We originally implemented it because it's noted for being a fast, robust compiled language. I didn't know C# before starting the project, but afterwards I feel I have a fairly good grasp on it (10 months of using it constantly will do that). I have to say I like it. Coming from a background in C++, C# is very similar except the biggest draw is you don't separate out your definitions from your code. Your code is your definition. No header files. I understand this is a requirement if you're going to link code in C/C++ to other C/C++ code, but I hate doing it.

    Back to the point, mono is great in many ways. It is fast, compiles from source fairly easily (although libgdiplus is another story, if you want to do image processing), and easy to program in.

    We built out a large queuing system with C#. You enter jobs into a queue table in MongoDB, and they get processed based on priority/time entered (more or less) by C#. Jobs can be anything from gathering information from third-parties to generating images and layering them all together (I actually learned first-hand how some of these Photoshop filters work). The P/Invoke system allowed us to integrate with third party libraries where the language failed (such as simple web requests with timeouts or loading custom fonts,  for instance).

    As with any project, it started off great. Small is good. Once we started processing large numbers of items in parallel, we'd get horrible crashes with native stacktraces. At first glance, it looked like problems with the Boehm garbage collector. We recompiled Mono with --enable-big-arrays and --with-large-heap. No luck. We contacted the Mono guys and, probably in lieu of all the political shenanigans happening with Mono at the moment, didn't really have a good response for us. Any time the memory footprint got greater than 3.5G, it would crash. It didn't happen immediately though, it seems random. Keep in mind Mono and the machines running it were 64bit...4G is not the limit!

    Our solution was two fold:

    • Put crash-prone code into separate binaries and call them via shell. If the process crashes, oh well, try again. The entire queue doesn't crash though. This is especially handy with the image libraries, which seem to have really nasty crashes every once in a while (not related to the garbage collection).
    • Make sure Monit is watching at all times.

    We also gave the new sgen GC a try, but it was much too slow to even compare to the Boehm. It's supposed to be faster, but pitting the two against each other in a highly concurrent setting crowned Boehm the clear winner.

    All in all, I like C# the language and Mono seemed very well put together at a small to medium scale. The garbage collector shits out at a high memory/concurrency level. I wouldn't put Mono in a server again until the GC stuff gets fixed, which seems low priority from my dealings with the devs. Still better than Java though.

    Comments
  • 201109.12

    MongoDB for a large queuing system

    Let me set the background by saying that I currently (until the end of the week anyway) work for a large tech company. We recently launched a reader app for iPad. On the backend we have a thin layer of PHP, and behind that a lot of processing via C# with Mono. I, along with my brother Jeff, wrote most of the backend (PHP and C#). The C# side is mainly a queuing system driven off of MongoDB.

    Our queuing system is different from others in that it supports dependencies. For instance, before one job completes, its four children have to complete first. This allows us to create jobs that are actually trees of items all processing in parallel.

    On a small scale, things went fairly well. We built the entire system out, and tested and built onto it over the period of a few months. Then came time for production testing. The nice thing about this app was that most of it could be tested via fake users and batch processing. We loaded up a few hundred thousand fake users and went to town. What did we find?

    Without a doubt, MongoDB was the biggest bottleneck. What we really needed was a ton of write throughput. What did we do? Shard, of course. Problem was that we needed even distribution on insert...which would give us almost near-perfect balance for insert/update throughput. From what we found, there's only one way to do this: give each queue item a randomly assigned "bucket" and shard based on that bucket value. In other words, do your own sharding manually, for the most part.

    This was pretty disappointing. One of the whole reasons for going with Mongo is that it's fast and scales easily. It really wasn't as painless as everyone led us to believe. If I could do it all over again, I'd say screw dependencies, and put everything into Redis, but the dependencies required more advanced queries than any key-value system could do. I'm also convinced a single MySQL instance could have easily handled what four MongoDB shards could barely keep up with...but at this point, that's just speculation.

    So there's my advice: don't use MongoDB for evenly-distributed high-write applications. One of the hugest problems is that there is a global write lock on the database. Yes, the database...not the record, not the collection. You cannot write to MongoDB while another write is happening anywhere. Bad news bears.

    On a more positive note, for everything BUT the queuing system (which we did get working GREAT after throwing enough servers at it, by the way) MongoDB has worked flawlessly. The schemaless design has cut development time in half AT LEAST, and replica sets really do work insanely well. After all's said and done, I would use MongoDB again, but for read-mostly data. Anything that's high-write, I'd go Redis (w/client key-hash sharding, like most memcached clients) or Riak (which I have zero experience in but sounds very promising).

    TL,DR; MongoDB is awesome. I recommend it for most usages. We happened to pick one of the few things it's not good at and ended up wasting a lot of time trying to patch it together. This could have been avoided if we picked something that was built for high write throughput, or dropped our application's "queue dependency" requirements early on. I would like if MongoDB advertised the global write lock a bit more prominently, because I felt gypped when one of their devs mentioned it in passing months after we'd started. I do have a few other projects in the pipeline and plan on using MongoDB for them.

    Comments
  • 201104.15

    PHP finally has anonymous functions??

    Wow, I can't believe I missed this...nobody seems to be talking about it at all. Ever since PHP 5.3, I can finally do non-generic callbacks.

    UPDATE: Check out this description of PHP lambdas (much better than what I've done in the following).

     2 function do_something($value)
     3 {
     4     // used >= 2 times, but only in this function, so no need for a global
     5     $local_function = function($value) { ... };
     6 
     7     // use our wonderful anonymous function
     8     $result = $local_function($value);
     9     ...
    10     // and again
    11     $result = $local_function($result);
    12     return $result;
    13 }
    

    There's also some other great stuff you can do:

    2 $favorite_songs = array(
    3     array('name' => 'hit me baby one more time', 'artist' => 'britney'),
    4     array('name' => 'genie in a bottle', 'artist' => 'xtina'),
    5     array('name' => 'last resort', 'artist' => 'papa roach')
    6 );
    7 $song_names = array_map(function($item) { return $item['name']; }, $favorite_songs);
    

    GnArLy bra. If PHP was 20 miles behind Lisp, it just caught up by about 30 feet. This has wonderful implications because there are a lot of functions that take a callback, and the only way to use them was to define a global function and send in an array() callback. Terrible. Inexcusable. Vomit-inducing.

    Not only can you now use anonymous functions for things like array_map() and preg_replace_callback(), you can define your own functions that take functions as arguments:

     2 function do_something_binary($fn_success, $fn_failed)
     3 {
     4     $success = ...
     5     if($success)
     6     {
     7         return $fn_success();
     8     }
     9     return $fn_failed();
    10 }
    11 
    12 do_something_binary(
    13     function() { echo "I successfully fucked a goat!"; },
    14     function() { echo "The goat got away..."; }
    15 );
    

    Sure, you could just return $success and call whichever function you need after that, but this is just a simple example. It can be very useful to encapsulate code and send it somewhere, this is just a demonstration of the beautiful new world that just opened for PHP.

    So drop your crap shared host (unless it has >= 5.3.0), get a VPS, and start using this wonderful new feature.

    Comments
  • 201011.21

    Vim: Cursor at beginning of tab in normal mode

    One thing that annoys me in Vim is that in normal mode, the cursor defaults to being at the end of a tab character. When I hit "Home" I expect the cursor to go all the way to the left, but instead it hovers 4 spaces to the right of where I expect it to. I stumbled across the answer after reading a mailing list thread for vim.

    set list lcs=tab:\ \ 
    " Note the extra space after the second \

    You can put this in your .vimrc to automatically set this behavior. Very useful.

    Comments
  • 201011.16

    Vim: I can't believe I ignored you all these years

    All these years, since the day I first turned on a linux distribution, I've ignored vi/vim. Sure, there are swarms of geeks covering you with saliva as they spew fact after fact about how superior vim is to everything else, but to me it's always been "that editor that is on every system that I eventually replace with pico anyway."

    Not anymore. Starting a few years back, I've done all of my development in Eclipse. It has wonderful plugins for PHP, C++, Javascript, etc. The past week or so I've been weening myself off of it and diving into vim. What actually got me started is I bought a Droid 2 off ebay for various hacking projects (I'm planning on reviewing it soon). Well, it was really easy to get vim working in it (sorry, lost the link already). I thought, well, shit, I've got vim, what the hell can I do with it? First things first, let's get a plugin for syntax coloring/indentation for a few of my favorite languages. What?! It has all of them already.

    Ok, now I'm interested. I installed vim for Windows (gvim), which was followed by a slow-but-steady growing period of "well, how do I do this" and "HA...I bet vim can't do THI...oh, it can." There are "marks" for saving your place in code, you can open the same file in multiple views (aka "windows"), you can bind just about any key combination to run any command or set of commands, etc. I even discovered tonight there's a "windows" mode for vim that mimics how any normal editor works. I hate to admit it, but I'll be using that a lot. One feature that blew my mind is the undo tree. Not stack, tree. Make a change, undo, make a new change, and the first change you did before your undo is still accessible (:undolist)!

    The nice thing about vim is that it saves none of its settings. Every change you make to it while inside the editor is lost after a restart. This sounds aggravating, but it actually makes playing with the editor really fun and easy. If I open 30 windows and don't know how to close them, just restart the editor. There are literally hundreds of trillions of instances when I was like "oh, shit" *restart*.

    Once you have a good idea of what you want your environment to be like, you put all your startup commands in .vimrc (_vimrc on Windows) and vim runs it before it loads. Your settings file uses the same syntax as the commands you run inline in the editor, which is awesome and makes it easy to remember how to actually use vim.

    So far I'm extremely impressed. The makers of vim have literally thought of everything you could possibly want to do when coding. And if they haven't thought of it, someone else has and has written a plugin you can drop into your plugins directory and it "just works." Speaking of plugins, vim.org's plugin list seems neverending. I was half expecting to see most plugins have a final mod date of 2002 or something, but a good portion have newer version released within the past two weeks. It seems the ones that are from 2002 never get updated because they're mostly perfect. Excellent.

    I do miss a few things though. First off, the project file list most editors have on the left side. I installed NERDTree to alleviate that pain, but honestly it's not the same as having my right click menus and pretty icons. I'm slowly getting used to it though. The nice thing about a text-only file tree is that in those instances where you only have shell access and need to do some coding, there isn't a dependency on a GUI.

    Tabs are another thing I miss. Gvim has tabs, but they aren't one tab == one file (aka "buffer") like most editors. You can hack it to do this, sort of, but it works really jenky. Instead I'm using MiniBufExplorer, which takes away some of the pain. I actually hacked it a bit because I didn't like the way it displays the tabs, which gave me a chance to look at some real vim script. It's mostly readable to someone who's never touched it before.

    That about does it for my rant. Vim is fast, free, customizable, extendable, scriptable, portable, wonderful, etc...and I've barely scratched the surface.

    Comments
  • 201011.06

    Alleged sexual assault at a tech conference

    Let me preface this by saying I know neither of the two people involved in this situation nor have any connection to them other than the fact that I use both Google and Twitter.

    A Google tech writer recently accused Twitter engineer of sexual assault on her blog, and given the responses shot at both sides (Noirin Shirley, accuser, and Florian Leibert, accused) I thought I'd inject my personal thoughts on both the actual report given by Noirin and the responses to the incident.

    First off, it's a big deal to make an accusation like this. Careers hang in the balance, and blah blah blah, we've all heard this already. That said, a lot of women are sexually assaulted and never mention it. A lot tell a few people and it never goes anywhere. A lot try to get help but it never comes.

    I think it's not only amazing, but brave that Noirin had the guts to stand up to her assaulter and accuse him in public. It takes brass balls to do this. It also takes brass balls to do this knowing full well the responses you're going to get because of it. I'm not one to not take stands on things, so I will say I think she's awesome. I'm sick of women getting pushed around and there being no consequences for the men doing it.

    I also do know that women make false accusations, but in my experience the ones who do so have a history of doing so and don't start off doing it later on in life.

    Now, at least one publication is saying that although it's great to be public about this matter, it's not ok to be public about the assailant's name. I have to disagree. So many assaults go unresolved because it's hard to prove unless you have a police officer right there watching, or at least 10 witnesses. Something like this wouldn't hold up in court. It's important that the person who did it be publicly recognized for his actions, because otherwise there very well may be no consequences, ever.

    A lot of people are saying that she should say absolutely nothing until the police investigate and the courts make a decision. I have to wonder if they are batshit insane. First off, the police generally have "more important" things to worry about than "hey sum guy jus touched my privatz," unfortunately. And without any material evidence, it will never hold up in court. What I'm getting at is that even though I love our justice system here in good old USA, there are many things that will fall through the cracks. Does Noirin really need the police or court system to validate what she experienced that night? That's fucking insane! She knows what happened better than the police or courts, and has every right to talk about it. Plus, she's opening herself up to a world of legal trouble by doing this, which is just one more reason she's brave for doing it (and one more incentive to NOT do it falsely).

    Let me put it this way: If somebody assaults you, you have the right to fucking let the world know who did it and what happened! Just because it won't hold up in court (and believe me, it won't) doesn't mean it didn't happen, and doesn't mean the assailant shouldn't suffer the social consequences. If a rape happens in the woods and nobody is there to witness it, did it happen? The courts, rightfully so, say "No." But it still happened, and the aggressor needs to pay for it in some way.

    If she lied about it, then that's another issue entirely. If it did happen, as she said it did, then good for her for letting the world know and making the world that much safer for women.

    Either way, there are some very good counter-arguments and discussion on the reddit comments page for the post, which I spent a good amount of time reading before making this post.

    Comments
  • 201008.17

    PHP's preg functions don't release memory??

    We were writing some parsing code for a client today. It takes a long string (html) and parses it out into array items. It loops over the string recursively and running a few preg_replaces on it every pass. We got "out of memory" errors when running it. After putting in some general stats, we found that memory usage was climbing 400k after each block of preg_replaces, which was being added on each loop (there were around 600 loops or so). This memory just grew and grew, even though the recursion at most got 6 levels deep. It was never being released.

    I did some reading and found that the preg* functions cache up to 4096 regex results in a request. This is the problem...a pretty stupid one too. It would be nice if they made this a configurable option or at least let you turn it off when, say, you are running a regex on a different string every time (why the hell would I run the same regex on the same string twice...isn't that what variables are for?) Unless I'm misunderstanding and PHP caches the compiled regex (but not its values)...but either way, memory was climbing based on the length of the string.

    Since the regex was only looking at the beginning of the string and disregarding the rest (thank god), the fix was easy (although a bit of a hack):

    $val = preg_replace('/.../', '', $long_string);

    Becomes:

    $short_string = substr($long_string, 0, 128);
    $val = preg_replace('/.../', '', $short_string);

    PHP guys: how about an option to make preg* NOT have memory leaks =).

    Comments
  • 201006.07

    Monit, how did I ever live without you?

    In my latest frenzy, which was focused on HA more than performance, I installed some new servers, new services on those servers, and the general complexity of the entire setup for beeets.com doubled. I was trying to remember a utility that I saw a while back that would restart services if they failed. I checked my delicious account, praying that I had thought of my future self when I originally saw it. Luckily, I had saved it under my "linux" tag. Thanks, Andrew from the past.

    The tool is called monit, and I'm surprised I ever lived without it. Not only does it monitor your services and keep them running, it can restart them if they fail, use too much memory/cpu, stop responding on a certain port, etc. Not only that, but it will email you every time something happens.

    While perusing monit's site, I saw M/Monit which allows you to monitor monit over web, essentially. The only thing I scratched my head about was that M/Monit uses port 8080 (which is fine) but NginX already uses port 8080, and I wasn't about to change that, so I opened conf/server.xml and looked for 8080, replaced with 8082 (monit runs on 8081 =)). Then I reconfigured monit to communicate with M/Monit and vice versa, and now I have a kickass process monitor that alerts me when things go wrong, and also sends updates to a service that allows me to monitor the monitor.

    I can't look at things like queries/sec as I can with Cacti (which is awesome but a little clunky) but I can see which important services are running on each of my servers, and even restart them if I need to straight from M/Monit. The free download license allows to use M/Monit on one server, which is all I need anyway.

    Great job monit team, you have gone above and beyond.

    Comments
  • 201006.07

    Strange problems with hosts resolving in PHP (and some other linux weirdness)

    This weekend I wen't on a frenzy. I turned beeets.com from a single VPS enterprise to 4 VPSs: 2 web (haproxy, nginx, php-fpm, sphinx, memcached, ndb_mgmd) and 2 database servers (ndmtd). There's still some work to do, but the entire setup seems to be functioning well.

    I had a few problems though. In PHP (just PHP, and nothing else) hosts were not resolving. The linux OS was resolving hosts just fine, but PHP couldn't. It was frustrating. Also, I was unable to sudo. I kept checking permissions on all my files in /etc, rebooting, checking again, etc.

    The fix

    Then I looked again. /etc itself was owned by andrew:users. Huh? I changed permissions back root:root, chmod 755. Everything works. Now some background.

    A while back, I wrote some software (bash + php) that makes it insanely easy to install software to several servers at once, and sync configurations for different sets of servers. It's called "ssync." It's not ready for release yet, but I can say without it, I'd have about 10% of the work done that I'd finished already. Ssync is a command-line utility that lets you set up servers (host, internal ip, external ip) and create groups. Each group has a set of install scripts and configuration files that can be synced to /etc. The configuration files are PHP scriptable, so instead of, say, adding all my hosts by hand to the /etc/hosts file, I can just loop over all servers in the group and add them automatically. Same with my www group, I can add a server to the "www" group in ssync, and all of a sudden the HAproxy config knows about the server.

    Here's the problem. When ssync was sending configuration files to /etc on remote servers, it was also setting permissions on those files (and folders) by default. This was because I was using -vaz, which attempts to preserve ownership, groupship, and permissions from the source (not good). I added some new params (so now it's "-vaz --no-p --no-g --no-o"). Completely fixed it.

    Comments
  • 201005.10

    HAProxy's keep-alive functionality (and how it can speed up your site)

    A while back I wrote a post about using NginX as a reverse-proxy cache for PHP (or whatever your backend is) and mentioned how I was using HAProxy to load balance. The main author of HAProxy wrote a comment about keep-alive support and how it would make things faster.

    At the time, I thought "What's the point of keep-alive for front-end? By the time the user navigates to the next page of your site, the timeout has expired, meaning a connection was left open for nothing." This assumed that a user downloads the HTML for a site, and doesn't download anything else until their next page request. I forgot about how some websites actually have things other than HTML, namely images, CSS, javascript, etc.

    Well in a recent "omg I want everything 2x faster" frenzy, I decided for once to focus on the front-end. On beeets, we're already using S3 with CloudFront (a CDN), aggressive HTTP caching, etc. I decided to try the latest HAProxy (1.4.4) with keep-alive.

    I got it, compiled it, reconfigured:

    defaults
    	...
    	option httpclose

    became:

    defaults
    	...
    	timeout client  5000
    	option http-server-close
    Easy enough...that tells HAProxy to close the server-side connection, but leave the client connection open for 5 seconds.

    Well, a quick test and site load times were down by a little less than half...from about 1.1s client load time (empty cache) to 0.6s. An almost instant benefit. How does this work?

    Normally, your browser hits the site. It requests /page.html, and the server says "here u go, lol" and closes the connection. Your browser reads page.html and says "hay wait, I need site.css too." It opens a new connection and the web server hands the browser site.css and closes the connection. The browser then says "darn, I need omfg.js." It opens another connection, and the server rolls its eyes, sighs, and hands it omfg.js.

    That's three connections, with high latency each, your browser made to the server. Connection latency is something that, no matter how hard you try, you cannot control...and there is a certain amount of latency for each of the connections your browser opens. Let's say you have a connection latency of 200ms (not uncommon)...that's 600ms you just waited to load a very minimal HTML page.

    There is hope though...instead of trying to lower latency, you can open fewer connections. This is where keep-alive comes in.

    With the new version of HAProxy, your browser says "hai, give me /page.html, but keep the connection open plz!" The web server hands over page.html and holds the connection open. The browser reads all the files it needs from page.html (site.css and omfg.js) and requests them over the connection that's already open. The server keeps this connection open until the client closes it or until the timeout is reached (5 seconds, using the above config). In this case, the latency is a little over 200ms, the total time to load the page 200ms + the download time of the files (usually less than the latency).

    So with keep-alive, you just turned a 650ms page-load time into a 250ms page-load time... a much larger margin than any sort of back-end tweaking you can do. Keep in mind most servers already support keep-alive...but I'm compelled to write about it because I use HAProxy and it's now fully implemented.

    Also keep in mind that the above scenario isn't necessarily correct. Most browsers will open up to 6 concurrent connections to a single domain when loading a page, but you also have to factor in the fact that the browser blocks downloads when it encounters a javascript include, and then attempts to download and run the javascript before continuing the page load.

    So although your connection latency with multiple requests goes down with keep-alive, you won't get a 300% speed boost, more likely a 100% speed boost depending on how many scripts are loading in your page along with any other elements...100% is a LOT though.

    So for most of us webmasters, keep-alive is a wonderful thing (assuming it has sane limits and timeouts). It can really save a lot of page load time on the front-end, which is where users spend the most of their time waiting. But if you happen to have a website that's only HTML, keep-alive won't do you much good =).

    Comments
  • 201005.03

    Using gzip_static in nginx to cache gzip files

    Recently I've been working on speeding up the homepage of beeets.com. Most speed tests say it takes between 4-6 seconds. Obviously, all of them are somehow fatally flawed. I digress, though.

    Everyone (who's anyone) knows that gzipping your content is a great way to reduce download time for your users. It can cut the size of html, css, and javascript by about 60-90%. Everyone also knows that gzipping can be very cpu intensive. Not anymore.

    I just installed nginx's Gzip Static Module (compile nginx with --with-http_gzip_static_module) on beeets.com. It allows you to pre-cache your gzip files. What?

    Let's say you have the file /css/beeets.css. When a request for beeets.css comes through. the static gzip module will look for /css/beeets.css.gz. If it finds it, it will serve that file as gzipped content. This allows you to gzip your static files using the highest compression ratio (gzip -9) when deploying your site. Nginx then has absolutely no work to do besides serving the static gzip file (it's very good at serving static content).

    Wherever you have a gzip section in your nginx config, you can do:

    gzip_static on;

    That's it. Note that you will have to create the .gz versions of the files yourself, and it's mentioned in the docs that it's better if the original and the .gz files have the same timestamp; so it may be a good idea to "touch" the files after both are created. It's also a good idea to turn the gzip compression down (gzip_comp_level 1..3). This will minimally compress dynamic content without putting too much strain on the server.

    This is a great way to get the best of both worlds: gzipping (faster downloads) without the extra load on the server. Once again, nginx pulls through as the best thing since multi-cellular life. Keep in mind that this only works on static content (css, javascript, etc etc). Dynamic pages can and should be gzipped, but with a lower compression ratio to keep load off the server.

    Comments
  • 201005.03

    Javascript minification with JSMin and gzip

    Here's a good tip I just found. Note that this may not be for all cases. In fact, I may have stumbled on a freak coincidence. Here's the story:

    I hate java. I hate having java on a server, but hate it even more if it's only for running one small script. Forever, beeets.com has used the YUI compressor to shrink its javascript before deployment. Well, YUI won't run without java, so for the longest time, jre has been installed collecting dust, only to be brushed off and used once in a while during a deployment. This seems like a huge waste of space and resources.

    Well, first I tried gcj. Compiling gcj was fairly straightforward, thankfully. After installing, I realized I needed to know a lot more about java in order to compile the YUI compressor with it. I needed knowledge I did not have the long-term need for, nor the will to learn in the first place. I, although revering myself as extremely tenacious, gave up.

    I decided to try JSMin. This nifty program is simple, elegant, and it works well. It also has a much worse compression ratio then YUI. However, I trust any site that hosts C code and has no real layout whatsoever. Knowing the compression wasn't as good, I still wanted to see what kind of difference gzipping the files would have.

    I recorded the size of the GZipped JS files that used YUI. I then reconfigured the deployment script to use JSMin instead of YUI. I looked at the JS files with JSMin compression:

    YUI:
    mootools.js     88.7K (29.6K gz)
    beeets.js       61.5K (20.5K gz)
    JSMin:
    mootools.js    106.1K (29.5K gz)
    beeets.js       71.0K (17.7K gz)

    Huh? GZip is actually more effective on the JS files using JSMin vs YUI! The end result is LESS download time for users.

    I don't know if this is a special case, but I was able to derive a somewhat complex formula:

    YUI > JSMin
    YUI + GZip < JSMin + GZip

    Who would have thought. See you in hell, java.

    Comments
  • 201003.30

    Marijuana reform (or how I learned to stop worrying and love the revenue from taxing it)

    Being a heavy and casual marijuana user for almost 10 years, and knowing many others who also are/were, I think I have a pretty good understanding of its effects, both positive and negative. I'd like to dispel some myths.

    First off, you always hear that marijuana is a gateway drug. I respond: being a teenager is a gateway drug. The emotions, the hormones, the internal and external influences pulling you in a thousand directions every second of your life...it's a wonder most of us make it through. That alone is enough to make most people want to try just about every drug out there. Also, another reason marijuana is a gateway drug is because kids are always taught how terrible it is and how addictive it is. So what's the next thing they do? They try it. After finding that they were lied to and mislead, they learn to mistrust those telling them that "all drugs are bad." So now heroin or cocaine doesn't seem so bad either, even though they have much more far-reaching effects than marijuana. The point is, the only real cause of marijuana being a "gateway drug" is the fact that kids are constantly being told lies about it. The fix? Honesty.

    Secondly, marijuana in moderation has no permanent effects. You can smoke till yer stupid for a few months, but take a week off and you bounce back completely. Its tar is more harmful than that of tobacco, but who aside from the most extreme users smokes a cigarette-pack's worth of joints every day? The only way to get cancer from marijuana is to pump the smoke into a ventilator and breath it in 24/7. With cutting-edge advances in technology, there are now vaporizers, which remove the tar from smoking. It's safer than ever.

    Thirdly, smoking marijuana is a personal choice. Here we are, in the "land of the free," restricted from doing things that even if they do have some negative effect, only affect us personally. It's not illegal to saw off my arm. It's not illegal to use a pogo stick next to the grand canyon. Why can't I take a puff on a joint? Who am I harming?

    Now to my main point. We're in an economic crisis. We're spending a lot of money on battling imports of drugs (including marijuana), and also spending a lot of money keeping potheads in prison (thanks, prison lobby). That's two very large drains on our economy to

    1. Fund a losing battle. I can go anywhere in almost any town in the US and within an hour, even not knowing anyone, get an eighth of weed. Good job drug war, money well spent. It's good to know that the taxes I just filed will go to "stopping" me from buying marijuana.
    2. Keep pot offenders in prison. Yeah, these people are really dangerous. They are on the edge of the law...sitting on the couch eating chips and giggling. The more money I can spend to keep them locked up, the better. Oh sure, most of them are dealers, but our culture is founded on the principals of capitalism: if a market exists, fill the void and capitalize. Makes sense to me. Nobody would sell pot if nobody wanted to smoke it. Yes it's illegal, but once again let's ask ourselves why instead of pointing to a law.

    Now imagine a world where the government grew, cultivated, sold & taxed pot. That's a lot of money we'd make back. Hell even if they raised the price on it, it'd be worth it to just be able to walk into a store and buy it. They could use the revenue from pot to plug the holes caused by battling all the other drugs.

    Maybe it's time to really start thinking about this. If you are against legalization of marijuana, ask yourself why. Anyone who wants to smoke it already does. Show me a person who wants to smoke pot but doesn't because it's illegal, and I'll show you the portal that takes you out of Neverland and back to reality.

    Conservative America: you want a smaller government with less services and less control on the population in general. Why not start with drug reform?

    Comments
  • 201003.21

    How to get the TOP2004 programmer running under Windows 7 64-bit

    UPDATE - Apparently closing the VM, unplugging the programmer, unselecting the programmer from the USB device menu, or pausing the VM after the programmer has been loaded by the VM makes Windows 7 bluescreen. So far, I have not found a way around this, as such the TOP2004 is effectively useless again. At least it's able to program chips and stuff, but once loaded, the VM has to stay open and has to be running. Pretty lame. I'll try to find a fix and update (BTW I'm using the latest VirtualBox as of this writing). Any ideas?


    top2004I love electronics. Building basic circuits, programming microcontrollers, making malicious self-replicating robots programmed to hate humans, and even so much as wiring up complete motherboards with old processors and LCDs. I had to find a USB flash/eeprom programmer that fit my hardcore lifestyle. On ebay a few years back, I bought the TOP2004. This wondrous piece of Chinese equipment is cheap, cheap, and USB. I needed USB because in the process of making my own flash programmer a while back, I destroyed half the pins on my parallel port. The programmer worked great, but only worked for one chip. I needed something a bit more versatile. The top2004 isn't a bad piece of equipment. The manual was translated poorly from Chinese, as is the software that comes with it.

    Well, for the longest time, I was a Windows XP guy. Nowadays it's all about Windows 7. Don't get me wrong, I'm Slackware through and through, but I need my gaming. So I installed 64-bit Windows and love it, but my programmer no longer works.

    Requirements: a 64-bit OS that doesn't let you use 32-bit drivers (namely Windows 7 x64), a 32-bit version of Windows laying around, virtualization software (check out VirtualBox) which is running your 32-bit version of Windows, a Top2004 programmer, DSEO, and the infwizard utility with libusb drivers (virus free, I promise).

    Here's the fix:

    1. I remembered when jailbreaking my iPod a while back with Quickfreedom that there was a utility used to sniff out USB devices called infwizard, which I believe is part of the libusb package. I never liked libusb because I remember it royally messing up my computer, but the infwizard program was dandy. It can write very simple drivers for USB devices without any prior knowledge of what they are. I used this with the programmer plugged in to create a makeshift driver. Note: Make sure the libusb* files in that zip provided are in the same directory as the .INF file you create for the programmer.
    2. 64-bit Windows doesn't like you to load unsigned drivers. In fact, it doesn't allow it at all. You have to download a utility called DSEO (Driver Signature Enforcement Override) to convince Windows that it should let you load the driver you just created.
    3. Once you turn driver enforcement off and load up the driver, you should now be able to see your TOP programmer in the device list. Boot your VM, which previously couldn't use the programmer (because it had no driver), and install v2.52 of the TopWin software. Once installed, you should be able to select the TOP2004 from the USB device list, and voilá...your programmer works.

    Obviously running it in a VM is less than ideal, but it's better than dropping $200 on a real programmer that might actually have 64-bit support. The great part about this version (2.52) of the TopWin software is that it supports the atmega168, which is almost exactly the same as the atmega328...meaning arduino fans new and old can use it. I'm not an arduino guy and use the chip just by itself with avr-gcc, but you can do whatever the hell you want once you get the TOP programmer working.

    Comments
  • 201002.23

    California Proposition 16 - "California Taxpayers Right to Vote Act"

    With a name like "California Taxpayers Right to Vote Act," you know there is an ulterior motive. We already have the right to vote, right? In fact, we do. So what is Prop 16, really?

    Prop 16 is designed such that before a city or state entity buys a section of power grid and resells that power to its residents, it must hold an election and get a 2/3 majority vote. While it may seem nice to have the voters decide on whether or not a government entity should be spending their money, it actually doesn't make sense. The reason is the expenses involved in NOT allowing the entities to do this.

    Think of it this way. A government agency spends some of your tax dollars buying up sections of the power grid. Money lost, right? Not necessarily. After they own that part of the grid, they start charging you for the power they give you. Great, so they spend your money to charge you money...but wait, there's more. Because the government entity is essentially a business at this point which provides a service and charges for that service, it's making money back. On top of this, the residents now have a choice of who they get their power from. This is known as "competition" and is the leading force against monopolization in any industry. If a market segment is profitable, doesn't it make sense for the government to capitalize on that market segment?

    Now let's look from another angle. You are a taxpayer (I'm assuming) and you want to make the final decision about whether or not a section of power grid is bought. This is great, but your local government spends a lot (I mean, a LOT) of money without your express permission because we as a city/state/country give our government that power. We elect people to handle this in our stead because we are busy and don't have the time to make every decision collectively. That's how a representative republic works (no, the U.S.A. is not a democracy, sorry!!)

    So why bother holding an (expensive) election so the govt. entity can spend more money petitioning and explaining to you why it's good that they actually make money? Especially when they're spending your money on lots of other things, all the time. Holding an election to give a local government entity the right to actually turn your tax dollars into profit (or at least offer you lower prices on energy) seems like a waste of time, no?

    So where did this bill come from? If you read the Wikipedia page, it's obvious: PG&E. Now, I have nothing against these guys. They do a great job, and obviously they're just protecting their interests. They do not want the government competing with them, which is why thus far they have donated $6.5 million to the campaign, and have stated they plan to donate up to $35 million total. They obviously have a vested interest in forcing local governments to get 2/3 support in elections (which is very, very hard to do).

    By voting "Yes!" on prop 16, you gain absolutely no more rights than you had before, you only make it harder for local and state governments to turn your tax dollars into something useful: cheap power for you. The name "California Taxpayers Right to Vote Act" is a misleading name designed to dupe the voters (that's you!) into voting for higher energy prices and less competition in the energy market.

    It's important that our local governments are accountable for the money they spend, but passing highly targeted, specific bills that force them to ask, nay, beg, the voters for approval on everything they spend money on slows (if not stops) progress and makes our government much less useful...after all, we're already electing them and paying them to decide where our money goes. Doesn't voting on every single issue defeat the purpose of appointing representation?

    Also, if the residents of a city really do not want the government spending their money on buying areas of power grid, they can get a ballot intiative (which takes a handful of signatures) and vote on it themselves.

    Comments
  • 201002.22

    NearlyFreeSpeech.net downtime

    https://www.nearlyfreespeech.net/I've never formally written about NearlyFreeSpeech.net until now, even though I've always had good things to say. Even after some extended downtime I'm excited to say I'm still on the bandwagon. The reason is their transparency. I've worked with many hosts before, and none are as honest or transparent. Even the Rackspace Cloud gave glossed over responses to their problems. A few of my NFS sites went down just about all of yesterday because of a server failure. I logged into the control panel (proprietary, but I actually prefer it over cPanel or hsphere) and looked at the sticky support note left for all customers.

    I half-expected to see "teh service is dwn!! were trying to fix it! sry lol!" as with most hosts. Instead, there was page after page of updates with details and explanation. After this, I was able to rest easy, because I had a good idea of how long it would take to get everything back up. Whenever it didn't go as planned, they'd post another update.

    I cannot stress how awesome this was. Yes, they made my downtime awesome by treating me and the other customers as if we were techs in the server room. I didn't really care about my sites being down, because I knew they were working really hard on it and probably wouldn't go to bed until it was fixed.

    This brings up another point though: transparency makes your customers wet. I know it's been discussed time after time, but it really is true. People don't like "Our apologies, our service went down," as much as they like "Our service went down from 6:30-8:30 UTC today when lightning struck main Big-IP load-balancer, and our failover didn't switch the backup on."

    What's a Big-IP? What's "failover?" It doesn't matter...treating your customers as equals and letting them decide if information is relevant or not will make them wet.

    In my three years experience with NFS, this is the first downtime I've experienced. Their support was amazing enough to update every customer with detailed information about the problems they were experiencing and how they were fixing it. I cannot recommend them more. For larger sites that require custom services running, you're out of luck. For blogs, informational sites, paypal-driven shopping carts (no SSL, yet), etc this is the best shared host I've dealt with, ever. They're dirt cheap, and the only host I know of who won't disable your site without a court order or copyright violation.

    Comments
  • 201002.16

    Is Open Source too open?

    I recently read a post on a web development firm's blog (anonymous to protect them and myself). It was talking about how open-source web software is inferior to closed-source. The main reasoning was that open-source allows attackers to find vulnerabilities just by sifting through the code. The company touts their proprietary CMS as better than Drupal or Wordpress because only they (and their customers, heh) see the source code. Therefore it's rock solid.

    I was kind of blown away by this. Obviously it's a marketing ploy to scare unknowing customers into using them instead of doing a simple Wordpress install, but it's blatantly wrong and I feel the need to respond. Oddly enough, their blog is in Wordpress. Hmm.

    First off, all software has vulnerabilities. All servers have vulnerabilities. Yes, it's easier to find them if you know the setup or know the code, but from what I've seen in my lifetime of computer work is this: if someone wants to hack your site, they will. If there is a vulnerability, they will find it. And as I just said, all software has vulnerabilities. It's stupid to assume that because the source is only readily available to people who pay you money and the people who work on their site after you that no vulnerabilities will ever be found. They will be found. Look at Google. They were just hacked by China. Does Google open source their Gmail app? No, completely closed-source. But someone wanted to hack them, so they got hacked. That's what happens. Also, if your proprietary CMS is written in PHP, Python, Ruby, Perl, etc etc...you're still using open source. Someone could attack the site at the language level. Does it make sense to now develop your own closed-source programming language so nobody will ever be able to hack it?

    Secondly, most well-known open-source software has been around a very long time and has had hundreds of thousands (if not millions) of people using it. This means that over time, it gets battle-hardened. The common and not-so-common vulnerabilities are found, leaving the users with the latest versions a rock-solid code base that has gone through thousands of revisions to be extremely secure. With open-source, you've got hundreds of eyes looking over everything that's added/changed/removed at all times. With proprietary code, you get a few pairs of eyes at best, with much fewer installs, much fewer revisions to harden and secure.

    Is open-source better than proprietary? If you're poor, most likely, but otherwise they both have their good and bad points. The main point of this article isn't to bash proprietary software at all, it's to refute the claim that because the source is open the product is less secure. I believe the exact opposite, in fact. If your code is open for everyone to look at, you damn well better be good at seeing vulnerabilities before they even get deployed...and if you don't catch it, someone else developing the project probably will.

    Is open source too open? Hell no.

    Comments
  • 201002.04

    NginX as a caching reverse proxy for PHP

    So I got to thinking. There are some good caching reverse proxies out there, maybe it's time to check one out for beeets. Not that we get a ton of traffic or we really need one, but hey what if we get digged or something? Anyway, the setup now is not really what I call simple. HAproxy sits in front of NginX, which serves static content and sends PHP requests back to PHP-FPM. That's three steps to load a fucking page. Most sites use apache + mod_php (one step)! But I like to tinker, and I like to see requests/second double when I'm running ab on beeets.

    So, I'd like to try something like Varnish (sorry, Squid) but that's adding one more step in between my requests and my content. Sure it would add a great speed boost, but it's another layer of complexity. Plus it's a whole nother service to ramp up on, which is fun but these days my time is limited. I did some research and found what I was looking for.

    NginX has made me cream my pants every time I log onto the server since the day I installed it. It's fast, stable, fast, and amazing. Wow, I love it. Now I read that NginX can cache FastCGI requests based on response caching headers. So I set it up, modified the beeets api to send back some Cache-Control junk, and voilà...a %2800 speed boost on some of the more complicated functions in the API.

    Here's the config I used:

    # in http {}
    fastcgi_cache_path /srv/tmp/cache/fastcgi_cache levels=1:2
                               keys_zone=php:16m
                               inactive=5m max_size=500m;
    # after our normal fastcgi_* stuff in server {}
    fastcgi_cache php;
    fastcgi_cache_key $request_uri$request_body;
    fastcgi_cache_valid any 1s;
    fastcgi_pass_header Set-Cookie;
    fastcgi_buffers 64 4k;

    So we're giving it a 500mb cache. It says that any valid cache is saved for 1 second, but this gets overriden with the Cache-Control headers sent by PHP. I'm using $request_body in the cache key because in our API, the actual request is sent through like:

    GET /events/tags/1 HTTP/1.1
    Host: ...
    {"page":1,"per_page":10}

    The params are sent through the HTTP body even in a GET. Why? I spent a good amount of time trying to get the API to accept the params through the query string, but decided that adding $request_body to one line in an NginX config was easier that re-working the structure of the API. So far so good.

    That's FastCGI acting as a reverse proxy cache. Ideally in our setup, HAproxy would be replaced by a reverse proxy cache like Varnish, and NginX would just stupidly forward requests to PHP like it was earlier today...but I like HAproxy. Having a health-checking load-balancer on every web server affords some interesting failover opportunities.

    Anyway, hope this helps someone. NginX can be a caching reverse proxy. Maybe not the best, but sometimes, just sometimes,  simple > faster.

    Comments
  • 201002.01

    Mosso (The Rackspace Cloud)

    After being a customer for the Rackspace Cloud (formerly Mosso) for quite some time, I'm happy to say that my business and anyone who listens to our advice will never be using this hosting service, ever again.

    Rackspace is an amazing company. They are know for having great servers, great support, great everything. You can't beat them. Mosso was a side project that was swallowed up by them which aims to run websites in a real, actual cloud. This is a valiant cause. To be able to upload a site to one server and have it scale infinitely over however many servers their datacenter has without ever having to touch it...that's a miracle. It's a great idea, that unfortunately just doesn't work.

    Mosso has repeatedly let us down, again and again. Their service is always going down. It's hard to find a month where one of our sites hosted on the "cloud" hasn't seen at least an hour of down time. I'd expect this from a shoddy "HOST 100 SITES FOR $2.99/mo!!" host, but not from someone charging a base rate of $100/mo. Here's what it boils down to: you're paying Mosso a lot of money for the privilege of beta testing their cloud architecture. Great business model.

    And while Rackspace is known for fanatical support, the Rackspace Cloud is known by us for support that is fanatical about ignoring or avoiding the issues plaguing them on a week-to-week basis. Questions go unanswered, support requests ignored, etc etc.

    So all in all, it's been a terrible experience. And yes, we have been using them for more than a month...a little over a year now. Yes, we stuck it out and payed outlandish hosting rates for horrible service. Why? Because I really do wish it worked. I wish I could put a site on it and have it be up 100% of the time. That's the point of a cloud, no? To have >= 99.999% uptime? I really wish I could put a site on there and let it scale with demand as it grew without ever having to touch it - and I can do this - but the price is my site goes down for long periods of time at short intervals (oh, plus the $100/mo). We tried to give them the benefit of the doubt, and tried to believe them every time they told us that this was the last downtime they'd be having (yes, we heard it a lot). I just can't lie to myself any more though. Mosso sucks.

    So please save yourself some time and realize that it's too good to be true. The Rackspace Cloud is the most real and cool cloud hosting you'll ever see, but as far as I'm concerned they are still alpha-testing it, and your site WILL go down. Want hosting that scales automatically, is zero customer maintenance, always up, and has amazing support? You won't find it anywhere.

    Mosso comes close, but they just can't get it right. Save your money and learn how to scale on a good VPS provider.

    Comments

 |