» Publishers, Monetize your RSS feeds with FeedShow: More infos (Show/Hide Ads)
There’s been some great discussions about the state of programming. Confession: I’m much more of a sysadmin and architecture guy than anything else at this point. If it doesn’t have a quick configuration file or a GUI, at this point, I don’t do much with it because I don’t have the time to learn everything. That’s even after focusing our core web environment on two technologies (php/python) and doing our best to reject anything that doesn’t fit into them.
Here’s the first one: Whatever Happened to Programming @ The Reinvigorated Programmer, and here’s it’s second part: It May Not Be As Bad as All That.
Pay special attention to the addendum in that second article. The money quote for me was in the big pull from a comment by jdeitrich on HackerNews:
We talk about ‘flow’ quite a lot in software and I just have to wonder what’s happening to us all in that respect. Just like a conversation becomes stilted if the speakers keep having to refer to their phrasebooks and dictionaries, I wonder how much longer it will be possible to retain any sort of flowful state when writing software. Might the idea of mastery disappear forever under a constant torrent of new tools and technologies?
It’s the death of the hobbyist programmer. There’s a new framework release in Symfony or Zend Framework every time I re-surface a week or two later. Even with 10 years experience with programming, unit tests, and a decent level of comfort from the experience with 0.x versions and up of these frameworks, I spend all the time I *should* be coding with my nose in the docs updating code that’s been deprecated or migrated. Just keeping up in one framework can be a full time job.
How can anything get done like this?
Wordpress 2.9 changed the permission structure away from the permission-based ACL, which confused many users, and created a role-based ACL where roles have permissions. This has royally fubared a few of my sites, which used extensive ACL settings with some custom plugins to enable fine-grained permissions. On the other hand, few people understood the old permission format, things were complicated enough that a user could trip over themselves and inadvertently grant multiple contradictory permissions to someone, and it was difficult to teach and explain the administrative interfaces.
The first step towards straightening out the new permissions structure is creating and/or changing the existing roles. Steph over at SillyBean has a good article on creating roles in the PHP code, and also mentions Justin Tallock’s Members Plugin, which automates a bunch of the things that she explains. Of course, there’s always the Wordpress Documentation, and the reasoning behind the changes are in this Wordpress Trac ticket.
One method of backup or recovery isn’t enough. Period. No matter what anyone tells you, what the book says, what your boss says, or what you think you need, you need to be backing things up in many ways.
Here’s a few examples.
MySQL
Theoretically, you could recover anything you needed from the binary log, as long as you’ve got a good starting point and a good ending point. (This, by the way, is a good reason to flush the binary logs and take a backup on a regular basis.) What if your binary log’s corrupted, though? You need to fall back to a full SQL backup … which you’re doing regularly, right?
If your binary log is corrupted, any mirrors you are using that are based on that binary log are corrupted as well.
Case in point: I had a client with a very active, very large database… north of 15GB in InnoDB. The binary log hit a bug and corrupted itself. The backups were being done from that mirror so that they didn’t interrupt the main machine’s processing, but they only kept a few days worth, so we couldn’t use those backups to restore. The most recent un-corrupted dump from the main machine had been taken three months before. Luckily, the client had done some application-level backups to an XML format, and we were able to (laboriously) restore from that. It cost about $3,000 because they didn’t want to degrade their forum’s performance for a half hour every night and pay for an extra TB or storage or so to keep more than a few days worth of upgrades.
Servers
Scenario: Hard drive gets corrupted or dies. You need to get the machine back up quickly. You have a snapshot of the machine … but your snapshot is on the same storage as that machine unless you back it up somewhere else.
On top of that, storage requirements have been growing rapidly for servers. Where a linux server take less than 1GB, Windows 2008R2 can take up 20GB with system files alone. (In fact, if you plan to have any data on that server, or keep any logs, we’d recommend going with 40GB minimum for your C: drive.) It’s important to back that up to something that’s not on the same system disks.
Better yet, take a hint from the application-level backups — and back up your registry, configuration files, and data separately from the snapshot. We tend to use RSync for this role and put it in a rolling-backup mode with the –link-dest option to ease recovery.
VMWare
Same principle as above. Snapshots are usually stored in the same datastore. Datastore goes bye-bye, so do your snapshots.
There’s some great products out there that can really help with this issue. The one we use is VEEAM Replication and Backup. It can be used to replicate a snapshot to another VMWare cluster, or back up the datastore files at a consistent snapshot point and then copy them elsewhere all in one step. We use a two-step process — we keep them locally on the backup server and also transmit them to another datacenter across campus.
When using VEEAM with Windows, make sure that VMWare Tools is installed and that you enable the VSS integration. (You’ll also need to make sure that the administrative share option on the system drives are enabled, and that the appropriate firewall ports are opened.) This ensures that you’ve got a transactionally consistent backup snapshot.
Practice, practice, practice
The only way to make sure that you can recover from a disaster is to test recovering from a disaster. At least once a year, we practice recovering from a worst-case scenario. That means bringing up a new machine from scratch, re-implementing all of the options and configurations, and then restoring the data. Despite that kind of restoration being something that should never happen, it does — and practice gives you insights into how to improve the processes and turns a recovery operation from an expensive nightmare that sets back all of your other processes into something that you can execute quickly and professionally.
Besides the obvious, the thing that pisses me off most about Google Buzz is having to mark things read twice — once in google reader, once in Buzz. Still experimenting to see if I can hide/unfollow people in Google Reader and not have them unfollowed in Buzz, or vice versa.
I’m happy to see it; I’m happy to be involved in it.
Sun has some of the best ideas in the world. From a creativity point of view, they’re pretty amazing. From an implementation point of view, with some notable exceptions (ex: Fishworks), they’re pitiful. Sun couldn’t get laid in a whorehouse wearing a suit made of hundred dollar bills.
Half of Sun’s ideas were half-baked. (Either go fully baked, a’la Steve Jobs, or lay off whatever writes you make bad haiku, mmkay?) The x45xx line of servers is a wonderful idea and a wonderful form factor, and Sun overcame significant engineering challenges to develop it. Unfortunately, the first gen fell down hard under load and were practically unusable. The second gen is still suffering from some high replacement part and add-on costs that don’t justify the price in many cases. The integration of ZFS and SSDs as ZIL/L2ARC is wonderful, but there are a ton of technical problems that customers keep running into and Sun keeps refusing to acknowledge. It took three months to solve the problem I was having with SSDs and ZIL. I place the blame for the former on poor management controls, and the latter on excessive outsourcing of core competencies. Both are failures of management to execute the brilliant ideas that engineers come with.
It’s nice to see a company with a reputation for being able to execute and capitalize on new ideas come in. Oracle’s already started to cut, and all of the cuts I know of so far in my various interactions with the company have been well-justified. I’m really excited, from the point of view of someone with several relationships with the company, to see what comes of this merger.
Check this out:

RHN Fail
Yeah, that’s what you see when you visit rhn.redhat.com — which you need to use to administer redhat subscriptions. I can’t get my servers to subscribe while the site’s down, and I can’t manage my entitlements or buy new ones.
One of my consulting projects has been on hold for days while RHN sorts itself out. Worse, you can’t even log in to report the problem. If you click on the “contacting us” link, you get taken to a page with a couple of mailing lists. Well, why join a mailing list? I know the site’s down. I want to file an engineering report. I click the last option, which is supposed to allow me to file such a report. It says I need to log in to file a report. FAIL.
It does seem that there’s some awareness of the problem. Poking around in the rest of the redhat.com domain, I got messages like this:

Why’s Redhat losing market share? They can’t even run a website well. Who’s going to trust their server distro when they can’t get a website right?
If you’re using raw access to storage LUNs with VMWare, and you’re using Windows, you can use the LSI Logic SAS virtual SCSI adapter option and create virtual drives. This is better than using the Microsoft iSCSI initiator because you can edit the drive mappings with the machine powered off and you can clone the machine and easily redirect all of your storage before powering the machine on.
The correct driver to be using is the LSI SAS 1068 driver. You’ll need to make a floppy image using an image tool — if you’ve access to a linux box, just use DD to create the image and then mount it and write files to it. If you’re on Windows, the venerable WinImage and other utilities exist. Either way, you’ll need to rename the file with a .flp extension and mount it on boot in your Windows VM in order to load the driver to see the drives.
I’ve been getting some pretty darned good performance with that and iSCSI LUNs on my Solaris server. I haven’t (yet) put together a decent test and some metrics to back it, but the machines on raw device LUNs feel a *lot* snappier than the machines that are on a 400GB VMFS. A good basic tutorial with iSCSI and ZFS is here: Running ZFS Over iSCSI as a vmware VMFS store — but note that I’m using raw LUNs after not being happy with the VMFS performance with a half-dozen hosts doing heavy I/O.
GhettoVCB – VCB for free. Doesn’t get better than that.
Understanding VMWare Snapshots – Also, it’s probably a good idea to learn this stuff.
Software problems are the #1 thing that will keep an Airbus A380 on the ground. Yes, airplanes are complicated things … but at the same time, not much is required to keep most of them in the air.
The thing that speaks volumes to me about these problems are a few key quotes.
Clark says that the problem with the nuisance warnings has been their diverse nature, but “the common thread” is the software. He says Airbus executive vice-president programmes Tom Williams and his team “have sat in my office many times and said they can’t identify trends, which is the worst possible thing”.
Clark blames the software’s design. “There was a philosophy of utopia – I suspect that Airbus was blessed with some boffins who said ‘we’ve got to make this absolutely perfect – no flexibility’. The slightest surge causes one [sensor] to trip and then six more as they’re all linked,” he says.
Anyone willing to take guesses about the type of architecture and software developers at Airbus?
In the last issue of my current consulting saga, Detecting and Resolving LAMP Stack Performance Problems, we talked about a Drupal site that was being brought offline every few hours due to poor tuning of the LAMP stack. With the default settings, a site isn’t going to take much before it just falls flat on it’s face.
After triaging and addressing the main issues based on the logs, we were left with two more issues. The first was the inability of Drupal to perform well in an environment where it had to rebuild every page from source for every page view. This is well documented in the drupal community; there are many pages inn the documentation area of Drupal that deal with caching and performance optimization. The second issue was MySQL performance and the long table lock/scan times we were seeing on some queries that could not be further optimized.
We scheduled a 2 hour downtime with the customer to install some tools. Our checklist was installing memcached and PHP-APC. I also wanted to take the time to back up the MySQL database and run a good check_table on each of the MyISAM tables. (Yes, I know. MyISAM. More on that later.)
Side note: I would typically prefer xcache, which in my mind is superior to APC because I have an easier time working with it and prefer it’s management interface and tuning parameters. However, APC was available as a binary package for the platform we were on, and xcache was not. To make things faster and easier, we chose APC. Despite the endless debate about which is superior, both are usable and work. I have not run into problems using APC on an 8-core system, despite oft-reported-but-never-proven flock() issues.
APC was fast to install and required minimal tuning. It produced a noticeable performance improvement. However, the number of deadlocked apache threads (and total number of apache threads) went up, and the other Apache errors that dealt with clients timing out did not cease.
We installed the Drupal Memcache implementation along with the appropriate PECL module. We configured two pools, both using up to 1 GB of RAM (which we had to spare on the web server.) The ‘hot’ pool would mostly handle cached pages for non-logged-in users, and the other one would handle some higher volume caching for users that are logged in, as well as some internal/custom functionality to go along with specialized RSS feed parsing. (Side note: We found that the Cache and Cacherouter plugins did not work as expected. Rather than waste downtime troubleshooting them, we used what worked.)
Again, we saw a huge performance boost. We needed to do some tuning (changing certain cache settings and analyzing performance, but that was essentially everything that we could find to do from a single-server web server side of things.
While we’re on the topic of drupal: Don’t forget that Drupal has a ‘cron’ program that should be getting called remotely. It’s sort of a poor man’s cron solution, but it works. It was causing our load to spike every 20 minutes. We occasionally disabled it during testing to be sure we understood it’s effects.
The next beast to tackle was the database. As previously mentioned, it was on MyISAM tables. Obviously, this isn’t ideal. We found that node lookups, statistics lookups, and searches were taking up a disproportionate amount of server time because they were both The weirdest part was that we were seeing some full table scans in the slow query log (i.e. 3 million rows scanned) but a later ‘explain’ statement couldn’t replicate the performance recorded in the slow query log.
We batted around adding indexes. The issue was that Drupal’s search and nodes tables are frequently altered, which means the indexes become scrambled quickly. And really, what was taking time was the size of the table we were dealing with — the table wouldn’t fit in memory, so it was copying it to a disk temporary table and then doing a filesort.
Running check_table did the trick to re-sort the indexes and ‘defrag’ the files, but the benefits only lasted so long.
What we ended up doing was taking the database down, dumping everything out to a SQL file, and re-importing everything to InnoDB. Make sure that innodb_files_per_table is enabled, or you might end up with some unexpectedly big files — this depends on your architecture and filesystem. Remember that InnoDB files can not currently shrink. (Also: You can do the table changes online, but it’s really not recommended. It takes a long time, especially when some of your tables are larger than 1gb.) Don’t forget to switch to set innodb_buffer_pool_size appropriately.
The change to InnoDB, the implementation of both PHP engine-level opcode and actual built pages, and the careful tuning of Apache and MySQL parameters led to stability for this client.
There were some further problems, but they were with an unrelated product that causes a nightly load spike on the database machine. Tomorrow night I’ll covering the cleanup work: NFS iops vs. local disk, binary logging and the lack of backups in the original configuration, and building some redundancy into the system so that it can tolerate faults more smoothly.
As a sysadmin, we sometimes run into performance problems with multiple angles and portions. It’s sometimes not particularly obvious where the actual performance problem is, and resolving one problem that you can see might bring another couple of problems to the surface.
The below comes from a consulting gig that I’ve been working on recently. The parties will remain nameless. I’m going to break this into several parts, since it took over three weeks to resolve all of the immediate problems with the site, and we’re still not all the way done with the task list.
Going in, I knew that we were dealing with a heavily loaded Drupal site that shared a mysql database with a wiki and a forum. The site would go down at random times — sometimes multiple times per hour. Upon logging into the server the first time, it seemed slow — so I immediately called ‘uptime’ and the answer came back with all three time period load averages over 90 on an 8-core server. There were 125 Apache processes running, but most of them were in Deadlocked state. The very second command I ran on the server was killall -9 httpd, which is never the way you want to start out a consulting gig…
While that was busy killing off processes, I checked the Apache configuration. Sure enough, it was still at the stock settings. I immediately cranked up the requests per process to 20,000 and upped the server limit to 300. (Remember, we’re dealing with prefork here.) I restarted Apache and watched it churn. It handled the load far more gracefully with some room to move around, and I quickly saw the number of Apache processes spike, and then sink down to about 80 and stay there.
The next step was looking through the logs. A quick aside about logs: I like my logs to be clean. I don’t like debug messages, I don’t like status messages, and I don’t want to see either of them. If I have a lot of a certain type of status message that I *do* want to trap, I make sure that syslog puts it into it’s own file or I handle the problem that’s causing it. In this case, /var/log/messages had a bunch of SNMP messages logging each get, and some messages about martian packets. The martian packets issue could be (and was) resolved with a quick firewall tweak to reject packets from an illegal source. The snmp issue was resolved by editing snmpd’s startup configuration to log to local1 instead of the default (check your man file for snmpd to make sure you get the right flags, it’s changed…), and then editing syslog’s configuration to log everything on local1 to /var/log/snmpd — and don’t forget to add it to logrotate!
Now we were down to two classes of errors. The first was obvious and sort of easy to troubleshoot: “MySQL server has gone away.” Log into the MySQL server. See if there’s slow-running queries. Nope? Well, double check the timeout that’s set in /etc/my.cnf — on this server, slow-query-time was set to twenty seconds, but timeout was set to ten seconds. Well, that’s not very useful. Also, check your caches and table types. In this case, everything was MyISAM. More on that later — for now, just make sure we’re using the right kind of caching strategy for your table type and system specs, which in this case is MyISAM key cache (and lots of it!). Try to fit all of your most-used tables in memory.
On this gig, we got the site back on it’s feet with these things. Downtime went from multiple events an hour down to one or two events per six hour period. Unfortunately, we were also out of easy things to change. Next time I post, we’ll start to get into fixes that will cause downtime.
At the Sun OpenWorld conferene keynote today, there were a few new products listed in the Flash storage arena — most notably the F5100 that everyone’s jibber-jabbering about.
As a smaller customer, I’m far more interested in the SunFlash F20 PCIe card — which I don’t see many people blogging about. Looks like I could add that to not only my existing systems, but non-Sun systems that can make use of that sort of storage. That, ladies and germs, is something worth the name “OpenWorld” — as in, a world of open wallets.
- vSphere, SAN or iSCSI-related:
- Using iSCSI with vSphere – Pretty much the bible, they covered it all.
- 2TB drives are here, but Stephen Foskett identifies the issues with bringing them to the enterprise. In the same vein, he covers the death of RAID as a storage technology, and what lies beyond.
- I need to research if our iSCSI TOE cards are supported by vSphere…
- Ubiquitous Talk might be my new favorite high-quality techie blog.
- Other:
- Streaming live webcams to your iPhone
- This has been linked all over, but the Ghost Fleet of the Recession is anchored just off of Singapore, and it doesn’t look like it’s going anywhere soon. Sis wondered why she hadn’t seen the Florida in port recently; she ships a lot of containers with Maersk.
Despite the recent pot-banging around the Sun/Oracle merger and the allegations that Sun’s getting it’s customer base stolen out from under it, I just pushed the button on a fairly large cluster with Sun as the hardware vendor.
Simply put, I couldn’t find machines with better stats for the money. Even with the academic matching grant program tabled for now, we STILL got amazing promotional pricing on the x4150. I can’t even find anything that can compare to an x4250 for on-board storage — 16 on-board drives. Dell’s MD1000 chassis supports only .. 15 drives. There’s no better hardware to run Solaris on. The Sun ILOM support is leagues better than Dell’s DRAC or even HP’s ILO. All the machines come with at least four on-board ethernet ports. The storage array options are also superior. No one else sells a 24 slot SATA chassis with hot-swap drives backed with three controllers.
Simply put, the Sun option was the fastest, most scalable option. The hardware is put together well, with the same sort of build quality you’ve come to expect from HP… far superior to Dell or IBM. And the management and tuning options are awesome. I’m really, really excited to see the hardware racked in a few weeks. They also maintain and stock a parts “locker”/cache on our campus so that a technician has access to all the parts they might need for our systems without having to courier them or drive for them.
Am I concerned about Sun going away? Not now that they’ve been bought by Oracle. They’ve got so many compelling offerings, and I hope that IW and other tech rags stop trashing Sun — I’m a fanboy from here on out.
It’s Monday morning. Your boss strolls into your office. You just finished with the trouble tickets from the weekend, and this is his favorite time to ruin your entire week. He says, “I have a project for you. I need a cluster with a primary and backup SAN that is going to store about 8TB of infrequently accessed images and it will also need to host virtual machines and an Oracle database. You’ll have to fit a budget for two sites in there, but the second site is a cold, hourly-synch backup. And it has to scale. And we’d prefer if you used a vendor solution and didn’t homebrew things.”
Talk about a list of contradictory feature requests! You’ve got a limited budget, it’s hard to squeak 8 usable TB out of your average entry level 12-disk arrays (i.e. HP MSA60 or Sun J4200 disk array, with Dell’s MD1000 15 disks and AC&NC’s 516-series with 16 disks being notable exceptions) when you factor in a double parity stripe and a couple of hot spares.
In most cases, you’ll do just fine. What happens when the load on the infrequently accessed (slow) portion of the array is ‘peaky’ though? During one of those peaks, you’ll max out a gig per second line in – depending on what you’ve got driving the array, that might be your entire bandwidth budget. What’s Oracle, which is also running in one of the VMs, going to do then? It doesn’t like having slow access to it’s log files, which means it’ll be consuming RAM and swapping heavily on it’s VM, which means the VM image will also be trying to write to disk. Triple-whammy until something gives — either load decreases or something fails.
The obvious choice is ZFS and Solaris. And the obvious choice for hardware is also Sun; you get four NICs by default on Sun hardware, with management ports and ILO ports out of band on their own interfaces. (Side note: When you have 7 Cat5e cables, a KVM dongle, and 2 power cords running to a 1u chassis, yes, you really do want the cable management arm.) ZFS support with Sun is excellent. Their storage products are also excellent.
By the time you get done buying storage, you’re through most of your budget — those 1TB disks aren’t cheap, and neither are the arrays themselves. Your maximum speed across the SAS backplane for the J4200 or J4400 series is going to be 3 or 6 GB/S, and your input is only going to be 1 gb/s actual even with bonded ports, but you’d probably rather not skip all over the place on the array as you try to write 3,000 10GB (compressed) images and then try to write to the Oracle logs. The question still remains: how do you squeeze in a budget for some faster storage for the VM images and database storage while still paying for the bulk storage you need and some room to grow?
Answer: What are you using to drive that array? Buy a bigger chassis, and put it inboard. The 2.5″ 10k SAS drives aren’t hideously expensive, and the additional grand for a larger chassis beats the hell out of buying an entire extra J4200. Note that you can’t mix the 10k SAS disks in an array with the 7200 SATA disks… on any vendor that I know of, at least. But inboard on the system’s backplane, you can run SAS and then run SATA on the outside.
Bonus points: This may not last, and it might just be the academic pricing that we get at work, but right now I can buy a half-full J4400 (24 disks) for less than I can buy a fully-loaded J4200. Guess which we’re getting? It’ll be half full of blanks, but those are free. As our 8TB grows over the next year, we’re going to just slot additional disks in. ZFS’s ability to add disks to pools relatively painlessly has made this a realistic goal. ZFS also has a built-in management server (which we’ll restrict to our private network and people will have to VPN in, but that’s trivial…) which makes management’s acceptance of the technology dead simple.
Also, don’t forget that if you can acquire some SSDs, you’ll be able to drive your storage even faster by offloading the ZFS log writes to the much-faster SSD. They have a limited lifespan, though, so consider if it’s really worth it to you and make sure that you plan for their obsolescence and replacement considering that a log buffer is a r/w-intensive application.
Our total server budget for this project (a compute-/storage-intensive academic project where data loss is not acceptable) was only $70k total. We managed to squeeze an insanely fast cluster out of a paltry budget.
I usually wait a few weeks, until the first .1 patch is released, to update to a new version of OS X. That’s been my procedure for a while, and it seems that there are the usual growing pains with 10.6. I don’t know if I could keep myself from giving in to 10.7’s charms right away…
With Ana, Bill, and Claudette having popped up in the last 48 hours, it seems that the tropical weather season has finally started in the Atlantic. A bunch of people have been asking for the tropical weather resources that I use to look up information.
My primary source for information is HardcoreWeather’s tropical forum, which has amateur meteorologists punditing and posting links to different models on a constant basis. It’s of the same quality as Jeff Masters’ Blog, but updated more frequently. They also have a twitter account, which is good if you’re trying to follow things but don’t want to do the work to follow them yourself.
Besides that, here’s a bunch of links that usually have good information:
- SkeetoBite Weather
- CIMSS Tropical Cyclones
- Experimental forecast Tropical Cyclone Genesis Potential Fields – Individual model runs in animation showing where lows are and will be.
- Tropical RAMSDIS Online – Satellite info
I may add more links later. I’m missing a few resources that I usually check…
Yesterday, a small disaster struck the community I live in. 72,000 people were asked to evacuate their homes and shelter south of the city because officials felt they were at risk of inhaling toxic fumes from an accidental chemical warehouse fire.
Briefly, chatter about the evacuation caused the word “Bryan” to trend into the top ten topics on Twitter.
As an observer, safely on the edges of the affected area, there were several interesting points that arose out of the incident. In particular, the role of traditional media took a backseat for many during the event compared to internet “crowdsourced” channels like Twitter, Facebook, and a local internet forum.
Commercial media sources were slow to jump on the story. The warehouse was destroyed at approximately 12:05pm, possibly in an explosion that was heard throughout the area. (I heard an unusual loud noise at that time, 20 miles away, but no one has officially stated that there was an explosion and there were thunderstorms in the area.) Within 30 minutes, there was a post on TexAgs.com’s Aggieland forum, which locals use to discuss current events and gossip. Before the Bryan, TX government had even managed a press release, the community forum had correctly identified the location of the fire from scanner traffic and had looked up what the facility supplied. There was a chemistry grad student talking about the toxicity of the chemicals involved and other materials and chemicals that may also be stored in the facility.
Throughout the first six hours of the incident, a community of casual onlookers with no particular expertise reported the news faster, more accurately, and in greater detail than traditional media sources.
When news and television sources did get on the story (at about 1:40 to 2pm, 2 hours after the event), they initially reported some wildly inaccurate information — locating the source of the blaze on the incorrect side of town (2 miles to the east of the freeway as opposed to 2 miles west) and that the evacuation was within a 3 mile radius instead of a 1/2 to 3/4 mile radius. The online community noted and mocked those errors. In a more serious incident, these errors could have panicked or killed thousands.
With all of this information at the community’s disposal, few people were tuning in to major news stations. The internet was talking about the exact source and likely makeup of the smoke, and had identified the streets within the evacuation zone. The television news was still broadcasting ‘Ellen’ without so much as a scrolling banner. Which would you listen to? In short, “crowdsourced” or “new media” carried the day.
A feather in the cap of Twitter: Several authoritative groups were using twitter to disseminate information to great effect. Bryan Fire Department, City of College Station, The Eagle, and KBTX News all were tweeting regularly. The City of College Station also updated their Facebook page regularly with news and updates. These sources were essential to some of the ‘refugees’, especially people who had the foresight to route those source updates to their mobile phones as SMS messages. The local cell tower infrastructure did fine on my Sprint service, although data was quite slow.
—
After things had quieted down for the night, I got an email from KBTX’s station manager. Someone had cut and paste a few of my comments from the forum thread to him, and knowing my ‘true identity’, provided my email address. He asked me if I could describe how the station didn’t live up to my expectations, and how they could improve. He explained that they are tied to official information sources that aren’t at all interested in talking to reporters and have to wait for information to be released to them. How can I explain: It’s not KBTX, or any other one news source that fell short. It’s the entire industry that’s failing to deliver.
As the KBTX manager pointed out for me, news reporters have forgotten how to investigate and interpret what they’re reporting. Smile pretty, fluff your hair, and parrot what the press release says! Investigation and the input of subject matter experts is an opportunity to provide added value that can’t be gleaned from a stream of twitter posts or a web forum. This opportunity is frequently being missed.
Let’s back up and examine “free” for a moment, because part of this is a money thing. The attitude of many content providers is that they’re now providing their product for free, or at a lower price (and lower profit) than they used to. Apparently, none have realized their service has become a commodity and is no longer valuable. Wire service news, television, and newspapers were novel and valuable in the days when information traveled slowly and expertise or interpretative ability was hard to come by. Now that traditional media is slower than the internet and no longer provides unique points of view or in-depth reporting, where is the value?
On the other hand, the The New Yorker has gained readership by raising it’s rates (as has the Economist and many others), and David Simon has pointed out in the Columbia Journalism Review that “giving things away” isn’t a good business plan. Professional news sources “of record” do have one valuable nature over ‘crowdsourced’ news: legitimacy. Compare it to selling bottled water. Bottled water is supposed to be pure. Tap water is questionable and tastes bad; anything less is sewage. But extending our examples above, magazines like The Economist provide a level of in-depth reporting that goes beyond water. So does the fact checking staff behind The New Yorker. Champagne will always cost more than water. What your average local newspaper, wire service, or news television provides is just plain ol’ tap water… with a “local color” spot of some puppies before they fade to the advertisements. Who would pay for that?
(Humorously enough, The Associated Press is still trying to shut the barn door long after the horses have been turned into Salami de Cavallo. Wire services are just water delivered quickly. There’s no value to be had there anymore. It’s a commodity with universal distribution.)
Beyond a misstated value, traditional media has missed the boat in a number of ways. This has been tackled (to death, ad nauseum) by several notables over the past several months: Steve Johnson, “Old Growth Media And The Future Of News”, Clay Johnson, “Newspapers and Thinking the Unthinkable”, and in the form of a compiled series of tweets, Dan Johnson, “The Following Account of My Short Career at The New Yorker”. Long story short: The end is nigh for many traditional media organizations, and by and large, not many are adjusting gracefully.
The amount of information that’s available via the internet is immense. Steven Johnson, in the above linked article, puts it well when he walks readers through the amount of information he had available 20 years ago compared to today. His information lead time went from months to weeks to minutes within a decade. At the same time, the amount of information and interpretive commentary on any one topic increased exponentially, from a couple thousand words per month to tens of thousands per day.
It doesn’t matter how much video that modern television stations, radio stations, and newspapers stream. It doesn’t matter how much they use twitter, facebook, or how many blog posts or podcasts they publish. They’re just not getting it in their hearts. Like a cargo cult, most are aping the actions of Gawker and MacRumors in the hopes that the rewards will be delivered from on high. There is absolutely no comprehension of the depth of the change — there are excuses about how much content they’re “giving away” online, how many new technologies they’ve adopted or tried, and strategies they’re now trying to attract or retain “market share.” But they still have the same damned Netflix pop-unders on their website. Is the picture clear yet? No? Try adjusting the rabbit ears.
Oh, there are solutions. Gawker is a great example of a flat, accurate, profitable, fast-moving internet investigative media source that has broad viewership. Look at the much-maligned Drudge Report. Even CNN does an excellent job with their websites, despite the insufferable cheesiness of iReport. Can your local news do all those things? You bet. It’s even possible without tearing the entire organization apart! Unfortunately, it’s not likely to happen that way.
Newspapers, radio, and television stations, and traditional commercial media sources in general, are still operating at the bandwidth of a telephone and are ‘reporting’ by talking to people one at a time. Despite the amazing array of sources, stories, and information floating around that’s accessible without leaving one’s home, these highly trained journalists are failing to get basic facts right with millions of dollars in equipment and support staff. Trying the same old things over isn’t working. From the outside, it’s obvious that they’re just providing a dribble of plain old tap water and trying to sell it. The internet is a raging torrent where anyone with a decent filter can make as much of their own potable water as they want.
The next step gets pretty obvious when you put it that way, doesn’t it?
Addendum:
- The White House has issued press passes to at least one blogger, and the city of New York may soon do so as well.
- Some of the above mentioned commentary about the Bryan plant blaze was lost in the swirling mass of sewage that is the General Board. I won’t link there, you don’t want to go.
- You really need to read the piece by Steven Johnson. Here’s the link again. He says what I said, but in far more illustrative and elegant words..
- 1August2009: Slight correction: Apparently the fire began around 11am and first responders were on the scene within five minutes. Hazmat teams were notified by 11:30 that they were going to be called up; there was a TEEX exercise happening at Kyle Field.
- 1August2009: A firsthand account from a firefighter: “About the time I pulled up, the smoke changed from orange to green briefly. The wind picked up and blew some of the smoke at a tree, and you could just see birds plummeting to the ground. That’s when I knew we had a serious problem on our hands.”
We’re trying not to use ‘old stuff’ as we’re building out our new cluster, but we have a big need for nfs or some other ad-hoc shared filesystem designed for high i/o on content servers. We’d been using ocfs2, but it’s slower than molasses and doesn’t scale n-ward as you increase the number of systems attached to a filesystem (due to the need for a journal for each node), whereas we can mount as many as we can support if our nfs server’s hardware will tolerate it.
Anyway, so the preference against ‘old stuff’ means that we’d shy away from the nfs v2 and v3 that are well-documented and stable on linux, and towards the hairy, thorny wilds of nfs 4. There’s a multitude of websites about nfs4, but they all seem to be incomplete or to apply to Solaris’s implementation, which is thorough and well-documented.
And don’t mistake me, nfs4 does run. It runs with TCP, it runs quickly, and we haven’t run into any issues save one — mapping users between servers. With nfsv3, between two servers it just works. With nfs4, you have to have a shared user authentication system and idmapd has to be running and configured correctly.
Idmapd is essentially undocumented on linux. Or, if there is documentation, I have not been able to find it. There is a man page giving basic options for the daemon. It *seems* that a configuration file syntax guide is living in /usr/share/doc/packages/nfsidmap/README, but I can’t verify that. The configuration file man page states that only Nobody-User and Nobody-Group are permitted in the [Mapping] area.
For what it’s worth, the following configuration is working for me on SLES11.
- Get some sort of shared authentication working between the servers. Since we’re a Novell shop, we’re doing it on eDirectory with the ‘linux user’ option enabled on the accounts, which assigns a uid and gid to the user.
- Set the idmapd.conf to have the same domain on each server. (ours, predictably, is ‘tamu.edu’).
- Add the mount point to /etc/exports on the server. Don’t forget that if you’re using nfs4 you need to bind the mount point on the server inside of the pseudofilesystem, and then set the pseudofs in the /etc/exports file as fsid=0. Start the server, and make sure that idmapd runs.
- On the client, set the domain in the idmapd to your domain, add the mount point to the fstab and then start the nfs service. Double check to make sure that idmapd is running.
- I have root squashing enabled, but don’t set the nobody-user or nobody-group on the server. I don’t know what effect it will have if I did… haven’t tried. Need to move on.
- SLES10sp2 doesn’t start idmapd when you start the nfs service; you need to set it to start manually.
- You could probably manually create users and manually set their uid/gid specifically. Again, I did not try this since we already had a solution in place to manage it. We just run our web servers and other clients as specific users that are defined in our ldap tree but disallowed logins. As a side bonus, our Novell logging infrastructure logs attempted logins/accesses for those user IDs.
Notes:
Write a file as a user to the nfs mount on the client, and then check it on the server (and vice versa). It should show up as the same uid and gid — if you see an exceptionally long one, or get ‘nobody’, it’s not working for you. I’m sorry, but I don’t have time right now to hack around and try different ways to get it to work!








