Midsummer’s day

June 24th, 2009 by admin

Today-ish is Midsummer’s day. Many people in the Americas and Europe gather around, lighting big bonfires and getting drunk together, possibly burning effigies of Witches. In the UK, we used to follow this habit, until the church found it too much fun in the fifteenth century. The bonfire habit (from bonnefyre – a fire made only of clean bones) petered out during the nineteenth century too, though of course fun/bakcward places like Scotland and Cornwall still have a bit of fun.

England has, in the absence of any kind of celebration, chosen the traditional pastime of invoicing for this most festive solstice; June 24th is a Quarter Day.

Projection onto moving surfaces of any dimension and orientation

June 3rd, 2008 by admin

A few guys over at Carnegie Mellon have exploited projectors and light sensors to develop a binary sensing mechanism allowing sensors to detect their own location on a grid, based upon a regular coded grid transmitted from the projector. Once light sensors know their own location, multiple sensors can be used to describe the boundaries of a surface, allowing subsequently projected images to be scaled and cropped from the projector’s entire field down to the area containing a target surface.

The idea can be extended – instead of using the presence or absense of light to describe sensor positions, varying frequencies of light can be used. This can be done in a manner invisible to the human eye. The result is the ability to project images onto moving surfaces without constant visible recalibration.

It all gets even more clever after that – see the full details at Johnny Chung Lee’s page.

Usage of XML considered wasteful

May 12th, 2008 by admin

Ever felt that XML seemed like a few too many bytes and a little bloated for the task at hand? Jeff Atwood has a great post on how XML has been misappropriated and suggests some alternatives. YAML seems particularly attractive, though the repetition of labels in regular structures could perhaps be abstracted a bit further away.

IR4QA Paper submitted

May 7th, 2008 by admin

Phew! Finally, our work’s been submitted for the IR4QA 2008 Workshop. I’ll get up some excerpts from our paper once it’s completed blind review. Currently, there’s also piles more material to cover on spoken language processing (interesting stuff) and mainframe computing (a little dry). Once the current glut of workload has been dealt with, content will resume!

Data mining blog

April 24th, 2008 by admin

Matthew Hurst runs a great data mining blog, regularly posting good content about the use and practice of data mining, taking consideration of usability as well as access to data. If you’re considering doing anything along the lines of a (much as I hate the word) mash-up, you could do a lot worse than to check out the developments and commentary on his blog. Aside from data mining, other topics covered include information retrieval and NLP (natural language processing, not that neuro-linguistic claptrap!).

Data analysis like you’ve never seen

April 14th, 2008 by admin

Hans Rosling delivers a fantastic talk dispelling myths about third-world and developing country debt and lifestyles over the past 60 years. The guy is really easy to listen to, and the entertaining talk moves at a great pace, covering lots of good material.

The really interesting point here is how someone outside of governments has managed to found an organisation aimed at pulling together all kinds of disparate data, and succeeded in producing something not only useful but compelling. You can see the work on this project at Gapminder.org.

Hans’ talk on TED (and his profile) - minimise everything, turn off your music, and watch for a little while:

Hans Rosling delivering a talk on poverty

» Link: Hans Rosling: New insights on poverty and life around the world

Meta Keywords plugin for Wordpress

April 12th, 2008 by admin

After setting up Wordpress yesterday and failing to find a decent meta keywords plugin, I decided to write my own. I found the following problems:

Quick Meta Keywords -  produced no keywords tag with this version of Wordpress (2.5), just a meta keywords tag with an empty content parameter.

Jerome’s Keywords – same problem; wouldn’t work – outdated, perhaps?

Simple Tagging – only puts in keywords related to your tags, which might not match perfectly; doesn’t make any attempt to guess what words to use in the case of there being no tags, or only a few.

It seemed to me that a basic meta keywords tag should be trivial to include. According to word of mouth from Google, meta keyword tags (apart from being mainly unimportant) shouldn’t contain more than six or seven keyphrases (not keywords, keyphrases – more than one word can go there) separated by commas.

Sounds simple, right? This plugin reads your document’s content and title, strips out crap (formatting, useless words such as “the”, invisible characters) and then performs n-gram analysis to get an idea of term frequency. That’s a fancy way of saying that it finds the most oft-used words and phrases in your post. The top six (or seven, or even twenty – you can edit a constant at the top of the plugin to alter this) phrases are then plonked out into a nice meta keywords tag in your HTML header. Easy as pie!

Download Keywords by datadump

This plugin’s set up for English only; it’s kinda mean and won’t think twice before breaking up words including accented characters, for example. The included stopword file is based on that used in the CACM corpus by SMART, with some tweaks; this is an English-language stopword file. Drop me a comment if you want another language and I’ll gladly add it, especially Chinese!

There’s a conflict with Simple Tags at this version (0.2 – 0.1 was broken horribly).

In the future, I’d like to:

  • Add extra languages
  • Improve the stopword file
  • Include trigram support
  • Move configuration setup to a Wordpress admin panel
  • Add an option for default keywords to fall back to
  • Add an option for always-included keywords
  • Add an option for “boosted” keywords (easy to implement!)

Feedback appreciated – it’s probably broken somewhere ;)

Infochimps

April 11th, 2008 by admin

Info ChimpFree redistributable rich data sets at your ready, sir!

Infochimps.org

Infochimps houses one huge linked dataset, built from many disparate sources of data. You can browse individual tables of data by tag, date, their fields, or plain simple text search. The ability to find any kind of reference data containing a field of your specification should make it really easy to locate data relevant to any piece of work – your website, a survey or projection, or a private study. You can mine to your heart’s content here; there’s historical population data, organ transplant records, YouTube statistics, word lists – everything. Happy hunting. Props to flip.

Pipe viewer

April 11th, 2008 by admin

Monitor progress of previously opaque unix commands! If that didn’t get you filled with bile and steam, wait ’til you see this! “Additional support is available for multiple instances working in tandem, to given a visual indicator of relative throughput in a complex pipeline:”

Pipe viewer in action

Yep. You know what to do. Download pipe viewer.

pv – Pipe Viewer – is a terminal-based tool for monitoring the progress of data through a pipeline. It can be inserted into any normal pipeline between two processes to give a visual indication of how quickly data is passing through, how long it has taken, how near to completion it is, and an estimate of how long it will be until completion.

Of course, if you don’t feel like doing it that way, you can just script the function in your favourite shell..

Methods for getting through large projects

April 11th, 2008 by admin

Every large project becomes painful sooner or later, no matter how well managed.

The Martini Method

Shane Lindsay outlines the Martini Method for getting through things, and it certainly sounds less painful than most, though I’d say it pretty much relies on just a carrot. One particularly effective technique I’ve tried and tested many times is to just spend 10 minutes getting a feel for a daunting task; the 10 often turns into more, and even if it doesn’t, you’ll come away with a better idea of what needs to be done, without spending more time on an excruciatingly unpleasant project.

« Previous Entries