Archive for the 'Data Superabundance' category

Feldian Dark Matter and Superabundant data…

NickN| November 29, 2007 5:16 pm

Back in July, Brad Feld wrote a post titled "The Dark Matter of the Blogosphere".  I’m not sure if he coined the term or not, but I like its meaning.

For those of you that are less of a physics nerd than me, dark matter is something astrophysicists have been struggling with for a while.  Simply put, the Universe doesn’t have enough stuff in it to work the way it does.  The most viable explanation is that there is a _lot_ of stuff we can’t see or detect easily a.k.a. Dark Matter.

In the case of the blogosphere, Brad was referring specifically to reader comments.  There’s a huge volume of user generated content out there in the form of blog comments, and for the most part it is unsearchable and effectively invisible.  Folks like Disqus and Intense Debate are working hard to resolve this.

But I think the concept of Dark Matter is very applicable to data in general. 

Think about all of the data in your life.  How much useful information do you have that is effectively hidden and invisible?  This is as true for an individual as it is for a corporation.  Some of this information is hidden by virtue of being hard to search or hard to access… and some is hidden because it isn’t explicit — it’s "implied" by the way things have been collected, organized, or used.

So lets take a quick look at each case…

Hard to search:

The original idea for disruptorMonkey stemmed from a personal problem…  Like many of you, I have the "big box of crap" that I’ve accumulated from many different jobs.  It includes CD-ROMs of data, printed stuff, handwritten notes and numerous other treasures.  About 18 months ago, I needed to put together some sales training materials for someone.  I dug in to the big box and it took me 4+ days to organize, recreate and assemble what I needed.  It was a nightmare.  Incensed at the stupidity of the process, I started looking for a better way, which quickly lead me to set up a wiki.  Wiki’s can be great, but they’re mostly hopeless with existing data unless you reformat it for the wiki…which is a huge pain.

The underlying issue was the fact that the data was hard to search, which made it difficult to organize and repurpose.

Hard to access:

Last week I was talking to a banker, who happened to have majored in IT systems.  I was explaining some of what we do, and he started telling me about some of his data woes.  The biggest one stemmed from the fact that some banking systems are built on fairly old databases.  You’ve probably seen the horrible green-screen terminal-window interfaces in use at your local bank.  These UI’s have zero flexibility and are the result of many years of development, much of it seemingly without input from the people using the product.

Even though the whole thing is just a database, he has no way whatsoever to run unique queries.  For example, he would love to be able to search for customers with a $5,000-$10,000 personal line of credit.  The data he needs is in the database, but he has no way to access it, so from a practical perspective it doesn’t exist in any meaningful way.

Implied Data:

The discussion I had with Brad before Thanksgiving was about how Exchange server contains a lot of interesting "implied" data, above and beyond the obvious email & social network info.  Your Outlook/Exchange account says an awful lot about you and the things your interested in… along with who you talk to and what you talk about.

That’s not data that is readily exposed in any useful way, although companies like Xobni are making some headway on that front.

All three of these scenarios are about "dark matter" data.  There’s a lot incredibly important information that’s there, waiting to be mined, but today’s tools mostly can’t see or use it.

One of our longer term goals at disruptorMonkey is to build a tool that not only captures all that dark matter, it’ll put it to work and make it useful.

There’s much to do, but we’re excited with the progress we’ve made so far…

 

Information R/evolution…

NickN| October 18, 2007 12:14 pm

I guess I’m late to the blog-party on this one, so you may already have seen it.  But this video is a fantastic backgrounder on much of how we see the world of data and information at disruptorMonkey.

The video was created by Michael Wesch, an Assistant Professor of Cultural Anthropology at Kansas State University.  You can find his spot on the web here.

He also created the excellent "The Machine is Us/ing Us" which you can see here, along with a bunch of other thought provoking videos about the impact of information on our lives.

One of Prof. Wesch’s other videos has a great quote by  Marshall McLuhan from 1967:

"Today’s child is bewildered when he enters the 19th century environment that still characterizes the educational establishment where information is scarce but ordered and structured by fragmented, classified patterns, subjects and schedules."

1967!!!  And look at us now.

Thanks to Zack for the heads up — I’m behind on my blog reading and had not seen this yet.

Where’d my Secretary go?

NickN| July 16, 2007 10:25 am

Like any entrepreneur, I am often far too close to what I’m working on.  The most common result is the glazed/confused expression on people’s faces when I explain what I’m up to.  Fortunately (IMO) , my co-founders seem to share my taste in cool-aid, so within the company at least we all more or less make sense to each other.  But third parties take a bit more work…

Our rather excellent Advisory Board has consistently (and appropriately) given me a hard time about this as I write and re-write our business plan and pitch.  Steve and Sam have both been especially merciless — thank you gentlemen!

I feel as though we’ve finally reached a point where we can explain ourselves to normal folks without inducing too much glazing, and the current incarnation of the business plan finally reflects that.

Having written a fair volume of copy, plans and other documents in my time, I’m well aware that heavily revised documents tend to suffer from typo creep… Things slip in and don’t get caught by the author.  On Friday, it was time to call in the ultimate proof reader, my Mum…  She’s reasonably computer literate, uses email, eBay and a variety of Microsoft Office products, but she’s also been a lifelong admin person and is excellent with spelling and grammar.

Apart from some funny Americanisms that are now firmly part of my vocabulary, the document passed with mostly flying colors.  But what was interesting was the conversation that ensued afterwards.

First of all, the plan made sense to her.  Now of course I’ve been on about this for a year or so now, so she had some background, but it was good to hear that the plan seemed to be in readable English.

But then she shared some recent experience that made a lot of sense within the context of what we do.  The gist of it was this (Mum: please excuse the paraphrasing).  A non-profit she had worked for had gained a computer for every member of staff.  While everyone was glad to have a computer, they had an organizational nightmare on their hands because everything was stored locally and everyone had their own way of organizing their data, ranging from a fairly sophisticated hierarchy of folders to "saving everything in C".  This lead to some real problems when they needed to collaborate, or when they needed to access data without the author being present.

Now in the not too distant past, administrators tended to be the creators of documents within companies, and they certainly had control of most if not all of the information within a company.  Managers would  dictate letters etc to their Secretaries and Administrators, who would then create the documents, go through an approval cycle, send the document to its destination and file a copy.

So the group that created the documents for the whole company also filed and organized them.  But they did so as a unified team with a common system and set of goals.

Since administrators had experience with this sort of thing, they could quickly create efficient filing systems for all the documents.  Once a document was created and stored, they’d act as gatekeepers.  Anyone who needed the document would just ask an administrator.  And it was usually in one of just a few filing cabinets.

Since the authors controlled the organizational system it was a nice closed loop.  The volume of data wasn’t that large and nobody really minded having to work with an admin, because it sure beat learning to type and figuring out how to file things…

Fast forward to today.  Almost no-one has a Secretary (in the traditional sense) any more and everyone is an author.  The volume of data has exploded, but company-wide, there are no gatekeepers to turn to.  There is no common system based on common goals (unless you’ve spent a fortune on a Content Management System).  And a handful of filing cabinets just isn’t going to get the job done anymore.

So one of the side-effects of the computer is that we’ve all become our own admins. 

If you’re like me, you have a way you like to organize your files, and that method works for you.  But I can state with some certainty that my system would not work for 90% of the folks out there.  So if I want to collaborate with someone else with today’s tools, I have four choices:

  1. Force everyone else to use my system
  2. Suck it up and use someone else’s system
  3. Agree on a compromise system that all parties hate to some degree
  4. Give up on organization and do it all by email — let the inbox sort it all out

Now (1) is great for the ego — flex those CEO muscles!  But it leads to a less than ideal solution for everyone involved except me.  (2) is bad for me, and bad for anyone else that wasn’t the creator of the system.  (3) sucks for everyone equally — this is a win because no-one is happy and we have a common drop in productivity :-).  Then there’s (4), and plenty of companies work that way.

We think there’s a much better way to do this: let everyone organize their data however they want, but if two or more folks have to collaborate don’t force them all to use the same system.

And it’s nice to know that even your Mum can find a use for the product you’re building…

Data Superabundance part 2: The Long Tail of Data

NickN| May 5, 2007 10:02 pm

Time for a longer discussion of our thinking about data superabundance, data management and what we’re up to…

Findability is driven by the frequency with which data gets used.  The more you use something, the easier it becomes to find.  Even sophisticated search engines like Google follow this model.  The legendary Page Rank algorithm primarily looks at who links to you.  The more popular a site is based on links, the more findable Google makes it.  So over time, findable data becomes more findable (or if you prefer "the rich get richer" — thanks Todd!).

And don’t get me wrong, Google works great.

But here’s the thing.  Thanks to the crazy year over year increase in data (zetabytes by 2010) more and more data is being used less and less.  "Huh?" I hear you cry… 

Look at it this way.  The amount of data that any individual can use frequently is pretty fixed — there are only so many hours in the day.  Time for a flashback to High School with a scary Venn diagram:

Past_2
So the green dot represents the amount of data you use frequently.  The red circle is all the data you ever use.  Now let the calendar roll forward a bit.  The overall amount of data has increased significantly, but the amount of data you can use frequently is about the same.  And that picture looks like this:

Future
So as a percentage of all the data, the stuff you use frequently is now a much tinier piece.  In other words, more data is now used less.  As data superabundance continues its merry march, frequently used data will continue to be an ever smaller piece of all the data that exists.  And that’s going to cause all kinds of problems…

And now for part 2 of PTOTD (pet theory of the day)…  The long tail.

If you take all the data inventory contained within a company, assess the frequency with which each piece of data is used, rank them in order and plot a graph, I think you’d see some kind of Power Curve.  This is also known as a "Long Tail" graph.  If you haven’t read the excellent book by Chris Anderson, you really should.

So what?  Well here’s a graph to stare at:

Longtail

Chris’s book focuses on the long tail for goods.  Most retailers focus on the tip of the long tail — they stock only the most popular items and they sell a lot of each one.  He makes a great argument for focusing on a huge number of less popular items and selling just a few of each one.

The big guns in data management (ECM, CMS, data warehousing etc) are like retailers.  They focus on the tip of the graph — the top 5% or so of a company’s data.  But 80-85% of a corporation’s data is in the "Long Tail" part of the chart.  That data is hard to search and mostly unmanaged.

Now maybe your inner skeptic is thinking that less frequently used data probably doesn’t have much value.  But there’s not much correlation between usage and value.  Less frequently used data just tends to have variable value driven by circumstance — it’s data that may not be needed today but will be vital tomorrow.

disruptorMonkey is building tools to manage the "Long Tail" of data. 

Based on the response from those that get what we’re doing, this is going to be an interesting ride!

Considering Consumer, SMB & Enterprise markets??? Start thinking about the Miniprise…

NickN| April 4, 2007 2:09 pm

I had a meeting this morning with the wonderful folks at Square 1 Bank.  We’re truly fortunate to have a group like this right in our backyard, but that’s another topic entirely.  On the drive home, I had a number of thoughts that were a direct result of the conversation with Peter and Adam (thanks guys!!).

So as every start-up guy knows, there are three markets for software: Consumer, SMB and Enterprise.  The consumer market is pretty obvious and clearly defined.  Enterprise is too.  It has always seemed to me that the SMB market is most clearly defined as everything in between i.e. neither consumer nor enterprise.

Each market has its pros and cons.  Inevitably as we’re building our company and pitching potential investors, we are frequently asked if we’ll be chasing the enterprise software market.  So far, I’ve said no.  But I think my reasons for saying no have more to do with the negative implications of the definition of “enterprise” more than the target audience and type of application.

Here are some of the things I associate with the concept of “enterprise software”:

  • Large companies
  • Server-based or SaaS solutions
  • Extensive customization
  • Big scale software that handles lots and lots of data

And more negatively:

  • Huge company with a large and defensive IT department
  • Long sales cycle that requires an active salesforce
  • Huge prior investment in entrenched players (Oracle, SAP etc)

What occurred to me today is that many small companies now deal with huge volumes of information and data.

Well Duh!!

But bear with me… Think of this volume of data in terms of the concept of enterprise software.  The volume of data that a small company deals with today is broadly comparable with the volume of data that an “enterprise” was dealing with ten years ago. 

In other words, Data Superabundance has spawned the Miniprise.

A Miniprise is a non-enterprise company that has enterprise-like needs.

This collection of small companies typically has little investment in any of the entrenched players.  Their IT departments are usually running a patchwork of odds and ends that were built on the fly – these systems get things done but there is plenty of room for improvement.  What’s more, small company IT departments are far more open to new solutions than a traditional enterprise IT department.  Last but not least, these are agile businesses that make decisions quickly, so the sales cycle ought to be shorter.

Of course, software for the Miniprise won’t command the same kind of price as traditional enterprise software.  But thanks to SaaS, enterprise software doesn’t usually command that kind of price these days either.

Providing tools with some enterprise-like traits to non-enterprise (or mini-enterprise) customers is a big opportunity.  And that’s the market I’m calling the Miniprise.