The third post in the series on Technology Trends discusses how data volumes will explode and organizations will not only need to store and classify data, but also extract meaning from it.
- As systems are automated two things happen – more objects take on an existence in a database and more electronic transactions involving the object take place – this is simply more data
- User Generated Content is creating more data, social interactions that once took place with spoken words are now taking place online and a post made online will be backed up, duplicated, re-quoted or repeated in a bidirectional communication
- A certain proportion of this data is worthless as being idle chat, astro-turfing, shilling and spam. Perhaps the larger proportion will be valuable to somebody and a smaller proportion will be valuable to everybody. However this value judgment will be difficult to make without the context of the consumer of the data.
- The data is disposable, volatile and constantly changing direction, value and intensity
- New technologies will emerge to map rather than index this data and will do so dynamically, they will discern value and allow the true knowledge to be sifted out and retained or summarized.
- Search will take on a whole new meaning, no longer based on keywords and predictions. It will rely on virtual local and peer to peer models where there is no central index, simply a federation of indexes.
- The semantic web, where all data carries metadata to define it will allow the web to self-index
- The opportunity is to help reduce the amount of data that needs to be processed, to retain and make accessible the data that is there, to sift the knowledge from the noise.
- Data put into context is knowledge. Systems that are able to create an ontology by which data can be classified will be sought after. The classic challenge with ontologies is sharing them - an ontology is just a way of carving up reality into chunks (“concepts”) that one deems to be distinct. Someone else may carve up the world in different ways. E.g. Inuits having however-many words for snow, etc. Maybe our challenge is to be flexible: to work with whatever ontology a customer gives us, to “merge” two similar ontologies, etc. Coming up with one uber-ontology (a la DMOZ) that keeps everyone happy may just be a pleasant fantasy.
Making Sense of Hadoop - Its fit with Data Warehousing Solutions a presentation powetpoint by Colin White of BI Research and Shawn Kung of Aster Data
Time Challenges – Challenging Times for Future Information Search a paper by Thomas Mestl, Olga Cerrato, Jon Ølnes, Per Myrseth, Inger-Mette Gustavsen of the DNV, Research & Innovation, Norway
The Social Data Revolution(s) by Andreas Weigend in The Harvard BUsiness Blog (May 2009)
The Exploding Digital Universe by William M Buleley at WSJ Blogs (May 2009)
Linked Open Data by Tim Berners-Lee at W3.org (2008)
Analysis of Large Data Volumes: Challenges and Solutions by Joseph Rozenfeld at Information Management Direct (Oct 2007)
Gam Dias is VP of Product Management and Research at Overtone Inc. and was previously Director of Product Management at Hyperion Solutions and Brio Software. Co-Authors of this article are: Simon Handley, Tim Mueller Eric Scott and Kryztof Urban