New interview: "Improvements in hardware and networking technology facilitate the software solutions to the Big Data challenges but they are merely a convenience rather than the driving force"

Wednesday, October 14, 2015

RETHINK big partners Gina Alioto and Adrián Cristal from BSC, Stefan Manegold from CWI and Dimitra Tsaoussis from EPFL, had a chance to interview Simon Riggs, founder and CTO of 2ndQuadrant, Álvaro Agea Herradón, CTO of Novelti, and Anastasia Ailamaki, CEO and Co-Founder of RAW Labs, respectively. These companies all work with Big Data but offer very different services. 

2ndQuadrant offers deployment and support of PostgreSQL databases and extensions to optimize power usage, emulate other systems, increase usability, etc., with a more mass target market in mind. Their clients have a “mid-size data problem - in other words, we target companies with Terabytes of data and not Petabytes”. This means that their biggest Big Data management issue becomes performing some kind of analytics inside the database. Another way to describe this is as “competing requirements; model discovery or ‘learning’ and applying that model - model deployment - or applying those learnings”.

2ndQuadrant currently has a roadmap for scalability that is HW-independent, however, if a certain hardware offered superior performance, they would like to take advantage of it. This could mean using non-volatile RAM memory instead of the current product, which works best on solid state drives. In fact, Mr. Riggs mentioned that “in the ENVELOPE Project Proposal 2ndQuadrant proposes eliminating I/O and accessing memory directly with the possibility of eventually eliminating layers of abstraction”. When asked about the company’s hardware and network optimization strategy, Mr. Riggs said they just adapt as required. As for what he believes could directly increase 2ndQuadrant’s competitiveness in the next few years, he “would like to see application requests with precise metadata, e.g. I want this quickly, accuracy is not as important, I will ask this 100 times a day, I will run this every night at 12:00”, as well as elevating non-volatile RAM access, as a secondary storage type, to be a first class interface feature of a microprocessor, believing that “non-volatile memory direct connection to the processors or memory channel is a great thing”.

On the other hand, Novelti is a Big Data service / SAS provider focused on stream data processing and analytics, particularly sensory data in the context of the Internet of Things. Their product is available on the cloud (Amazon) and as Software as a Service. Mr. Agea said that “for the stream processing challenges addressed, Hadoop or Spark are not suitable as their architecture and functionality focus on batch processing of historic / stored data rather than online processing of streaming sensor data. Novelti's solution focuses on exploiting main memory and in-memory technologies for efficient window (-query) processing, as well as process scalability using scale-out supported by virtualization”. He also mentioned that their core unique selling point is “the ability to automatically extract signal characteristics and patterns without requiring any a-priori knowledge about the data. This makes it considerably different from alternative techniques that are largely based in machine learning”. The main hardware requirement for their product is a fast network.

Mr. Riggs and Mr. Agea both gave the same answer when asked if they thought hardware or network optmization could solve the majority of their Big Data problems; while it could help, it wouldn’t be enough; Mr. Agea pointing out that their “major challenges, streaming data integration, stream data analysis, and end-user friendly software to perform these, require novel algorithmic and software solutions in the first place. While improvements in hardware and networking technology facilitate the software solutions to these challenges, they are merely a convenience rather than the driving force”.

Finally, RAW Labs, an EPFL spin-off company, offers RAW, an open source software to facilitate ad-hoc database queries on top of heterogeneous raw data, that is “through the same SQL dashboard you can query CSV, JSON, XML, and other types of data and view results in many different ways — hierarchical, tabular, graphical, etc.”, in Ms. Ailamaki’s words. The advantage of this approach is that you do not need to load or transform the data, as RAW will query it in situ and as is, instead of having to use different products for different types of data. When asked about scalability of their solution, Ms. Ailamaki said “RAW leverages innovative technology by enabling access, exploration, manipulation, presentation, and otherwise management of any dataset in its original raw format and location (in-situ). […] As RAW db only uses data the user needs, it is fully scalable to data sizes and, more importantly, adapts to user queries”.

 

Further information:

Other interviews: