A conversation with Orri Erling about Big Data

Thursday, July 9, 2015

Our partner Prof. Dr. Stefan Manegold, Senior Researcher and Database Architectures Group Leader at CWI, hold an interview with Orri Erling, Virtuoso Program Manager and Lead Developer at OpenLink Software, to talk about how the landscape of Big Data is seen from his perspective of Databases, Algorithms and Distributed Computing expertise, and to discuss about where European research should go.

Erling answered questions about user challenges; the technology he would like to have, network or power efficiency; if there are pressures to adapt business models based on Big Data; and what should Europe do that would have tangible impact in the next 5 years.

The conversation, written  down by Orri Erling in his weblog, shed some light on the Big Data European landscape. For example, Erling said: 

“The transition from capex to opex may be approaching maturity, as there have been workable cloud configurations for the past couple of years. The EC2 from way back, with at best a 4 core 16G VM and a horrible network for $2/hr, is long gone. It remains the case that 4 months of 24x7 rent in the cloud equals the purchase price of physical hardware. So, for this to be economical long-term at scale, the average utilization should be about 10% of the peak, and peaks should not be on for more than 10% of the time.

So, database software should be rented by the hour. A 100-150% markup for the $2.80 a large EC2 instance costs would be reasonable. Consider that 70% of the cost in TPC benchmarks is database software.

There will be different pricing models combining different up-front and per-usage costs, just as there are for clouds now. If the platform business goes that way and the market accepts this, then systems software will follow. Price/performance quotes should probably be expressed as speed/price/hour instead of speed/price.

The above is rather uncontroversial but there is no harm restating these facts. Reinforce often.”

Regarding to the question of whether network or power efficiency is the technology Erling would like to have, he stated the following: 

“There are no easy solutions. We have built scale-out conscious, vectorized extensions to SQL procedures where one can express complex parallel, distributed flows, but people do not use or understand these. These are very useful, even indispensable, but only on the inside, not as a programmer-facing construct. MapReduce and BSP are the limit of what a development culture will absorb. MapReduce and BSP do not hide the fact of distributed processing. What about things that do? Parallel, partitioned extensions to Fortran arrays? Functional languages? I think that all the obvious aids to parallel/distributed programming have been conceived of. No silver bullet; just hard work. And above all the discernment of what paradigm fits what problem. Since these are always changing, there is no finite set of rules, and no substitute for understanding and insight, and the latter are vanishingly scarce. "Paradigmatism," i.e., the belief that one particular programming model is a panacea outside of its original niche, is a common source of complexity and inefficiency. This is a common form of enthusiastic naïveté”.

This interview is one of the several that the partners of RETHINK big are doing to the main stakeholders in order to identify the industry coordination points that will maximize European competitiveness in the processing and analysis of Big Data over the next 10 years.

Please, feel free to read the complete interview following this link: http://www.openlinksw.com/dataspace/doc/oerling/weblog/Orri%20Erling%27s%20Blog/1854