Big Time Series Analysis with JuliaDB

Dr. Josh Day of Julia Computing takes a look into the multi-indexed database of the future

The next generation of data analysis requires the next generation of tools. The most popular opensource packages for data analysis (Python’s pandas and various R packages) are designed to work with small files of basic data types, but ‘small’ and ‘basic’ do not describe the data landscape of the future. The amount of data in the world is growing exponentially, and as The Economist observes, it’s changing as it grows:

“The quality of data has changed, too. They are no longer mainly stocks of digital information – databases of names and other well-defined personal data, such as age, sex, and income. The new economy is more about analyzing rapid realtime flows of often unstructured data: the streams of photos and videos generated by users of social networks, the reams of information produced by commuters on their way to work, the flood of data from hundreds of sensors in a jet engine.”

The current generation tools therefore face a number of difficulties in analyzing the next generation of data. The first is that of scale, which can be achieved with distributed computing systems like Hadoop and Spark, but loses the ease of use that make Python and R tools attractive.

Scaling an analysis also adds costs in the form of gluing together tools that may not support the same data types or operations (e.g., Spark DataFrame to Pandas DataFrame to numpy array to scikit-learn model). Another issue for current databases is storing nonstandard data types. A database can sometimes work around unsupported types (e.g., units and currencies) by attaching metadata to a field, but the same approach is harder to apply to more complicated data like images and video. The next-generation database should therefore offer the features that are lacking in the current  generation:

  • Scalability (works equally well on Small and Big Data)
  • Ease of use (no need to glue together different formats)
  • Flexibility (stores data types that may not exist yet).

Introducing JuliaDB

JuliaDB aims to be the analytics database of the future. It is implemented entirely in Julia, a high-performance language for technical computing designed around modern technologies such as just-in-time compilation, type inference, and parallelism.

Logged-in members can download the article by clicking the link under all the “Related Posts” below. If there isn’t a link then you aren’t logged in! To log in or register visit here.

 

Related Posts

Automatic Differentiation for the Greeks The sensitivities of the value of an option to the model parameters, a.k.a. “the Greeks,” are crucial to understanding the risk of an option posit...
What is Implied by Implied Volatility? Word and concept Implied volatility is not just a word or a concept. As a word, what is implied by implied volatility – what “implied volatili...
Introduction to Variance Swaps The purpose of this article is to introduce the properties of variance swaps, and give insights into the hedging and valuation of these instrument...
Monte Carlo in Esperanto This article shows how a simple parser environment in Excel/VBA could be used to perform single and multi-dimensional Monte Carlo. The clsMathParser i...
Numerical Methods for the Markov Functional Model The Libor Market Model of Brace Gatarek and Musiela (BGM) (1997) is the market standard model for pricing and hedging exotic interest rate derivat...
A Generalised Procedure for Locating the Optimal C... The fundamental concepts that shape modern capital structuring theory were first put together by Modigliani and Miller (1958) in a series of proposit...
The Great Investors, Their Methods and How We Eval... Winning has two parts: getting an edge and then betting well. The former simply means that investments have an advantage so $1 invested returns on...
Not-so-complex Logarithms in the Heston Model In Heston's stochastic volatility framework , semi-analytical formulæ for plain vanilla option prices can be derived. Unfortunately, these formulæ...
Big Time Series Analysis with JuliaDB
10-13_julia_final_may18