April 16, 2020

Analyzing CSV files in Python and Tableau

In this article we will be creating graphical presentation of “List of countries by wealth per adult” data. There is a table which contains country name, mean and median wealth by country. Data will be displayed in the bar chart. First step is to convert HTML table to CSV file. It can be archived in […]

April 1, 2020

Talend jobs in PHP

Talend Open Studio is an open source solution for data integration. It has GUI interface which allows user to easily create procedure for data import. For instance, import from CSV file to MySQL database table. User can run job directly from Talend Open Studio. Another option is to build job as a standalone procedure which […]

March 9, 2020

ClickHouse vs MySQL

In this article, we are going to benchmark ClickHouse and MySQL databases. Sample database table contains over 10,000,000 records. It has composite primary key (as_on_date, customer_number, collector_number, business_unit_id and country). There is a trxn_amount field besides composite primary key. It contains transaction amount. Toral number of records: 11,091,713Average records per month: 231,077Number of months: 48 […]

January 6, 2018

MySQL – Get top N rows for each group

Following table contains results from athletic 10K race: It contains following fields: id – autoincrement full_name – name of participant category – can be “Junior”, “Senior” and “35+” result – finish time in seconds Goal is to extract top 3 participants for each category using SQL query. Microsoft SQL Server has OVER(PARTITION BY fieldname) clause […]

December 6, 2017

R Studio – Marathon Stats

I got an idea during the “R Programming” course on Coursera. Apart from my interests in the programming, I am also avid marathon runner. So I got idea to analyze marathon results and generate various interesting graphs. I downloaded Ljubljana Volkswagen Half Marathon 2017 results in PDF format and converted it to following CSV file: […]