Analyzing CSV files in Python and Tableau

mmedojevic~April 16, 2020 /Python/Tableau

In this article we will be creating graphical presentation of “List of countries by wealth per adult” data. There is a table which contains country name, mean and median wealth by country.

Data will be displayed in the bar chart. First step is to convert HTML table to CSV file. It can be archived in many ways (e.g. using online tool, copy/paste in Excel). This is a resulting file:

There are 171 countries in data file so chart might look cumbersome if all data is displayed at once. It will be nice if data can be filtered by continents.
We need additional file with country-continent mapping in order to meet this requirement.

However, we cannot just drag these two files into Tableau and generate chart. Inaccurate results might be produced without prior data validation and data cleansing.
First step is to make sure that each country from wealth-per-country.csv file can be mapped to continent from countries-continents.csv file.
Let us create Python script for this validation:

Output:

According to the output 14 countries aren’t matched. This is caused by country name difference across two files. For example, first CSV file contains “South Korea” while second contains “Korea, South”. Although vast majority of countries are matched data should be cleansed in order to generate 100% accurate chart. Let us create unmatched country name map and fill missing continents:

Full Python script for matching country and continents:

Python script does following:

Result is saved to new CSV file which has same structure as wealth-per-country.csv with additional continent column.
Data is now ready for Tableau. We are going to add generated CSV file as a data source. Chart configuration is self explanatory:

At this point, report is ready for publishing to Tableau Public service. Embed code is available after publish so that report can be embedded in external web application.
This is the resulting embedded report:

Please follow and like us: