How-to Setup Your Data Analytic Environment for Data Science — Part 1.3: Add Metadata Document to Your Dataset Using Apache Parquet

Prerequisites

import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
import platformprint('Python: ', platform.python_version())
print('pandas: ', pd.__version__)
print('pyarrow: ', pa.__version__)
Python: 3.7.11
pandas: 1.1.5
pyarrow: 3.0.0

Sample Data

from google.colab import drive
drive.mount('/content/gdrive')
social_index = pd.read_csv('/content/gdrive/My Drive/Colab Notebooks/Analytic Environment/data/Social_Vulnerability_Index_2018_-_United_States__tract.csv', index_col = 0)

Exploratory Data Analysis

Continue reading: https://towardsdatascience.com/add-metadata-to-your-dataset-using-apache-parquet-75360d2073bd?source=rss—-7f60cf5620c9—4

Source: towardsdatascience.com