For adhoc data analysis via Jupyter Notebook you often need external data via CSV, Excel or database to process them with Python. Reading the data is relatively easy thanks to the handy Pandas library. I have summarized the standard procedures for you in this article. Of course, you can also use the same errors for later standardized and integrated data integration processes, such as an ETL for your Data Warehouse.
Main Use Cases
Here is my selection of commands and sources that I use most often. If you want to try it out as well, you can simply use Jupyter Notebook Online — Click here .
Read from a CSV File
A very typical first use case is of course the famous CSV file, which was either provided to you as a dump of a source system, and can be retrieved online or was automatically stored on a file folder.
import pandas as pdpd.read_csv(‘yourfilename.csv’, header=None, nrows=5)
After importing Pandas and importing the file via read_csv you will already see success:
Read Data from Google Sheet
Same like above but with a Google Sheets Link — be aware that you published the Sheet to the web before. (Go to file →publish to web )
import pandas as pdnew_google_sheet_url = ‘https://docs.google.com/spreadsheets/d/e/2PACX-1vQ8lv36OQPiqgfFjdgJF9clvwhhh2Ao2xDzkIzYp2g1CqBbpLsnW5nefPsPVOKKBX2PdFVX6Mweyyut/pub?gid=1696994381&single=true&output=csv'df=pd.read_csv(new_google_sheet_url)
Read Data from a Database
Beside CSV and other similar files you will often connect data to a database — here we need beside Pandas also the sqlite3 module.
import pandas as pd
After the successful import of the data, we now can easily query tables via SQL String:
# Read via SQLite databases
con = sqlite3.connect(“your.database.link”)
#Read table via Select Statement
player = pd.read_sql_query(“SELECT * from Table”, con)
#close the connection
Other possible Data Sources
In addition to the standard data sources mentioned above, there are of course many other possible data sources, such as data warehouse technologies like Google Big Query and Amazon’s Redshift or even NoSQL databases. Here, you will often find downloadable Pythons libraries .
Whether for an adhoc data analysis via e.g. Jupyter Notebook or later for standardized and automated data integration processes, external data sources like files or…
Continue reading: https://towardsdatascience.com/data-integration-with-pandas-b7deaa4f0f31?source=rss—-7f60cf5620c9—4