๐ ๐ผ Pandas DataFrames: The Swiss Army Knife of Data!#
๐ฆ What is a DataFrame?#
Think of a pandas DataFrame as a spreadsheet on steroids. Itโs a two-dimensional data structure where:
Rows ๐ represent individual data entries.
Columns ๐ hold different types of information about those entries.
Itโs like an Excel sheet, but faster, smarter, and Pythonic!
๐๏ธ Creating a DataFrame#
Imagine youโre running a coffee shop โ and tracking orders. You can create a DataFrame like this:
import pandas as pd
data = {
"Customer": ["Alice", "Bob", "Charlie"],
"Order": ["Latte", "Espresso", "Mocha"],
"Price ($)": [4.5, 3.0, 5.0],
}
df = pd.DataFrame(data)
print(df)
Customer Order Price ($)
0 Alice Latte 4.5
1 Bob Espresso 3.0
2 Charlie Mocha 5.0
Boom! ๐ Weโve got a DataFrame!
๐ Accessing Data#
Want to peek at the first few rows? Use .head()
:
Hey, this is a class method.
df.head()
Customer | Order | Price ($) | |
---|---|---|---|
0 | Alice | Latte | 4.5 |
1 | Bob | Espresso | 3.0 |
2 | Charlie | Mocha | 5.0 |
Need just one column? Use square brackets:
df["Order"]
0 Latte
1 Espresso
2 Mocha
Name: Order, dtype: object
Wanna grab a single order? Use .loc
or .iloc
:
df.loc[1] # Fetches row with index 1 (Bob's order)
df.iloc[2] # Fetches row at position 2 (Charlieโs order)
Customer Charlie
Order Mocha
Price ($) 5.0
Name: 2, dtype: object
๐ Remember: .loc[]
is for labels, .iloc[]
is for positions!
๐๏ธ Filtering Data#
Letโs say we only want orders above $4:
expensive_orders = df[df["Price ($)"] > 4]
print(expensive_orders)
Customer Order Price ($)
0 Alice Latte 4.5
2 Charlie Mocha 5.0
Pandas filters like a supercharged search engine! ๐ฅ
๐ ๏ธ Modifying Data#
Oops! We had a Happy Hour discountโletโs apply a 10% discount:
df["Price ($)"] = df["Price ($)"] * 0.9
Pandas lets you modify data like a boss ๐.
๐ Summarizing Data#
Need a quick summary? Try .describe()
:
df.describe()
Price ($) | |
---|---|
count | 3.00000 |
mean | 3.75000 |
std | 0.93675 |
min | 2.70000 |
25% | 3.37500 |
50% | 4.05000 |
75% | 4.27500 |
max | 4.50000 |
Want to know how many of each drink was ordered?
df["Order"].value_counts()
Order
Latte 1
Espresso 1
Mocha 1
Name: count, dtype: int64
Pandas gives instant insights ๐.
๐๏ธ Speed Boost: Vectorized Operations#
Instead of looping through rows (which is slow ๐), use pandasโ fast operations:
โ Slow way:
df["Price with Tax"] = [price * 1.07 for price in df["Price ($)"]]
โ Fast way (vectorized ๐):
df["Price with Tax"] = df["Price ($)"] * 1.07
Pandas handles operations at warp speed! ๐
๐ญ Final Act: Exporting Data#
Want to save your hard work? Pandas supports:
๐ CSV: df.to_csv("orders.csv", index=False)
๐ Excel: df.to_excel("orders.xlsx", index=False)
๐ก JSON: df.to_json("orders.json")
Boom! Your data is ready to travel! โ๏ธ
๐ Final Thoughts#
Pandas is like a data superhero ๐ฆธโโ๏ธโit can:
โ Read and write data ๐
โ Slice and dice information ๐ช
โ Analyze and visualize ๐
โ Handle massive datasets at lightning speed โก
So, whether youโre a data scientist, analyst, or just curiousโpandas is your best friend! ๐ผ๐ฅ
Want to go deeper? Explore:
๐ Official Docs: https://pandas.pydata.org/