Libraries like Pandas
Pandas Data Manipulation
This note covers data manipulation using the Pandas library in Python.
Key functionalities:
-
Data Selection:
.loc
(label-based indexing)
import pandas as pd data = {'col1': 1, 2, 3, 'col2': 4, 5, 6} df = pd.DataFrame(data) print(df.loc[0, 'col1']]) # Accesses value at row 0, column 'col1' print(df.loc[:, 'col2']]) # Accesses all rows of column 'col2'
.iloc
(integer-based indexing)
print(df.iloc1, 0]]) # Accesses value at row 1, column 0
- Boolean indexing
print(df[df['col1']] > 1) # Selects rows where 'col1' > 1
-
Data Cleaning:
- Handling missing values:
.dropna()
,.fillna()
df_with_nan = pd.DataFrame({'A': 1, 2, None]], 'B': 4, None, 6}) print(df_with_nan.dropna()) # Drops rows with NaN values print(df_with_nan.fillna(0)) # Fills NaN values with 0
- Removing duplicates:
.drop_duplicates()
df_with_duplicates = pd.DataFrame({'A': 1, 1, 2, 2, 'B': 4, 4, 5, 5}) print(df_with_duplicates.drop_duplicates())
- Handling missing values:
-
Data Transformation:
- Applying functions:
.apply()
,.map()
df['col1_squared']] = df['col1']].apply(lambda x: x**2) #Applies a lambda function
- Grouping and aggregation:
.groupby()
,.agg()
data = {'group': ['A', 'A', 'B', 'B']], 'value': 1, 2, 3, 4} df = pd.DataFrame(data) print(df.groupby('group')['value']].agg(['sum', 'mean']])) #Groups by 'group' and calculates sum and mean of 'value'
- Pivoting and melting:
.pivot()
,.melt()
#Example requires more elaborate data, see [Data Reshaping with Pandas](./../data-reshaping-with-pandas/)
- Applying functions:
-
Data Joining/Merging:
pd.merge()
(different types of joins)
# Example requires creating two dataframes first, see [Data Merging Techniques](./../data-merging-techniques/)
-
Data Visualization (brief): Pandas integrates well with Matplotlib for basic visualizations.
df.plot.bar(x='col1', y='col2') #Requires matplotlib to be installed
This is a high-level overview. Each bullet point above could be expanded into a more detailed note. Refer to the linked notes for more in-depth explanations.