Here’s a list of common methods in pandas that you can use for various data manipulation tasks. I've categorized them based on functionality:
Basic Methods:
-
pd.read_csv()
— Load CSV files into a DataFrame. -
df.head()
— View the first 5 rows of a DataFrame. -
df.tail()
— View the last 5 rows of a DataFrame. -
df.info()
— Summary of DataFrame with types, non-null counts, etc. -
df.describe()
— Get a statistical summary of the DataFrame. -
df.shape
— Get the number of rows and columns. -
df.columns
— Get the column labels of a DataFrame. -
df.dtypes
— Check the data types of columns. -
df.set_index()
— Set a column as the index. -
df.reset_index()
— Reset the index to default.
Accessing and Selecting Data:
-
df.loc[]
— Access rows and columns by labels (slicing). -
df.iloc[]
— Access rows and columns by index (integer-based). -
df.at[]
— Access a single value for a row/column pair. -
df.iat[]
— Access a single value for a specific row/column. -
df['column_name']
— Access a single column. -
df[['col1', 'col2']]
— Access multiple columns. -
df.iloc[condition]
— Access rows based on conditions (e.g.,df.iloc[df['col'] > 10]
).
Manipulating Data:
-
df.drop()
— Drop a row/column from the DataFrame. -
df.dropna()
— Drop missing values. -
df.fillna()
— Fill missing values with a specific value or method. -
df.rename()
— Rename columns or index labels. -
df.assign()
— Assign new columns to the DataFrame. -
df.insert()
— Insert a new column at a specific position. -
df.append()
— Append rows to a DataFrame. -
df.concat()
— Concatenate multiple DataFrames along rows or columns. -
df.merge()
— Merge two DataFrames based on a common column. -
df.join()
— Join two DataFrames based on the index. -
df.replace()
— Replace values in a DataFrame. -
df.apply()
— Apply a function along an axis (rows/columns).
Filtering and Querying:
-
df.query()
— Query the DataFrame using a string expression. -
df[condition]
— Filter rows based on conditions (e.g.,df[df['age'] > 30]
). -
df.str.contains()
— Filter strings containing a pattern. -
df.str.match()
— Filter strings matching a regex pattern. -
df.isna()
— Detect missing values. -
df.notna()
— Detect non-missing values. -
df.isnull()
— Check for null values. -
df.notnull()
— Check for non-null values.
Aggregation and Transformation:
-
df.groupby()
— Group data based on columns and apply aggregation. -
df.aggregate()
— Apply aggregate functions to data. -
df.mean()
— Calculate the mean of a column. -
df.sum()
— Calculate the sum of a column. -
df.min()
— Find the minimum value in a column. -
df.max()
— Find the maximum value in a column. -
df.count()
— Count non-null values in a column. -
df.cumsum()
— Compute the cumulative sum. -
df.cumprod()
— Compute the cumulative product. -
df.transform()
— Apply a function elementwise to each group.
Date/Time Operations:
-
pd.to_datetime()
— Convert to datetime type. -
df['date'].dt.year
— Extract year from a datetime column. -
df['date'].dt.month
— Extract month from a datetime column. -
df['date'].dt.day
— Extract day from a datetime column. -
df['date'].dt.weekday
— Extract weekday (0 = Monday, 6 = Sunday). -
df['date'].dt.date
— Extract date (without time).
Sorting and Ranking:
-
df.sort_values()
— Sort DataFrame by values in one or more columns. -
df.sort_index()
— Sort DataFrame by index. -
df.rank()
— Rank the values in a column.
Pivoting and Reshaping Data:
-
df.pivot()
— Pivot the DataFrame (reshaping data). -
df.pivot_table()
— Create a pivot table. -
df.melt()
— Unpivot the DataFrame from wide to long format. -
df.stack()
— Stack the columns into rows (reshape). -
df.unstack()
— Unstack rows into columns.
Data Type Conversion:
-
df.astype()
— Convert data types of columns. -
df.to_numeric()
— Convert a column to numeric values. -
df.to_datetime()
— Convert a column to datetime.
Rolling and Window Functions:
-
df.rolling()
— Create a rolling view of a DataFrame for window-based operations. -
df.rolling(window=5).mean()
— Calculate the rolling mean over a window of size 5. -
df.expanding()
— Apply expanding window operations (i.e., cumulative).
Other Useful Methods:
-
df.pivot_table()
— Create a pivot table for summarizing data. -
df.to_csv()
— Save DataFrame to a CSV file. -
df.to_excel()
— Save DataFrame to an Excel file. -
df.to_json()
— Save DataFrame to a JSON file. -
df.memory_usage()
— Get memory usage of DataFrame columns.
Lambda Functions:
-
df.apply(lambda x: x * 2)
— Apply a lambda function elementwise on the DataFrame.
This list should give you a wide variety of methods to handle different tasks in pandas.