Here’s a list of common methods in pandas that you can use for various data manipulation tasks. I've categorized them based on functionality:
Basic Methods:
-
pd.read_csv()— Load CSV files into a DataFrame. -
df.head()— View the first 5 rows of a DataFrame. -
df.tail()— View the last 5 rows of a DataFrame. -
df.info()— Summary of DataFrame with types, non-null counts, etc. -
df.describe()— Get a statistical summary of the DataFrame. -
df.shape— Get the number of rows and columns. -
df.columns— Get the column labels of a DataFrame. -
df.dtypes— Check the data types of columns. -
df.set_index()— Set a column as the index. -
df.reset_index()— Reset the index to default.
Accessing and Selecting Data:
-
df.loc[]— Access rows and columns by labels (slicing). -
df.iloc[]— Access rows and columns by index (integer-based). -
df.at[]— Access a single value for a row/column pair. -
df.iat[]— Access a single value for a specific row/column. -
df['column_name']— Access a single column. -
df[['col1', 'col2']]— Access multiple columns. -
df.iloc[condition]— Access rows based on conditions (e.g.,df.iloc[df['col'] > 10]).
Manipulating Data:
-
df.drop()— Drop a row/column from the DataFrame. -
df.dropna()— Drop missing values. -
df.fillna()— Fill missing values with a specific value or method. -
df.rename()— Rename columns or index labels. -
df.assign()— Assign new columns to the DataFrame. -
df.insert()— Insert a new column at a specific position. -
df.append()— Append rows to a DataFrame. -
df.concat()— Concatenate multiple DataFrames along rows or columns. -
df.merge()— Merge two DataFrames based on a common column. -
df.join()— Join two DataFrames based on the index. -
df.replace()— Replace values in a DataFrame. -
df.apply()— Apply a function along an axis (rows/columns).
Filtering and Querying:
-
df.query()— Query the DataFrame using a string expression. -
df[condition]— Filter rows based on conditions (e.g.,df[df['age'] > 30]). -
df.str.contains()— Filter strings containing a pattern. -
df.str.match()— Filter strings matching a regex pattern. -
df.isna()— Detect missing values. -
df.notna()— Detect non-missing values. -
df.isnull()— Check for null values. -
df.notnull()— Check for non-null values.
Aggregation and Transformation:
-
df.groupby()— Group data based on columns and apply aggregation. -
df.aggregate()— Apply aggregate functions to data. -
df.mean()— Calculate the mean of a column. -
df.sum()— Calculate the sum of a column. -
df.min()— Find the minimum value in a column. -
df.max()— Find the maximum value in a column. -
df.count()— Count non-null values in a column. -
df.cumsum()— Compute the cumulative sum. -
df.cumprod()— Compute the cumulative product. -
df.transform()— Apply a function elementwise to each group.
Date/Time Operations:
-
pd.to_datetime()— Convert to datetime type. -
df['date'].dt.year— Extract year from a datetime column. -
df['date'].dt.month— Extract month from a datetime column. -
df['date'].dt.day— Extract day from a datetime column. -
df['date'].dt.weekday— Extract weekday (0 = Monday, 6 = Sunday). -
df['date'].dt.date— Extract date (without time).
Sorting and Ranking:
-
df.sort_values()— Sort DataFrame by values in one or more columns. -
df.sort_index()— Sort DataFrame by index. -
df.rank()— Rank the values in a column.
Pivoting and Reshaping Data:
-
df.pivot()— Pivot the DataFrame (reshaping data). -
df.pivot_table()— Create a pivot table. -
df.melt()— Unpivot the DataFrame from wide to long format. -
df.stack()— Stack the columns into rows (reshape). -
df.unstack()— Unstack rows into columns.
Data Type Conversion:
-
df.astype()— Convert data types of columns. -
df.to_numeric()— Convert a column to numeric values. -
df.to_datetime()— Convert a column to datetime.
Rolling and Window Functions:
-
df.rolling()— Create a rolling view of a DataFrame for window-based operations. -
df.rolling(window=5).mean()— Calculate the rolling mean over a window of size 5. -
df.expanding()— Apply expanding window operations (i.e., cumulative).
Other Useful Methods:
-
df.pivot_table()— Create a pivot table for summarizing data. -
df.to_csv()— Save DataFrame to a CSV file. -
df.to_excel()— Save DataFrame to an Excel file. -
df.to_json()— Save DataFrame to a JSON file. -
df.memory_usage()— Get memory usage of DataFrame columns.
Lambda Functions:
-
df.apply(lambda x: x * 2)— Apply a lambda function elementwise on the DataFrame.
This list should give you a wide variety of methods to handle different tasks in pandas.
