ValueError: Can only compare identically-labeled Series objects
When comparing 2 last 100 rows in columns with the same name from 2 separate dataframes I encountered this error "ValueError: Can only compare identically-labeled Series objects"
My first reaction was to compare column names which were the same. Upon further research I saw that the index numbers of the 2 columns were different and since I am using "iloc" for comparison.
One possible solution is to use reset_index when subsetting the dataframe which still did not help since my dataframes have different number of rows and the row indices would be dissimilar.
The solution that I found worked for me was to use "isin" method in pandas. This has 3 benefits
1. Eliminates any errors due to index mismatch and can be called in 1 line
2. Takes into account scenarios where the values may be present in the rows being compared but not in the same index, i.e. value may be in rank 20 in the first DF but in rank 25 in the second DF. E.g. If top 10 students in year 1 are the same in year 2
3. It allows you to compare columns in dataframes of different lengths

Comments
Post a Comment