ValueError: Can only compare identically-labeled Series objects

When comparing 2 last 100 rows in columns with the same name from 2 separate dataframes I encountered this error "ValueError: Can only compare identically-labeled Series objects"

My first reaction was to compare column names which were the same. Upon further research I saw that the index numbers of the 2 columns were different and since I am using "iloc" for comparison. 

One possible solution is to use reset_index when subsetting the dataframe which still did not help since my dataframes have different number of rows and the row indices would be dissimilar.

The solution that I found worked for me was to use "isin" method in pandas. This has 3 benefits

1. Eliminates any errors due to index mismatch and can be called in 1 line

2. Takes into account scenarios where the values may be present in the rows being compared but not in the same index, i.e. value may be in rank 20 in the first DF but in rank 25 in the second DF. E.g. If top 10 students in year 1 are the same in year 2

3. It allows you to compare columns in dataframes of different lengths



Comments

Popular posts from this blog

Record of error messages when writing ML code

Convert dataframe values to a list

TypeError: only list-like objects are allowed to be passed to isin(), you passed a [int]