Posts

ValueError: Can only compare identically-labeled Series objects

Image
When comparing 2 last 100 rows in columns with the same name from 2 separate dataframes I encountered this error " ValueError : Can only compare identically-labeled Series objects " My first reaction was to compare column names which were the same. Upon further research I saw that the index numbers of the 2 columns were different and since I am using "iloc" for comparison.  One possible solution is to use reset_index when subsetting the dataframe which still did not help since my dataframes have different number of rows and the row indices would be dissimilar. The solution that I found worked for me was to use "isin" method in pandas. This has 3 benefits 1. Eliminates any errors due to index mismatch and can be called in 1 line 2. Takes into account scenarios where the values may be present in the rows being compared but not in the same index, i.e. value may be in rank 20 in the first DF but in rank 25 in the second DF. E.g. If top 10 students in year 1 ar...

check if 2 columns are equal - Pandas

Image
 After joining 2 dataframes you may have what appear to be redundant columns. It is tempting to be happy with a scan of the 1st few or last few columns but for very large datatsets you don't know what lies beneath. The pandas equals function is here to help. See code snippet below.

Convert dataframe values to a list

Image
<class 'pandas.core.frame.DataFrame'> --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-5-18d68697e7bd> in <module> 1 a = df . quantile ( [ 0.25 , 0.50 , 0.75 ] ) ; 2 print ( type ( a ) ) ----> 3 a = a . tolist ( ) 4 print ( type ( a ) ) ~\.conda\envs\pybase\lib\site-packages\pandas\core\generic.py in __getattr__ (self, name) 5137 if self . _info_axis . _can_hold_identifiers_and_holds_name ( name ) : 5138 return self [ name ] -> 5139 return object . __getattribute__ ( self , name ) 5140 5141 def __setattr__ ( self , name : str , value ) -> None : AttributeError : 'DataFrame' object has no attribute 'tolist' Oftentimes we want to convert a dataframe or a column of a dataframe to a list. the pandas function tol...

TypeError: only list-like objects are allowed to be passed to isin(), you passed a [int]

--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-31-29405b792f60> in <module> 2 for yr in yr_list [ : 2 ] : 3 print ( type ( yr ) ) ----> 4 temp_df = nifty_data [ nifty_data [ 'year' ] . isin ( yr ) ] 5 print ( temp_df . head ( ) ) ~\.conda\envs\py36\lib\site-packages\pandas\core\series.py in isin (self, values) 4683 Name : animal , dtype : bool 4684 """ -> 4685 result = algorithms . isin ( self , values ) 4686 return self._constructor(result, index=self.index).__finalize__( 4687 self , method = "isin" ~\.conda\envs\py36\lib\site-packages\pandas\core\algorithms.py in isin (comps, values) 416 if not is_list_like ( values ) : 417 raise TypeError( --> 418 "onl...

Record of error messages when writing ML code

I'm an MBA turned data scientist who enjoys applying data science to real world problems. Of particular interest to me are capital markets and the telecom industry (my current employer).  While working on machine learning problems, I've often been stuck trying to decipher the long and scary error messages which make you wonder if something really bad has happened.  Oftentimes it is the result of an incorrect datatype or a missed comma or bracket. Google and Stackoverflow are usually helpful but it still takes a while to figure out the solution that really works for you.  So I've decided to keep a blog of error messages that I face when writing code and how I managed to solve them. I will be writing predominantly in python at this time. My other blogs are http://realworldml.substack.com for overall ML work including some general principles when working on machine learning projects as well as application of data science skills to specific domains. My other blog http:/...