Importing Prosper XML data into Nickel Steamroller has been completed. The data will be updated daily at around 7am CST. Prosper allows two options for data export, full and differential. The full public download is over 500 megabytes (include the private and we are over 1 gigabyte)! Doing daily full imports would be pretty intense so luckily Prosper provides daily differentials which are much smaller and much more manageable. Both the Lending Club and Prosper import on Nickel Steamroller complete in under a minute, so the last run timestamp on the status page indicates the system data is current as of that time.
Lending Club’s export is a simple CSV file, but it’s easy for anyone to dive into. This might not be the most glamorous export but it works well, it’s simple to read. CSV files are not able to really represent relational data though, so this is one place where Prosper XML export provides much more richness, and data! Prosper also provides very detailed information on the data types of every element in their API documentation. This was very helpful when designing a data schema to hold the Prosper export as it took the guess work out of the data types. This is one area where Lending Club is lacking, they need more developer support documentation.
Prosper offers what they call a private export. Being private means you must authenticate to access the data. This private data consists of what is needed for analysis since it includes the loan state and the payment history. Lending Club makes this data completely open to the public so in terms of transparency they win . Lending Club also offers a very helpful data point which is the total interest paid. Prosper.com does not offer this data point so calculation of ROI can be a bit of choir. In terms of richness you can not beat Prosper’s export but it’s not completely open to the public. Lending Club’s data export is open to the public and includes interest paid to date as a data point. Anyone can open Lending Club’s data in excel but might have trouble with Prosper’s counter-part as the data is all XML and also contains relations.
To download the private data, it’s really not that tricky, but as mentioned you must have an account with Prosper. Another option is to use their API which really puts them out in front of Lending Club in terms of web service access (although they restrict what you can search – that’s another blog post). I’ve created a PHP library you can use to get an authenticated token to download data without having to log into the web site. You can access that on the Labs page. The API still requires an account on Prosper.
The Prosper daily differential and Lending Club’s export are both about 30 megabytes in size. This will grow. Lending Club will eventually need to introduce a daily differential just like Prosper in order to assist 3rd party access in scaling.
One data point that is missing from Lending Club’s data which is critical is “loan re-listed”. It’s my opinion that re-listed loans are at a higher risk of default, but there is no way to determine this as the data stands today. I would like to see the loan re-list status on Lending Club’s export.
For Prosper, it’s massively important the total interest paid be included. Since the entire point of p2p lending is to earn interest on your investment this is a data point that must be included on all platform exports.