Helps improve the profiles of CrowdFight volunteers and classify the incoming requests based on their scientific domain.
CrowdFight is a non-profit organization dedicated to facilitating scientific collaborations — especially those that don’t emerge naturally — by finding the experts needed to complement research projects and documenting the credit due to every participant.
Its mission is to promote the advancement of science, its contribution to society, and the well-being of scientists. As an NGO that started at the beginning of the Covid-19 pandemic, and grew to over 45,000 volunteers in just 2 years, they know how impactful the power of collaboration is, and what is needed in order to make it work on a global scale.
The purpose of this project was to help improve the profiles of CrowdFight volunteers, and help classify the incoming requests based on their scientific domain. This was needed in order to better match these volunteers with incoming requests from those seeking help.
Over a six-month period, we set up an automated data pipeline running in Google Cloud to upgrade a process previously run via Google Sheets. The pipeline pre-processes and saves all incoming Crowdfight data to Google BigQuery. This set-up includes a data analysis notebook for assessing matching performance together with an ML ready training data set.
The project was completed in two phases, with our project teams consisting of the following people:
On Crowdfight’s side, we were working with Alfonso Pérez Escudero and Francisco Romero Ferrero.
To find out more details about this project, please see our blog post written by some of the project participants.