In the middle of May, we packed up our bags, our conference booth, and one USB stick with a presentation about the future of Data-Centric AI and hit the road to Boston for the 2023 BioIT World Conference & Expo. Upon returning home our DrugBank attendees have had some time to reflect on their experience and share with us everything they soaked up over the course of the conference.
Check out what Chris Klinger, our Scientific Support lead, and Alex Wilson, our Knowledge & Insights lead, had to say.
Chris Klinger, Scientific Support Lead, Bioinformatics
It sounds like there was a lot going on at BioIT, what were your highlights?
There were several excellent presentations on how to use AI/ML/data science to advance drug discovery efforts. The most exciting of these, in my opinion, was by Mark Brenckle from Generate Biomedicines. They use a combination of generative AI, large-scale protein purification and crystallography, and a robust tech stack to build proteins that fold into predetermined shapes.
What makes this even more interesting is the ability to deduce a protein structure from its sequence was only really obtained in 2021. As such, this group was able to rapidly build out their protocols in about two years.
I also had a great conversation with someone who worked for an analytics software company. They emphasized the value of groupings or cohorts in data for analysis, the importance of sharing the data together with the analysis, and the need to cater to scientists with differing levels of informatics skills. It made me think about how diverse the people working in drug discovery, and repurposing are and how we must serve all end users.
Did you learn anything that surprised you?
Despite the rapid increase in the sophistication of the AI/ML tools available to the field, there was still a large focus on data availability. Talk of data silos, difficulties in finding or sharing data (due to both technical and regulatory/privacy concerns), and data accuracy were common themes throughout the entire conference.
It left me with the impression that we have been operating in a very tool- and model-centric paradigm and have finally outstripped our ability to make the most of what we have. I think that moving forward, the next big challenge to be tackled will be more practical concerns around data stewardship.
In terms of the industry’s future, patient data is both a huge challenge and a huge opportunity. On the one hand, EMRs and other kinds of patient data (history, clinical presentation, genetic background, etc.) are a potential wealth of data to understand conditions and treatment responses. But, there are (rightfully) strict legal and regulatory guidelines around how this data can be stored, shared, and used.
Some newer approaches (like federated learning) can overcome some of these challenges, but it likely isn’t enough. Even with distributed analytics approaches, the source data can still be messy and full of errors (some cannot be “corrected” because they were entered correctly but based on incorrect information from the patient themselves). It will be interesting to see how this evolves.
Did BioIT change any aspect of how you will approach your work?
BioIT really stressed to me the importance of data formats and connections. One of the biggest things top companies look for in acquisitions is interoperability. Or, that is to say, how nicely your stuff plays with all of their stuff. Also, the term “multi-model data” was thrown around a lot, which is basically a shorthand for the difficulty of analyzing different kinds of data within a single model or tool/platform. I think it will be important for us to continue to focus on building out the number of “entry points” we have to our database and the amount of connections within it.
Leaving BioIT, what is the number one thing on your mind?
I keep thinking about where the industry is putting its efforts. Most of the predictive modeling is still focused on the earlier stages of drug discovery (target identification, lead compound generation/optimization). There seems to be less focus on predicting whether a given lead will make it through clinical trials. We can have the most promising lead compound in the world, but if it has hidden toxicities, poor ADMET properties, or some other unperceived issues, it still won’t result in a new drug. I wonder if we need a shift in perspective.
Alex Wilson, Team Lead, Knowledge & Insights
What were you most excited about at BioIT?
Heading to BioIT I was looking forward to meeting our customers, discovering new ideas, and making connections with people working across the industry, and I was not disappointed!
I also had the opportunity to deliver a presentation, and I was really pleased by the reception. I was able to expand awareness of data-centric methods in AI and DrugBank, and it was really rewarding to see my talk land.
Was there anything you found particularly surprising?
Before attending I understood the concepts behind FAIR data, but I didn’t know yet how people were working with and experiencing this concept in their day-to-day lives. The panel discussion “The Future of Data Science in Biomedicine: New Approaches to Make FAIR a Reality” was really enlightening in this regard. I was surprised by some of the aspects of FAIR that people found challenging, and it gave me a new appreciation for the work we do at DrugBank.
It made me realize that there are some parts of our products and data that I take for granted, where we have solved problems that are still challenging some parts of industry, and where we could help.
Were there any conversations that really left an impression on you?
Although I had so many great conversations, my favourites are always those with our customers. It’s invigorating to hear how they’re using DrugBank, to see what they’re excited about, and to discuss what can and should come next! These are the conversations that really energize me and make me excited about the impact we have.
Did BioIT change any aspect of how you will approach your work? If so, how?
One of the talks that I most enjoyed was called “Self-Driving Chemical Optimization,” and was given by Cihan Soylu of Novartis. My team works on building and refining Natural Language Processing tools to extract and structure knowledge from unstructured text resources, and even though the work we do is different, it was really interesting to see the parallels in the talk! I gained some new insights and ideas that I’m excited to share with my team so that we can do our work better.
Leaving BioIT, what is the number one thing on your mind?
The biggest thing I’m thinking about is how can we join the conversation around FAIR data?
The other thing that I’m walking away with is a deep appreciation for my colleagues. Although I work remotely full time and love it, I also really enjoyed the chance to see my fellow DrugBankers in person, work and socialize with them. Everyone at DrugBank is so awesome! ♥️