Good morning!
Welcome to the twenty second ever issue of Monday Morning Data Science from the Fred Hutch Data Science Laboratory. We are excited to show you what we have been working on (Fresh from the Lab), plus links that we think you would be interested in (Our Weekly Bookmarks Bar). Part of the purpose of this newsletter is to start conversations, so if you have a question or there is something you would like to share with us please let us know by responding directly to this email.
Fresh from the Lab
[Welcome to Spring 2023!!!] We made it through the Dark and weâre back to start fresh. Now is a great time to think about spring cleaning for your data too. Also, despite Marie Kondo having a little real life adjustment lately (the parent in me is very smug about that), the sentiment of assessing whether a dataset still âsparks joyâ and deleting it if it doesnât, is still valid. Often times we move on to the next big project and donât take the time to think about the ghosts of data past. Given recent policy changes by the NIH about data sharing, new requirements from publications for data sharing, reproducibility and documentation of analyses, itâs a great time to start thinking differently about your data stewardship skills so your future self has an easier time of it.Luckily for you, the Data Science Lab is also ramping up and available to help you think about how you might leverage all the resources you have here at FHCC to manage your research data in a way that:
protects it from loss or corruption (all those months of work and all those reagents wasted, ack!!!),
is cost effective (yes we have subsidized storage, but not all storage locations cost the same!) and most importantly,
helps you and yours do the best quality research you can as efficiently and reproducibly as possible so you can focus on the science, not the logistics of your work.
Weâre beginning to develop some guidance around data management and stewardship in the Data Science Lab portion of SciWiki. You can read more about where to store what and what tools we have to move data around in the Scientific Computing Data Storage section of SciWiki. Also, at any time you can schedule a Data House Call to talk about:
best ways to plan your labâs data management scheme,
ways to move data to where they should be, and
ways to adjust how you do computing and access data that will inherently set up good data practices with the side effect of helping you do more reproducible analyses!!
Remember you also always have support from Scientific Computing as well if you need to phone a friend to help you move and verify data or set up credentials for the cloud if your lab has not yet fully shifted to storing your larger (and all genomics) data to AWS S3 where IT supports PI buckets. They have office hours every Wednesday from 10a-12p on Teams or you can email them and describe your needs by emailing scicomp@fredhutch.org.
[New Regime for Data House Calls] Given the diversity of topics folks have brought to Data House Calls, and DaSLâs desire to meet people where they are at (in a hybrid and time-aware sort of way), we've shifted the format to ensure that everyone has some focused time to discuss their specific questions. If you have questions about anything data-related, from here on out you can, at any time, schedule a Data House Call via the link above.
Our Weekly Bookmarks Bar




As always you can contact us by replying directly to this email, you can email Jeff Leek, Amy Paguirigan, and Sean Kross at data@fredhutch.org, or you are welcome to join us on the Fred Hutch Data Slack Workspace. For more information about the Fred Hutch Data Science Lab, visit our website: https://hutchdatascience.org/. See you next week!
- The Fred Hutch Data Science Laboratory