3 Comments
User's avatar
en zyme's avatar

The myriad ways of organizing data is confusing. The articles you've linked to are good reads. However, designing the data structures for a given industry is more than an afternoon's work. It feels far removed from the day to day of biotech. Any strategies to avoid "paralysis by analysis"?

https://miro.medium.com/max/720/0*h60AcWEOy-5Qdmr2

https://www.sqlshack.com/wp-content/uploads/2018/05/word-image-281.png

Expand full comment
Jesse Johnson's avatar

That's a great point - schema/ontology design is often much more complex and difficult than infrastructure, and often runs into issues with politics and personal preferences. I have some thoughts on this that I might write about in the future, but more than I want to put in the comments section :)

Expand full comment
en zyme's avatar

For storage options I like to consider capacity, cost, convenience, and latency. Over the years there have been many expensive high tech solutions such as tape libraries and data closets.

The ETL vs ELT analysis, you mentioned is a a good place to start. Understanding scale and scope is hard to do in advance, so it's important to leverage lessons learned. Data Lakes and Graph Databases require understanding of the broader objectives, significant planning, and commitment of resources. Biologists grapple with the layering of biochemical, cellular, organ, system, and behaviour. A haphazard storage strategy will be as temperamental as a hyena and as sluggish as, well, as sluggish as a slug.

https://media.sciencephoto.com/image/c0049078/400wm/C0049078-Computer_Tape_Library.jpg

https://images.computerhistory.org/revonline/images/500004392-03-01.jpg

Expand full comment