The myriad ways of organizing data is confusing. The articles you've linked to are good reads. However, designing the data structures for a given industry is more than an afternoon's work. It feels far removed from the day to day of biotech. Any strategies to avoid "paralysis by analysis"?
That's a great point - schema/ontology design is often much more complex and difficult than infrastructure, and often runs into issues with politics and personal preferences. I have some thoughts on this that I might write about in the future, but more than I want to put in the comments section :)
For storage options I like to consider capacity, cost, convenience, and latency. Over the years there have been many expensive high tech solutions such as tape libraries and data closets.
The ETL vs ELT analysis, you mentioned is a a good place to start. Understanding scale and scope is hard to do in advance, so it's important to leverage lessons learned. Data Lakes and Graph Databases require understanding of the broader objectives, significant planning, and commitment of resources. Biologists grapple with the layering of biochemical, cellular, organ, system, and behaviour. A haphazard storage strategy will be as temperamental as a hyena and as sluggish as, well, as sluggish as a slug.
The myriad ways of organizing data is confusing. The articles you've linked to are good reads. However, designing the data structures for a given industry is more than an afternoon's work. It feels far removed from the day to day of biotech. Any strategies to avoid "paralysis by analysis"?
https://miro.medium.com/max/720/0*h60AcWEOy-5Qdmr2
https://www.sqlshack.com/wp-content/uploads/2018/05/word-image-281.png
That's a great point - schema/ontology design is often much more complex and difficult than infrastructure, and often runs into issues with politics and personal preferences. I have some thoughts on this that I might write about in the future, but more than I want to put in the comments section :)
For storage options I like to consider capacity, cost, convenience, and latency. Over the years there have been many expensive high tech solutions such as tape libraries and data closets.
The ETL vs ELT analysis, you mentioned is a a good place to start. Understanding scale and scope is hard to do in advance, so it's important to leverage lessons learned. Data Lakes and Graph Databases require understanding of the broader objectives, significant planning, and commitment of resources. Biologists grapple with the layering of biochemical, cellular, organ, system, and behaviour. A haphazard storage strategy will be as temperamental as a hyena and as sluggish as, well, as sluggish as a slug.
https://media.sciencephoto.com/image/c0049078/400wm/C0049078-Computer_Tape_Library.jpg
https://images.computerhistory.org/revonline/images/500004392-03-01.jpg