Database vs Data Warehouse vs Data Lake π±
Last updated on March 13, 2021
Database
Stores relational tables typically with defined schema (i.e.say we have a table storing employee info, the fields for employee info wonβt typically change because we always define employees the same way)
Optimized to deal with transactions, for example editing a table to add a new user.
Is not optimized to perform data analytics
Data Warehouse
Exists on top of several databases that consumes data from the databases and creates an interface for you to query the data in the databases
Is optimized for data analytics
Data Lake
A centralized repository that is used to store any kind of structured or unstructured data that doesnβt impose any structure (schemas) on the data
Basically you throw whatever you want into the data lake and just leave it there for safe keeping. Because you donβt have any schemas you canβt really extract, transform, load (ETL) the data for analysis. What you do is export the data in its raw form, and then you can define itβs structure afterwards.
Notes mentioning this note
There are no notes linking to this note.
Here are all the notes in this garden, along with their links, visualized as a graph.
5G and WiFi AWS Step Functions Analyzing Reddit Post on the Dollar Standard Async, Await, and Promises Bayesian Average Bias Variance Decomposition Blockchain Presentation Breakpoint Debugging in VSCode Brief Look into Measure Theory C4 Model for Software Architecture Cache vs Session Store Cant compare mean and median from different sets Client vs Server Side Rendering Code Production in an AI Company Comparing Client Side Storage Methods Computational Perception Highlights Confidence Intervals for Known Distributions and... Cool Stocks List Crazy Meeting with Obama, McCain, and Bush Post... Curse of Dimentionality Database vs Data Warehouse vs Data Lake Different Git Adds Docker (containerization) vs Vagrant (virtual... Explaining Decision Boundary of a Support Vector... Exporting Databricks Files to Github Floyds Tortoise and Hare Algorithm Fresh Mac Setup Installation Essentials Graphical Model Independencies Highlights from Bad Samaritans Highlights from Good Economics for Hard Times Highlights from The Righteous Mind How Does Chromosomal Heredity Work How Does Light Influence the Rate of Capture in a... How Does Sweating Work How Does Version Naming Work How Not To Be Wrong Excerpt Self Selecting Bias How Not to be Wrong Excerpt Public Opinion Doesn't... How Quantum Computers Could Quickly Break... How Someone Made a Spectral Lamp that Can Emit all... How are images compressed and stored in a computer How do SPACs Work How does Hypothesis Testing Work How does air slow objects down How is Neural Network a Universal Approximator How is Unit Testing Done How to Access a Previous Commit with Git How to Add to Your System Path Variable for Mac How to Build a Full Stack Application How to Clear Unused Docker Containers How to Convert from Celsius to Fahrenheit How to Delete a Branch Github How to Export Pandas DataFrame to CSV Properly How to Force Pull and Overwrite Git How to Get the Bootstrapped Standard Error for a... How to handle violations in positivity How to Properly Explain Technical Tools How to Push Code for Production How to Read a Path in S3 How to Set Up Python Aliasing In the Command Line How to Set a Specific Branch to Track a Specific... How to Store and Access SQL Queries in Databricks How to Take a Weighted Average How to Temporarily Stash Changes with Git Stash How to Untrack Committed Files from Git How to Use Pyenv How to Use Sample Splitting for Doubly Robust... How to Write Output to Text File How to edit Obsidian themes with CSS How to make copies of DNA with PCR How to use Bounds and Sensitivity Analysis in... How to use Scipy Optimize to solve for values when... Info on Stock Options Inspirational Computer Pioneers Intuition Behind the Doubly Robust Estimator Inverting Hypothesis Tests Investing Lessons Jupyter Widgets Exist ML Cheatsheets Making Sense of a Betting Market with... Managing Ruby Versions with rbenv Market Makers and Quant Trading Market Making Presentation Matching Intuition Methodology for Managing Web Apps Microservices vs Monolithic Architecture Modeling Advice and Lessons Learned Working at a... Multinomial to Binomial Stick Breaking... Music Theory Notes Notes from Michael Nielsen Effective Research Post Notes from the Martian by Andy Weir Notes on Bayesian Optimization Notes on Exon Skipping with ASOs Notes on Options Spreads Notes on Quantum Country One Persons Perspective About Why We Shouldnt Read... Presentation on the Kronovet Family Clothing... Python Dataclasses Update Python Package Reference Instructions Random CMU Course Webpages Random Facts from What If by Randall Munroe Reading about Internet Services Rock Thrust Explained SSHing into AWS and Running Things Some Bash Commands to Find Redundant Files and... Some Cool Python Features Some Notes on Exploding Gradient Problem Stats Blogs Stock Options in a Company Testing Code on Github Thoughts after Reading Hillbilly Elegy Thoughts on Andy Matuschak Article on Teaching... Thoughts on Approaching Infinite Knowledge Thoughts on Maria Konnikova Knowledge Project... Thoughts on the End of Natural Selection Tor Network and .Onion Domains Using nonparametric models in doubly robust... Various Treatment Effects and their... Virtual Environment in AWS What Database do I use What are Git Pull and Push Requests What are Information Criteria What are Javascript Workers What are Makefiles What are Moment Generating Functions What are Multiple CPU Cores What are Progressive Web Apps What are Wasserstein and Earth Movers Distances What are the Four Fundamental Forces in Our... What is Apache Spark What is Bootstrapping in Statistics What is Cryptocurrency Staking What is Elasticsearch What is Express.js What is GLUE What is GraphQL What is HTTPS What is IV Crush What is Integration Really JAMStack What is Kubernetes What is Mahalanobis Distance What is MakerDAO Crypto What is Markov Chain Monte Carlo Sampling What is Nested Cross Validation What is Next.js What is PAC Learning What is R Squared What is Redis What is Shrinkage What is Spearman Correlation What is Svelte What is Terraform What is The Graph (Blockchain) What is Variational Inference What is Vue.js What is WebAssembly What is a Credible Interval What is a Fourier transform What is a Gaussian Mixture Model What is a Gaussian Process What is a Object Relational Mapper What is a Qini Curve What is a Sufficient Statistic What is independent component analysis What is the C-Statistic for Benefit What is the Dirichlet Process What is the EM Algorithm What is the Hidden Markov Model What is the Indian Buffet Process What is the Naive Bayes algorithm What is the Negative Binomial Distribution What is the Runtime of a Language What is the Studentized Bootstrap What is the Wake Sleep Algorithm What is the hypergeometric distribution Why are Conjugate Priors Useful Why are there 12 Notes in Western Music Why is Cross Fitting Useful for Estimating... Why is a room hotter when you leave the fridge... Working with Clients Working with Terminal data science overview highlights from Debt The First 5000 Years highlights from Enlightenment Now highlights from Hacking Darwin highlights from How Not to be Wrong highlights from Leonardo da Vinci highlights from Open an Autobiography highlights from Range Why Generalists Triumph in a... highlights from Salt, Fat, Acid, Heat Sapiens a Brief History of Humankind highlights from Stumbling on Happiness highlights from The Gene highlights from Thinking Fast and Slow highlights from Trick Mirror