If you’re a newbie in the analytics world, you can easily get overwhelmed by the choices you have in front of you as you try to get up and running. This is particularly true when you talk about choosing a technology that works best for your needs. For example, the importance of choosing the right database cannot be overemphasized. And there are four questions you need to answer before you decide on any option.
What results do you want?
Unpredictable queries are the most common cause of database issues. With no queries on your database, your performance metrics would be excellent. Making queries predictable means avoiding as much spot-on system computations as possible. This is possible by studying what your users are likely to ask for and putting it in there before anyone needs it. If you can consistently have the answers on hand, a database with semantics will be a good fit. Learn three ways to perform an Excel SQL query – Click here.
What data model works best for your needs?
Databases are built for specific purposes. There will always be areas where they’re great at and not so great at. By knowing the results you want and how your data will be accessed, it will be easier to choose the right database for you.
What are your memory requirements to maintain optimal performance?
Keep in mind that while disks are fast, they’re no match for memory. Networks are gaining speed. The secret to great performance related to disk access is disk-reading predictability. Take note though that memory is not only faster than disk, but more expensive too. If you’re dealing with a minimal data set (or if you don’t mind paying more for memory that can accommodate bigger data sets), then skipping memory when serving data can boost performance.
How well can you balance reads and writes?
Finally, it’s worth noting how engineers tend to forget about the data loading process. This typically causes bottlenecks because of too much focus on the writes as the reads suffer. The good news is writes are more predictable than reads. It’s often better issuing large but occasional sets of writes than doing them small and frequent.
Another method that works is using a queue for buffering writes as a way to throttle throughput. In extreme cases, having two database clusters – one the writes and the other for the reads – would be smart. The inbuilt replication of the clusters can serve as a buffer or throttling system. Click here to find out how to create a drop down in Excel.
Visit http://www.youtube.com/watch?v=8L1OVkw2ZQ8 for more tips.