Search Results


37 matches found for 'indexing'

Local Secondary Index vs. Global Secondary Index

Secondary Index A secondary index is used in databases to help speed up queries when we want to grab data from popular columns or if we want to do some type of key range lookup efficiently. Secondary indices are used in relational databases (e.


RDBMS Indexing

Introduction As illustrated in this article, indexing is one of the easiest and most effective tweaks you can add to your SQL database. However, indexing might seem like magic, and you might also not be too sure which field to index in the first place.


Atomic operations with Elasticsearch

... of aliases here is that you can insert the documents into a separate index, and when the indexing operation is complete, you can simply map the alias to that new index, and it will near-instantly point to that index.


RDBMS Optimization

Indexing Probably the easiest tweak to implement. It can usually be done with one SQL command. However, an index should be made based on a good column. For example, if you are frequently querying your rows by timestamp, then the timestamp can be chosen for an index.


Phone Number Mnemonics in Python

Recursion is a great way to come up with permutations of strings or elements. Not necessarily because of the performance (iterative styles are usually faster) but because of simplicity and organization - like many algorithms with functional programming languages like Scala, recursive functions in general look neater, nicer, and easier to categorize.


Queue Reconstruction by Height

Problem Suppose you have a random list of people standing in a queue. Each person is described by a pair of integers (h, k), where h is the height of the person and k is the number of people in front of this person who have a height greater than or equal to h.


Random Permutation

Problem Given an array of integers, and an integer index \(k\), randomly permute the array up to index \(k\). Input \(A\) - Array of integers \(k\) - integer index representing the indices to permute up to Approach Key insights into random permutation with arrays: If only k elements are permuted in an array of size n, the algorithm for random permutation should only cost k in time complexity.


Cyclic Permutation

This is an array/list trick that saves about O(n) in space complexity by taking advantage of a given array with integers representing indices. For this example problem, suppose that we want to apply some permutation P to an array A.


Merge k sorted linked lists

Problem Merge \(k\) sorted linked lists and return it as one sorted list. Analyze and describe its complexity. Example: Input: [ 1->4->5, 1->3->4, 2->6 ] Output: 1->1->2->3->4->4->5->6 Input lists - an array of linked list pointers # Definition for singly-linked list.


Reversing sublists of singly linked lists

Singly linked lists can be tricky in its own right. Without the convenience of having a pointer to the previous node that is provided in doubly-linked lists, it can be really easy to be wasteful in performance with singly linked lists.


Design Concepts

In this article, I want to go over some fundamental design concepts that are useful for coming up with system design. Requirements Functional Requirements Describes specific behaviors i.e. If a URL is generated, it is composed of a Base64 encoded alias Non-functional Requirements Describes architectural requirements i.


Algorithm Handbook

... heap based sorting algorithm which is typically implemented with arrays and indexing. This has potential issues if the dataset is gigantic and the array is allocated statically - priority queues are used as an alternative.


Find the duplicate number

Problem Given an array nums containing \(n + 1\) integers where each integer is between \(1\) and \(n\) (inclusive), prove that at least one duplicate number must exist. Assume that there is only one duplicate number, find the duplicate one.


Search in a rotated sorted array

var jsav = new JSAV("av-sym"); jsav.label("A sorted array rotated to the left by 3"); var arr = jsav.ds.array([25, 28, 29, 34, 1, 15, 20]); Problem Suppose an array sorted in ascending order is rotated at some pivot unknown to you beforehand.


B-Trees vs. LSM Trees

... data structures that are popular in database usage, most notably SQL databases. With a B-Tree indexing structure, data is written onto the disk in fixed size page segments. These page segments are often about 4 KB in size, and have key value pairs sorted by the key.


Buy and Sell Two Stocks

Problem Given an array of integers representing stock prices, with each index representing a different day, find the maximum profit by selling at most two stocks. The buy/sell dates must not overlap.


Python String Tricks - Performance considerations

If a string is changed to become bigger, consider trying to find the maximum size of the final string early on in a quick pass (strive for \(O(n)\)) Consider working from the tail of the string and iterating backwards if, say, 1 character needs to be replaced with 2.


String Manipulation in Python 3+

There are a few ways to search for a string in Python (3+): String operations str.find(sub[, start[, end]])] "shadow walker".find('w') => 5 This gives you the index of the first match. While the index might be nice to have, do remember that strings are immutable.


Data stores in Software Architectures

... the few challenges we would have with scaling. We could scale the SQL using techniques such as indexing, federation, data denormalization (to have less join queries), and moving some data out to another data source (i.


Bloom Filter

A set data structure uses a hashing function to store values and to verify if a value exists. Bloom filters are similar in that it uses multiple hashing functions to store values and to verify if a value exists.


Coin Change Denominations

Problem Given some amount of money integer, and an array of integer coins, calculate the total number of ways to make change. Approach Suppose that we want to find the total number of denominations of 5 cents, using any U.


Replace all occurrences of a space inside an array

Problem Given an array of characters, replace all occurrences of a space with another string. Assume that the array has enough space to contain all of the replacements. Input A - An array of characters which may or may not have spaces word - The string to replace spaces with.


Data Sharding: Twitter Posts

Scenario Let's begin with a Twitter-like service that allows you to tweet new posts. The service has very high read and write traffic , we'll say ~10k read TPS, or transactions-per-second for starters.


NumPy vs. Pandas, and other flavors (Dask, Modin, Ray)

NumPy NumPy is a Python library for numerical computing that offers multi-dimensional arrays and indices as data structures and additional high-level math utilities. ndarray The unique offering of NumPy is the ndarray data structure, which stands for n-dimensional array.


Big Data Processing: Batching vs. Streaming

... data analysts may not pull every column in a query, these databases are also read-optimized by indexing singular columns, rather than rows with every single column. Data Warehouses and SQL databases are similar in that they use the same query interface (SQL).


Next Permutation

Problem Implement next permutation, which rearranges numbers into the lexicographically next greater permutation of numbers. If such arrangement is not possible, it must rearrange it as the lowest possible order (ie, sorted in ascending order).


Rotate a 2D Matrix

var jsav = new JSAV("set"); jsav.label("Before rotation").css({"color": "gray"}); var m = jsav.ds.matrix([[0, 1, 2], [3, 4, 5], [6, 7, 8]]); jsav.label("After rotation").css({"color": "green"}); var n = jsav.


DataFrames (a software engineer's perspective)

What is a DataFrame? A DataFrame is a special data structure used primarily by data scientists and machine learning algorithms. It contains row and column data in tabular fashion by storing metadata about each column and row.


Distributed scaling with Relational Databases

... optimizations we talked about. For general optimizations: Apply techniques such as secondary indexing, federation, data denormalization. Cache as much as you can - it's not just there for fast lookup times, but also to reduce the number of read requests going to your database For higher read throughput: Use RDBMS replications (single-leader or multi-leader) Most RDBMS replications are done asynchronously, which means reads will be eventually consistent Use single-leader for replications within the same data center Use multi-leader for replications across different data centers For higher read throughput with strong consistency reads: Use single-leader replication with consensus algorithms to vote on the total ordering of operations i.


Build a Trie in Python

Problem In computer science, a trie (pronounced "try"), also called digital tree, radix tree or prefix tree, is a kind of search treeā€”an ordered tree data structure used to store a dynamic set or associative array where the keys are usually strings.


Javascript Essentials

Hoisting Hoisting is JavaScript's default behavior of moving declarations to the top. Given the following Javascript code, what is the expected output, and why? fcn2(); fcn1(); function fcn2(){ alert("hi"); } var fcn1 = function(){ alert("hey") } The expected output is a pop up alert that says "hi", followed by an error that fcn1 isn't defined.


Kefir.js - Reactive Javascript

Background Kefir.js is a Reactive Programming library for JavaScript inspired by Bacon.js and RxJS, with focus on high performance and low memory usage. Kefir works with objects called observables. observables could be two things; a stream, or a property (not to be confused with a Javascript object property) Streams A stream is a sequence of events made available over time.


Scaling Instragram Infrastructure

Notes Sending notifications to a person whose photo you liked: RabbitMQ -> Celery Django / Python for web server / application PostgreSQL to store users, medias, friendships, etc. Master with multiple replicas, where reads happen on replicas (Master-Slave Replication) To deal with increased latency with writes, by batching requests wherever possible Replication lag from Master to slave replicas was not a big issue (for them) Cassandra NoSQL (wide column store) to store user feeds, activities, etc.


Compute the max. water trapped by a pair of vertical lines

Problem An array of integers naturally defines a set of lines parallel to the Y-axis, starting from x = 0 as illustrated in the figure above. The goal of this problem is to find the pair of lines that together with the X-axis "trap" the most water.


Summation with rationals

Given some lower bound a = na/da and upper bound nb/db, write a function that calculates the summation of f(nk/dk), where k is the index, n represents numerator, and d represents denominator. An example: if a =1/2 and b = 4/5, then you should calculate f(1/2)+f(2/3)+f(2/4)+ f(3/4)+ f(3/5)+f(4/5) def sum (f: Rational => Rational)(a: Rational, b: Rational) : Rational = { val k = a val total = new Rational(0, 1) def innerSum (f: Rational => Rational) ( k: Rational, total : Rational) : Rational = { if (a <= k && k <= b && k.


Decode number of ways

Problem A message containing letters from A-Z can be encoded into numbers using the following mapping: 'A' -> "1" 'B' -> "2" ... 'Z' -> "26" To decode an encoded message, all the digits must be grouped then mapped back into letters using the reverse of the mapping above (there may be multiple ways).


Build a Binary Tree with Pre-order and In-order traversal data

Problem Given a pre-order traversal array of integers and an in-order traversal array of integers, construct a binary tree. Input preorder - array of integers inorder - array of integers # Definition for a binary tree node.