Sections:
Several of the sections discuss functions and operators that work differently in different programming languages. E.g. '/'. (Division.) E.g. '%'. (Modulo.) E.g. '..' and 'range'. E.g. 'round'. Also: E.g. 'reduce right'. E.g. 'atan2'. E.g. 'log'. E.g. 'is upper'. E.g. 'truthy'/'falsy'. One motivation for the programming language cheat sheets on this website, was to comprehensively cover these distinctions, and to use warning labels to help prevent misuse.
The confusion: '/' works differently in different programming languages. The confusion: in some languages, '/' works differently based on whether the input values are ints/floats. The confusion: 'integer division' is ambiguous, truncated division, floor division? In programming there are 3 common forms of division:
true division e.g.: 9 / 2 = 4.5 -9 / 2 = -4.5 9 / -2 = -4.5 -9 / -2 = 4.5 truncated division (true divide, then round towards zero) e.g.: 9 / 2 = 4 -9 / 2 = -4 9 / -2 = -4 -9 / -2 = 4 floor division (true divide, then round down towards -infinity) e.g.: 9 / 2 = 4 -9 / 2 = -5 9 / -2 = -5 -9 / -2 = 4
Let's test the division operator / in some common programming languages:
C++: -9 / 2 = -4 //truncated division -9.0 / 2.0 = -4.5 //true division JavaScript: -9 / 2 = -4.5 //true division -9.0 / 2.0 = -4.5 //true division Python: -9 / 2 = -4.5 #true division (Python 3 onwards) -9.0 / 2.0 = -4.5 #true division
Let's show how to achieve each type of division in some common programming languages:
true division: C++: n1 / (double)n2 JavaScript: n1 / n2 Python: n1 / n2 #Python 3 onwards truncated division: C++: trunc(n1 / (double)n2) JavaScript: Math.trunc(n1 / n2) Python: int(n1 / n2) #Python 3 onwards floor division: C++: floor(n1 / (double)n2) JavaScript: Math.floor(n1 / n2) Python: n1 // n2
Author's view I would recommend that programming languages have one operator for each type of division: Perhaps something like this:
true division: n1 / n2 truncated division (symbol borrowed from Dart): n1 ~/ n2 floor division (symbol borrowed from Python): n1 // n2
Notes on the / operator symbol WARNING: In many languages, / does truncated division, if both values are ints... and it does true division if at least one value is a float. WARNING: In everyday use, and in many common programming languages, / is true division. Thus it is easy to forget that in many common programming languages, / applied to ints, is truncated division. For example this array: [1/2, 1/3, 1/4] is equivalent to [0, 0, 0] in many languages. Notes on truncated division versus floor division and 'negative / positive' WARNING: In truncated division, positive / positive gives positive (or zero). In truncated division, negative / positive gives negative (or zero). E.g. -1 / 2 = 0. In floor division, positive / positive gives positive (or zero). In floor division, negative / positive gives negative, but never gives zero! E.g. -1 / 2 = -1. So, truncated division has the potential disadvantage that a negative number can become zero. E.g. I pass n/2 to a function, to do n/2 iterations. The function throws if it receives a negative value. If n is -1... With floor division, n/2 remains negative, causing the function to throw. Good, we get an explicit error. With truncated division, n/2 is zero, causing the function to operate as though n was 0, even though n was negative, an invalid input evaded detection. A guard clause is needed. Notes on the utility of truncated versus floor division In my personal use: I generally find truncated division more useful: E.g. if I have found the difference between 2 dates to be 90 seconds, or -90 seconds... I then divide by 60 to get the difference in complete minutes. That gives 1 and -1 minutes respectively. I.e. truncated division is useful because it rounds towards zero. In this case floor division would give a non-useful answer. Links: Operators and Symbols - Programming Language Cheat Sheets - Bazzle Mathematics - Programming Language Cheat Sheets - Bazzle
The confusion: % works differently in different programming languages. The confusion: truncated/floor modulo handle negative input values differently. The confusion: when dividing, what should the remainder be when using negative numbers? A modulo operator (or function) returns the remainder of a division: E.g. 10 / 6 = 1 rem 4 Thus 10 % 6 = 4 I.e. divided / divisor = quotient rem remainder I.e. divided % divisor = remainder Formulas for calculating the remainder can vary. In particular, how negative input values are handled can vary. In this section we cover truncated modulo and floor modulo. In programming there are 2 common forms of modulo:
truncated modulo e.g. (sign of output depends on 1st input value): 10 % 6 = 4 -10 % 6 = -4 10 % -6 = 4 -10 % -6 = -4 4 % 3 = 1 -4 % 3 = -1 4 % -3 = 1 -4 % -3 = -1 floor modulo e.g. (sign of output depends on 2nd input value): 10 % 6 = 4 -10 % 6 = 2 10 % -6 = -2 -10 % -6 = -4 4 % 3 = 1 -4 % 3 = 2 4 % -3 = -2 -4 % -3 = -1
Let's test the modulo operator % in some common programming languages:
C++: -10 % 6 = -4 //truncated modulo JavaScript: -10 % 6 = -4 //truncated modulo Python: -10 % 6 = 2 #floor modulo
Let's show how to achieve each type of modulo calculation in some common programming languages:
truncated modulo: C++: n1 % n2 JavaScript: n1 % n2 Python: math.fmod(n1, n2) floor modulo: C++: n1 - n2*floor(n1 / (double)n2) JavaScript: n1 - n2*Math.floor(n1 / n2) Python: n1 % n2
Author's view I would recommend that programming languages have one operator for each type of modulo operation: Perhaps something like this:
truncated modulo: n1 % n2 floor modulo (symbol borrowed from R): n1 %% n2
I would also recommend that programming languages have one method/function for each type of modulo operation: Perhaps something like this:
truncated modulo: TruncMod(n1, n2) //or perhaps just Mod(n1, n2) floor modulo: FloorMod(n1, n2)
When programming languages have operators/functions in common, this makes it easier to translate code, without having to modify the structure of formulas. I.e. rearranging formulas risks introducing bugs, serious bugs, and bugs that could be easily preventable. Notes on the signs of output values For truncated modulo (the sign is based on the divisor, i.e. the 2nd value): pos % pos = pos (or zero) neg % pos = neg (or zero) pos % neg = neg (or zero) neg % neg = pos (or zero) For floor modulo (the sign is based on the dividend, i.e. the 1st value, this is typically more useful): pos % pos = pos (or zero) neg % pos = pos (or zero) [differs from truncated modulo] pos % neg = neg (or zero) [differs from truncated modulo] neg % neg = neg (or zero) Notes on the utility of truncated versus floor modulo In my personal use: I generally find floor modulo much more useful: E.g. if I have an array of items, e.g. the letters A-Z at indexes 0-25... If I am at item 0 (A), and want the indexes of the items 3 earlier and 3 later... I can calculate these via floor mod: left = floormod(i-3, 26) = 23 right = floormod(i+3, 26) = 3 This is more complicated via truncated mod: left = floormod(i+26-3, 26) = 23 right = floormod(i+3, 26) = 3 With truncated mod, I have to ensure that the value is positive. Almost always, I am working with 'dividend % divisor'... where the dividend could be positive or negative, the divisor is positive, and the desired result is 0 or positive. Note: an alternative is Euclidean modulo. This always gives a positive (or zero) result, whatever the input values. There is a use case where truncated modulo is preferable... To retrieve the fractional part of a floating-point number. E.g. retrieve fractional part via truncated modulo: 5.9 % 1 = 0.9 -5.9 % 1 = -0.9 Note: using floor modulo would give this: 5.9 % 1 = 0.9 -5.9 % 1 = 0.1
The confusion: what are the most common modulo definitions, how do they define the quotient and the remainder. The confusion: different modulo definitions: no-one ever says that there are simple formulas for input values to remainder. The confusion: different modulo definitions: how the signs of input values affect the sign of the remainder (these can be inferred from 'remainder = ...' formulas). Note: whilst the other sections could be considered essential reading, this section is optional, intended for reference, for people wanting to implement/understand one or more of these modulo algorithms. We can perform a division operation: E.g. 70 / 6 = 11 rem 4 I.e. dividend / divisor = quotient rem remainder And then display the original number as a product plus a remainder: E.g. 70 = 6*11 + 4 I.e. dividend = divisor*quotient + remainder There are various definitions for division and modulus... these will result in a slightly bigger/smaller quotient value, and a corresponding slightly different remainder value. Five common forms of division/modulo operation are: truncated, floored, Euclidean, rounded, ceiling. We will derive formulas for calculating the remainder in each case. And we will explain how the sign of the output is determined by the signs of the 2 input values.
formulas for quotient (truncated, floored, Euclidean, rounded-to-even, ceiling): where a is the dividend and n is the divisor: q = truncdiv(a,n) = trunc(a/n) q = floordiv(a,n) = floor(a/n) q = eucliddiv(a,n) = sgn(n)*floordiv(a,abs(n)) = sgn(n)*floor(a/abs(n)) q = roundevendiv(a,n) = roundeven(a/n) q = ceildiv(a,n) = ceil(a/n) note: round to even: round to nearest integer, for tiebreakers e.g. -0.5/0.5/1.5, round to nearest even number formulas for remainder (truncated, floored, Euclidean, rounded-to-even, ceiling): where a is the dividend and n is the divisor: r = a - n*truncdiv(a,n) = a - n*trunc(a/n) r = a - n*floordiv(a,n) = a - n*floor(a/n) r = a - n*eucliddiv(a,n) = a - n*sgn(n)*floordiv(a,abs(n)) = a - n*sgn(n)*floor(a/abs(n)) r = a - n*roundevendiv(a,n) = a - n*roundeven(a/n) r = a - n*ceildiv(a,n) = a - n*ceil(a/n) general: In general, for a/n: a = n*q + r so, r = a - n*q i.e. for dividend / divisor: dividend = divisor*quotient + remainder so, remainder = dividend - divisor*quotient truncated modulo (formula for remainder): q = truncdiv(a,n) substitute q into a = n*q + r: a = n*truncdiv(a,n) + r r = a - n*truncdiv(a,n) truncated modulo (signs): note: comparing a versus n*truncdiv(a,n) will determine the sign of the remainder note: trunc(value) decreases/maintains the absolute value note: |a| >= |n*truncdiv(a,n)| note: |x| is notation for abs(x) i.e. whatever a is, n*truncdiv(a,n) will give a smaller/equal absolute value e.g. 70 versus 6*truncdiv(70,6), is 70 versus 6*11, 70 - 66 gives remainder 4 e.g. -70 versus 6*truncdiv(-70,6), is -70 versus 6*-11, -70 - -66 gives remainder -4 e.g. 70 versus -6*truncdiv(70,-6), is 70 versus -6*-11, 70 - 66 gives remainder 4 e.g. -70 versus -6*truncdiv(-70,-6), is -70 versus -6*11, -70 - -66 gives remainder -4 +a +n: r = +NUM - small = positive or 0 -a +n: r = -NUM - -small = -NUM + small = negative or 0 +a -n: r = +NUM - small = positive or 0 -a -n: r = -NUM - -small = -NUM + small = negative or 0 i.e. '+a +n' means for positive a, and positive n i.e. '+NUM' means positive a, '-NUM' means negative a i.e. 'small' means 'n*truncdiv(a,n)' is smaller than (or equal to) a i.e. 'BIGGER' means 'n*truncdiv(a,n)' is bigger than (or equal to) a so, if a is positive, remainder is positive or 0 so, if a is negative, remainder is negative or 0 floor modulo (floored modulo) (formula): q = floordiv(a,n) substitute q into a = n*q + r: a = n*floordiv(a,n) + r r = a - n*floordiv(a,n) floor modulo (floored modulo) (signs): note: floor(positive) decreases/maintains the absolute value note: floor(negative) increases/maintains the absolute value note: for a and n of same sign: |a| >= |n*floordiv(a,n)| note: for a and n of different sign: |a| <= |n*floordiv(a,n)| +a +n: r = +NUM - small = positive or 0 -a +n: r = -NUM - -BIGGER = -NUM + BIGGER = positive or 0 +a -n: r = +NUM - BIGGER = negative or 0 -a -n: r = -NUM - -small = -NUM + small = negative or 0 so, if n is positive, the remainder will be positive or 0 so, if n is negative, the remainder will be negative or 0 Euclidean division (formula): q = sgn(n)*floordiv(a,abs(n)) substitute q into a = n*q + r: a = n*sgn(n)*floordiv(a,abs(n)) + r r = a - n*sgn(n)*floordiv(a,abs(n)) Euclidean division (signs): note: floor(positive) decreases/maintains the absolute value note: floor(negative) increases/maintains the absolute value note: for positive a: |a| >= |n*floordiv(a,abs(n))| note: for negative a: |a| <= |n*floordiv(a,abs(n))| +a +n: r = +NUM - small = positive or 0 -a +n: r = -NUM - BIGGER = positive or 0 +a -n: r = +NUM - small = positive or 0 -a -n: r = -NUM - BIGGER = positive or 0 so, the remainder will be positive or 0 rounded division (round to even) (formula): q = roundeven(a/n) substitute q into a = n*q + r: a = n*roundeven(a/n) + r: r = a - n*roundeven(a/n) rounded division (round to even) (signs): note: the sign of r cannot be inferred from the signs of a and n alone e.g. 9 % 2 = 1 e.g. 11 % 2 = -1 ceiling division (formula): q = ceildiv(a,n) substitute q into a = n*q + r: a = n*ceildiv(a,n) + r r = a - n*ceildiv(a,n) ceiling division (signs): note: ceil(positive) increases/maintains the absolute value note: ceil(negative) decreases/maintains the absolute value note: for a and n of same sign: |a| <= |n*ceildiv(a,n)| note: for a and n of different sign: |a| >= |n*ceildiv(a,n)| +a +n: r = +NUM - BIGGER = negative or 0 -a +n: r = -NUM - -small = -NUM + small = negative or 0 +a -n: r = +NUM - small = positive or 0 -a -n: r = -NUM - -BIGGER = -NUM + BIGGER = positive or 0 so, if n is positive, the remainder will be negative or 0 so, if n is negative, the remainder will be positive or 0
Links Operators and Symbols - Programming Language Cheat Sheets - Bazzle Mathematics - Programming Language Cheat Sheets - Bazzle Modulo - Wikipedia
The confusion: 'round' works differently in different programming languages. The confusion: different rounding methods can be fiddly to describe. Five common methods of rounding are: ceil: round towards +infinity, to the nearest integer floor: round towards -infinity, to the nearest integer trunc (aka int): round towards zero, to the nearest integer (commonly used to convert floats to ints) round: round towards the nearest integer (tiebreak: round away from zero) round to even (aka bankers' rounding): round towards the nearest integer (tiebreak: round to the nearest even integer) Another common method: round (ties towards +infinity): round towards the nearest integer (tiebreak: round towards +infinity) Tiebreak rules are used when rounding numbers whose fractional part is 0.5 (or -0.5). E.g. 4.5 is equally near 4 and 5, so we use a tiebreak rule to determine which is the nearer integer. Rounding v. bankers' rounding: The 2 are identical except for, in bankers' rounding: 0.5, 2.5, 4.5 ..., round towards 0 (subtract 0.5) -0.5, -2.5, -4.5 ..., round towards 0 (add 0.5) To test what type of rounding a programming language uses, two useful test values are 2.5 and -2.5. E.g. a table demonstrating rounding methods:
n | ceil | round | floor | trunc | bankers' |
---|---|---|---|---|---|
3 | 3 | 3 | 3 | 3 | 3 |
2.75 | 3 | 3 | 2 | 2 | 3 |
2.5 | 3 | 3 | 2 | 2 | 2 |
2.25 | 3 | 2 | 2 | 2 | 2 |
2 | 2 | 2 | 2 | 2 | 2 |
1.75 | 2 | 2 | 1 | 1 | 2 |
1.5 | 2 | 2 | 1 | 1 | 2 |
1.25 | 2 | 1 | 1 | 1 | 1 |
1 | 1 | 1 | 1 | 1 | 1 |
0.75 | 1 | 1 | 0 | 0 | 1 |
0.5 | 1 | 1 | 0 | 0 | 0 |
0.25 | 1 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 |
-0.25 | 0 | 0 | -1 | 0 | 0 |
-0.5 | 0 | -1 | -1 | 0 | 0 |
-0.75 | 0 | -1 | -1 | 0 | -1 |
-1 | -1 | -1 | -1 | -1 | -1 |
-1.25 | -1 | -1 | -2 | -1 | -1 |
-1.5 | -1 | -2 | -2 | -1 | -2 |
-1.75 | -1 | -2 | -2 | -1 | -2 |
-2 | -2 | -2 | -2 | -2 | -2 |
-2.25 | -2 | -2 | -3 | -2 | -2 |
-2.5 | -2 | -3 | -3 | -2 | -2 |
-2.75 | -2 | -3 | -3 | -2 | -3 |
-3 | -3 | -3 | -3 | -3 | -3 |
Note: round via floor and ceil for positive x: round(x) = floor(x + 0.5) for negative x: round(x) = ceil(x - 0.5) Note: mapping floats to ints WARNING: Trunc maps floats unevenly to ints. The middle set (-1<n<1) is roughly twice the size of the others. E.g. sets: -3<n<=-2, -2<n<=-1, -1<n<1, 1<=n<2, 2<=n<3. WARNING: Round maps floats unevenly to ints. The middle set (-0.5<n<0.5) is slightly smaller than the others. E.g. sets: -2.5<n<=-1.5, -1.5<n<=-0.5, -0.5<n<0.5, 0.5<=n<1.5, 1.5<=n<2.5. Floor maps floats evenly to ints. E.g. sets: -3<=n<-2, -2<=n<-1, -1<=n<0, 0<=n<1, 1<=n<2, 2<=n<3. Ceil maps floats evenly to ints. E.g. sets: -3<n<=-2, -2<n<=-1, -1<n<=0, 0<n<=1, 1<n<=2, 2<n<=3. E.g. drawing sets on a number line:
-|-----|-----|-----|-----|-----|-----|-----|-----|- -4 -3 -2 -1 0 1 2 3 4 >< -3 >< -2 >< -1 >< 0 >< 1 >< 2 >< 3 >< trunc -><----><----><----><---------><----><----><----><- trunc >< -4 >< -3 >< -2 >< -1 >< 0 >< 1 >< 2 >< 3 >< floor ><----><----><----><----><----><----><----><----><- floor >< -3 >< -2 >< -1 >< 0 >< 1 >< 2 >< 3 >< 4 >< ceil -><----><----><----><----><----><----><----><---->< ceil -4 >< -3 >< -2 >< -1 >< 0 >< 1 >< 2 >< 3 >< 4 round ----><----><----><----><---><----><----><----><---- round -4 ><-3 >< -2 ><-1 >< 0 >< 1 >< 2 >< 3 >< 4 bankers' ----><---><-----><---><-----><---><-----><---><---- bankers'
Note: the floating point format uses bankers' rounding. E.g. if converting a 64-bit float to a 32-bit float, that involves bankers' rounding.
The confusion: throughout programming, be careful to avoid off-by-one errors. In programming, there are many, often very simple, situations... where you could easily make an error in your calculation, returning an answer 1 too big or 1 too small. Or, for example, you might might compare two values, where one value is off-by-one. Furthermore, when iterating, you may need special handling for the first and last items. In general, an off-by-one error. E.g. line v. circle posts If you want have a 100m line, and you want to put posts every 10m... HOw many posts do you need? You need 11 posts. E.g. 1 at the start of every 10m, and one at the end. Consider dividing a line into 4, 2 posts at the end, 3 in the middle. If you'd guessed 10, that would be an off-by-one error! If you want have a 100m circle, and you want to put posts every 10m... HOw many posts do you need? You need 10 posts. E.g. 1 at the start of every 10m. Consider dividing a circle into 4, it's like one post for each of North/East/South/West points on a compass. E.g. page numbers If you read from the start of page 44, to the end of page 67... How many pages have you read? You might guess 67-44=23. The correct is answer is 67-44+1=24. Another way to think about it: consider pages 1-67, now remove 1-43, so 67-43=24. E.g. remove 5 items from an array If you have a 26-item array, one for each letter. A is 1, Z is 26. To remove, the 5 items, fghij... You must remove items F (6) to J (10). Note: that's 5 items, but the difference is 10-6=4. (In a 1-based array, F to J is 6 to 10.) (In a 0-based array, F to J is 5 to 9.) E.g. is an array index valid In a 1-based array. An index is valid if: 1 <= i <= array.length. In a 0-based array. An index is valid if: 0 <= i < array.length. Notice the difference: <= versus <. Some other examples that appear in this document: XYRB Coordinates. Bitshifting v. Powers of 2. Inclusive Range v. Exclusive Range.
The confusion: signum returns 1/0/-1 (where negative is -1), signbit returns 0/1 (where negative is 1). The confusion: signbit distinguishes between positive zero and negative zero. Some common 'sign' function definitions: Sign(n) or signum(n) or n <=> 0: returns 1 if input positive, 0 if input zero, -1 if input negative. Copysign(1, n): returns 1 if input positive or positive zero, -1 if input negative or negative zero. Signbit(n): 'is negative': returns 1 if input negative or negative zero, returns 0 if input positive or positive zero. Note: <=> is called the spaceship operator.
The confusion: clamping via min/max is confusing/counterintuitive. A clamp function restricts a value between 2 limits. E.g. clamp(x, 0, 100) to restrict the volume (amplitude) between 0 and 100. E.g. clamp(x, 0, 255) to restrict and RGB value between 0 and 255. If the value is between the limits, it is unchanged. If the value is below the lower limit, it is set to the lower limit. If the value is above the upper limit, it is set to the upper limit.
E.g. clamping with an upper limit: Longhand: if (x > max) x = max Via ternary: x = (x <= max) ? x : max Shorthand: x = min(x, max) The last is counterintuitive, because to set a max limit, you use min. E.g. clamping with a lower limit: Longhand: if (x < min) x = min Via ternary: x = (x >= min) ? x : min Shorthand: x = max(x, min) The last is counterintuitive, because to set a min limit, you use max. E.g. clamping with both an upper and a lower limit: Long-hand: if (x < min) return min else if (x > max) return max else return x Via ternary: x = (x < min) ? min : (x > max) ? max : x Shorthand: x = max(min, min(x, max)) Via clamp: x = clamp(x, min, max) The max/min approach is difficult to comprehend, hence the benefit of a clamp function.
The confusion: 'range' works differently in different programming languages. The confusion: '..' works differently in different programming languages. The confusion: if languages lack an inclusive range function, it can be difficult to reverse the direction of iteration. Range functions or operators typically start inclusive, but the endpoint may be inclusive or exclusive, depending on the language. E.g. for 1 to 10 inclusive: PHP: range(1, 10) Python: range(1, 11) Kotlin: 1..10 Ruby: 1..10 Rust: 1..11 Author's view I would recommend that programming languages have one operator for each type of range: Perhaps something like this:
range: start inclusive to end inclusive (borrowed from Kotlin): n1 .. n2 range: start inclusive to end exclusive (borrowed from Kotlin): n1 ..< n2
I would recommend that programming languages have one function for each type of range: Perhaps something like this:
range: start inclusive to end inclusive: rangeTo(n1, n2) //or as in Kotlin: n1.rangeTo(n2) range: start inclusive to end exclusive: rangeUntil(n1, n2) //or as in Kotlin: n1.rangeUntil(n2)
In my experience, to inclusive end, is far more useful than to exclusive end. E.g. if I want 1 to 10 (both inclusive) in PHP: range(1, 10) E.g. if I want 10 to 1 (both inclusive) in PHP: range(10, 1, -1) range(10, 1) //also works But in Python which lacks a 'inclusive range' function: E.g. if I want 1 to 10 (both inclusive) in Python: range(1, 10+1) E.g. if I want 10 to 1 (both inclusive) in Python: range(10, 1-1, -1) So, for the lack of an inclusive range function, we have a basic use case, where the code is pretty unreadable, and bugs are likely. For more complicated use cases, bugs are very likely. One use case for an inclusive-to-exclusive range, is doing something every n hours, every day. E.g. you want 0 inclusive to 24 exclusive, with gaps of say 1, 3 or 6 hours. 24 is the natural endpoint, an exclusive endpoint... An inclusive endpoint would need endpoint 23, or 21, or 18 respectively. However, you could simply use an inclusive endpoint of '24 - gap'. And furthermore, if you wanted to iterate in reverse, with an inclusive-to-inclusive range, you could just use swap the values, e.g. '24 - gap' to 0, you couldn't simply swap the values if using an inclusive-to-exclusive range. One very good case for an inclusive-to-exclusive range, is not for iteration, but for a 'between' function. E.g. if you want words beginning with a/b/c/d/e, then a (inclusive) to f (exclusive) is a useful range.
The confusion: jumping to page n, when the real and PDF page numbers differ. E.g. the real page 1, is at PDF page 20. So to print the real pages 1-10, you need to print PDF pages 20-29. In this case: p = r + 19 r = p - 19 You can use the formulas to translate. E.g. the index says see p.40, so p = 40 + 19 = 59, I go to PDF page 59.
The confusion: in JavaScript, why is Number.MAX_SAFE_INTEGER equal to 2**53-1, and not 2**53? The smallest positive integer that can't be represented exactly as a 64-bit float: 2**53+1 = 9007199254740993 It gets rounded down to 2**53 = 9007199254740992. Note: In JavaScript, Number.MAX_SAFE_INTEGER = 2**53-1. Why not 2**53? There is some big ambiguity with 2**53. If you see 2**53, it may have been entered as 2**53, or it may have been entered as 2**53+1, and rounded down. I.e. 2**53 corresponds to 2 possible integer values. Whereas 2**53-1, is safe. The largest integer that can be represented exactly as a 64-bit float: 2**1024 - 2**971 = approx. 1.797693 * 10**308 In full, the 309-digit integer: 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368 Note: it is also the largest *number* that can be represented as a 64-bit float (excluding positive infinity). Here are some calculations:
Bits per component in 64-bit floats: 1+11+52=64: sign/exponent/significand respectively. Formulas for storing 64-bit floats: pow bits all 0: 0.0, -0.0 or denormal numbers: (-1)**signbit * 2**(-1022) * 0.fraction pow bits all 1: +infinity, -infinity or NaN (not a number) in general: (-1)**signbit * 2**(pow-1023) * 1.fraction Floats are stored in a way similar to scientific notation: E.g. 123 in scientific notation: 1.23 * 10**2 I.e. scientific notation (base 10): (1 or -1) * (x, where 1 <= x < 10) * 10**(p, an integer). I.e. scientific notation (base 2): (1 or -1) * (x, where 1 <= x < 2) * 2**(p, an integer). Note: in scientific notation (base 2), every significand starts with '1'. Thus for 64-bit floats, since the leading bit is always 1, there is no need to store it. For a 64-bit float, the range for p is: -1022 <= p <= 1023. In scientific notation, a value of 0 is a special case. Here's some JavaScript code for 64-bit float to bit representation: vNum = 0.3; oBuf = new Uint8Array(new Float64Array([vNum]).buffer, 0, 8); vBin = Array.from(oBuf).reverse().map(v=>("0000000"+v.toString(2)).slice(-8)).join(""); vBinSplit = vBin[0] + " " + vBin.slice(1, 12) + " " + vBin.slice(12); //1+11+52 = 64 console.log(vBinSplit); ============================== bit representation: 0 0 00000000000 0000000000000000000000000000000000000000000000000000 ... min denormal 0 00000000000 0000000000000000000000000000000000000000000000000001 ;approx. 4.940656 * 10**-324 max denormal 0 00000000000 1111111111111111111111111111111111111111111111111111 ;approx. 2.225074 * 10**-308 (i.e. min normal - 1/2**1074) min normal 0 00000000001 0000000000000000000000000000000000000000000000000000 ;approx. 2.225074 * 10**-308 ... 1 0 01111111111 0000000000000000000000000000000000000000000000000000 2 0 10000000000 0000000000000000000000000000000000000000000000000000 3 0 10000000000 1000000000000000000000000000000000000000000000000000 4 0 10000000001 0000000000000000000000000000000000000000000000000000 5 0 10000000001 0100000000000000000000000000000000000000000000000000 6 0 10000000001 1000000000000000000000000000000000000000000000000000 7 0 10000000001 1100000000000000000000000000000000000000000000000000 ... 2**53-1 0 10000110011 1111111111111111111111111111111111111111111111111111 ;9007199254740991 2**53 0 10000110100 0000000000000000000000000000000000000000000000000000 ;9007199254740992 2**53+1 # ########### #################################################### ;9007199254740993 (gets rounded to 9007199254740992) 2**53+2 0 10000110100 0000000000000000000000000000000000000000000000000001 ;9007199254740994 ... 2**62-512 0 10000111100 1111111111111111111111111111111111111111111111111111 ;4611686018427387392 2**62 0 10000111101 0000000000000000000000000000000000000000000000000000 ;4611686018427387904 2**63-1024 0 10000111101 1111111111111111111111111111111111111111111111111111 ;9223372036854774784 2**63 0 10000111110 0000000000000000000000000000000000000000000000000000 ;9223372036854775808 2**64-2048 0 10000111110 1111111111111111111111111111111111111111111111111111 ;18446744073709549568 2**64 0 10000111111 0000000000000000000000000000000000000000000000000000 ;18446744073709551616 ... max normal 0 11111111110 1111111111111111111111111111111111111111111111111111 ;approx. 1.797693 * 10**308 (note: an integer) calculating values: 0 ... min denormal = (-1)**0 * 2**(-1022) * (2**-52) = 1 * 2**-1074 * 1 = 1/2**1074 = approx. 4.940656 * 10**-324 max denormal = (-1)**0 * 2**(-1022) * (1-2**-52) = 1 * 2**-1022 * (1-2**-52) = 1/2**1022 - 1/2**1074 = approx. 2.225074 * 10**-308 (i.e. min normal - 1/2**1074) min normal = (-1)**0 * 2**(1-1023) * (1) = 1 * 2**-1022 * 1 = 1/2**1022 = approx. 2.225074 * 10**-308 ... 1 = (-1)**0 * 2**(1023-1023) * (1) = 1 * 1 * 1 = 1 2 = (-1)**0 * 2**(1024-1023) * (1) = 1 * 2 * 1 = 2 3 = (-1)**0 * 2**(1024-1023) * (1+1/2) = 1 * 2 * 1.5 = 3 4 = (-1)**0 * 2**(1025-1023) * (1) = 1 * 4 * 1 = 4 5 = (-1)**0 * 2**(1025-1023) * (1+1/4) = 1 * 4 * 1.25 = 5 6 = (-1)**0 * 2**(1025-1023) * (1+1/2) = 1 * 4 * 1.5 = 6 7 = (-1)**0 * 2**(1025-1023) * (1+1/2+1/4) = 1 * 4 * 1.75 = 7 ... 2**53-1 = (-1)**0 * 2**(1075-1023) * (2-2**-52) = 1 * 4503599627370496 * (2-1/4503599627370496) = 9007199254740991 2**53 = (-1)**0 * 2**(1076-1023) * (1) = 1 * 9007199254740992 * 1 = 9007199254740991 = 9007199254740992 2**53+1 = N/A 2**53+2 = (-1)**0 * 2**(1076-1023) * (1+2**-52) = 1 * 9007199254740992 * 1 = 9007199254740991 = 9007199254740994 ... 2**62-512 = (-1)**0 * 2**(1083-1023) * (2-2**-52) = 1 * 2305843009213693952 * (2-1/4503599627370496) = 4611686018427387392 2**62 = (-1)**0 * 2**(1084-1023) * (1) = 1 * 4611686018427387904 * 1 = 4611686018427387904 2**63-1024 = (-1)**0 * 2**(1084-1023) * (2-2**-52) = 1 * 4611686018427387904 * (2-1/4503599627370496) = 9223372036854774784 2**63 = (-1)**0 * 2**(1085-1023) * (1) = 1 * 9223372036854775808 * 1 = 9223372036854775808 2**64-2048 = (-1)**0 * 2**(1085-1023) * (2-2**-52) = 1 * 9223372036854775808 * (2-1/4503599627370496) = 18446744073709549568 2**64 = (-1)**0 * 2**(1086-1023) * (1) = 1 * 18446744073709551616 * 1 = 18446744073709551616 ... max normal = (-1)**0 * 2**(2046-1023) * (2-2**-52) = 1 * 2**1023 * (2-2**-52) = 2**1024 - 2**971 = approx. 1.797693 * 10**308 (note: an integer) ============================== Some examples of binary fractions: 0b0.1 = 1 / 2**1 = 1/2 0b0.0000000000000000000000000000000000000000000000000001 = 1 / 2**52 = 2**-52 = 1/4503599627370496 0b1.1111111111111111111111111111111111111111111111111111 = 2 - 0b0.0000000000000000000000000000000000000000000000000001 = 2 - 1 / 2**52 = 2 - 2**-52 = 2 - 1/4503599627370496 ============================== The integers that can be represented exactly as 64-bit floats: 0 to 2**53 inclusive. No gaps. The gap is 2**1=2 after: 2**53. I.e. you can represent 2**53, 2**53+2, 2**53+4, ..., 2**54. The gap is 2**2=4 after: 2**54. I.e. you can represent 2**54, 2**54+4, 2**54+8, ..., 2**55. The gap is 2**3=8 after: 2**55. ... The gap is 2**11=2048 after: 2**63. The gap is 2**12=4096 after: 2**64. ... In general, the gap is 2**(n-52) after: 2**n (for n>=52). ============================== Listing powers of 10 and the nearest available integers in 64-bit floats: JavaScript code to list powers of 10 and their 64-bit float approximations: vOutput = ""; for (var i=0; i<=308; i++) { //i: how many trailing 0's //vBits: number of bits (floored) //vNum1: 10**i //vNum2: vNum1 as a 64-bit float (accuracy may be lost) //vNum3: vNum2 as a string vNum1 = "1" + "0".repeat(i); vNum2 = parseFloat(vNum1); vNum3 = BigInt(vNum2).toString(); vBits = Math.floor(Math.log2(vNum2)); vOutput += [i, vBits, vNum1, vNum3].join("\t") + "\n"; } console.log(vOutput); The smallest power of 10 that can't be represented exactly as a 64-bit float: 10**23 = 100000000000000000000000, is rounded to 99999999999999991611392. (No powers of 10 greater than that can be represented exactly as a 64-bit float either.) Note 100000000000000000000000 falls between these two values: 2**76 = 75557863725914323419136 2**77 = 151115727451828646838272 Between 2**76 and 2**77 inclusive, the gap between representable integers is 2**(76-52) = 16777216. So the next integer that can be represented exactly as a 64-bit float, after 99999999999999991611392, is: 99999999999999991611392 + 16777216 = 100000000000000008388608 JavaScript code to confirm this: vNum = "1" + "0".repeat(23); vBigInt = BigInt(vNum); vTemp = parseFloat(vBigInt); vNeedle = BigInt(vTemp).toString(); console.log([vBigInt, vNeedle]); while (true) { vBigInt++; vTemp = parseFloat(vBigInt); vHaystack = BigInt(vTemp).toString(); if (vNeedle != vHaystack) { console.log([vBigInt, vHaystack]); break; } } The binary representation of 99999999999999991611392: 0 10001001011 0101001011010000001011000111111000010100101011110110 99999999999999991611392 = (-1)**0 * 2**(1099-1023) * (0b1.0101001011010000001011000111111000010100101011110110) = 2**76 * (1+1/4+1/16+1/128+1/512+1/1024+1/4096+1/524288+1/2097152+1/4194304+1/67108864+1/134217728+1/268435456+1/536870912+1/1073741824+1/2147483648+1/68719476736+1/274877906944+1/2199023255552+1/8796093022208+1/35184372088832+1/70368744177664+1/140737488355328+1/281474976710656+1/1125899906842624+1/2251799813685248) = 99999999999999991611392
The confusion: '0.1 + 0.2 == 0.3' usually returns false when programming. When using 64-bit floats, '0.1 + 0.2 == 0.3' returns false. Doing some calculations by hand, can suggest the cause: We do 1/3, and round to 6 decimal places: 0.333333. We do 2/3, and round to 6 decimal places: 0.666667. We do 1/3 + 1/3 and get 0.666666. The 2 values do not match, due to rounding. Below, we will calculate and compare the underlying bit representations for: 0.1, 0.2, 0.3 and 0.1+0.2. Here are the rules for 64-bit float calculations: The IEEE standard requires that the result of addition, subtraction, multiplication and division be exactly rounded. That is, the result must be computed exactly and then rounded to the nearest floating-point number (using round to even). Source: What Every Computer Scientist Should Know About Floating-Point Arithmetic
Bit representations as 64-bit floats: 0.1 0 01111111011 1001100110011001100110011001100110011001100110011010 0.2 0 01111111100 1001100110011001100110011001100110011001100110011010 0.3 0 01111111101 0011001100110011001100110011001100110011001100110011 0.1+0.2 0 01111111101 0011001100110011001100110011001100110011001100110100 Note: in decimal, 0.1, 0.2 and 0.3 can be represented simply. Note: in binary, 0.1, 0.2 and 0.3 all give recurring bits: 0.1: 1/10 = 0b0.0 0011 0011... 0.2: 2/10 = 0b0.0011 0011... 0.3: 3/10 = 0b0.01 0011 0011... If we write these numbers to 54 significant figures, we get: 0.1 = 0b0.000110011001100110011001100110011001100110011001100110011 0.2 = 0b0.00110011001100110011001100110011001100110011001100110011 0.3 = 0b0.0100110011001100110011001100110011001100110011001100110 If we use scientific notation we get: 0.1 = 2**-4 * 0b1.10011001100110011001100110011001100110011001100110011 0.2 = 2**-3 * 0b1.10011001100110011001100110011001100110011001100110011 0.3 = 2**-2 * 0b1.00110011001100110011001100110011001100110011001100110 To get the bit representations, we round these numbers to 53 significant figures, using round-to-even rules. If the last bit is even, round towards 0, i.e. do nothing. (Discard the last 0.) If the last bit is odd, round away from 0. (Discard the last 1, and add 1.) This gives: 0.1 = 2**-4 * 0b1.1001100110011001100110011001100110011001100110011010 0.2 = 2**-3 * 0b1.1001100110011001100110011001100110011001100110011010 0.3 = 2**-2 * 0b1.0011001100110011001100110011001100110011001100110011 And so the bit representations: Signbit: 0 since the numbers are positive. Exponent bits: add 1023. E.g. -4 + 1023 = 1019. Significand bits: discard the '0b1.' The values match those at the top. To add 0.1 and 0.2. We start with the values from above, for 0.1 and 0.2, that match the bit representations: 0.1 = 2**-4 * 0b1.1001100110011001100110011001100110011001100110011010 0.2 = 2**-3 * 0b1.1001100110011001100110011001100110011001100110011010 Equivalent to: 0.1 = 0b0.00011001100110011001100110011001100110011001100110011010 0.2 = 0b0.0011001100110011001100110011001100110011001100110011010 We sum them: 0.1+0.2 = 0b0.01001100110011001100110011001100110011001100110011001110 We use scientific notation: 0.1+0.2 = 2**-2 * 0b1.001100110011001100110011001100110011001100110011001110 To get the bit representation, we round to 53 significant figures: 0.1+0.2 = 2**-2 * 0b1.0011001100110011001100110011001100110011001100110100 And so the bit representation: Signbit: 0 since the numbers are positive. Exponent bits: add 1023. I.e. -2 + 1023 = 1021. Significand bits: discard the '0b1.' The values for 0.1+0.2 match those at the top. ============================== Here's an example of binary division (using the same principles as decimal division): 1/10 = 0b1/0b1010 = 0b0.0 0011 0011... 0.000110011001... -------------- 1010|1.0000 1010 1100 0b10000-0b1010 = 16-10 = 6 = 0b110 1010 10000 0b1100-0b1010 = 12-10 = 2 = 0b10 1010 1100 1010 10000 1010 100... It becomes evident that the calculation is recurring.
The confusion: there are 2 kinds of zero. The confusion: how to test for negative zero. Most programming languages use the IEEE 754 standard for floating-point numbers. Internally, the numbers are stored as a sign bit, an exponent (power), and a significand (absolute value). This results in the existence of both 0.0 and -0.0, depending on whether the sign bit is 0 or 1 respectively. In some situations, they both behave the same: E.g. 0.0 == -0.0 returns true. E.g. n == 0.0 and n == -0.0 are equivalent. E.g. n > 0.0 and n > -0.0 are equivalent. In some situations they behave differently: E.g. 1/0.0 = +infinity, 1/-0.0 = -infinity. (E.g. in C++/JavaScript.) E.g. atan2(0, 0.0) = 0, atan2(0, -0.0) = pi. (E.g. in C++/JavaScript/Python.) (If desired, a custom function can treat positive/negative zero differently. As if passing 2 pieces of information, the value 0 and a sign.)
The confusion: how are 0.0 and -0.0 different? The confusion: negative zero gotchas. For ints, there is one type of 0: 0. For floats, there are two types of 0: 0.0 (positive zero) and -0.0 (negative zero). Most of the time, 0.0 and -0.0 behave equivalently, so obtaining -0.0 when 0.0 was intended will not cause a bug. What follows are details on when 0.0 and -0.0 behave equivalently/differently, and how to avoid obtaining -0.0 unintentionally. Literal ints that are in fact literal floats In most programming languages, if you type something that looks like an int, an int is returned. 2 programming languages where this is not true are JavaScript and R. So if you type -0, you will get -0.0, not 0. Print, Format, ToString Some languages print 0.0 and -0.0 identically. Changing global settings, or using a 'format' function could reveal them. E.g. JavaScript, some functions show the negative sign, some don't: console.log((-0.0).toFixed()); //0 console.log((-0.0).toFixed(1)); //0.0 console.log((-0.0).toLocaleString()); //-0 console.log((-0.0).toLocaleString("en-US")); //-0 E.g. R: print(0.0) #0 print(-0.0) #0 sprintf("%f", 0.0) #0.000000 sprintf("%f", -0.0) #-0.000000 Equality Most programming languages use the IEEE 754 standard for floats, it requires that '-0.0 == 0.0' return true. Truncate Most languages have a truncate function. It truncates floats, it rounds the value to the nearest integer, towards zero, it removes the fractional part. Truncate functions typically truncate -0.0 <= x < -1.0 (e.g. -0.1, -0.9) to -0.0, not 0.0. Custom comparison formulas A comparison function typically compares 2 values, and returns positive/0/negative depending on which value is considered 'larger'. E.g. a custom formula for comparing numbers in JavaScript: vCmp = +(vNum1>vNum2)||-(vNum1<vNum2) Unfortunately, if the values were considered equal, the formula above would return -0.0, not 0.0. The formula can be fixed by adding '0' or swapping the clauses. E.g. two working alternatives: vCmp = +(vNum1>vNum2)||0-(vNum1<vNum2) vCmp = -(vNum1<vNum2)||+(vNum1>vNum2) Where positive/negative zero behave differently: Sort In most programming languages, since -0.0 and 0.0 are considered equal: for a stable sort, any instances 0.0 of -0.0 would appear together, but remain in their original orders, for an unstable sort, any instances 0.0 of -0.0 would appear together, but in a 'random' order. Unusually, Java's sort() distinguishes between -0.0 and 0.0: all the -0.0 instances appear first, then the 0.0 instances. Languages based on Java, e.g. Kotlin and Scala, also exhibit this behaviour. Author's view The Java behaviour is beneficial and useful, it makes it easier to inspect and understand the sorted values. (It simultaneously treats -0.0 and 0.0 as different, i.e. separated, and equal, they appear alongside each other in the same location.) Coercing negative zero to positive zero To force -0.0 to 0.0, but keep other numbers unchanged: vNum = 0 + vNum To force 0.0 to -0.0, but keep other numbers unchanged: vNum = -(0-vNum) I.e. 0-vNum makes all zeros into positive zeros, and inverts the sign of other numbers Then -(...) makes all positive zeros into negative zeros, and inverts the sign of other numbers
//E.g. JavaScript, examples of flipping the sign: oArray = [3.0, 0.0, -0.0, -3.0]; oArrayMM = oArray.slice(); //maintain 0s, maintain rest oArrayFM = oArray.map(v=>v?v:-v); //flip 0s, maintain rest oArrayPM = oArray.map(v=>0+v); //0s to +0, maintain rest oArrayNM = oArray.map(v=>-(0-v)); //0s to -0, maintain rest oArrayMF = oArray.map(v=>v?-v:v); //maintain 0s, flip rest oArrayFF = oArray.map(v=>-v); //flip 0s, flip rest oArrayPF = oArray.map(v=>0-v); //0s to +0, flip rest oArrayNF = oArray.map(v=>-(0+v)); //0s to -0, flip rest console.log(oArrayMM); //[ 3, 0, -0, -3] console.log(oArrayFM); //[ 3, -0, 0, -3] console.log(oArrayPM); //[ 3, 0, 0, -3] console.log(oArrayNM); //[ 3, -0, -0, -3] console.log(oArrayMF); //[-3, 0, -0, 3] console.log(oArrayFF); //[-3, -0, 0, 3] console.log(oArrayPF); //[-3, 0, 0, 3] console.log(oArrayNF); //[-3, -0, -0, 3]
Other functions: Various lines of codes demonstrating functionality where -0.0 and 0.0 might behave differently. E.g. JavaScript: console.log(1/-0.0); //-Infinity console.log(1/0.0); //Infinity console.log(Math.sign(-0.0)); //-0 console.log(Math.sign(0.0)); //0 console.log((-0.0).toLocaleString()); //-0 console.log((-0.0).toLocaleString("en-US")); //-0 console.log((-0.0) % 1); //-0 console.log(Math.sqrt(-0.0)); //-0 console.log(Math.cbrt(-0.0)); //-0 E.g. JavaScript (rounding): console.log((-0.0).toFixed()); //0 console.log((-0.0).toFixed(1)); //0.0 console.log(Math.round(-0.0)); //-0 console.log(Math.ceil(-0.0)); //-0 console.log(Math.floor(-0.0)); //-0 E.g. JavaScript (results where using -0.0 and 0.0 are equivalent): console.log([0.0,-0.0].indexOf(-0.0)); //0 console.log([-0.0,0.0].indexOf(0.0)); //0 console.log(Math.hypot(-0.0,-0.0)); //0 console.log(Math.exp(-0.0)); //1 console.log(Math.log(-0.0)); //-Infinity //i.e. ln(-0.0) console.log(Math.log10(-0.0)); //-Infinity E.g. JavaScript (trigonometric functions): console.log(Math.sin(-0.0)); //-0 console.log(Math.sin(0.0)); //0 console.log(Math.cos(-0.0)); //1 console.log(Math.cos(0.0)); //1 console.log(Math.tan(-0.0)); //-0 console.log(Math.tan(0.0)); //0 console.log(Math.asin(-0.0)); //-0 console.log(Math.asin(0.0)); //0 console.log(Math.acos(-0.0)); //1.5707963267948966 console.log(Math.acos(0.0)); //1.5707963267948966 console.log(Math.atan(-0.0)); //-0 console.log(Math.atan(0.0)); //0 console.log(Math.atan2(0, -0.0)); //3.141592653589793 //pi console.log(Math.atan2(0, 0.0)); //0 E.g. Ruby: p (-50.0).clamp(-0.0, 100.0) #-0.0 E.g. Python: import math import statistics print(math.copysign(1, -0.0)) #-1.0 print(math.copysign(1, 0.0)) #1.0 print(math.radians(-0.0)) #-0.0 print(math.degrees(-0.0)) #-0.0 print(sum([-0.0])) #0.0 #for most languages, [-0.0].sum() returns 0.0 print(math.prod([-0.0])) #-0.0 print(statistics.mean([-0.0])) #0.0 print(statistics.median([-0.0])) #-0.0 print(statistics.multimode([-0.0])) #[-0.0] print(statistics.multimode([-0.0, 0.0, 1, 1])) #[-0.0, 1] print(statistics.multimode([0.0, -0.0, 1, 1])) #[0.0, 1] E.g. Kotlin: println(listOf(-0.0, 0.0).groupingBy{it}.eachCount()) //{-0.0=1, 0.0=1} E.g. Java: double[] oArray = {-1.0, -0.0, 0.0, -0.0, 0.0, -0.0, 0.0, 1.0}; java.util.Arrays.sort(oArray); System.out.println(java.util.Arrays.toString(oArray)); //[-1.0, -0.0, -0.0, -0.0, 0.0, 0.0, 0.0, 1.0]
The confusion: ascending versus descending. The confusion: older is 'smaller' than newer. Typically, sort is ascending, the 'smallest' value first. Values increase (ascend) as you go from left-to-right. E.g. 1, 2, 3. E.g. a, b, c. E.g. -3, -2, -1, 0, 1, 2, 3. E.g. 2008, 2009, 2010, 2011, 2012. E.g. 1 is smaller than 10. E.g. A is smaller than Z. E.g. the year 2010 is smaller than the year 2020. So counterintuitively: older dates are smaller than newer dates. E.g. -10 is smaller than 10. Descending order is often useful for viewing the most recently modified files... The newer files have a 'larger' date modified (e.g. the year is bigger), and go at the top.
The confusion: in a sort comparator function, if item 1 is 'bigger' than item 2, should positive or negative be returned. Sort functions typically let you specify a custom sort comparator function. You take 2 input values, and you return one of positive/0/negative to indicate which item is bigger/smaller. But which way round? Treat it as a difference function. I.e. return A - B. E.g. cmp(10,7). 10 is bigger than 7, diff(10,7) = 10-7 = 3, a positive number. E.g. cmp(7,10). 7 is smaller than 10, diff(-7,10) = -7-10 = -3, a negative number. E.g. cmp(10,10). 10 is equal to 10, diff(10,10) = 10-10 = 0. E.g. cmp("J","G"). J is bigger than G, so return a positive number. E.g. cmp("G","J"). G is smaller than J, so return a negative number. E.g. cmp("J","J"). J is equal to J, so return 0. So, if item 1 is 'bigger', and item 2 is 'smaller', the output should be positive. So, if item 1 is 'smaller', and item 2 is 'bigger', the output should be negative. So, if item 1 is equal to item 2, the output should be 0.
The confusion: an equals function returns truthy if two values match. The confusion: a sort comparator function returns 0 (which looks falsy) if two values match. This is counterintuitive. E.g. I could have a 'cmpi' sort comparator function, it does a case-insensitive comparison, making it possible to do a case-insensitive sort on an array of strings. But if just comparing two values directly, the code would like this: are_equal = !cmpi("a", "A") //true are_equal = !cmpi("a", "B") //false It would be more intuitive and readable, to create an 'equalsi' function, like so: are_equal = equalsi("a", "A") //true are_equal = equalsi("a", "B") //false Note: 'equals' is more intuitive than '!cmp'. It also reduces the likelihood of bugs, e.g. if the programmer forgets to use '!'.
The confusion: some 'sort comparator functions' take 1 value, rather than the usual 2 values. The confusion: no-one explains how these functions work, they miss out a crucial step: (you specify a value to intermediate value function, then...) intermediate values are sorted by Python's default sort rules. Python-style sort functions take a 'key parameter'. This function is known as 'SortBy' in various languages. From the Python documentation: Both list.sort() and sorted() have a key parameter to specify a function (or other callable) to be called on each list element prior to making comparisons. Source: Sorting Techniques — Python 3.12.5 documentation The idea is to modify the value into an intermediate value, something Python knows how to sort. I.e. consider if you created a list of the intermediate values, how would Python sort that list. E.g. to sort case-insensitive, provide a function that converts one value to either upper case or lower case. Python can then sort those values using its case-sensitive string sort. E.g. to sort strings by length, provide a function that returns the length of a string. Python can then sort those values using its integer sort. E.g. to sort integers alphabetically, provide a function that converts an int to a string. Python can then sort those values using its case-sensitive string sort. Example Python code:
print(sorted(["Q","w","E","r","T","y"], key=str.lower)) print(sorted(["Q","w","E","r","T","y"], key=lambda s: s.lower())) #equivalent to line above print(sorted(["aaa","b","cc"], key=len)) print(sorted([1,2,3,11,22,33], key=str))
Related terms: decorate-sort-undecorate, strxfrm ('string transform'), Schwartzian transform.
The confusion: stable sort is fundamentally important, but people may be unaware of it. The confusion: people may intuitively assume stable sort is the default, but it is not always the default. In a stable sort, when 2 values are considered equal, the order is maintained. In an unstable sort, when 2 values are considered equal, the order is not guaranteed to be maintained. E.g. in a case-insensitive sort, these values could end up in any order: ["A", "a"]. E.g. in an unstable sort, where strings are compared numerically, these values could end up in any order: ["0", "00", "0x0"]. People unaware of the stable/unstable distinction, might assume that stable is the default. A further benefit of stable sort, is if you want to perform a multi-stage sort. E.g. to achieve ["A", "a", "B", "b"], sort case-sensitive, sort-case insensitive. E.g. to achieve ["a", "A", "b", "B"], sort case-sensitive, reverse the array, sort-case insensitive. In a case-sensitive sort on strings, stable/unstable sort should be equivalent. In an integer sort on ints, stable/unstable sort should be equivalent. When it is safe to use it, unstable sort may mean faster execution.
The confusion: sorting numbers alphabetically. E.g. in some situations in Excel, you can't 'just sort', you have to make a choice: Sort Warning The following sort key may not sort as expected because it contains some numbers formatted as text: Column A What would you like to do? Sort anything that looks like a number, as a number Sort numbers and numbers stored as text separately E.g. sort an int array numerically: [1,2,3,11,22,33], E.g. sort an int array, as strings, alphabetically: [1,11,2,22,3,33].
The confusion: compare 'as if lower case' versus compare 'as if upper case'. The location of chars such as [\]^_` varies in different implementations of case-insensitive sorts. In most (almost all?) programming languages, case-sensitive comparison is done like so: Each character is compared according to its Unicode codepoint. A case-insensitive comparison is usually done by converting a string to lower case (or upper case), and then comparing each character according to its Unicode codepoint. Using ASCII characters 32 to 126 as examples: If you sort every char, case sensitive, you get chars sorted by codepoint order: !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ If you sort every char, case insensitive 'as if upper case', you get: !"#$%&'()*+,-./0123456789:;<=>?@AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz[\]^_`{|}~ If you sort every char, case insensitive 'as if lower case', you get: !"#$%&'()*+,-./0123456789:;<=>?@[\]^_`AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz{|}~ These 6 chars appear in a different location depending on whether the sort is as upper case or as lower case: [\]^_`
The confusion: how does natural sort work. From Rosetta Code: Some of the key points: There is no "one true way" to do this, but for the purpose of this task 'natural' orderings might include: 1. Ignore leading, trailing and multiple adjacent spaces 2. Make all whitespace characters equivalent. 3. Sorting without regard to case. 4. Sorting numeric portions of strings in numeric order. That is split the string into fields on numeric boundaries, then sort on each field, with the rightmost fields being the most significant, and numeric fields of integers treated as numbers. foo9.txt before foo10.txt As well as ... x9y99 before x9y100, before x10y0 ... (for any number of groups of integers in a string). ... 6. Sort letters without regard to accents. Source: Natural sorting - Rosetta Code
The confusion: base 2 is not like a tally system, it's like on/off switches, and position matters. E.g. with a tally system, and 4 sticks, you can count between 0 and 4 inclusive. Double the sticks to 8, you can count between 0 and 8 inclusive. E.g. binary (base 2), 4 slots is like 4 on/off switches, which gives 2**4 = 16 permutations, 0 to 15 inclusive. Add one more switch, this doubles the range, with the new switch off, 0 to 15 like before, with the new switch on 16 to 31 inclusive. E.g. decimal (base 10), 4 slots is like 4 slots with 10 possibilities, which gives 10**4 = 10000 permutations, 0 to 9999 inclusve. Add more more digit, this multiplies the range by 10, giving 100000 permutations, 0 to 99999 inclusve. Note: tallies versus base 2. In a tally system, 1 stick equals 1. In base 2, 1 switch set to on, with all others off, could mean: 1, 2, 4, 8, or 16 etc. Position matters. In base 2, if you have n bits, and are using unsigned integers, the range is 0 to n**2-1 inclusive. E.g. 1-bit: 0 to 1. E.g. 2-bit: 0 to 3. (E.g. '00, 01, 10, 11'.) E.g. 4-bit: 0 to 15. E.g. 8-bit: 0 to 255. E.g. 16-bit: 0 to 65535. E.g. 32-bit: 0 to 4,294,967,295. E.g. 64-bit: 0 to 18,446,744,073,709,551,615. In base 10, if you have n digits, and are using unsigned integers, the range is 0 to n**10-1 inclusive. E.g. 1-digit: 0 to 9. E.g. 2-digit: 0 to 99. E.g. 3-digit: 0 to 999.
The confusion: converting between signed/unsigned numbers. The confusion: why is the largest unsigned number always equivalent to -1 (as a signed number). The confusion: there are more negative integers available than positive integers (1 more). The confusion: the key difference between signed and unsigned integers, is that with signed integers, you treat the first bit as negative. Using the ubiquitous two's complement system to represent integers, if working with 8-bit integers (bytes), we get: Unsigned: 0 to 127, 128 to 255. Signed: 0 to 127, -128 to -1. Since the first half will contain 0 and the positives, and the other half will contain the negatives, the count of negatives will always be one higher than the count of positives. For unsigned integers, you calculate the number by adding the bits. For signed integers, you calculate the number by adding the bits, however, you treat the first bit (most significant bit) as negative if present. For 8-bit unsigned/signed integer, the most significant bit has value 128. E.g. 0b10101010. (The '0b' prefix indicates a binary integer.) Unsigned: 128 + 0*64 + 32 + 0*16 + 8 + 0*4 + 2 + 0*1 = 170. Signed: -128 + 0*64 + 32 + 0*16 + 8 + 0*4 + 2 + 0*1 = -86. For 8-bit integers, and indeed any n-bit integers: The largest unsigned integer is always equivalent to -1 (as a signed integer). This is because the integers will range from 0 to 2**(b-1)-1, then 2**(b-1) to -1. Where b is the number of bits. The last integer in binary will always be repunit(b). For 8-bit integers this is repunit(8) = 0b11111111. As a signed integer you get: -128 + 64 + 32 + 16 + 8 + 4 + 2 + 1 = -1 I.e. you treat the first bit as negative, and so you start off with a big negative integer, which is almost, but not quite, cancelled out by the positive bits. Converting between signed/unsigned integers For 8-bit integers: s = (u < 128) ? u : u - 256 u = (s >= 0) ? s : s + 256 Note: the difference between negative (signed) 8-bit integers and their unsigned equivalents, is always 256.
The confusion: 2**n = 1<<n. I.e. to achieve n to the power of 2, you use 1 << n, not 2 << n. These: 2**0 = 1 2**1 = 2 2**2 = 4 2**3 = 8 Give the same values as these: 1<<0 = 1 1<<1 = 2 1<<2 = 4 1<<3 = 8 But don't confuse 2**n with 2<<n: 2<<0 = 2 2<<1 = 4 2<<2 = 8 2<<3 = 16 In summary: 2**n: multiply n 2's together e.g. n=3 gives 2*2*2 = 8 1<<n: doubling 1 n times (equivalent to the line above) e.g. n=3 gives 1*2*2*2 = 8 2<<n: doubling 2 n times e.g. n=3 gives 2*2*2*2 = 16
The confusion: ambiguous terminology re. 2's complement (base 2), and 2s' complement (base 3). The confusion: ambiguous terminology re. 10's complement (base 10), and 10s' complement (base 11). The confusion: apostrophes: in base 2, we usually say ones' complement and two's complement. The confusion: apostrophes: in base 10, we usually say nines' complement and ten's complement. Complements for individual digits In base 2, the ones' complement for 0 is 1. In base 2, the ones' complement for 1 is 0. In base 10, the nines' complement for 0 is 9. In base 10, the nines' complement for 1 is 8. The other pairs are: 2 and 7, 3 and 6, 4 and 5. Complements for integers For positive integers (and zero), you concatenate the complements for each digit: E.g. in base 2, the ones' complement for 0b1011 (11) is 0b0100 (4). Note: the bits are flipped: 0's become 1's, 1's becomes 0's. Note: 0b1011 + 0b0100 = 0b1111. All 1's. (The '0b' prefix indicates a binary integer.) E.g. in base 10, the nines' complement for 1290 is 8709. Note: 1290 + 8709 = 9999. All 9's. Complements in different bases In general: In base n, numbers have: (n-1)s' complement (base n) (diminished radix complement) n's complement (base n) (radix complement) (n-1)s' complement: just involves performing n-1-d on each digit (see above) n's complement: do the same, but then add 1 from the total adding 1 gives a system with useful properties (see lower down) In programming, the most well-known complements are: 1s' complement (base 2) (diminished radix complement) 2's complement (base 2) (radix complement) where: 1s' complement = repunit(len) - n = 2**len - 1 - n where: 2's complement = 1s' complement + 1 = 2**len - n where: len is the maximum number of bits allowed for our binary numbers e.g. repunit(8) = 0b11111111 (255) In programming, also sometimes mentioned are: 9s' complement (base 10) (diminished radix complement) 10's complement (base 10) (radix complement) where: 9s' complement = repdigit(9, len) - n = 10**len - 1 - n where: 10's complement = 9s' complement + 1 = 10**len - n where: len is the maximum number of digits allowed for our decimal numbers e.g. repdigit(9, 4) = 4444 From Stack Overflow: This comment clears up some key terminology confusion: Binary has only 1's-complement and 2's-complement; base-10 has only 9's-complement and 10's-complement. "53's-complement" only makes sense if you are working in base-53 or base-54 Source: algorithm - How to find N's complement of a number? - Stack Overflow Ones' complement (base 2) Ones' complement is useful as a system for representing signed integers. If working with 8-bit integers (bytes), we get: Unsigned: 0 to 127, 128 to 255. Signed: 0 to 127, -127 to -0. In ones' complement, there is positive zero, and negative zero. E.g. 5 is stored as 0b00000101. To get the binary representation for -5, we flip the bits: -5 is stored as 0b11111010. And the inverse, -5 to 5: 0b11111010 (-5), flip bits to get 0b00000101 (5). Two's complement (base 2) If working with 8-bit integers (bytes), we get: Unsigned: 0 to 127, 128 to 255. Signed: 0 to 127, -128 to -1. In two's complement, there is no negative zero. E.g. 5 is stored as 0b00000101. To get the binary representation for -5, we flip the bits, and add 1, ignoring any overflow beyond the 8 bits. -5 is stored as 0b11111010 + 0b1 = 0b11111011. We can check our calculations. For a signed 8-bit number: The first bit, if present, is treated as negative, equivalent to -128, So, we can sum as follows: -128 + 64 + 32 + 16 + 8 + 0*4 + 2 + 1 = -5, as expected. For an unsigned 8-bit number, we would get: 128 + 64 + 32 + 16 + 8 + 0*4 + 2 + 1 = 251. Note: for unsigned integers, you calculate the number by adding the bits. Note: for signed integers, you calculate the number by adding the bits, however, you subtract the first bit (most significant bit, equal to 128 for 8-bit integers) if present. And the inverse, -5 to 5: 0b11111011 (-5), flip bits to get 0b00000100 (4), add 1 to get 0b00000101 (5). Competing systems For unsigned integers, 0 to 255 is the obvious first choice. Sacrificing half of the integers, to allow negative numbers, some first ideas could be: Unsigned: 0 to 127, 128 to 255. Signed: 0 to 127, -0 to -127. (Use the first bit to indicate the others are negative.) Signed: 0 to 127, -127 to -0. Ones' complement. Signed: 0 to 127, -128 to -1. Two's complement. (Remove the negative zero, add -128.) (Use the first bit as a negative bit.) Two's complement is not the most obvious system to use... but two's complement has become the dominant system for storing integers, probably because of these advantages: From Wikipedia: Compared to other systems for representing signed numbers (e.g., ones' complement), the two's complement has the advantage that the fundamental arithmetic operations of addition, subtraction, and multiplication are identical to those for unsigned binary numbers (as long as the inputs are represented in the same number of bits as the output, and any overflow beyond those bits is discarded from the result). Source: Two's complement - Wikipedia Further advantages: Simplicity. Two's complement only has positive zero, no negative zero. An increased range of 1, e.g. -127 to 127, versus -128 to 127. E.g. AutoHotkey uses signed 64-bit ints. I wanted modulo arithmetic for unsigned 64-bit ints. I.e. return a signed int, where the value is really meant to be an unsigned int. The availability of -9223372036854775808 (-0x8000000000000000) eased calculations, and reduced the number of special cases. Author's view Two's complement is something I essentially never need to think about. It is important if considering how signed integers are stored in memory. E.g. if writing code relating to bit-shifting. The main facts I keep in my head relating to bit representations, are listed below... Some key facts re. signed integers and two's complement (base 2) Bit representation to integer: Treat the first bit as negative, the others as positive, and sum them. The bit representation for -1 will be all 1's. E.g. a byte: -0b10000000 + 0b01111111 = -128 + 127 = -1. Mapping between signed and unsigned integers: 8-bit integers. Signed: 0 to 127, -128 to -1. Unsigned: 0 to 255. 16-bit integers. Signed: 0 to 32767, -32768 to -1. Unsigned: 0 to 65535. 32-bit integers. Signed: 0 to 2147483647, -21474836478 to -1. Unsigned: 0 to 4294967295. 64-bit integers. Signed: 0 to 9223372036854775807, -9223372036854775808 to -1. Unsigned: 0 to 18446744073709551615. Note: the first half are (zero and) positive, the second half are negative. The last value is -1. There is positive zero, but no negative zero. There will be one more negative integer, than there are positive integers. Links: Two's complement - Wikipedia Method of complements - Wikipedia
The confusion: '>>' works differently in different programming languages. The confusion: how does bitshift right work on negative integers? How does adding one-bits halve a negative integer? In simple terms, bitshift right halves a number, and discards the remainder. More specifically, integers are stored as bits, we 'shift' bits, and this changes the integer's value. Bitshifting positive integers E.g. if we use the 8-bit unsigned integer 128, and bitshift right by 1: 0b10000000 (128) >> 1 = 0b01000000 (64) the bits are shifted 1 place right the rightmost bit is discarded the left side is zero-filled with 1 0-bit E.g. if we use the 8-bit unsigned integer 128, and bitshift right by 4: 0b10000000 (128) >> 4 = 0b00001000 (8) the bits are shifted 4 places right the 4 rightmost bits are discarded the left side is zero-filled with 4 0-bits When a bit is shifted, its value is halved, or it is discarded if it goes beyond the end of the integer. Bitshifting negative integers E.g. if we use the 8-bit signed integer -128, and bitshift right by 1: 0b10000000 (-128) >> 1 = 0b11000000 (-64) = -128 + 64 the bits are shifted 1 place right the rightmost bit is discarded the left side is one-filled with 1 1-bit E.g. if we use the 8-bit signed integer -128, and bitshift right by 4: 0b00001111 (-128) >> 4 = 0b11111000 (-8) = -128 + 64 + 32 + 16 + 8 the bits are shifted 4 places right the 4 rightmost bits are discarded the left side is one-filled with 4 1-bits We typically think of 8-bit signed positive integers as follows: the default integer is 0, all 0-bits, the first bit is always 0 for a positive integer, we can ignore it, in the other 7 bits, any 1-bits increase the total. We can think of 8-bit signed negative integers as follows: the default integer is -1, all 1-bits, the first bit is always 1 for a negative integer, we can ignore it, in the other 7 bits, any 0-bits decrease the total (make it more negative). E.g. 0b11111111 = -1 + 0*128 - 0*64 - 0*32 - 0*16 - 0*8 - 0*4 - 0*2 - 0*1 = -1 E.g. 0b11111101 = -1 + 0*128 - 0*64 - 0*32 - 0*16 - 0*8 - 0*4 - 2 - 0*1 = -3 I.e. we replace a 1-bit with a 0-bit above, the 0-bit acts as a -2. Note: 1-bits are ignored, 0-bits are treated as present (with a negative value). As bits are shifted right, those negative values are halved (and sometimes discarded, as they go off the end). E.g. 0b10001000 = -1 + 0*128 - 64 - 32 - 16 - 0*8 - 4 - 2 - 1 = -120 E.g. 0b11000100 = -1 + 0*128 - 0*64 - 32 - 16 - 8 - 0*4 - 2 - 1 = -60 E.g. 0b11100010 = -1 + 0*128 - 0*64 - 0*32 - 16 - 8 - 4 - 0*2 - 1 = -30 E.g. 0b11110001 = -1 + 0*128 - 0*64 - 0*32 - 0*16 - 8 - 4 - 2 - 0*1 = -15 E.g. 0b11111000 = -1 + 0*128 - 0*64 - 0*32 - 0*16 - 0*8 - 4 - 2 - 1 = -8 E.g. 0b11111100 = -1 + 0*128 - 0*64 - 0*32 - 0*16 - 0*8 - 0*4 - 2 - 1 = -4 E.g. 0b11111110 = -1 + 0*128 - 0*64 - 0*32 - 0*16 - 0*8 - 0*4 - 0*2 - 1 = -2 E.g. 0b11111111 = -1 + 0*128 - 0*64 - 0*32 - 0*16 - 0*8 - 0*4 - 0*2 - 0*1 = -1 Note: -1 >> 1 = -1. I.e. applying a bitshift right to -1 returns -1. Note: If you repeatedly do a signed bitshift right, starting with a positive int, you get 0. If you repeatedly do a signed bitshift right, starting with a negative int, you get -1. If you repeatedly do an unsigned bitshift right, you get 0. Bitshifting right in different programming languages In JavaScript: '>>' converts a number to a signed 32-bit integer, then does a signed right shift. '>>>' converts a number to an unsigned 32-bit integer, then does an unsigned right shift. I.e. the operator determines the output. (Whether one-fill may be used. Whether the output may be negative.) console.log(0x80000000 >> 0); //-2147483648 (-0x80000000) console.log(0x80000000 >>> 0); //2147483648 (0x80000000) console.log(0x80000000 >> 1); //-1073741824 (-0x40000000) console.log(0x80000000 >>> 1); //1073741824 (0x40000000) console.log(); console.log(-1 >> 0); //-1 console.log(-1 >>> 0); //4294967295 (0xFFFFFFFF) console.log(-1 >> 1); //-1 console.log(-1 >>> 1); //2147483647 (0x7FFFFFFF) In C++: '>>' on a signed 32-bit integer (int), does a signed right shift '>>' on an unsigned 32-bit integer (unsigned int), does an unsigned right shift. I.e. the integer types determines the output. (Whether one-fill may be used. Whether the output may be negative.) std::cout << ((int)0x80000000 >> 0) << "\n"; //-2147483648 (-0x80000000) std::cout << ((unsigned int)0x80000000 >> 0) << "\n"; //2147483648 (0x80000000) std::cout << ((int)0x80000000 >> 1) << "\n"; //-1073741824 (-0x40000000) std::cout << ((unsigned int)0x80000000 >> 1) << "\n"; //1073741824 (0x40000000) std::cout << "\n"; std::cout << ((int)-1 >> 0) << "\n"; //-1 std::cout << ((unsigned int)-1 >> 0) << "\n"; //4294967295 (0xFFFFFFFF) std::cout << ((int)-1 >> 1) << "\n"; //-1 std::cout << ((unsigned int)-1 >> 1) << "\n"; //2147483647 (0x7FFFFFFF)
The confusion: how can a XOR swap, swap 2 variables, without using a temporary variable. Some notes about XOR XOR means exclusive-or, and is often denoted by '^'. (Note: '^' is sometimes used to mean exponent.) If given 2 bits, a and b, it returns 1 if a, or b, but not both is 1: 1 ^ 1 = 0 1 ^ 0 = 1 0 ^ 1 = 1 0 ^ 0 = 0 If you XOR 2 numbers, you treat the numbers as a stream of bits, and XOR each bit.
E.g. 10 ^ 12: a = 12 = 0b1100 b = 10 = 0b1010 t = 12^10 = 0b0110 = 6
Note: the example above demonstrates all the possible pairings: 1^1, 1^0, 0^1, 0^0. Like + and *, XOR is commutative: a^b = b^a E.g. 12^10 = 10^12 The XOR swap Given 2 of a, b and t=a^b, you can calculate the third. This is the key to the XOR swap. e.g. t = a^b = b^a e.g. a = b^t = t^b e.g. b = a^t = t^a This is demonstrated in the example above, where 12^10 = 6. The XOR swap, 3 actions, swaps the values of 2 variables, e.g.: a=a^b b=a^b a=a^b e.g. a = 12 b = 10 a = a^b = 12^10 = 6 [we lose the original a, but it can be regained] b = a^b = 6^10 = 12 [we regain the original a, we lose the original b, but it can be regained] a = a^b = 6^12 = 10 [we regain the original b] in general: a = A b = B a = a^b = A^B = T [we lose A, but it can be regained] b = a^b = T^B = A [we regain A, we lose B, but it can be regained] a = a^b = T^A = B [we regain B] note: (A^B)^B = A note: (A^B)^A = B
The confusion: the pros and cons of zero-based and one-based indexes. The confusion: why do arrays start at zero in many programming languages? zero-based versus one-based arrays [1-based wins] First element index: 0-based: 0 1-based: 1 [1-based is common in maths, the alphabet (1-26), everyday life] [1-based wins] Last element index: 0-based: array.length - 1 [more verbose] 1-based: array length [simpler] [1-based wins] IndexOf not found indicator: 0-based: -1 [since -1 is truthy, requires separate IndexOf/Contains methods] 1-based: 0 [since 0 is falsy, IndexOf/Contains can be the same method] [since -1 is negative, a 0-based IndexOf requires a signed integer (with half of the number range wastefully unused)] [draw] Is key valid? 0-based: (n >= 0) && (n < array.length) 1-based: (n > 0) && (n <= array.length) [it's arguable which of these is faster, or if there is a speed difference] [0-based wins] Modulo: 0-based: start at 0 [modulo values start at 0] 1-based: start at 1 [however, 1-based modulo arithmetic is so useful, a separate function may be in order] Author's view I prefer 1-based for almost everything. I'm used to it from mathematics, and it's more natural... An unfortunate consequence of the prevalence of 0-based indexing: referring to the 'first parameter' in a function call, or the 'first bit' in a byte, can be ambiguous. Tuples often use 1-based indexing, an admission of their utility. So that isn't to say 1 or 0 is better per se, just that, our whole societal notion of indexing starts at 1, and that any deviation from this is likely to cause ambiguity/confusion/bugs. Perhaps it's true that many programmers *never* fully adjust to using 0-based indexes, and have to actively/deliberately translate indexes in their head. Look at these columns: A D G B E H C F I In which column are D, E and F? Did you say column 2? (1-based.) Surely you didn't say column 1 (0-based). 0-based arrays are sometimes argued for in terms of offsets. E.g. for an array of 4-byte items: Item 0 starts at 4*0 bytes. Item 1 starts at 4*1 bytes. Etc. But equally, you could say that between 4*0 and 4*1, is item 1. And that between 4*1 and 4*2, is item 2. Neither convention has a particular advantage here. 0-based arrays are useful for modulo arithmetic. However, I have thought it worthwhile to have a 1-based modulo function. 0-based / -1-based / -2-based arrays etc are useful in order to have spare columns/rows, in 2D arrays, for use in algorithms. Comments on the Dijkstra article: 'Why numbering should start at zero' Dijkstra considers how best to express '2, 3, ..., 12' as a range. Which of these 4 conventions: a) 2 ≤ i < 13 b) 1 < i ≤ 12 c) 2 ≤ i ≤ 12 d) 1 < i < 13 He mentions that for conventions a and b: the length (11) equals the bound difference 13-2 and 12-1. (A valid but perhaps relatively minor argument.) He also mentions that for conventions a and b, inclusive start to exclusive end, gives you arrays that can be neatly concatenated, and that the same is true for exclusive end to inclusive start. (This is an interesting mathematical observation, but it's probably fair to say that it never, or almost never, arises in programming, simple or complex.) He dislikes the initial <, of conventions b and d, on aesthetic grounds. (Aesthetics can be a valid argument, even if it is difficult to pin down logically, what is or isn't aesthetically pleasing.) (Although 'aesthetic' justifications can often be pinned down, e.g. the bouba-kiki effect.) He mentions that Mesa has special notations for all 4 conventions, stating 3 of them have caused mistakes. (I would counter that bad notations can ruin *any* concept, and have often done so, and that *until* there is a good notation, fields and concepts often remain a relative failure. Some examples might include, numbers before zero, mathematics before operator symbols, algebra before using letters as unknowns, and current calculus notations. The many symbols of the International Phonetic Alphabet, and of the APL programming language, make them difficult to work with. A quote, by Laplace, I often consider: 'A good notation is half the battle in mathematics.') The Mesa programming language makes use of the classical mathematical notations of round brackets for exclusive, and square brackets for inclusive e.g.: a) 2 ≤ i < 13 equivalent to [2..13) b) 1 < i ≤ 12 equivalent to (1..12] c) 2 ≤ i ≤ 12 equivalent to [2..12] d) 1 < i < 13 equivalent to (1..13) Although the manual uses '..', whereas mathematics uses a comma. If this was what Dijkstra was referring to, indeed, that *is* hard to use, even when you know which way round is correct. But a superior notation may be easier to use, e.g. this (.. and ..< from Kotlin, I added <.. and <..<): a) 2 ≤ i < 13 equivalent to 2..<13 b) 1 < i ≤ 12 equivalent to 1<..12 c) 2 ≤ i ≤ 12 equivalent to 2..12 d) 1 < i < 13 equivalent to 1<..<13 He states that if convention a is preferred, an N-length array has: 1-based indexes: 1 ≤ i < N+1 0-based indexes: 0 ≤ i < N And that the latter is more pleasing. (But of course, if convention d is preferred, an N-length array has: 1-based indexes: 1 ≤ i ≤ N 0-based indexes: 0 ≤ i ≤ N-1 So the argument only works within a certain context.) The article mainly considers range boundaries, and doesn't really address 0-based indexes. Source: E.W. Dijkstra Archive: Why numbering should start at zero (EWD 831)
The confusion: giving each character an index, versus giving the imaginary line between two characters an index. E.g. for text in an Edit control, on Windows PCs, we refer not to characters, but to the thin spaces in-between them. I.e. when you use a text editor, the flashing line (caret), goes between characters. If you hold shift, you can add characters to a selection, between an anchor (starting point) and active point (caret). E.g. character indexing:
a b c d e 0 1 2 3 4 5 the thin spaces between chars, where the caret goes 0 1 2 3 4 char index 0-based 1 2 3 4 5 char index 1-based
To refer to the spaces between characters, starting at 0, before the first character, seems sensible. To refer to the individual characters themselves, starting at 1 (1-based indexes) is more common in everyday life, starting at 0 (0-based indexes) is more common in programming. Author's view I personally use 1-based indexes wherever possible, and write functions do to so, where programming languages lack them. It's more intuitive, and prevents bugs.
The confusion: substring functions work differently in different programming languages. It's worth being aware that substr/substring functions may work differently across programming languages. Some use the convention 'start, end', i.e. index of first char inclusive, to index of last char exclusive. Some use the convention 'start, length', i.e. index of first char inclusive, and count of chars.
The confusion: XYRB coordinates are simpler to use, if you adjust them by one. If I consider the top-left 10x10 pixels of the screen: the top-left pixel is at (0,0), the bottom-left pixel is at (9,9). The width is 10, and the height is 10. Thus the XYWH coordinates are: 0, 0, 10, 10. (Top, Left, Width, Height.) You might think the XYRB coordinates are: 0, 0, 9, 9. (Top, Left, Right, Bottom.) But in fact, the XYRB coordinates are: 0, 0, 10, 10 as well. By convention, in XYRB coordinates we add 1 to both the R and B values. If we consider the square to the right of our initial square: XYWH: 10, 0, 10, 10. XYRB: 10, 0, 20, 10. The real RB is (19,9), not (20,10). This adjustment of RB, by (1,1), makes it much much easier to work with values. Some of the reasons for this are outlined in a The Old New Thing article: From The Old New Thing: Endpoint-exclusive RECTs and lines are much easier to work with. For example, the width of a rectangle is right - left, and its height is bottom - top. If rectangles were endpoint-inclusive, then there would be annoying +1’s everywhere. End-point exclusive rectangles also scale properly. For example, suppose you have two rectangles (0,0)-(100,100) and (100,100)-(200,200). These two rectangles barely touch at the corner. Now suppose you magnify these rectangles by 2, so they are now (0,0)-(200,200) and (200,200)-(400,400). Notice that they still barely touch at the corner. Also the length of each edge doubled from 100 to 200. Now suppose endpoints were inclusive, so the two rectangles would be (0,0)-(99,99) and (100,100)-(199,199). Now when you double them, you get (0,0)-(198,198) and (200,200)-(398,398). Source: Why are RECTs endpoint-exclusive? - The Old New Thing This also means you have to be careful to avoid an off-by-one error, when considering whether two areas overlap.
The confusion: the A-Z, AA-ZZ, AAA-ZZZ naming system, is not base 26, and is not base 27. What is the logic for converting column name letters to/from integers? An initial guess An initial guess that the A-Z system may be base 26 or base 27, is wrong. We can use A-D, AA-DD, AAA-DDD, to consider the logic: If we say A-D represents 0-3. Base 4.
A 0 B 1 C 2 D 3 AA 00 AB 01 AC 02 AD 03 [we're getting the repetition of 0/1/2/3] BA 10 BB 11 BC 12 BD 13
Base 4 generates too few numbers, so we get repetitions. If we say A-D represents 1-4. Base 5.
A 1 B 2 C 3 D 4 AA 11 AB 12 AC 13 AD 14 BA 21 BB 22 BC 23 BD 24 [where is 10? 20?]
Base 5 generates too many numbers, so we get missing numbers. Summary for A-D system
column names (A-D): A-D 4 = 4 AA-DD 4*4 = 16 AAA-DDD 4*4*4 = 64 base 4: 1-3 4**1 - 4**0 = 3 10-33 4**2 - 4**1 = 12 100-333 4**3 - 4**2 = 48 base 5: 1-4 5**1 - 5**0 = 4 10-44 5**2 - 5**1 = 20 100-444 5**3 - 5**2= 100
Summary for A-Z system Note: base 26 uses digits 0-9, then A-P. P is the 16th letter. Note: base 27 uses digits 0-9, then A-Q. Q is the 17th letter.
column names (A-Z): A-Z 26 = 26 AA-ZZ 26*26 = 676 AAA-ZZZ 26*26*26 = 17576 base 26: 1-P 26**1 - 26**0 = 25 10-PP 26**2 - 26**1 = 650 100-PPP 26**3 - 26**2 = 16900 base 27: 1-Q 27**1 - 27**0 = 26 10-QQ 27**2 - 27**1 = 702 100-QQQ 27**3 - 27**2 = 18954
Note: here's a table for base 10, which should be more intuitive to understand:
base 10: 1-9 10**1 - 10**0 = 9 [9 1-digit numbers] 10-99 10**2 - 10**1 = 90 [90 2-digit numbers] 100-999 10**3 - 10**2 = 900 [900 3-digit numbers]
Some code Here's some AutoHotkey v2 code for column name letters to integers: For each letter, get the 1-based index (1-26), and multiply it by a power of 26. Sum those values.
;note: A_Index starts at 1, it increments each iteration: vNum := 0 vLen := StrLen(vLet) Loop vLen vNum += (Ord(SubStr(vLet, vLen-A_Index+1, 1))-64) * (26**(A_Index-1))
Here's some AutoHotkey v2 code for integers to column name letters: Keep applying mod 26 to the number, to generate the next letter to the left. Stop when the number is 0.
vLet := "" while vNum { vLet := Chr(65+Mod(vNum-1, 26)) . vLet vNum := (vNum-1) // 26 }
The confusion: describing the path, visiting each cell in a rectangle. The confusion: useful terminology that is not that well-known, 'by columns', 'by rows'. If you visit each cell in a range, in this order, that is 'by columns': ADG BEH CFI If you visit each cell in a range, in this order, that is 'by rows': ABC DEF GHI See also (for a code example): Section: Traversal: XY Coordinates v. RC Coordinates
The confusion: nested loop, which way round, iterating through each cell in a rectangle. In programming, data is often stored as rows, and within each row, cells. E.g. data: ABC DEF GHI Using a 1-based 2D array: We can refer to cells like so: 'myarray[R, C]'. I.e. '[row number, column number]'. E.g. myarray[1, 1] is A. E.g. myarray[1, 3] is C. E.g. myarray[3, 1] is G. And when iterating through a 2D array, we normally iterate through the rows, and within that, through the cells, e.g. JavaScript code:
//iterating through a 2D array: oArray = [["A","B","C"],["D","E","F"],["G","H","I"]]; for (const oRow of oArray) { for (const vCell of oRow) { console.log(vCell); //prints A to I in alphabetical order } } console.log(oArray[0][0]); //A console.log(oArray[0][2]); //C console.log(oArray[2][0]); //G console.log(oArray[2][2]); //I
Note: (R,C): R indicates vertical position, C indicates horizontal position. This is in contrast to (X,Y) where X indicates horizontal position, Y indicates vertical position. Note: in C++ for example, 2D arrays are often stored in memory like so: ABC|DEF|GHI
The confusion: the order for iterating through each node in a tree, what are the rules, which is which. (BFS is generations, DFS is line of succession.) The confusion: DFS is often done via recursion. How to do it iteratively/recursively? The confusion: BFS is typically not done via recursion. How to do it iteratively? The confusion: a common variant of DFS exists, that uses DFS, but lists results more like BFS. The confusion: writing nested array one-liners, and doing 'flattening', is akin to DFS. Two common ways to iterate through a tree are breadth-first search and depth-first search. Breadth-first search: Breadth-first, i.e. elements are checked row-by-row. E.g. algorithm: search the elements of a tree for a needle: Create a deque (double-ended queue) containing the first element in the tree. Loop { If the stack is empty, exit the loop. Remove the first item from the stack. (Check if the element's value matches the needle. If it matches, exit the loop.) Add all of its children to the end of the stack. } Depth-first search: Depth-first, i.e. elements are checked as deeply possible straightaway. E.g. along: search the elements of a tree for a needle: Visit the first element. Loop { (Check if the element's value matches the needle. If it matches, exit the loop.) If the element has a child, that hasn't been visited yet, visit it. Else, if the current element has no parent, exit the loop. Else, backtrack, return to the parent element. } If looking at a family tree: Breadth-first search is like scanning left-to-right, line-by-line, where each line is a generation. Depth-first search is equivalent to the line of succession in a monarchy. Here are two diagrams comparing the two approaches. The increasing indexes represent the order in which each node is first visited. Breadth-first search: Depth-first search: If we create a node tree like so, we get the following results:
A B C D E F G H I J K L M NOP QRS TUV WXY Z-- --- --- --- --- node paths (BFS order): A A/B A/C A/D A/B/E A/B/F A/B/G A/C/H A/C/I A/C/J A/D/K A/D/L A/D/M A/B/E/N A/B/E/O A/B/E/P A/B/F/Q A/B/F/R A/B/F/S A/B/G/T A/B/G/U A/B/G/V A/C/H/W A/C/H/X A/C/H/Y A/C/I/Z node paths (DFS variant order): e.g. AutoHotkey Loop Files file search on Windows PCs (using FindFirstFile/FindNextFile internally): A A/B A/C A/D A/B/E A/B/F A/B/G A/B/E/N A/B/E/O A/B/E/P A/B/F/Q A/B/F/R A/B/F/S A/B/G/T A/B/G/U A/B/G/V A/C/H A/C/I A/C/J A/C/H/W A/C/H/X A/C/H/Y A/C/I/Z A/D/K A/D/L A/D/M node paths (DFS order): e.g. RegEdit, export reg file, on Windows PCs: A A/B A/B/E A/B/E/N A/B/E/O A/B/E/P A/B/F A/B/F/Q A/B/F/R A/B/F/S A/B/G A/B/G/T A/B/G/U A/B/G/V A/C A/C/H A/C/H/W A/C/H/X A/C/H/Y A/C/I A/C/I/Z A/C/J A/D A/D/K A/D/L A/D/M Note: if the 4-level entries are removed, then the BFS (yes, B) and DFS variant orders above are identical. Whereas the DFS and DFS variant orders remain distinct. Another set of examples: Breadth-first search: (which in this case matches the depth-first search variant:) A1 A1\B1 A1\B2 A1\B3 A1\B1\C1 A1\B1\C2 A1\B1\C3 A1\B2\C1 A1\B2\C2 A1\B2\C3 A1\B3\C1 A1\B3\C2 A1\B3\C3 Depth-first search: A1 A1\B1 A1\B1\C1 A1\B1\C2 A1\B1\C3 A1\B2 A1\B2\C1 A1\B2\C2 A1\B2\C3 A1\B3 A1\B3\C1 A1\B3\C2 A1\B3\C3
Below is JavaScript code that demonstrates BFS, DFS and DFS variant:
//============================== //JavaScript code: //breadth-first search / depth-first search: //BFS / DFS / DFS variant algorithm demos: console.log("BFS / DFS / DFS variant algorithm demos:"); console.log("note: nodes in parentheses are visited but not logged:"); console.log(); //node tree: // A // B C D // E F G H I J K L M //NOP QRS TUV WXY Z-- --- --- --- --- //summary (3 results): //BFS: ABCDEFGHIJKLMNOPQRSTUVWXYZ //DFS: ABENOPFQRSGTUVCHWXYIZJDKLM //DFS variant: ABCDEFGNOPQRSTUVHIJWXYZKLM //summary (6 algorithms): //breadth-first search (BFS) iterative: //depth-first search (DFS) iterative: //depth-first search (DFS) recursive: //depth-first search (DFS) *variant* recursive: //... note: uses DFS, but returns items in a slightly different order: //depth-first search (DFS) iterative (using indexes): //depth-first search (DFS) *variant* iterative (using indexes): //... note: uses DFS, but returns items in a slightly different order: //============================== //preparing nodes A-Z: A={};B={};C={};D={};E={};F={};G={};H={};I={};J={}; K={};L={};M={};N={};O={};P={};Q={};R={};S={};T={}; U={};V={};W={};X={};Y={};Z={}; vOrd = 64; for (const oNode of [A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z]) { oNode.name = String.fromCharCode(++vOrd); oNode.children = []; } A.children = [B,C,D]; B.children = [E,F,G]; C.children = [H,I,J]; D.children = [K,L,M]; E.children = [N,O,P]; F.children = [Q,R,S]; G.children = [T,U,V]; H.children = [W,X,Y]; I.children = [Z]; A.parent = null; for (const oNode of [A,B,C,D,E,F,G,H,I]) { for (const oChild of oNode.children) oChild.parent = oNode; } vSep = "/"; for (const oNode of [A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z]) oNode.path = NodeGetPath(oNode, vSep); function NodeGetPath(oNode, vSep="/") { vPath = oNode.name; while (oNode.parent !== null) { vPath = oNode.parent.name + vSep + vPath; oNode = oNode.parent; } return vPath; } //============================== console.log("breadth-first search (BFS) iterative:"); oRet = []; oStack = [A]; while (oStack.length) { oNode = oStack.shift(); //removes first value from array oRet.push(oNode.name); oStack.push(...oNode.children); } console.log(oRet.join("")); console.log(); //============================== console.log("depth-first search (DFS) iterative:"); oRet = []; oStack = [A]; while (oStack.length) { oNode = oStack.shift(); //removes first value from array oRet.push(oNode.name); oStack.unshift(...oNode.children); //insert values at start of array } console.log(oRet.join("")); console.log(); //============================== console.log("depth-first search (DFS) recursive:"); oRet = []; DFS(A); console.log(oRet.join("")); console.log(); function DFS(oNode) { oRet.push(oNode.name); for (const oChild of oNode.children) DFS(oChild); } //============================== console.log("depth-first search (DFS) *variant* recursive:"); console.log("note: uses DFS, but returns items in a slightly different order:"); //e.g. used by AutoHotkey Loop Files file search: oRet = []; oRetX = []; oRet.push(A.name); oRetX.push(A.name); DFSVariant(A); console.log(oRet.join("")); console.log(oRetX.join("")); console.log(); function DFSVariant(oNode) { oRetX.push("(" + oNode.name + ")"); for (const oChild of oNode.children) { oRet.push(oChild.name); oRetX.push(oChild.name); } for (const oChild of oNode.children) DFSVariant(oChild); } //============================== //DFS iterative, using 1 index per level rather than pushing all children: console.log("depth-first search (DFS) iterative (using indexes):"); oRet = []; oRetX = []; oStack = [[A, 0]]; vLevel = 0; vDrn = 1; //i = 0; while (oStack.length) { //console.log([++i, vLevel]); oNode = oStack[vLevel][0]; vIndex = oStack[vLevel][1]; if (vDrn == 1) { oRet.push(oNode.name); oRetX.push(oNode.name); } else oRetX.push("(" + oNode.name + ")"); if (vIndex < oNode.children.length) { oStack[vLevel][1]++; vDrn = 1; vLevel++; oStack.push([oNode.children[vIndex], 0]); } else { oStack.pop(); vDrn = -1; vLevel--; } } console.log(oRet.join("")); console.log(oRetX.join("")); console.log(); //============================== console.log("depth-first search (DFS) *variant* iterative (using indexes):"); console.log("note: uses DFS, but returns items in a slightly different order:"); //e.g. used by AutoHotkey Loop Files file search: oRet = []; oRetX = []; oStack = [[A, 0]]; vLevel = 0; vDrn = 1; //i = 0; oRet.push(oStack[0][0].name); oRetX.push(oStack[0][0].name); while (oStack.length) { //console.log([++i, vLevel]); oNode = oStack[vLevel][0]; vIndex = oStack[vLevel][1]; oRetX.push("(" + oNode.name + ")"); if (vDrn == 1) { for (const oChild of oNode.children) { oRet.push(oChild.name); oRetX.push(oChild.name); } } if (vIndex < oNode.children.length) { oStack[vLevel][1]++; vDrn = 1; vLevel++; oStack.push([oNode.children[vIndex], 0]); } else { oStack.pop(); vDrn = -1; vLevel--; } } console.log(oRet.join("")); console.log(oRetX.join("")); console.log(); //================================================== console.log("node paths (BFS order):"); console.log([A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z].map(v=>v.path).join("\n")); console.log(); console.log("node paths (DFS variant order):"); console.log([A,B,C,D,E,F,G,N,O,P,Q,R,S,T,U,V,H,I,J,W,X,Y,Z,K,L,M].map(v=>v.path).join("\n")); console.log(); console.log("node paths (DFS order):"); console.log([A,B,E,N,O,P,F,Q,R,S,G,T,U,V,C,H,W,X,Y,I,Z,J,D,K,L,M].map(v=>v.path).join("\n")); console.log(); //============================== console.log("THE END"); //==============================
Flattening arrays, e.g. via JavaScript's flat method: When you write a nested array one-liner, and then 'flatten' it, it's akin to DFS: I.e. it's as if you traversed the elements in DFS order, and collected them as a 1-dimensional array. E.g. ["A", ["B", ["E", "F", "G"], "C", ["H", "I", "J"], "D", ["K", "L", "M"]]].flat(2) Becomes: ["A", "B", "E", "F", "G", "C", "H", "I", "J", "D", "K", "L", "M"]
The confusion: the difference between reduce left and reduce right. The confusion: how reduce right can be done via reduce left. Reduce functions take an array, iterate through the values, and return a value. Classic use cases for reduce functions are sum and product. Reduce left Reduce left is equivalent to applying a left-associative operator to an array: E.g. apply + to [1, 2, 3, 4]: This gives: (((1 + 2) + 3) + 4) This is equivalent to: acc = 1 acc = acc + 2 acc = acc + 3 acc = acc + 4 Note: where acc is the accumulator. E.g. apply add to [1, 2, 3, 4]: This gives: add(add(add(1, 2), 3), 4) This is equivalent to: acc = 1 acc = add(acc, 2) acc = add(acc, 3) acc = add(acc, 4) Reduce right Reduce right is equivalent to applying a right-associative operator to an array: E.g. apply ** to [1, 2, 3, 4]: This gives: (1 ** (2 ** (3 ** 4))) This is equivalent to: acc = 4 acc = 3 ** acc acc = 2 ** acc acc = 1 ** acc E.g. apply pow to [1, 2, 3, 4]: This gives: pow(1, pow(2, pow(3, 4))) This is equivalent to: acc = 4 acc = pow(3, acc) acc = pow(2, acc) acc = pow(1, acc) Reduce right via reduce left To perform reduce right, when only a reduce left function is available... Reverse the array. Reverse the parameter order in the function. E.g. apply div to [1, 2, 3, 4]: where div = (a, b) => a / b where rdiv = (a, b) => b / a Reduce left: div(div(div(1, 2), 3), 4) Equivalent to: (((1 / 2) / 3) / 4) = 1/24 = 0.041667 (6dp) Reduce right: div(1, div(2, div(3, 4))) Reduce left with reversed array and reversed param order (reduce right): rdiv(rdiv(rdiv(4, 3), 2), 1) Both equivalent to: (1 / (2 / (3 / 4))) = 3/8 = 0.375 And for completeness... Reduce left with reversed array: div(div(div(4, 3), 2), 1) Equivalent to: (((4 / 3) / 2) / 1) = 2/3 = 0.666667 (6dp) Reduce left with reversed param order: rdiv(rdiv(rdiv(1, 2), 3), 4) Equivalent to: div(4, div(3, div(2, 1))) Equivalent to: (4 / (3 / (2 / 1))) = 8/3 = 2.666667 (6dp)
The confusion: the difference between reduce without seed and reduce with seed. Reduce-without-seed functions reduce the values in an array. Reduce-with-seed functions reduce the values in an array, and a seed value. Reduce-without-seed functions receive 2 values of the same type. Reduce-with-seed functions can receive 2 values of different type. (In some languages, reduce-without-seed functions can also receive 2 values of different type.) Reduce-with-seed functions take an array, iterate through the values, and return a value. The key benefit of having the seed is to allow the return value to be of a different type from the array values. Also, having a seed allows you to specify an initial accumulator value, that is independent of the array values e.g. 0 or "". Possible use cases for reduce-with-seed functions are: count ints/floats/strings that match certain criteria, concatenate integers as a string, get the length of strings as an int. Some example code in JavaScript:
//E.g. count ints divisible by 3 in [1, 2, 3, 4], with seed 0: myfunc = (acc, n) => acc + (n%3==0?1:0); acc = 0; acc = myfunc(acc, 1); acc = myfunc(acc, 2); acc = myfunc(acc, 3); acc = myfunc(acc, 4); console.log(acc); console.log([1,2,3,4].reduce(myfunc,0)); //1 //E.g. concatenate [1, 2, 3, 4], with seed "" (no pad char): myfunc = (acc, n) => "" + acc + n; acc = ""; acc = myfunc(acc, 1); acc = myfunc(acc, 2); acc = myfunc(acc, 3); acc = myfunc(acc, 4); console.log(acc); console.log([1,2,3,4].reduce(myfunc,"")); //1234 //E.g. concatenate [1, 2, 3, 4], with seed "" (with pad char comma): myfunc = (acc, n) => "" + acc + (acc==""?"":",") + n; acc = ""; acc = myfunc(acc, 1); acc = myfunc(acc, 2); acc = myfunc(acc, 3); acc = myfunc(acc, 4); console.log(acc); console.log([1,2,3,4].reduce(myfunc,"")); //1,2,3,4 //E.g. get the combined string length of ["a", "bb", "ccc", "dddd"], with seed 0: myfunc = (acc, n) => acc + ("" + n).length; acc = 0; acc = myfunc(acc, "a"); acc = myfunc(acc, "bb"); acc = myfunc(acc, "ccc"); acc = myfunc(acc, "dddd"); console.log(acc); console.log(["a","bb","ccc","dddd"].reduce(myfunc,0)); //10
The confusion: no-one ever outlines, in one place, the differences between unary/binary/ternary operators. Here is a summary of unary / binary / ternary operators: Note: some languages lack these operators, or use them for different roles: Note: un-/bi-/ter- indicating 1/2/3.
Unary prefix: op var [1 prefix operator] E.g. ++var [pre-increment] [increment, then return value] E.g. --var [pre-decrement] [decrement, then return value] E.g. +var [unary plus] [in some languages this does bool to int] E.g. -var [unary minus] E.g. !var [logical not] Unary postfix: var op [1 postfix operator] E.g. var++ [post-increment] [return value, then increment] E.g. var-- [post-decrement] [return value, then decrement] Binary: var1 op var2 [1 infix operator] E.g. var1 + var2 [add / append] E.g. var1 || var2 [logical or] E.g. var1 && var2 [logical and] E.g. var1 = var2 [assignment] E.g. var1 += var2 [compound assignment: increment / append] [similar to pre-increment] Ternary: var1 op1 var2 op2 var3 [2 infix operators] E.g. var1 ? var2 : var3 [if var1 is truthy, return var2, else var3] E.g. var1 <= var2 <= var3 [if var2 between var1 inclusive and var3 inclusive]
Note: 'var1 ? var2 : var3' is a ternary operator, sometimes called the ternary conditional operator, or ternary if. It is often referred to as 'the' ternary operator, since it the only ternary operator most programming languages have. Author's view While constructs such as 'var1 <= var2 <= var3', chained comparisons, may appear convenient, there are many serious disadvantages to them. Thus I recommend not adding them to programming languages. And to prefer functions for common use cases such as 'Between' and 'BetweenUntil'. Here is a video with some examples of the dangers of chained comparisons in Python: From mCoding (duration 11:26): Source: Why I don't like Python's chained comparisons - YouTube
The confusion: '+' has 2 roles in most programming languages, this causes bugs. E.g. JavaScript:
n = 1; s = "1"; console.log(n + s); //11 (not 2) console.log(s + n); //11 (not 2) console.log(n + n); //2 console.log(s + s); //11
Author's view Making + only do addition, and never string concatenation increases safety. In PHP and AutoHotkey, + does addition, . does string concatenation. And likewise for the += and .= compound assignment operators.
The confusion: the ternary operator compares truthy/falsy, the null-coalescing operator compares non-null/null. A summary:
a ? b : c [if a is truthy, return b, else c] [ternary operator] a ?: b [if a is truthy, return a, else b] [equivalent to: a ? a : b] [short ternary operator] ['falsy-coalescing operator'] a ?? b [if a is non-null, return a, else b] [null-coalescing operator] E.g. var := a ? b : c and a ? (var := b) : (var := c) are both equivalent to: if (a) var := b else var := c E.g. var := "The count is " . count . " item" . (count==1?"":"s") . "." is equivalent to: if (count == 1) var := "The count is " . count . " item." else var := "The count is " . count . " items."
The short-ternary operator is commonly used to replace 0, or perhaps a blank string, with an alternative value. The null-coalescing operator is used to replace null, with an alternative value. Note: languages differ regarding what is regarded as truthy. See below: 'Truthy v. Falsy'. Note: in many languages 'a ? a : b' and 'a ?: b' would have one key difference... Here, 'a ?: b', 'a' could take any value. Here, 'a ? a : b', in many languages, the first 'a' must be a bool, i.e. true or false, to be used as the condition in a ternary operator.
The confusion: low-precedence operators have high staying power. High-precedence operators are high priority, evaluate them first. Low-precedence operators have high durability, they persist the longest. High-precedence operators, happen first, like a weak barrier. Low-precedence operators, happen last, like a brick wall, like a strong barrier. E.g. + has higher precedence, happens faster: || is more durable, is more of a wall:
console.log(10 + 20 || 30 + 40); //30 console.log((10 + 20) || (30 + 40)); //30 console.log(10 + (20 || 30) + 40); //70
E.g. && versus ||: && has higher precedence, happens faster: || is more durable, more of a wall:
console.log(10 || 20 && 30); //10 console.log(10 || (20 && 30)); //10 console.log((10 || 20) && 30); //30
The confusion: in what order to perform a calculation. The confusion: how to add parentheses to a calculation. The confusion: unary minus appears to contradict BODMAS. In mathematics, the acronym BODMAS is taught: BODMAS: Brackets, Order, Division/Multiplication, Addition/Subtraction (Note: in UK English, brackets typically means round brackets.) (Note: in US English, brackets typically means square brackets.) It means do any calculations in brackets first, then orders (powers), then division/multiplication, then addition/subtraction. Note: to translate this into programming, one way to think of it, is like this: Apply 'Order, Division/Multiplication, Addition/Subtraction' to any parentheses that contain no nested parentheses, and repeat. How to add parentheses to a calculation E.g. 1 + 2 - 3 * 4 / 5 ** 6 A rule we might apply is, for each operator, go left-to-right (or right-to-left for right-associative operators), add parentheses. Orders first (powers are right associative, so go right-to-left): E.g. **: 1 + 2 - 3 * 4 / (5 ** 6) Then division/multiplication (go left-to-right): E.g. *: 1 + 2 - (3 * 4) / (5 ** 6) E.g. /: 1 + 2 - ((3 * 4) / (5 ** 6)) Then addition/subtraction (go left-to-right): E.g. +: (1 + 2) - ((3 * 4) / (5 ** 6)) Everything is now accounted for. Although, to keep the algorithm simple, we could add parentheses to the entire expression: E.g. -: ((1 + 2) - ((3 * 4) / (5 ** 6))) E.g. JavaScript:
console.log(1 + 2 - 3 * 4 / 5 ** 6); //2.999232 console.log((1 + 2) - ((3 * 4) / (5 ** 6))); //2.999232
See below for details on right-associativity: Operator Precedence: Right-Associative Operators. Unary minus appears to contradict BODMAS In Excel: '=-3^2' returns 9. '=0-3^2' returns -9. Note: where ^ means power of. (Note: ** is 'power of' in many programming languages.) Does '=-3^2' contradict BODMAS? It's doing subtraction, then order. The only thing that matters is Excel's operator precedence table, and if Excel is obeying that. Note: Excel may make a distinction between unary minus, e.g. '-3', and binary minus, e.g. '0-3'. The simplest answer may be this: It does not contradict BODMAS. If you see '-b' by itself, not part of 'a-b', then that should be thought of as '(-b)'... and then apply the BODMAS rules as per usual.
The confusion: how to add parentheses to a right-associative binary operator. E.g. division is left-associative: a/b/c = (a/b)/c. E.g. power is right-associative: a**b**c = a**(b**c). E.g. multiplication is both left-associative and right-associative: a*b*c = (a*b)*c = a*(b*c). E.g. assignment is right-associative: a=b=c = a=(b=c). I.e. a and b will be set to the same value as c. Since almost operators are left-associative, or both left-associative and right-associative, the key thing to be aware of, is which operators are right-associative only. Right-associative operators include: ** (power), and ? : (the ternary operator). E.g. JavaScript:
console.log(2 ** 2.5 ** 3); //50535.16432696741 console.log((2 ** 2.5) ** 3); //181.01933598375612 console.log(2 ** (2.5 ** 3)); //50535.16432696741
How to add parentheses to a right-associative binary operator As mentioned in 'Operator Precedence: BODMAS v. Unary Minus'. An approach to parsing expressions, and adding parentheses, is for each operator, in turn, by its precedence, to go left-to-right or right-to-left, depending on its associativity, and add parentheses.
The confusion: how to add parentheses to the ternary operator. The ternary conditional operator is right-associative. We go right-to-left and add parentheses to the first unaccounted for '?', and the first unaccounted for ':' to its right: For reference, various ternary operator one-liners with parentheses added:
the number of possible ternary operator lines, per operator pair count, matches the Catalan numbers: 1, 1, 2, 5, 14, 42, 132, 429, ... e.g.: a a ? b : c a ? b : (c ? d : e) a ? (b ? c : d) : e a ? b : (c ? d : (e ? f : g)) a ? b : (c ? (d ? e : f) : g) a ? (b ? c : d) : (e ? f : g) a ? (b ? c : (d ? e : f)) : g a ? (b ? (c ? d : e) : f) : g a ? b : (c ? d : (e ? f : (g ? h : i))) a ? b : (c ? d : (e ? (f ? g : h) : i)) a ? b : (c ? (d ? e : f) : (g ? h : i)) a ? b : (c ? (d ? e : (f ? g : h)) : i) a ? b : (c ? (d ? (e ? f : g) : h) : i) a ? (b ? c : d) : (e ? f : (g ? h : i)) a ? (b ? c : d) : (e ? (f ? g : h) : i) a ? (b ? c : (d ? e : f)) : (g ? h : i) a ? (b ? c : (d ? e : (f ? g : h))) : i a ? (b ? c : (d ? (e ? f : g) : h)) : i a ? (b ? (c ? d : e) : f) : (g ? h : i) a ? (b ? (c ? d : e) : (f ? g : h)) : i a ? (b ? (c ? d : (e ? f : g)) : h) : i a ? (b ? (c ? (d ? e : f) : g) : h) : i
Some examples of adding parentheses to ternary operator one-liners, step-by-step:
a ? b : c ? d : e a ? b : (c ? d : e) a ? b ? c : d : e a ? (b ? c : d) : e a ? b : c ? d : e ? f : g a ? b : c ? d : (e ? f : g) a ? b : (c ? d : (e ? f : g)) a ? b : c ? d ? e : f : g a ? b : c ? (d ? e : f) : g a ? b : (c ? (d ? e : f) : g) a ? b ? c : d : e ? f : g a ? b ? c : d : (e ? f : g) a ? (b ? c : d) : (e ? f : g) a ? b ? c : d ? e : f : g a ? b ? c : (d ? e : f) : g a ? (b ? c : (d ? e : f)) : g a ? b ? c ? d : e : f : g a ? b ? (c ? d : e) : f : g a ? (b ? (c ? d : e) : f) : g
The confusion: no-one ever outlines simply, these 4: C without R, C with R, P without R, P with R. Combinations is 'who's in the photo'. I.e. presence matters, but not order. Permutations is 'who's in the photo, and how are they arranged'. I.e. presence matters, and order matters. Repetitions means 'people can appear more than once'. Below are some examples using elements: a, b, c. Note: '(R)' is a custom notation I invented to denote 'with repetitions'. In column 2, we state the number of combinations/permutations, in column 3, we list them. E.g. '3C2', '3 choose 2', there are 3 elements, how many ways can we choose 2 of them, 'ab' is considered identical to 'ba'. E.g. '3P2', '3 permute 2', there are 3 elements, how many ways can we choose 2 of them, 'ab' is considered different to 'ba'.
3C1 3 a,b,c 3C1(R) 3 a,b,c 3P1 3 a,b,c 3P1(R) 3 a,b,c 3C2 3 ab,ac,bc 3C2(R) 6 aa,ab,ac,bb,bc,cc 3P2 6 ab,ac,ba,bc,ca,cb 3P2(R) 9 aa,ab,ac,ba,bb,bc,ca,cb,cc 3C3 1 abc 3C3(R) 10 aaa,aab,aac,abb,abc,acc,bbb,bbc,bcc,ccc 3P3 6 abc,acb,bac,bca,cab,cba 3P3(R) 27 aaa,aab,aac,aba,abb,abc,aca,acb,acc,baa,bab,bac,bba,bbb,bbc,bca,bcb,bcc,caa,cab,cac,cba,cbb,cbc,cca,ccb,ccc 3C4 0 (none) 3C4(R) 15 aaaa,aaab,aaac,aabb,aabc,aacc,abbb,abbc,abcc,accc,bbbb,bbbc,bbcc,bccc,cccc 3P4 0 (none) 3P4(R) 81 aaaa,aaab,aaac,aaba,aabb,aabc,aaca,aacb,aacc,abaa,abab,abac,abba,abbb,abbc,abca,abcb,abcc,acaa,acab,acac,acba,acbb,acbc,acca,accb,accc,baaa,baab,baac,baba,babb,babc,baca,bacb,bacc,bbaa,bbab,bbac,bbba,bbbb,bbbc,bbca,bbcb,bbcc,bcaa,bcab,bcac,bcba,bcbb,bcbc,bcca,bccb,bccc,caaa,caab,caac,caba,cabb,cabc,caca,cacb,cacc,cbaa,cbab,cbac,cbba,cbbb,cbbc,cbca,cbcb,cbcc,ccaa,ccab,ccac,ccba,ccbb,ccbc,ccca,cccb,cccc Formulas (mathematical style): nCk: n! / k!(n-k)! nCk(R): (k+n-1)! / k!(n-1)! nPk: n! / (n-k)! nPk(R): n**k Formulas (programming style): nCk: fact(n) / (fact(k)*fact(n-k)) nCk(R): fact(k+n-1) / (fact(k)*fact(n-1)) nPk: fact(n) / fact(n-k) nPk(R): n**k
Note: can get nCk by taking the items from nPk where the chars are in alphabetical order. Note: can get nCk(R) by taking the items from nPk(R) where the chars are in alphabetical order. Note: can get nPk by taking the items from nPk(R) that don't contain duplicate chars. Note: can get aP[k+1](R) by duplicating nPk(R) for each of a/b/c, prepending one of a/b/c to every item.
The confusion: ATan2 typically uses the parameter order 'y, x' (not 'x, y'). Note: the (y,x) parameter order matches the common calculation: atan(y/x). ATan versus ATan2 ATan2 can be used to calculate the angle between the x axis (horizontal axis), and (a line joining the origin to) a point. Where the angle returned is in the range: -pi < theta <= pi. Equivalent to the range: -180° < theta <= 180°. An ATan2 formula can be derived by adapting the ATan formula. Given 2 side lengths, one can calculate an angle size: theta = asin(opp/hyp) theta = acos(adj/hyp) theta = atan(opp/adj) Quadrants on a graph are typically labelled like so: 2 1 3 4 Quadrant 1: x>0, y>0. E.g. (1,1) appears in quadrant 1. Quadrant 2: x<0, y>0. E.g. (-1,1) appears in quadrant 2. Quadrant 3: x<0, y<0. E.g. (-1,-1) appears in quadrant 3. Quadrant 4: x>0, y<0. E.g. (1,-1) appears in quadrant 4.
Obtaining ATan2 via ATan: At the origin: atan2(y,x) = undefined For (x=0,y>0): atan2(y,x) = 90° For (x=0,y<0): atan2(y,x) = -90° For Q1, Q4, (x>0,y=0): atan2(y,x) = atan(y/x) For Q2, (x<0,y=0): atan2(y,x) = atan(y/x) + 180° For Q3: atan2(y,x) = atan(y/x) - 180° Note: For (x>0,y=0): atan2(y,x) = 0° For (x<0,y=0): atan2(y,x) = 180° Note: 90° = pi/2 radians 180° = pi radians
The confusion: what values are considered 'true' and 'false'. Note: languages differ regarding what is regarded as truthy. These values are often but not always regarded as falsy: false, 0, 0.0, -0.0, "", null. Another value to consider is NaN (not a number). In many languages '!!a' or '!(!a)' (logical not used twice), and 'a ? true : false' (ternary operator), can be used to determine whether a value is truthy or falsy. In many languages, e.g. Java and C#, where logical not, the ternary operator, and if-statements only accept booleans, it is not actually defined which values are considered truthy or falsy other than true and false. In some languages where arrays have default values, the default values may happen to correspond with falsy values. Note: SQL determines whether a string is truthy or not, by first checking if the start of the string looks numerical or not, and then checking the number for truthiness. Note: Ruby and Crystal regard false and nil as falsy, and everything else as truthy. E.g. 0 and "" are regarded as truthy. Note: The programming language cheat sheets provide code for an approximate 'IsTruthy' function. Mathematics - Programming Language Cheat Sheets - Bazzle
The confusion: 'log' works differently in different programming languages. For some languages, the log function uses base e (e=2.7182818284...), the natural logarithm. For some languages, the log function uses base 10. So in languages that have them, it is recommended to use ln (log base e), or log10 (log base 10) for safety. Another issue for log functions, is functions that accept a base, sometimes base is the 1st argument, sometimes the 2nd argument.
The confusion: in regular English, splicing is synonymous with joining. Not so in programming. The confusion: no-one ever explains that array splicing is: delete (0 or more values) at an index, then insert (0 or more values) at that index. In regular English, splicing is synonymous with joining. In programming, array splicing is typically: delete (0 or more values) at an index, then insert (0 or more values) at that index.
The confusion: in an array, if you copy an element to the desired index, and then remove the original, you may have an off-by-one error (i.e. the new element at the wrong place, or the wrong element removed). E.g. a 1-based array: ["A","B","C","D","E","F","G","H","I"]. 9 elements, the middle element is at index 5. To put B in the middle, would be: oArray.Move(2,5). To put H in the middle, would be: oArray.Move(8,5). One example use case is moving a video in a YouTube playlist. The YouTube API uses an ID to identify which video to move, and a 0-based index to specify the new location. Some example code using AutoHotkey: A possible implementation: remove the original, insert it back again This is a really simple approach.
E.g. put B in the middle: vValue := oArray.RemoveAt(2) (RemoveAt reindexes the array so there are no gaps.) oArray.InsertAt(5, vValue) E.g. put H in the middle: vValue := oArray.RemoveAt(8) (RemoveAt reindexes the array so there are no gaps.) oArray.InsertAt(5, vValue)
A possible implementation: create a copy, remove the original This approach has potential off-by-one errors.
E.g. put B in the middle: The source index is before the destination index. I.e. 2 is before 5. So if we inserted at 5, and then removed at 2, the copied element would end up at 4, not 5. I.e. ABCDEFGHI to ABCD*B*EFGHI (new B at 5) to ACD*B*EFGHI (new B at 4). So we insert not at 5, but at 6. I.e. ABCDEFGHI to ABCDE*B*FGHI (new B at 6) to ACDE*B*FGHI (new B at 5). oArray.InsertAt(6, oArray[2]) oArray.RemoveAt(2) (RemoveAt reindexes the array so there are no gaps.) In general: if (vKeyNew > vKeyOld) { oArray.InsertAt(vKeyNew+1, oArray[vKeyOld]) oArray.RemoveAt(PodOld) } E.g. put H in the middle: The source index is after the destination index. I.e. 8 is after 5. So if we inserted at 5, and then removed at 8, we would remove the wrong element. The original element would have moved from position 8 to 9, after inserting the copy at 5. I.e. ABCDEFGHI to ABCD*H*EFGHI (G at 8, old H at 9) to ABCDHEFHI (G at 8 removed). So we remove at 9, not at 8. I.e. ABCDEFGHI to ABCD*H*EFGHI (G at 8, old H at 9) to ABCDHEFGI (H at 9 removed). oArray.InsertAt(5, oArray[8]) oArray.RemoveAt(9) (RemoveAt reindexes the array so there are no gaps.) In general: if (vKeyNew <= vKeyOld) { oArray.InsertAt(vKeyNew, oArray[vKeyOld]) oArray.RemoveAt(PodOld+1) } Note: <= used rather than <. Explained below. E.g. put E in the middle: I.e. ABCDEFGHI to ABCD*E*EFGHI (E at 5, old E at 6) to ABCDEHFHI (E at 6 removed). This is equivalent to the 2nd example, hence 'if (vKeyNew <= vKeyOld)', not 'if (vKeyNew < vKeyOld)'.
The confusion: the logic of object reference counting. The principle of reference counting is essentially: when you start using an object, increment the reference count, when you don't need to use the object any more, decrement the reference count. Thus when the object is in no longer in use, the reference count can be checked, and the object discarded.
The confusion: when backslash does/doesn't need escaping. In the vast majority of programming languages, '\' is used as an escape character. E.g. '\\' for a literal backslash, e.g. '\t' for tab, '\r' for carriage return, '\n' for linefeed. E.g. Windows filenames use '\', so will need '\\'. In RegEx, '\' is used for escaping characters. So if storing a RegEx needle in a string, in a language that escapes '\', you may need to double escape. E.g. a literal string 'a\b', in a RegEx needle, would be 'a\\\\b'. WARNING: Some RegEx escape sequences are also built into programming languages. So '\t' for tab. RegEx receives a tab. And '\\t' for tab. RegEx receives a literal backslash and a t, which it interprets as a tab. These would both work in a string passed to RegEx. Note, in some languages, a special RegEx syntax is available, and '\' can be used, where '\\' would be needed in a string. MAJOR WARNING: Java replaces escaped characters, including in comments, before parsing the source code. E.g. myvar = \u0022hello\u0022 resolves to myvar = "hello". E.g. myvar = "a\u005C\u0022b" resolves to myvar = "a\"b". E.g. myvar = "a\u0022b" resolves to myvar = "a"b", which is invalid.
The confusion: 'is a string upper case', 2 definitions. 'IsUpper' commonly returns true if each char in a string is an upper-case letter. E.g. AutoHotkey: vIsMatch := RegExMatch(vText, "^[A-Z]*$") A different definition is: E.g. AutoHotkey: vIsMatch := (vText == StrUpper(vText)) Returns true if 'ToUpperCase' applied to a string leaves the string unmodified.
The confusion: 'whole', 2 definitions. 1) The entire haystack matches a RegEx needle. E.g. Swift's wholeMatch. 2) The haystack contains a literal needle, but it must be a whole word match, not a substring of a word. E.g. WordPad's Find dialog has 'Match whole word only'. E.g. a RegEx search with '\b' on both sides of a needle. E.g. 'abcde fghij klmno', 'abcde'/'fghij'/'klmno' are whole word matches, 'fgh'/'ghi'/'hij' are matches but not whole word matches. A simple definition would be: look for a literal needle string, if the string is found, check if the chars to the left and right of the potential match are non-letters, if both are non-letters, it is a whole word match. I.e. a search for '[NOT LETTER]needle[NOT LETTER]'.
The confusion: which century years are leap years. Century examples: Are leap: 1600, 2000, 2400. Aren't leap: 1700, 1800, 1900, 2100, 2200, 2300. To be a leap year, it must pass '4Y', and pass either '100N' or '400Y'. Where 4Y: means is divisible by 4. Where 100N: means is not divisible by 100. Where 400Y: means it is divisible by 100. E.g. 1900 fails '100N', and fails '400Y', so is not leap. E.g. 2000 fails '100N', but passes '400Y', so is leap. E.g. 2004 passes '100N', so is leap.
The confusion: different date systems e.g. UK/US. The UK uses dd/MM/yyyy. Small to big. The US uses MM/dd/yyyy. 2nd biggest, then 3rd biggest, then biggest. It's recommended to use yyyy-MM-dd, big to small, to be understood internationally. Note: HH:mm:ss uses big to small also. Note: yyyy-MM-dd HH:mm:ss, where everything is big to small, sorts correctly using an alphabetical string sort.
The confusion: when you 'lose an hour', what happens to that hour. E.g. in the UK and the US, 'spring forward, fall back' apply. In the spring, you lose an hour. E.g. in the UK, 0am (GMT) happens, then 2am (BST), 1am is skipped. In the fall (autumn), you gain an hour. E.g. in the UK, 1am (BST) happens, then another 1am (GMT) happens. One way of thinking about it, is two lines representing two different time zones, and you jump from one to the other.
The confusion: different approaches to adding/subtracting 'months' to a datestamp. If adding years: A simple approach is to simply add or subtract years. If the date is invalid, i.e. the day is 29/30/31, too many for the month, then subtract days until the date is valid. If adding months: If the number to add/subtract has an absolute value of 12 of more, then add/subtract years first. Then if the number of months would overflow into the next/previous year, then add/subtract a year, and adjust the month count. Add/subtract the remaining months. If the date is invalid, i.e. the day is 29/30/31, too many for the month, then subtract days until the date is valid. I.e. at most you subtract 3 days from the date, after having added/subtracted months and years. This algorithm is equivalent to the following: ● Microsoft Excel's EDATE function. ● Microsoft Excel VBA's DateAdd function (with 'm'/'yyyy' options). ● Microsoft Windows's SysDateTimePick32 controls. This algorithm differs from: ● JavaScript's setUTCMonth/setMonth methods.
The confusion: different approaches to counting the 'months' between 2 datestamps. A good approach to counting years, is to start off by subtracting the later year from the earlier year. Then check smaller date units as a tiebreaker. If the later year has an earlier date, then subtract 1 from the year count. Similar logic applies when counting months.
The confusion: which way round, e.g. whether to use + or -. The confusion: given the time in UTC, what is the local time. The confusion: given the local time, what is the UTC time. Consider New Year's Day. Different time zones move to New Year's Day, one by one. E.g. Sydney, then London, then New York. So Sydney is ahead of UTC: UTC+11:00 (+660 min, AEDT, DST) on Jan 1st, UTC+10:00 (+600 min, AEST, non-DST) on Jul 1st. So New York is behind UTC: UTC-05:00 (-300 min, EST, non-DST) on Jan 1st, UTC-04:00 (-240 min, EDT, DST) on Jul 1st. I.e. the time zones that experience New Year's Day earliest, have the largest positive offsets. Compared to UTC, Sydney is 'in the future', and New York is 'in the past'. The UK uses UTC+00:00 (+0 min, GMT, non-DST), and UTC+01:00 (+60 min, BST, DST). When London goes to BST, in March, DST starts, it is in the UTC+1 time zone. Which means, compared to UTC, it is 'in the future'. So during BST, if you have a UTC time, the UK time will be 1 hour greater, because the UK is 'in the future'. So during BST, if you have a UK time, the UTC time will be 1 hour lesser, because UTC is 'in the past'. E.g. 2006-05-04 03:02:01+00:00 UTC, is 04:02:01 in the UK. E.g. 2006-05-04 03:02:01+01:00 BST, is 02:02:01 UTC.
The confusion: multiple join types, some joins have multiple names. The confusion: which joins match which common set methods (i.e. intersection, union, difference, symmetric difference). The confusion: there is no simple join for difference or symmetric difference. If we consider a Venn diagram, with sets A and B, and intersection I:
AIB [where A = items unique to A, I = intersection (items present in both), B = items unique to B] 000 null set [empty set] 001 B difference A [unique to B] 010 A intersection B [present in both] [INNER JOIN] [JOIN] 011 B [RIGHT JOIN] [RIGHT OUTER JOIN] 100 A difference B [unique to A] 101 A symmetric difference B [unique to A, or unique to B, not present in both] [union subtract intersection] 110 A [LEFT JOIN] [LEFT OUTER JOIN] 111 A union B [FULL OUTER JOIN] [FULL JOIN]
Keywords Arguably, the 4 main JOIN keywords to know are: INNER: i.e. INNER JOIN [aka JOIN] [info from items that appear in both tables] LEFT: i.e. LEFT JOIN [aka LEFT OUTER JOIN] [left table plus added columns from right table] RIGHT: i.e. RIGHT JOIN [aka RIGHT OUTER JOIN] [right table plus added columns from left table] FULL: i.e. FULL OUTER JOIN [aka FULL JOIN] [info from items that appear in at least one table] SQLite/MySQL/PostgreSQL: Note: there is no such thing as 'OUTER JOIN', attempting to use it would trigger an error. SQLite/PostgreSQL: Note: 'FULL OUTER JOIN' and 'FULL JOIN' are equivalent, but the former appears more frequently in texts. Clauses SQLite/MySQL/PostgreSQL: These joins are typically performed with clauses. E.g. USING, ON, NATURAL. Note: 'USING (name)' and 'ON MyTable1.name = MyTable2.name' are equivalent. SQLite/MySQL/PostgreSQL: Cartesian products: if table 1 has n rows, and table 2 has m rows, the Cartesian product will have n x m rows. SQLite: If no clauses are used, then a Cartesian product is performed. Further keywords SQLite/MySQL/PostgreSQL: CROSS JOIN creates a Cartesian product. SQLite/MySQL/PostgreSQL: For creating unions, UNION, and UNION ALL are also available. UNION removes any duplicates. (It deduplicates.) UNION ALL keeps any duplicates. (It doesn't do a deduplicate.) SQLite/MySQL/PostgreSQL: A difference can be obtained via LEFT JOIN and ON/WHERE. SQLite/PostgreSQL: A symmetric difference can be obtained via a FULL OUTER JOIN and ON/WHERE. Links (SQLite): SQLite Syntax: join-operator SELECT Links (SQLite/MySQL/PostgreSQL prettify options): Objects - Programming Language Cheat Sheets - Bazzle Demo code Some demo code for JOINs:
-- ================================================== -- SQLite(/MySQL/PostgreSQL) demo code for JOINs: -- note: requires SQLite v3.39+ for RIGHT JOIN and FULL OUTER JOIN: -- note: see 'fix for MySQL' comments to run this in MySQL: -- note: see 'fix for PostgreSQL' comment to run this in PostgreSQL: -- ================================================== -- some notes on SQLite printing settings: -- note: .headers on: show header rows: -- note: (when header rows are visible:) SQLite prints values, e.g. ints/strings, as though they were a one-cell table, where the header is equal to the value: -- note: SQLite prints true/false/null as 1/0/(blank) respectively: -- note: .nullvalue NULL: sets null to be printed as 'NULL': -- note: mimic MySQL: .mode table, .nullvalue NULL [e.g. differences: spaces per column, left/centre/right column alignment, column names for values]: -- note: mimic PostgreSQL: .mode column [e.g. differences: left/centre/right column alignment, column names for values, no pipes between columns, no blank lines between items, true/false values]: -- ================================================== -- some notes on MySQL printing settings: -- note: AFAIK there is no way to change printing (prettify) options in-script: -- note: MySQL prints values, e.g. ints/strings, as though they were a one-cell table, where the header is equal to the value: -- note: MySQL prints true/false/null as 1/0/NULL respectively: -- ================================================== -- some notes on PostgreSQL printing settings: -- note: \set QUIET 1: removes notifications e.g. CREATE/INSERT: -- note: \pset tuples_only: removes header rows and footers: -- note: \pset footer 0: removes footers (row counts e.g. '(3 rows)'): -- note: \a: removes spaces, aka alignment, in tables (but also removes blank lines): -- note: AFAIK there is no way to remove header rows, but keep footers (row counts): -- note: PostgreSQL prints values, e.g. ints/strings, as though they were a one-cell table, where the header is equal to '?column?': -- note: PostgreSQL prints true/false/null as t/f/(blank) respectively: -- note: \pset null NULL: sets null to be printed as 'NULL': -- note: mimic MySQL: \set QUIET 1, \pset footer 0, \pset border 2, \pset null NULL [e.g. differences: spaces per column, left/centre/right column alignment, column names for values, has blank lines between items, true/false values]: -- note: mimic SQLite: \set QUIET 1, \pset tuples_only, \pset border 0, \a [e.g. differences: true/false values]: -- ================================================== SELECT '=================================================='; SELECT ''; SELECT 'RAW TABLES:'; SELECT ''; CREATE TABLE MyTable1 (name TEXT, number INTEGER); CREATE TABLE MyTable2 (name TEXT, colour TEXT); INSERT INTO MyTable1 VALUES ('Edward', 2), ('Gordon', 4), ('Percy', 6), ('Duck', 8), ('Douglas', 10); -- even-numbered engines (from Thomas the Tank Engine) INSERT INTO MyTable2 VALUES ('Thomas', 'blue'), ('Edward', 'blue'), ('Gordon', 'blue'); -- blue engines SELECT * FROM MyTable1; SELECT ''; SELECT * FROM MyTable2; SELECT ''; SELECT '=================================================='; SELECT ''; SELECT 'INNER/LEFT/RIGHT/FULL JOINS, USING (name):'; SELECT ''; SELECT 'I'; SELECT * FROM MyTable1 INNER JOIN MyTable2 USING (name); SELECT ''; -- same as JOIN [show info for items in intersection] SELECT 'L'; SELECT * FROM MyTable1 LEFT JOIN MyTable2 USING (name); SELECT ''; -- same as LEFT OUTER JOIN [show info for items in A] -- WARNING: RIGHT JOIN: MySQL lists 'name colour number' (SQLite/PostgreSQL list 'name number colour'): SELECT 'R'; SELECT * FROM MyTable1 RIGHT JOIN MyTable2 USING (name); SELECT ''; -- same as RIGHT OUTER JOIN [show info for items in B] -- fix for MySQL: comment out line below (MySQL lacks FULL OUTER JOIN): SELECT 'F'; SELECT * FROM MyTable1 FULL OUTER JOIN MyTable2 USING (name); SELECT ''; -- same as FULL JOIN [show info for items in A or B or both] -- FULL OUTER JOIN alternative, if FULL OUTER JOIN syntax not available: -- MAJOR WARNING: RIGHT JOIN: MySQL lists 'name colour number', this results in values in incorrect columns when applying UNION ALL, specify 'SELECT name,number,colour' to avoid this (SQLite/PostgreSQL list 'name number colour'): SELECT 'F alt'; SELECT * FROM MyTable1 LEFT JOIN MyTable2 USING (name) UNION ALL SELECT * FROM MyTable1 RIGHT JOIN MyTable2 USING (name) WHERE MyTable1.name IS NULL; SELECT ''; SELECT '=================================================='; SELECT ''; SELECT 'DIFFS/SYM DIFFS:'; SELECT ''; SELECT 'A - B'; SELECT * FROM MyTable1 t1 LEFT JOIN MyTable2 t2 ON t1.name = t2.name WHERE t2.name IS NULL; SELECT ''; -- A diff B [difference] SELECT 'B - A'; SELECT * FROM MyTable2 t2 LEFT JOIN MyTable1 t1 ON t1.name = t2.name WHERE t1.name IS NULL; SELECT ''; -- B diff A [difference] -- fix for MySQL: comment out line below (MySQL lacks FULL OUTER JOIN): SELECT 'A sym diff B'; SELECT * FROM MyTable1 t1 FULL OUTER JOIN MyTable2 t2 ON t1.name = t2.name WHERE t1.name IS NULL OR t2.name IS NULL; SELECT ''; -- A sym diff B [symmetric difference] -- sym diff alternative, if FULL OUTER JOIN syntax not available: -- MAJOR WARNING: RIGHT JOIN: MySQL lists 'name colour number', this results in values in incorrect columns when applying UNION ALL, specify 'SELECT name,number,colour' to avoid this (SQLite/PostgreSQL list 'name number colour'): SELECT 'A sym diff B alt'; SELECT * FROM MyTable1 LEFT JOIN MyTable2 USING (name) WHERE MyTable2.name IS NULL UNION ALL SELECT * FROM MyTable1 RIGHT JOIN MyTable2 USING (name) WHERE MyTable1.name IS NULL; SELECT ''; SELECT '=================================================='; SELECT ''; SELECT 'UNION/UNION ALL:'; SELECT ''; SELECT 'U'; SELECT * FROM MyTable1 UNION SELECT * FROM MyTable2; SELECT ''; SELECT 'UA'; SELECT * FROM MyTable2 UNION ALL SELECT * FROM MyTable1; SELECT ''; -- fix for PostgreSQL: comment out 2 lines above, uncomment 2 lines below: -- SELECT 'U'; SELECT name,cast(number as TEXT) FROM MyTable1 UNION SELECT * FROM MyTable2; SELECT ''; -- SELECT 'UA'; SELECT name,cast(number as TEXT) FROM MyTable1 UNION SELECT * FROM MyTable2; SELECT ''; SELECT '=================================================='; SELECT ''; SELECT 'CROSS JOINS:'; SELECT ''; SELECT 'C'; SELECT * FROM MyTable1 CROSS JOIN MyTable2; SELECT ''; -- Cartesian cross product SELECT 'C'; SELECT * FROM MyTable2 CROSS JOIN MyTable1; SELECT ''; -- Cartesian cross product SELECT '=================================================='; SELECT ''; SELECT 'THE END'; SELECT ''; SELECT '=================================================='; SELECT ''; -- ==================================================
Some demo code for FULL OUTER JOIN via LEFT/RIGHT JOIN:
-- ================================================== -- SQLite(/MySQL/PostgreSQL) demo code for FULL OUTER JOIN via LEFT/RIGHT JOIN: -- note: requires SQLite v3.39+ for RIGHT JOIN and FULL OUTER JOIN: -- note: see 'fix for MySQL' comments to run this in MySQL: -- ================================================== CREATE TABLE MyTable1 (v TEXT); INSERT INTO MyTable1 VALUES ('a'); INSERT INTO MyTable1 VALUES ('b'); INSERT INTO MyTable1 VALUES ('c'); INSERT INTO MyTable1 VALUES ('d'); INSERT INTO MyTable1 VALUES (NULL); INSERT INTO MyTable1 VALUES (NULL); CREATE TABLE MyTable2 (v TEXT); INSERT INTO MyTable2 VALUES ('a'); INSERT INTO MyTable2 VALUES ('b'); INSERT INTO MyTable2 VALUES ('e'); INSERT INTO MyTable2 VALUES ('f'); INSERT INTO MyTable2 VALUES (NULL); INSERT INTO MyTable2 VALUES (NULL); -- fix for MySQL: comment out line below (MySQL lacks FULL OUTER JOIN): SELECT t1.v,t2.v FROM MyTable1 t1 FULL OUTER JOIN MyTable2 t2 ON t1.v = t2.v; SELECT t1.v,t2.v FROM MyTable1 t1 LEFT JOIN MyTable2 t2 ON t1.v = t2.v UNION ALL SELECT t1.v,t2.v FROM MyTable1 t1 RIGHT JOIN MyTable2 t2 ON t1.v = t2.v WHERE t1.v IS NULL; -- ==================================================