Marius Bancila [Bancila 2018](Chapter 8, Problem 69) describes a checksum computation problem which can be paraphrased as follows:
Let X = x_{1}x_{2}…x_{N}
where x_{i} is a decimal digit.
X is a valid number if
(Σ_{i=0}^{N} (N-i+1)*x_{i}) mod 10 = 0
A function to compute the checksum is shown below in C++ syntax
int checksum(const string& number) { auto index = number.length(); auto sum = 0; auto K = -1; for(auto c: number){ ++K; sum = index *(c-'0'); --index; } return sum%10; }
Loop invariant:
Here K is the index of the loop and N is the length of the string. Notice that K is redundant and included only for the purpose of the proof
However one could avoid the multiplication by using the following routine:
int my_checksum(const string& number) { auto s0 = 0; auto sum = 0; auto K = 0; for(auto c: number){ s0 += (c - '0'); sum += s0; ++K; } return sum%10; }
Loop invariant:
Once again N is the length of the string.
In case the correctness of the solution is not obvious, notice that the sum
can be computed as follows
x_{1} +
x_{1} + x_{2} +
...
x_{1} + x_{2} + ... + x_{N}
Notice that x_{1} is added N times, x_{2} is added N-1 times and so on. Thus in my_checksum, s0
computes a single row while sum
accumulates all the rows computed
Reference
[Bancila 2018] Marius Bancila “The Modern C++ Challenge” Packt Publishing Birmingham-Mumbai 2018.
The benefits of SQL-like declarative syntax for imperative programming languages like C# is [well documented](www.tutorialsteacher.com/linq/why.linq) Linq as it is known in C# has been implemented by more than one team in C++. The main advantages are
There are at least four implementations of Linq for C++:
Codeplex in turn lists other attempts:
Here I consider Berrysoft, mostly because the library implementation was easy to understand
Consider the following code:
read_lines(cin) >> where ([] (string const& str) { return str.find("include") != string::npos; }) >> write_lines(cout);
The code filters out all lines not containing the string “include”. Grep or Awk can do the same thing in one line of code. But the advantage of this code is that the data can be shaped as shown below:
struct trade{ string symbol; bool is_buy; date purchase_date; double cost; }; auto trades {read_lines(cin) >> select ([] (string const& str) { auto e{str>>split(',')>>to_vector()}; trade t; t.symbol = e[5]; t.is_buy = e[4]=="B"; istringstream{e[3]} >> parse("%d/%m/%Y", t.purchase_date); t.cost = strtod(e[8].c_str((), nullptr); return t; }) >> to_vector()};
At this point we have all the trades stored in a container and from here on we get the full power of C++. As an aside [Howard Hinant’s](https://github.com/HowardHinnant/date) date library is used in the above code.
]]>Abstract
In considering how to print a multiway tree in heirarchical fashion, this blog shows that there are many ways to represent a multiway tree in a data structure. A node with a pointer to a list of child nodes is not always the best.
Source Code: https://github.com/theSundayProgrammer/heirarchy_CPP
Problem
Consider a table of the of following form:
________________________
|parent_id|item_id|info|
________________________
where item_id is unique to every row and parent_id matches with an item_id of some row except for a special row called the root which has a special parent_id, say zero, that is not present as an item_id. It is required to print this table in an heirachical fashion as for example
Root Child1 Subchild1 Subchild2 ... SubchildN Child2 Subchild21 Subchild22
The first solution is to convert the given linear data into a multiway tree and traverse the data in a pre-order fashion. This can be done by picking the root node and traversing the tree to pick all its immediate children and then recursively the children of the children, using a breadth-first or depth-first algorithm. Another way to do it would be to use a map (known as intermediate map henceforth) from the parent_id
to an array of children. This can then be used to generate a multiway tree.
struct raw_child { int id; content name; int parent; }; struct Child { Child(content const& name_): name(name_){ } content name; std::vector<Child> children; }; std::map<int, vector<Data>> list_children; void generate_map(raw_child const& child) { list_children[child.parent].push_back({child.id,child.name}); } Data& get_root(int k) { return list_children[k][0]; } Child generate_tree(Data const & root) { auto item = list_children.find(root.id); Child parent(root.name); if (item != list_children.end()) { auto& subs = item->second; for (auto& ch : subs) { parent.children.push_back(generate_tree(ch)); } } return parent; }
Another way to represent the intermediate map is using a multi-map as in C++ multimap or Ocaml Hashtbl.
Notice that since the sole purpose of the tree is print the data in a heirarchical fashion, it is possible to print the data using the same traversal as in generating a tree. Hence there is no need to generate the tree.
void print_tree(Data const & root, int tab_count) { auto item = list_children.find(root.id); Child parent(root.name); if (item != list_children.end()) { auto& subs = item->second; for (auto& sub : subs) print_tree(sub, tab_count+1); } return parent; }
Taking a cue from Chartier as quoted above I considered other ways to address the main problem which is to print the array data in a heirachical fashion. The brute force solution would to be start with the root and then traverse the array to print all its children and all their children recursively. Notice from the code below that if n is the number of rows the time complexity is O(n^{2}) in terms of comparisons because each non-leaf item (one which is parent of at least one other item) takes n comparisons to print all its immediate children and each leaf node (one that is not a parent of any other) takes n comparisons to determine it is a leaf node.
void print_tree( raw_child const &root, int tab_count) { int k; for (int i = 0; i < tab_count; ++i) std::cout << " "; std::cout << root.name << std::endl; auto child = std::find_if(raw_children.begin(), raw_children.end(), [&](raw_child const &r) { return r.parent == root.id; }); while (child != raw_children.end()) { print_tree(*child, tab_count + 1); child = std::find_if(child+1, raw_children.end(), [&](raw_child const &r) { return r.parent == root.id; }); } }
To reduce that time complexity we could sort the array based on parent_id. This would reduce the number of comparisons to at most O(log(n)). Hence including the sorting this would take O(n.log(n)). In other words we have used a sorted array to represent a multiway tree as shown below
void print_tree( raw_child const &root, int tab_count) { int k; for (int i = 0; i < tab_count; ++i) std::cout << " "; std::cout << root.name << std::endl; auto child = std::lower_bound(std::begin(raw_children), std::end(raw_children),root, [](raw_child const& a, raw_child const& val){ return a.parent parent == root.id) { print_tree(*child, tab_count + 1); ++child; } }
Conclusion
There is more than one way to accomplish a programming task. Thinking of different ways helps in improving an existing solution.
Write a function that, prints on the console all the possible permutations of a given string.
and provides a recursive version of the solution as follows:
void next_permutation(std::string str, std::string perm) { if (str.empty()) std::cout << perm << std::endl; else { for (size_t i = 0; i< str.size(); ++i) { next_permutation(str.substr(1), perm + str[0]); std::rotate(std::begin(str), std::begin(str) + 1, std::end(str)); } } } void print_permutations_recursive(std::string str) { next_permutation(str, ""); }
A better solution would be as follows
void next_permutation(std::string& str, size_t n) { if (n==1) std::cout << str << std::endl; else { for (size_t i = 0; i<n; ++i) { next_permutation(str,n-1); std::rotate( std::end(str)-n , std::end(str)-n+1, std::end(str)); } } } void print_permutations_recursive(std::string str) { if(!str.empty()) next_permutation(str, str.length()); }
This avoids string copy over recursive calls. More importantly the second parameter
has a meaning in that it is the length of the suffix that needs to be permuted.
The other thing to notice is that rotate
is O(n) which can be reduced to O(1) by swapping as in
for (size_t i = 0; i < n; ++i) { std::swap(*(str.end()-n), *(str.end()-n+i)); next_permutation(str,n-1); std::swap(*(str.end()-n), *(str.end()-n+i)); }
Reference
[Bancila 2018] Marius Bancila "The Modern C++ Challenge" Packt Publishing Birmingham-Mumbai 2018.
So why would one create a text file and get it compiled rather than thus use Word.
Just like word there are templates for most common purposes such as Resumes, Invoices, APA style academic documents etc.
There are other options to Latex which are also free.
Given two positive integers A and B compute the quotient and remainder without using multiplication. This is an old problem except, in optimising the algorithm we derive the long division method taught in primary school.
Problem: Given A and B, where A>0
and B>0
compute q and r such that:
A = B*q +r
where q≥0
and r≥0
using only addition and subtraction. This is a fairly simple algorithm as shown below:
def divisor(A,B): q = 0 r = A #loop invariant : A = q*B + r while (r >= B): q,r = q+1, r-B return q,r
The time complexity is O(A/B). If A is say one million and B is 3, then the loop will executed 333,333 times. How can we reduce the number of iterations? We could reduce it by half if we deduct B twice as shown below
def divisor(A,B): q = 0 r = A #loop invariant : A = q*B + r twiceB = B + B while (r >= B): if (r >= twiceB) : q,r = q+2, r - twiceB else : q,r = q+1, r-B return q,r
Now, why do we want to reduce only twice? Why not four times or eight or more? If we allow multiplication or division by the radix then we can use the following procedure. Multiplication of a number by the radix is trivial because it means moving the digits of number left or right.
def divisor(A,B): q,r=0,A q0,r0=1,B radix=2 #loop invariant r0 = B * q0 while r0*radix ≤ A: q0,r0=radix*q0, radix*r0 #loop invariant : A = q*B + r while r≥B: #loop invariant : A = q*B + r while r0≤r: q,r = q+q0, r-r0 q0,r0 = q0/radix, r0/radix return q,r
Notice that the above algorithm works even if the radix is ten, or in other words all numbers are represented in decimal. In Line 6, we are pushing B to the left until it is just greater than A. q0
keeps track of how much B is being shifted left. Notice that if the radix were two the while
loop in Line 11 is a conditional statement. To prove that the loop invariant holds note that
r0 = B * q0
Hence
(q+q0)*B + r-r0 = q*B + r +(q0*B -r0) = q*B + r = A (QED)
Thus dear reader you now have the algorithm that was taught to you in primary school in Python-esque code.
Time Complexity
The while
loop commencing at Line 9 computes the most significant digit or bit of the quotient at the first iteration.;the next most significant bit at the next iteration and so on. Hence the number of iterations can be at most log(A/B). The inner loop at Line 11 is executed at most radix
times. Hence the time complexity is O(log(A/B)).
]]>
Alexander Stepanov and Paul McJones in the their book “Elements of Programming” (Chaper 11.1) describe a stable partition algorithm that appears to be more complex than necessary. Here I present an improvement.
Definition A partition of sequence with respect to a predicate P is a permutation of the sequence such that all elements satisfying the predicate appear before any element that does not or vice versa.
If x and y are two elements in the original sequence then a stable partition is one where P(x) = P(y)
implies that x appears before y in the partitioned sequence if and only if x appears before y in the original sequence.
A semi stable partition is a partition that is stable only for P(x) = P(y)= false.
The following C++ function generates an in-place semi stable partition. Notice that using the C++ convention last points to one past the last element of the array.while first points to first element of the array.
template<class Iter,class Pred> Iter stable_part(Iter first, Iter last, Pred P) { Iter i = first; Iter p = first; // let [first',last') refer to the unmodified data // Invariant // [first,i) is semi-stable partitioned at p. // In other words // (1) x in [first,p) => !P(x) // (2) x in [p,i) => P(x) // (3) [first,i) is a permutation of // [first', first' +(i-first)) // (4) x,y in [first,p) & // x precedes y in [first,p) => // x precedes y in [first',first' +(i-first)) while(i!=last) { if( P(*i)) ++i; else { swap(*i,*p); ++i;++p; } } return p; }
Notice that the invariant is trivially established at the beginning of the loop. The if
statement extends the sequence that is semi stable partitioned. Hence the loop invariant holds at the end of the if
statement. Hence on termination of the loop p
is a point of semi stable partition of the whole sequence.
String matching is one of the most studied issues in Computer Science. There are three well known algorithms by
It is very unlikely that a developer in this day will be asked to write one let develop a new one. However the reason
for a refresher is that some problems may be modeled as a string matching problem. Besides, this is the actual reason
for the refresher,
they are interesting.
The classic problem is described as: Given a string Y and a search string x is there a substring of Y that matches exactly with x. In C syntax a brute force algorithm is shown below
int strSearch(const char *Y, int N, const char *x, int n) // N is the length of string Y // n is the length of string x { for(int i=0; i+n <= N; ++i) { bool found=true; for(j=0; found && j<n; ++j) { found = (Y[i+j] == x[j]); } if (found) return i; } return -1; }
An interesting variation is to find a substring in Y that is a permutation of the string x. Let’s try test driven development: Given a solution let us write code to verify it. In other words for some k such that 0<= k < N-n, verify that the substring Y[k, k+n) is a a permutation of X. There are n! (! denotes factorial) permutations of the substring Y[k,k+n). Comparing every permutation is not viable. One solution is to reduce both strings to a canonical form that can then be compared. One such form is to sort the two strings which makes comparison possible. Thus to find a permuted substring, just slide a window of size n across Y and check if it is a permuted version of x.
int strPermuteSearch(const char *Y, int N, const char *x, int n) // N is the length of string Y // n is the length of string x { char y[n]; sort(x,x+n); for(int i=0; i+n <= N; ++i) { strncpy(y, Y+i, n); sort(y,y+n); if (strncmp(x,y,n)==0) return i; } return -1; }
The time complexity of this algorithm is O(N*n*Log(n)). The n*Log(n) is because of the sorting withing the loop. This can reduced to O(N*n) by using insertion sort. Since y is sorted all that is needed when i is incremented is to remove Y[i-1] from y and insert Y[i+n-1] which can be done in O(n) while keeping y sorted.
This leads to another problem. Is there a substring of Y such that some combination of n characters of the substring is a permutation of x. Again consider a verification algorithm. Given a string Z of length M where M>=n is there some collection n characters which can be permuted to match x. There are C(M,n) combinations. Trying all the combinations is not viable. Consider Z[0]. If Z[0] is present in x, our solution space is now reduced to
C(M-1,n-1). If Z[0] is not present then our solution space is reduced to C(M-1,n). The algorithm terminates when n==0 which means that a result has been found or n>M which means that Z does not contain a substring which can be permuted to x.
bool Verify(const char* Z, int M, char *x, int n) { if (n==0) return true; if (n>M) return false; int i=0; for(; i<n; ++i) if(*Z == x[i]) break; if (i<n) { //swap x[i] with x[n-1] char t = x[i]; x[i]=x[n-1]; x[n-1] =t; return Verify(Z+1, M-1, x, n-1); } else return Verify(Z+1,M-1,x,n); }
Notice that the linear search in Lines 5 and 6 push the time complexity to O(n*M). If the two strings were sorted, then we could reduce it to O(M) as shown below
bool VerifySorted(const char* Z, int M, char *x, int n) { if (n==0) return true; else if (n>M) return false; else if (*Z < *x) return Verify(Z+1, M-1,x,n); else if (*Z == *x) return Verify(Z+1,M-1,x+1,n-1); else // *Z>*x return false; }
This solution can be adapted to checking if B is a subset of A, if A and B are sequences. Now let us restate the problem adding a simple restriction: Find the smallest substring of Y such that there exists n characters in it, which can be permuted
to match x. This can be done by sliding a window of size M where n<=M<=N and calling Verify with that window as shown below
struct position { int start; int length; }; position CombinationSubset(const char* Y, int N, char *x, int n){ position result = {-1,-1}; char Z[N]; sort(x,x+n); for(M=n; M<=N; ++M) { for(i=0; i+M <=N; ++i) { strncpy(Z,Y+i,M); sort(Z,Z+M); if (VerifySorted(Z+i, M, x,n)) { result.start=i; result.length = M; return result; } } } return result; }
Notice that we could make some optimisation by checking that Z[i] and Z[i+M-1] are both in x before calling VerifySorted. As before, sort(Z,Z+M)
in lines 11 and 12 can be reduced to an insertion taking O(M) time.
Problem:List all sub strings of length n in a string Y such that every character in the substring is unique.
Hint 1: Find the number of duplicate characters in every substring of length n.
Hint 2: use C++ sort and unique and find the difference in length
Problem: List all substrings with exactly one duplicate.
Hint: Modify the previous solution.
The rapid spread of “fake news” and “alternative facts” on the internet is a threat to freedom as such data can be used to manipulate people into doing things that are not in their best interests. In a recent article Vinton Cerf claims that the only antidote to such an eventuality is eternal vigilance and critical thinking on the part of the general public.
Before the internet era we would rely on the editorial integrity of established media to filter out rumours and falsehoods. Now the main stream media itself is accused of supplying “fake news”. Vinton Cerf (2017) points out that free societies supporting a high level of tolerance would accept or reject ideas based on accepted social norms, although slavery, Holocaust, and Apartheid were accepted social norms at some point in time and space. Thus social norms are not sufficient to ensure a society that is free for all. We cannot depend on the state or church to decide what is socially acceptable either. Left unchecked such institutions can manipulate the media to suit the ends of a small elite. Edward Herman and Noam Chomsky (2002) would claim that it is happening already. “Weapons of mass destruction,” …anyone? Cerf(2017) claims that liberty as we know know it may not survive the digital age if the general public do not apply critical thinking to identify the truth. To paraphrase a popular saying: there are none so stupid as those who refuse to think.
Sadly the short sightedness of the electorate in some developed countries in recent months does not bode well for the future of a free society as we know it.
References
Vinton Cerf (2017),Can Liberty Survive the Digital Age?, Communications of the ACM, May 2017, 60(5), pp.7.
Edward S. Herman & Noam Chomsky (2002), Manufacturing Consent: The Political Economy of the Mass Media Pantheon Reprint edition, New York, January 15, 2002
]]>Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. — Benjamin Franklin
There has been a recent spate of cybersecurity breaches, such as the one at Yahoo! and more recently the leak of emails from the Democratic National Committee. In a recent editorial, Moshe Yardi (2017) claims that despite decades research into information security, the rate at which new vulnerabilities are created is faster than the the number of old vulnerabilities eliminated. The risk as he points out is not limited to data breaches, bad as they may be. We now have to worry about infrastructure such the electric power grid, the telecommunication system, financial system and the transportation system.
Yardi appears to suggest that we are not investing enough in cybersecurity. As Acquisto, Friedman, and Telang (2006) point out, companies don’t have an incentive to take cybersecurity seriously. In fact wide data breaches generate enough publicity so that the share rebounds higher after a small decline. Moshe argues that similar to the National Transportation Security Board created in 1926 in the US, there is a need a for cybersecurity board.
Cyber libertarians on the other hand claim that “regulation stifles innovation.” Yardi counters that by stating that “regulation and innovation can co-exist” and that self regulation by the tech community has failed and cybersecurity cannot be resolved by market mechanisms.
References
Moshe Y. Vardi (May 2017) Cyber Insecurity and Cyber Libertarianism, Communications of the ACM, 60(5), 5
Acquisto, A., Friedman, A., & Telang, R. (2006). Is There a Cost to Privacy Breaches? An Event Study.Twenty-Seventh International Conference on Information Systems, (pp. 1563-1580), Milwaukee.
]]>