Émile-Auguste Chartier

**Abstract**

*In considering how to print a multiway tree in heirarchical fashion, this blog shows that there are many ways to represent a multiway tree in a data structure. A node with a pointer to a list of child nodes is not always the best.*

**Source Code**: https://github.com/theSundayProgrammer/heirarchy_CPP

**Problem**

Consider a table of the of following form:

where item_id is unique to every row and parent_id matches with an item_id of some row except for a special row called the root which has a special parent_id, say zero, that is not present as an item_id. It is required to print this table in an heirachical fashion as for example

Root Child1 Subchild1 Subchild2 ... SubchildN Child2 Subchild21 Subchild22

The first solution is to convert the given linear data into a multiway tree and traverse the data in a pre-order fashion. This can be done by picking the root node and traversing the tree to pick all its immediate children and then recursively the children of the children, using a breadth-first or depth-first algorithm. Another way to do it would be to use a map (known as intermediate map henceforth) from the `parent_id`

to an array of children. This can then be used to generate a multiway tree.

struct raw_child { int id; content name; int parent; }; struct Child { Child(content const& name_): name(name_){ } content name; std::vector<Child> children; }; std::map<int, vector<Data>> list_children; void generate_map(raw_child const& child) { list_children[child.parent].push_back({child.id,child.name}); } Data& get_root(int k) { return list_children[k][0]; } Child generate_tree(Data const & root) { auto item = list_children.find(root.id); Child parent(root.name); if (item != list_children.end()) { auto& subs = item->second; for (auto& ch : subs) { parent.children.push_back(generate_tree(ch)); } } return parent; }

Another way to represent the intermediate map is using a multi-map as in C++ multimap or Ocaml Hashtbl.

Notice that since the sole purpose of the tree is print the data in a heirarchical fashion, it is possible to print the data using the same traversal as in generating a tree. Hence there is no need to generate the tree.

void print_tree(Data const & root, int tab_count) { for (int i = 0; i second; for (auto& sub : subs) print_tree(sub, tab_count+1); } }

Taking a cue from Chartier as quoted above I considered other ways to address the main problem which is to print the array data in a heirachical fashion. The brute force solution would to be start with the root and then traverse the array to print all its children and all their children recursively. Notice from the code below that if n is the number of rows the time complexity is O(n^{2}) in terms of comparisons because each non-leaf item (one which is parent of at least one other item) takes n comparisons to print all its immediate children and each leaf node (one that is not a parent of any other) takes n comparisons to determine it is a leaf node.

void print_tree( raw_child const &root, int tab_count) { int k; for (int i = 0; i < tab_count; ++i) std::cout << " "; std::cout << root.name << std::endl; auto child = std::find_if(raw_children.begin(), raw_children.end(), [&](raw_child const &r) { return r.parent == root.id; }); while (child != raw_children.end()) { print_tree(*child, tab_count + 1); child = std::find_if(child+1, raw_children.end(), [&](raw_child const &r) { return r.parent == root.id; }); } }

To reduce that time complexity we could sort the array based on parent_id. This would reduce the number of comparisons to at most O(log(n)). Hence including the sorting this would take O(n.log(n)). In other words we have used a sorted array to represent a multiway tree as shown below

void print_tree( raw_child const &root, int tab_count) { int k; for (int i = 0; i < tab_count; ++i) std::cout << " "; std::cout << root.name << std::endl; auto child = std::lower_bound(std::begin(raw_children), std::end(raw_children),root, [](raw_child const& a, raw_child const& val){ return a.parent parent == root.id) { print_tree(*child, tab_count + 1); ++child; } }

**Conclusion**

There is more than one way to accomplish a programming task. Thinking of different ways helps in improving an existing solution.

Write a function that, prints on the console all the possible permutations of a given string.

and provides a recursive version of the solution as follows:

void next_permutation(std::string str, std::string perm) { if (str.empty()) std::cout << perm << std::endl; else { for (size_t i = 0; i< str.size(); ++i) { next_permutation(str.substr(1), perm + str[0]); std::rotate(std::begin(str), std::begin(str) + 1, std::end(str)); } } } void print_permutations_recursive(std::string str) { next_permutation(str, ""); }

A better solution would be as follows

void next_permutation(std::string& str, size_t n) { if (n==1) std::cout << str << std::endl; else { for (size_t i = 0; i<n; ++i) { next_permutation(str,n-1); std::rotate( std::end(str)-n , std::end(str)-n+1, std::end(str)); } } } void print_permutations_recursive(std::string str) { if(!str.empty()) next_permutation(str, str.length()); }

This avoids string copy over recursive calls. More importantly the second parameter

has a meaning in that it is the length of the suffix that needs to be permuted.

The other thing to notice is that `rotate`

is O(n) which can be reduced to O(1) by swapping as in

for (size_t i = 0; i < n; ++i) { std::swap(*(str.end()-n), *(str.end()-n+i)); next_permutation(str,n-1); std::swap(*(str.end()-n), *(str.end()-n+i)); }

Reference

[Bancila 2018] Marius Bancila "The Modern C++ Challenge" Packt Publishing Birmingham-Mumbai 2018.

So why would one create a text file and get it compiled rather than thus use Word.

- It is free, in the sense the user does not explicitly pay for the software.
- In raw form it is a text file and occupies much less space.
- Since it is a text file it is more easily searchable
- The
*modus operandi*is in the text. For example I often look at a Word document I wrote a while back and then scratch my head as to how I got something done like say subscript or superscript. In Latex if you do something once, that notation is already written down and repeating it elsewhere is a matter of cut and paste. - There are some things Latex can’t do like embed an Excel spreadsheet or run macros, although running macros is not a recommended thing to do.

Just like word there are templates for most common purposes such as Resumes, Invoices, APA style academic documents etc.

There are other options to Latex which are also free.

- Pandoc converts from one set of markup notation to another. It can generate PDF or HTML from markdown; hence the Pan prefix.
- RStudio markdown is essentially a markup notation that can have embedded R language scripts. R also supports nice graphics.

Given two positive integers A and B compute the quotient and remainder without using multiplication. This is an old problem except, in optimising the algorithm we derive the long division method taught in primary school.

**Problem: **Given A and B, where `A>0`

and `B>0`

compute q and r such that:

`A = B*q +r`

where `q≥0`

and `r≥0`

using only addition and subtraction. This is a fairly simple algorithm as shown below:

def divisor(A,B): q = 0 r = A #loop invariant : A = q*B + r while (r >= B): q,r = q+1, r-B return q,r

The time complexity is O(A/B). If A is say one million and B is 3, then the loop will executed 333,333 times. How can we reduce the number of iterations? We could reduce it by half if we deduct B twice as shown below

def divisor(A,B): q = 0 r = A #loop invariant : A = q*B + r twiceB = B + B while (r >= B): if (r >= twiceB) : q,r = q+2, r - twiceB else : q,r = q+1, r-B return q,r

Now, why do we want to reduce only twice? Why not four times or eight or more? If we allow multiplication or division by the radix then we can use the following procedure. Multiplication of a number by the radix is trivial because it means moving the digits of number left or right.

def divisor(A,B): q,r=0,A q0,r0=1,B radix=2 #loop invariant r0 = B * q0 while r0*radix ≤ A: q0,r0=radix*q0, radix*r0 #loop invariant : A = q*B + r while r≥B: #loop invariant : A = q*B + r while r0≤r: q,r = q+q0, r-r0 q0,r0 = q0/radix, r0/radix return q,r

Notice that the above algorithm works even if the radix is ten, or in other words all numbers are represented in decimal. In Line 6, we are pushing B to the left until it is just greater than A. `q0`

keeps track of how much B is being shifted left. Notice that if the radix were two the `while`

loop in Line 11 is a conditional statement. To prove that the loop invariant holds note that

r0 = B * q0

Hence

(q+q0)*B + r-r0 = q*B + r +(q0*B -r0) = q*B + r = A (QED)

Thus dear reader you now have the algorithm that was taught to you in primary school in Python-esque code.

**Time Complexity**

The `while`

loop commencing at Line 9 computes the most significant digit or bit of the quotient at the first iteration.;the next most significant bit at the next iteration and so on. Hence the number of iterations can be at most log(A/B). The inner loop at Line 11 is executed at most `radix`

times. Hence the time complexity is O(log(A/B)).

]]>

Alexander Stepanov and Paul McJones in the their book “Elements of Programming” (Chaper 11.1) describe a stable partition algorithm that appears to be more complex than necessary. Here I present an improvement.

*Definition* A partition of sequence with respect to a predicate P is a permutation of the sequence such that all elements satisfying the predicate appear before any element that does not or vice versa.

If x and y are two elements in the original sequence then a **stable partition** is one where `P(x) = P(y)`

implies that x appears before y in the partitioned sequence if and only if x appears before y in the original sequence.

A semi stable partition is a partition that is stable only for P(x) = P(y)= false.

The following C++ function generates an in-place semi stable partition. Notice that using the C++ convention last points to one past the last element of the array.while first points to first element of the array.

template<class Iter,class Pred> Iter stable_part(Iter first, Iter last, Pred P) { Iter i = first; Iter p = first; // let [first',last') refer to the unmodified data // Invariant // [first,i) is semi-stable partitioned at p. // In other words // (1) x in [first,p) => !P(x) // (2) x in [p,i) => P(x) // (3) [first,i) is a permutation of // [first', first' +(i-first)) // (4) x,y in [first,p) & // x precedes y in [first,p) => // x precedes y in [first',first' +(i-first)) while(i!=last) { if( P(*i)) ++i; else { swap(*i,*p); ++i;++p; } } return p; }

Notice that the invariant is trivially established at the beginning of the loop. The `if `

statement extends the sequence that is semi stable partitioned. Hence the loop invariant holds at the end of the `if `

statement. Hence on termination of the loop `p`

is a point of semi stable partition of the whole sequence.

String matching is one of the most studied issues in Computer Science. There are three well known algorithms by

It is very unlikely that a developer in this day will be asked to write one let develop a new one. However the reason

for a refresher is that some problems may be modeled as a string matching problem. Besides, this is the actual reason

for the refresher,

they are interesting.

The classic problem is described as: Given a string Y and a search string x is there a substring of Y that matches exactly with x. In C syntax a brute force algorithm is shown below

int strSearch(const char *Y, int N, const char *x, int n) // N is the length of string Y // n is the length of string x { for(int i=0; i+n <= N; ++i) { bool found=true; for(j=0; found && j<n; ++j) { found = (Y[i+j] == x[j]); } if (found) return i; } return -1; }

An interesting variation is to find a substring in Y that is a permutation of the string x. Let’s try test driven development: Given a solution let us write code to verify it. In other words for some k such that 0<= k < N-n, verify that the substring Y[k, k+n) is a a permutation of X. There are n! (! denotes factorial) permutations of the substring Y[k,k+n). Comparing every permutation is not viable. One solution is to reduce both strings to a canonical form that can then be compared. One such form is to sort the two strings which makes comparison possible. Thus to find a permuted substring, just slide a window of size n across Y and check if it is a permuted version of x.

int strPermuteSearch(const char *Y, int N, const char *x, int n) // N is the length of string Y // n is the length of string x { char y[n]; sort(x,x+n); for(int i=0; i+n <= N; ++i) { strncpy(y, Y+i, n); sort(y,y+n); if (strncmp(x,y,n)==0) return i; } return -1; }

The time complexity of this algorithm is O(N*n*Log(n)). The n*Log(n) is because of the sorting withing the loop. This can reduced to O(N*n) by using insertion sort. Since *y* is sorted all that is needed when i is incremented is to remove *Y[i-1]* from y and insert *Y[i+n-1]* which can be done in O(n) while keeping y sorted.

This leads to another problem. Is there a substring of Y such that some combination of n characters of the substring is a permutation of x. Again consider a verification algorithm. Given a string Z of length M where M>=n is there some collection n characters which can be permuted to match x. There are C(M,n) combinations. Trying all the combinations is not viable. Consider Z[0]. If Z[0] is present in x, our solution space is now reduced to

C(M-1,n-1). If Z[0] is not present then our solution space is reduced to C(M-1,n). The algorithm terminates when n==0 which means that a result has been found or n>M which means that Z does not contain a substring which can be permuted to x.

bool Verify(const char* Z, int M, char *x, int n) { if (n==0) return true; if (n>M) return false; int i=0; for(; i<n; ++i) if(*Z == x[i]) break; if (i<n) { //swap x[i] with x[n-1] char t = x[i]; x[i]=x[n-1]; x[n-1] =t; return Verify(Z+1, M-1, x, n-1); } else return Verify(Z+1,M-1,x,n); }

Notice that the linear search in Lines 5 and 6 push the time complexity to O(n*M). If the two strings were sorted, then we could reduce it to O(M) as shown below

bool VerifySorted(const char* Z, int M, char *x, int n) { if (n==0) return true; else if (n>M) return false; else if (*Z < *x) return Verify(Z+1, M-1,x,n); else if (*Z == *x) return Verify(Z+1,M-1,x+1,n-1); else // *Z>*x return false; }

This solution can be adapted to checking if B is a subset of A, if A and B are sequences. Now let us restate the problem adding a simple restriction: Find the smallest substring of Y such that there exists n characters in it, which can be permuted

to match x. This can be done by sliding a window of size M where n<=M<=N and calling Verify with that window as shown below

struct position { int start; int length; }; position CombinationSubset(const char* Y, int N, char *x, int n){ position result = {-1,-1}; char Z[N]; sort(x,x+n); for(M=n; M<=N; ++M) { for(i=0; i+M <=N; ++i) { strncpy(Z,Y+i,M); sort(Z,Z+M); if (VerifySorted(Z+i, M, x,n)) { result.start=i; result.length = M; return result; } } } return result; }

Notice that we could make some optimisation by checking that Z[i] and Z[i+M-1] are both in x before calling VerifySorted. As before, `sort(Z,Z+M)`

in lines 11 and 12 can be reduced to an insertion taking O(M) time.

Problem:List all sub strings of length n in a string Y such that every character in the substring is unique.

Hint 1: Find the number of duplicate characters in every substring of length n.

Hint 2: use C++ sort and unique and find the difference in length

Problem: List all substrings with exactly one duplicate.

Hint: Modify the previous solution.

The rapid spread of “fake news” and “alternative facts” on the internet is a threat to freedom as such data can be used to manipulate people into doing things that are not in their best interests. In a recent article Vinton Cerf claims that the only antidote to such an eventuality is eternal vigilance and critical thinking on the part of the general public.

Before the internet era we would rely on the editorial integrity of established media to filter out rumours and falsehoods. Now the main stream media itself is accused of supplying “fake news”. Vinton Cerf (2017) points out that free societies supporting a high level of tolerance would accept or reject ideas based on accepted social norms, although slavery, Holocaust, and Apartheid were accepted social norms at some point in time and space. Thus social norms are not sufficient to ensure a society that is free for all. We cannot depend on the state or church to decide what is socially acceptable either. Left unchecked such institutions can manipulate the media to suit the ends of a small elite. Edward Herman and Noam Chomsky (2002) would claim that it is happening already. “Weapons of mass destruction,” …anyone? Cerf(2017) claims that liberty as we know know it may not survive the digital age if the general public do not apply critical thinking to identify the truth. To paraphrase a popular saying: there are none so stupid as those who refuse to think.

Sadly the short sightedness of the electorate in some developed countries in recent months does not bode well for the future of a free society as we know it.

References

Vinton Cerf (2017),*Can Liberty Survive the Digital Age?*, Communications of the ACM, May 2017, 60(5), pp.7.

Edward S. Herman & Noam Chomsky (2002), *Manufacturing Consent: The Political Economy of the Mass Media* Pantheon Reprint edition, New York, January 15, 2002

Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. — Benjamin Franklin

There has been a recent spate of cybersecurity breaches, such as the one at Yahoo! and more recently the leak of emails from the Democratic National Committee. In a recent editorial, Moshe Yardi (2017) claims that despite decades research into information security, the rate at which new vulnerabilities are created is faster than the the number of old vulnerabilities eliminated. The risk as he points out is not limited to data breaches, bad as they may be. We now have to worry about infrastructure such the electric power grid, the telecommunication system, financial system and the transportation system.

Yardi appears to suggest that we are not investing enough in cybersecurity. As Acquisto, Friedman, and Telang (2006) point out, companies don’t have an incentive to take cybersecurity seriously. In fact wide data breaches generate enough publicity so that the share rebounds higher after a small decline. Moshe argues that similar to the National Transportation Security Board created in 1926 in the US, there is a need a for cybersecurity board.

Cyber libertarians on the other hand claim that “regulation stifles innovation.” Yardi counters that by stating that “regulation and innovation can co-exist” and that self regulation by the tech community has failed and cybersecurity cannot be resolved by market mechanisms.

**References**

Moshe Y. Vardi (May 2017) Cyber Insecurity and Cyber Libertarianism, *Communications of the ACM*, 60(5), 5

Acquisto, A., Friedman, A., & Telang, R. (2006). Is There a Cost to Privacy Breaches? *An Event Study.Twenty-Seventh International Conference on Information Systems,* (pp. 1563-1580), Milwaukee.

Part-time jobs are slowly replacing full time jobs (Australian Bureau of Statistics, 2017). Individual workers bidding for jobs in Uber, Lyft, Freelancer.com, and serviceseeking.com.au are becoming popular. In a recent Crossroads magazine article Paolo Pailigi and Xia Ma(2017) refer to this working arrangement as a “Gig Economy” which is characterised by trust and short duration.

**Trust** comes in two forms. *Personal trust* between the supplier and provider of a service as in a Uber driver and a customer trusting each other. *Trust on a platform* essentially means that providers and customers use a platform to co-ordinate their interaction and both trust the platform to create a temporary contract that the two parties tacitly agree to. It also refers to the fairness of the rating system, screening out malicious users, and the like.

Service providers in a gig economy have a **short term commitment** to the employer. This paradoxically is advantageous to the customer as the gig economy commoditizes the gig. More than one plumber may bid for a job thus reducing the cost to the customer.

The authors argue that the challenge is not to convert short term gigs into a stable employment, but to work towards a regular stream of gigs. The gig economy is inevitable claim the authors. However left to itself markets will tend to exploit the vulnerable. Free markets would have 14-year olds working 12 hours in coal mines. Hence it is necessary for the state to intervene to distribute the fruits of the economy and its laboour more evenly. Pailigi and Ma suggest three things that workers in the gig economy be afforded:

- Training
- Benefits
- Encourage free markets

In all good organisations, every employee undergoes some training usually amounting to at least a week or two of working hours. Policy makers can make this kind of training mandatory for all the service providers in the gig economy. In addition it can regulate the training market and provide tax concessions for undergoing such training.

Except in the US, in most developed countries basic health care is provided by the state. In the US, health insurance is often subsidised by the employer. In addition private organisations provide other benefits such as easier loans, tie ups with hotels and car rentals, subsidised private education for children, extended maternity and paternity leave and so on. The “lone ranger” in the gig economy has to pay full price for everything. This is where the state can intervene. They can create institutions or assist existing institutions like trade associations to help their members avail of such benefits.

The third suggestion is to stop protecting the interests of special interest groups. Here, in New South Wales, Australia, taxi industry forced the government to limit the number of taxi licenses. The medical, accounting and legal professions are also guilty of this type of behaviour where people from “outside” the system have a huge entry barrier to climb. Allowing a free market to operate will reduce the cost to the consumers and help the best in the trade to rise. The authors may need to be reminded that this is more easily than done. Vested interests control most aspects of policy making including major decisions like going to war.

Given that the gig economy is inevitable, Paigiri and Ma make a few suggestions as to how to make it work effectively. They suggest the signalling theory frame work to deal with two types of signals. **Assessment signals** are those that reflect the quality of characteristics of the underlying service or product. **Conventional signals** on the other hand reflect the quality through conventional means such as promises and qualitative assessments.

Assessment signals take time to build while conventional signals are not quite reliable. The platforms used for the sharing economy should distinguish the two types of signals. The authors suggest that such platforms should hide bias and prejudice. Unfortunately that is a people problem that cannot be solved with technology unless the state intervenes.

The authors address two objections to their proposals namely: (a) the gig economy entrenches people in poorly paid jobs, (b) resulting in a large disparity between the gig economy and the conventional economy of stable employment. They point to the industrial revolution which did displace many professions but in the long run people were better off. The usual response to the issue of disparity is that the size of the pie can be increased and hence despite the disparity most people will have a larger slice than before.

There are two main issues to this type of reasoning. In the first place, workers who are displaced must be provided for. Otherwise if the number of people who are not sure of their prospects increases, people will not make long term commitments be it in housing or training. Secondly the argument ignores externalities. The market economy encourages waste. If people don’t reduce consumption, the planet is heading to towards climate catastrophe. Resources like clean air and water which we now take for granted will dwindle and large scale wars for these resources will bring human civilisation to an end before the global warming can unleash its weapons of mass destruction.

Paigini and Ma identify trust and short term commitment as the main characteristics of the gig economy. They suggest training, access to training and protection from vested interests as means of reducing the exploitative nature of the gig economy. Left to its course free market will exploit labour and harm the environment as the cost of such externalities are borne either by the general public or by future generations who have no say in the matter. They welcome the gig economy which they claim is inevitable and address some issues that help reduce the exploitative nature of the gig economy. However they barely address the more serious problems. A better solution would be a sharing economy that shares not just natural resources but also employment opportunities.* *

*“The world has enough for everyone’s need, but not enough for everyone’s greed.”* Mahatma Gandhi

Australian Bureau of Statistics, *6202.0 – Labour Force, Australia, Jan 2017*, http://www.abs.gov.au/ausstats/abs@.nsf/mf/6202.0 retrieved on 22/02/2107.

Paolo Parigi and Xiao Ma, *The Gig Economy,* Crossroads The ACM Magazine for Students, Winter 2016, vol.23, no.2

“No Silver Bullets” is an oft-cited paper in the annals of Software Engineering. While technology has changed significantly since the paper was first published there are still some pearls of wisdom that are worth reflecting upon.

Tracing to Aristotle, Brooks (1986) breaks down the difficulties in solving a complex problem into two categories. **Essential difficulties** are those that are inherent to the problem. **Accidental difficulties** arise out of the inadequacy of the tools we employ to deal with them. The paper states that the four distinguishing features of a software engineering problem are:

- Complexity,
- Conformity,
- Changeability and
- Invisibility.

Complexity and conformity are inherent to any engineering problem, but it is more pronounced in software engineering. Conformity means that software has to adapt to existing hardware and infrastructure. Changeability refers to the fact that software is almost always modified after it is delivered and invisibility refers to lack of diagramming tools. Software is abstract. There is no diagramming standard like a floor plan or circuit diagram to document the design. Documenting and communicating software design remains a challenge and Brooks claims that advances in this area will play a significant role in addressing the essential complexity. (Brooks often uses the term complexity to mean difficulty)

Brooks claims that most the tools are geared towards addressing the accidental complexities. Some of these tools include better programming languages, better development environments, and better hardware and storage technologies. For example syntax directed highlighting is now the norm in almost all Integrated Development Environments and editors like Vim and Emacs that are mainly used for programming. He claims that Object Oriented Programming could help if languages could infer the types. He dismisses Artificial Intelligence as a non-solution although Expert Systems can be of use in that expert developers can help build code generators or code checkers to improve the productivity of the average programmer.

There have been many attempts to deal with essential difficult such as designing new languages, Ada being one such. However as he points out, the long lasting benefit of Ada programming language is not so much the language itself as the many ideas it has spawned such as “modularization, abstract data types, and hierarchical structuring.” Many of these ideas have now been incorporated into other languages like C++/Java/C#. Brooks likes some of the ideas that Object Oriented Programming encourages such as data hiding and inheritance and points out that the two concepts are independent. For example ‘C’ always had FILE*. The exact type-definition of FILE was hidden from the user of FILE*. Data hiding can thus be done in pure C. Brooks claims that programmer spends a lot of time keeping track of types and productivity can be improved if type is deduced. Functional Programming languages like Lisp and ML allowed the programmer, by and large, to ignore declaring the type of a variable. In recent times C++ and C# go a long way through the use of *auto* and *var* to reduce the need for having to declare the types of variables, by using the type of the value to which the variable is initialised.

In **conclusion**, some of Brooks ideas are relevant to this day. Improving accidental complexity is easier and hence a lot of effort has been expended in that direction. The challenge of addressing the essential complexity is difficult partly because it requires a new way of thinking and any significant improvement is likely to be ignored without sustained effort.

**References**

Brooks, Fred P. (1986). *No Silver Bullet — Essence and Accident in Software Engineering.* Proceedings of the IFIP Tenth World Computing Conference: 1069–1076.