L’après long terme

Ca va devenir une habitude de répondre à Clochix

Un passage en particulier m’a interpellé :

Bien sûr, certains d’entre nous mettent en place des testaments numériques où nous confions nos clés à un proche pour qu’il mette de l’ordre dans nos affaires après notre disparition. Mais cela reste à mon avis une solution fragile, peu fiable sur le long terme. Il nous faudrait un service public de cimetière numérique, avec des concessions « perpétuelles ». Où figer et héberger pour quelques temps l’ensemble de nos contributions numériques. Et cela m’amène à la pérennité des noms de domaine. On n’est jamais propriétaire d’un nom de domaine, juste locataire pour quelques années. Serait-il légitime de conserver ces noms après notre mort ? Je ne le pense pas, les NDD sont une ressource commune que nous privatisons temporairement. L’hétéronyme Clochix ne m’appartient pas, je ne vois pas au nom de quoi j’interdirais pour l’éternité à d’autres d’utiliser l’adresse clochix.net. Et pourtant j’aimerais bien ne pas casser le Web, qu’après ma mort les liens vers mes billets ne se retrouvent pas soudain brisés. Que des redirections amènent les visiteurs vers le cimetière des billets.

Quand un ami disait récemment que “le web est encore immature” ou que “ce web le décevait”, je lui répondais que le web n’était qu’un reflet de l’humanité. Peut-être que reflet n’est pas le bon mot. Peut-être est-ce plutôt une projection.
Le web est une projection tellement fidèle de l’humanité que la NSA pense jouer à Person of Interest et prévoir les crimes à l’avance avec suffisamment de précision.

On aimerait être immortels, profiter des bonnes choses un peu plus. On aimerait observer les générations suivantes réussir là où on a échoué. Mais nous sommes condamnés. Après notre disparition, il ne reste que des souvenirs chez celles et ceux que l’on a connu, une trace que l’on a laissé sur eux. Peut-être des traces médias diverses. Mais ces traces se raréfieront, disparaitront. Elles ne disparaitront pas forcément dans l’absolu, mais au moins noyées dans le bruit des échanges des survivants.
A part d’irréductibles romantiques comme Karl, qui lit les archives du W3C des années 90 plutôt que le dernier article sur Grunt vs Gulp ?

Si nous sommes voués à disparaitre, peut-être est-il vain que nous essayons de préserver ce qui est condamné, parce que nous ne serons plus là pour le maintenir. Peut-être que casser un lien n’est que la projection de la fin de notre projection. Peut-être devons-nous accepter de casser le web dans l’après long terme.

Real tools

So, I was discussing with @Clochix on the future of mobile/apps/the web, what are the costs for developers today to target as much platforms as possible, etc.
And then someone popped up in the discussion (and that’s cool, that’s the reason we’re having these discussions on Twitter), to comment on HTML5 tooling: (translation is mine)

devs prefer coding with real tools and users prefer real apps.

Clochix answered constructively asking for reference for this claim as well as definitions for “real tools” and “real apps” (“Facebook apps, Linkedin apps and the iOS ecosystem”). That intervention pissed me off (“real”) and since I’m a human being, I make mistakes so I answered unconstructively that as a real dev, I only work with real butterflies.

I feel this topic deserves a broader coverage, so I’d like to talk about my favorite tools to build web applications and software in general. So here is my list of real tools I use as a real developer in no particular order.


My brain

So I like this tool my brain is. It’s an amazing tool to think abstractly, model data, combine ideas. Our brains develop habits which make us faster and better at what we do over time usually. We have to be careful about forgetting our habits when they get in the way of moving forward, though.

With my brain, I can project myself in the future, imagine the maintanability problems in advance and lead to ways to avoid them in the present.

With my brain, I’m capable of cutting a problem into subproblems that can each be solved independently.

With my brain, I can bridge my understanding of a problem with the vocabulary the web platform provides to produce a solution to the problem.
My brain helps me remembering this vocabulary. Since my brain is very limited in capacity, I dump occasionally its content to MDN, so I can refer to it later. The MDN team is indeed making sure the website is up as much as possible.

This sounds like an obvious one, but I see too often devs rushing to their computers when a walk would probably be more effective.


Voice, email and patience

The vast majorit…. well… all the time, I build software for someone else’s need. I must translate their intent into code. My main task is to get their intent out of their brain. If I fail at this task, the code I write is most likely a waste of time.
Among other difficulties, most of the time, people don’t know what they want. Even if they do, they most often lack the proper vocabulary to express it. It’s a frustrating situation, but one we have to deal with just all the time.

It takes discussing, exchanging, expanding people’s vocabulary. It takes building something, gathering feedback, iterating. But not doing so results in endless cycles of disatisfaction about what’s being built… well… or you can screw your client over, but I always aim at success…

It’s a lot of work. Communication. Among other things, it involves cross-cultural concerns (not talking about cultures as in nationality, but culture in a broader sense), making sure the tone is right, seduction sometimes, manipulation in rarer occasions.
It helps a lot not wasting time, though.

I quite intentionally plan to master this art. Much more seriously than I plan to master the command line. We’re building software for people.


White board

Aw maaaan, this is my favorite actually. In my second year of software engineering school, I got involved in a project with a group of students and we had a room assigned for us. It had a whiteboard. Team design discussions in front of a whiteboard are just the best!
I miss whiteboards… I miss a team too, but that’s another topic, which leads me to…


A team

Not sure how ethical it is to consider a team a tool, but moving on…
Other people are amazing at seeing errors you make because they most often think from a different perspective. No later than a couple of weeks ago, I thought of a project and just discussing it over lunch with Thomas Parisot made me realize it was much more easier to do it as a Chrome extension than something with a server (Clochix would be proud). Bim! Saved hours messing around with server settings and going back and forth between browser APIs and Node APIs, just like that, over lunch.

A team is good for code review, for keeping things moving when you’re not at your best, for telling you to write test when you’d be lazy about it, for telling you how to promote your projects.

Teams are good.


Conclusive note

Tools like SDKs, frameworks, I can Google about them and given a few hours, I’ll know what to pick. In a few hours, I can find one excellent way to customize my CLI and I bet you the code to do it is even open source.

As far as I’m concerned, the tools I listed in this post are the ones that make a massive difference in my productivity. The rest of my tooling is a replaceable distraction. Coding itself sometimes feels like an annoying burden after the work of understanding the problem is done.

Please wake me up when the non-debates about Grunt vs Gulp or Ember vs Angular or MVC vs MV*MCC|PVM² are over. I’m making a nap so my brain rests ;-)

Sans Serveur

Réponse à certains morceaux de cette merveilleuse pièce par Clochix.

Je me suis trompé sur le Web. J’aime HTTP, mais ça reste un protocole client-serveur, donc fondamentalement centralisateur. Toute solution impliquant d’installer un serveur est réservée à une élite, donc ne peut être émancipatrice. Il nous faut remettre l’intelligence à la marge, dans les mains des gens, dans leurs ordiphones plutôt que sur des serveurs, fussent-ils Web.

Et instantanément se débloquent des clefs, des “mais bien sûr !”. La formulation est un peu brute et certaines aspects méritent clarification et peut-être discussion.

Poche est une application PHP qu’on installe sur un serveur. Je ne veux plus installer d’applications PHP sur mes serveurs. Ni Ruby ou Python d’ailleurs.

Malheureusement, certaines applications nécessitent un serveur, parce qu’il faut une source de confiance et que chaque client pourrait “tricher”. Les jeux multijoueurs sont un exemple, mais on pourrait en trouver d’autre. Certaines applications demandent aussi une disponibilité de contenu 24/24 ce qui demande un serveur sous une forme ou une autre.
Je concède facilement par contre que beaucoup d’application ne nécessitent pas du tout de serveur.

Parce qu’héberger une telle application demande une vigilance quotidienne. Il suffit d’être déconnecté ou occupé ailleurs quelques jours pour louper l’annonce d’une faille critique dans le logiciel ou la pile sous-jacente, et perdre le contrôle de son serveur et ses données. Sans même parler de faille, des évolutions des plateformes peuvent modifier les réglages par défaut et nécessiter de reconfigurer les applications. Mes priorités actuelles font que je n’ai plus beaucoup de temps à consacrer à ça.

Bim. Elite. C’est écrit là. Pour installer son serveur, il faut des ressources en temps et des compétences pour s’intéresser à la sécurité qui vont au-delà de ce qu’il est raisonnable d’attendre du commun des mortels.
Et quel est le risque encouru si on n’a pas ces ressources ? “perdre le contrôle de son serveur et ses données” en gros, perdre tout l’intérêt d’un ordinateur :-s

L’évolution des plateformes est un problème un peu à part, mais concernant la sécurité, il est dommage que l’on en soit dans cet état. les failles de sécurités sont toujours les mêmes et pire, on y perd quasi-systématiquement le contrôle de sa machine en entier. WTF!

Et pourtant on sait faire des systèmes qui, même en présence d’une faille ne donnent pas le contrôle de la machine entière (le contenu a une qualité bien supérieure au flux vidéo ) (autre vidéo sur le sujet). La version très courte, c’est que la modularité marche aussi pour la sécurité. Mais faut qu’on perde nos habitudes des plugins WordPress qui ont le droit de massacrer n’importe quel partie du logiciel ou des données sous prétexte d’ajouter des fonctionnalités.

On bosse aussi sur des langages qui n’ont pas les soucis de sécurités par défaut des langages du passé. Pourquoi la planète ne se rue pas pour investir sur Rust reste un mystère. Pourquoi Linux et toutes les bibliothèques d’infrastructure ne sont pas graduellement réécrites en Rust vitesse grand V reste un mystère.

On prend en pratique le risque issue de la nonchalance des projets open source quant à la sécurité. Oui, le risque est moins grand qu’avec du source fermé, mais on reste à un niveau trop élevé par rapport à ce qu’on sait faire en 2013. On sait faire des systèmes sécurisé. On sait faire des systèmes qui ne donnent pas contrôle de la machine sous prétexte que le module de parsing JSON a un bug ou crashent la machine sous prétexte d’un problème de rendu texte. On sait le faire, faisons-le. Le manque de sécurité ne devrait pas être une raison de ne pas avoir son propre serveur.

Alternativement, on peut également le synchroniser via le génial protocole remoteStorage (…) La synchronisation nécessite dans ce cas de passer par un serveur (…)

… ce qui pourrait paraître comme une régression vu qu’on a besoin d’un serveur qui peut se faire taper dessus en HTTP de n’importe où sur la planète. On est sauvé par le fait que ce composant serveur est de taille normalement réduite ce qui réduira les opportunités de failles de sécurité. Le fait d’utiliser une API standardisée garantit une pérennité entre le serveur et les clients.
Ce qu’il manque peut-être à remoteStorage, c’est le côté réplication + cryptage d’un Tahoe-LAFS qui permettrait d’utiliser Dropbox ou Google Drive pour le stockage de nos données même si on ne leur fait pas confiance !

Mais installer un serveur n’est pas une réponse qui résiste à la démocratisation de la technologie. Avec cette démocratisation est venue l’utilisation de serveurs tiers, et, assez logiquement me semble-t-il, des silos actuels.

Cette idée est fascinante. Le coût imposé à chaque personne pour être autonome sur le web est trop élevé, donc des silos se sont créés en réponse à cette difficulté pour apporter les bénéfices sociaux du web (partage, collaboration instantanés, etc.) en abaissant les coûts.
Nous aurions perdu le web décentralisé à ne pas reconnaître la barrière à l’entrée qu’imposait de gérer son propre serveur. L’enjeu pour récupérer le web serait d’abaisser cette barrière.

Dire que pour échapper aux silos il « suffit » d’avoir son propre serveur, est aussi irréaliste que de conseiller de faire son potager pour échapper aux grandes chaines agro-alimentaires. Juste sur le papier, très difficile à grande échelle dans la réalité.

Au lieu de chacun son jardin, on peut aussi développer une relation plus directe avec le producteur comme ça se fait avec les AMAP.

Accessoirement, peut-être que nombre de nos contemporains n’ont pas envie d’être autonomes mais se sentent parfaitement bien dans les jardins privatifs totalement sécurisés, aseptisés, confortables. Et qu’avec nos histoires de serveurs-à-la-maison nous ne cherchons pas à répondre à un besoin mais à promouvoir notre vision de ce qui serait bon pour l’humanité collectée.

Ce paragraphe mériterait un bouquin en réponse :-) Un premier problème est qu’il est dur de juger si nos contemporains ont envie d’être autonomes notamment à cause d’une carence en éducation au web. Quand tout le monde sera éduqué sur le web et continuera à choisir les silos, peut-être pourrons-nous conclure que nous cherchons à imposer notre vision et pas le bien commun d’un commun qui s’en tape.

Mais l’exemple de la nourriture semble nous suggérer une leçon. Des scandales sur la grande distribution, on en entend plusieurs par an dans les médias centralisés. Pour autant, la grande distribution ne se meurt pas, forte de sa clientèle fidèle.
L’article lié explique tout. Il explique comment une limitation/faiblesse humaine, (tendance à ne pas changer nos habitudes), tendance neurologique, biologique, est abusée (bidouillée ?) à des fins de profits. L’abus de la neurologie humaine ne me semble pas être un terrain sur lequel je veux m’aventurer. A tort peut-être, garantissant une défaite sûrement ?
L’humanité est-elle condamnée à se faire bouffer par ceux qui profitent légalement d’une limitation neurologique ?

Mais je m’éloigne aussi.

Même si l’humanité entière ne veut pas d’un internet a-centré, est-ce une raison suffisante pour ne pas y travailler ?
Continuons. Même si ce n’est que pour la minorité pour qui c’est important.

Building in Bordeaux

I spent about two years having my belongings in a bag, moving from places to places, never unpacking it fully.
I took some time to notice what other people were doing (building companies, projects, etc.) and realized that the somewhat nomad lifestyle I had was in the way of achieving the same thing. To build stuffs, one needs to build the relationships, to build the trust, to build the friendships, to become part of a community or several; to take a place in a social graph. Building, I believe, is a collective process.

I guess the way the building started was when Liam Boogar and I were working in a café and were annoyed that it wasn’t really appropriate for work, yet the best thing. In a way, through our discussions, we sort of re-invented the idea of a co-working space. We tweeted about our thoughts and got contacted by Alexis Monville and he told us about the story of coworking in Bordeaux. That at some point, there were 2 associations, “Bordeaux Coworking” and “Coworking Bordeaux”… Just from the names, you can imagine the rest. They attempted to merge, but humans happened and all positive energy on the topic went away for some time.
Alexis also told us about this other association called Aquinum (digital professionals, about 200 people). He also told us that the City of Bordeaux was planning on opening a coworking space and called for proposals from companies/associations and Aquinum was planning on sending a proposal. We got in touch with them. Liam moved to Paris, but I kept in touch with Aquinum. I sent the proposal with them, we won, I also became a member. That’s how Le Node started (but opened about a year later for various technical reasons). Benjamin Rosoor, Aquinum President back then tried to get the no.de domain for Le Node, but soon enough found out it was already taken :-) bxno.de will be the domain in the end.

Wait no. It also started at the second barcamp in Bordeaux, where I got to meet a good number of people. I remember the event and the people quite clearly. One person I met was a man slightly older than me, somewhat long blonde hair. We talked for a moment, among other things, I told him I was just starting a PhD. Later, when he was about to leave the event, he handed me his card and told me that if I was looking for work, I should contact him. He then remembered about the PhD and added something like “…after your PhD… in three years”. Something weird and long-forward looking in proposing work three years in advance, but so is Thomas Parisot.

The irony will want that I left the PhD in late October and he posted a job offer in December. I answered, he and the other founders interviewed me, and I joined Dijiwan. It was a promising startup. It raised half a million to begin with which is quite unusual in Bordeaux. I got to meet and work with Amar Lakel, Guillaume Marty and Nicolas Chambrier (who was a remotee from Lyon). When it came to building a product, to write code, we felt invincible. We were a team.

In parallel, at one Happynum (Aquinum gatherings), I got to meet Suzanne Galy with whom I became friend. Among many of her talents, she’s a journalist. We’ve had good discussion about education, education to code/data (a topic dear to my heart).
One day, she told me that with the school of journalism in Bordeaux, she had started the Data Journalisme Lab, a project with student journalists which aimed at making production involving visualization of open data. I volunteered to be a developer in two projects.
Not-so-secretly, the production were a mere excuse to have journalist get in touch with a different culture (design and code). They learned. I did too interacting with them; probably more.

Dijiwan crashed in bad circumstances. Money ran out, the CEO hid the situation from everyone even the other co-founders until I discovered it via a rejected check… and all the terrible decisions that comes from someone who can’t admit to himself he’s wrong, but with all the employees we sticked together; it certainly participated in making us closer.
Le Node opened at that time. People talk in the Aquinum/Le Node community. People knew what was happening at Dijiwan. I remember the solidarity and support. They couldn’t act, but were supportive. We never really heard back from the CEO. I think he understood in a way that what he had done was not acceptable by this community and I think he intuitively kicked himself out of this community and probably Bordeaux as well. Indirectly, I see a community being able to organically kick out those with unethical behaviors. Maybe I’m over-interpreting. Maybe not.

Back from an event, Thomas decided to start the BordeauxJS. A monthly meetup at Le Node dedicated to discussing JavaScript, Front-end stuffs, Node.js and web technologies in general. After the Dijiwan debacle, he left for London to work at the BBC and handed off the BordeauxJS organization to me. It’s some work to organize, find speakers, communicate about the event, but it’s always an enjoyable moment (damned! Need to find a speaker for December and January!!).

Education to technology dear to my heart… When I came back to Bordeaux, I tried to submit a project to the City of Bordeaux where I’d go to schools educating kids on computers. I expected that the project would be judged out of the idea and they would open the doors of schools for me. The first question I was asked was whether I had a school director already following me… Didn’t happen.
Fast forward to the first Le Node anniversary, I organized a Coding Goûter; an event where kids and parents come learn the basics of programming. Following this event, I and Aquinum, got contacted by City of Bordeaux Representative to see if we could work together to teach programming to kids in school… It won’t happen this time because time was too short to prepare anything serious, but we’ve got the contacts, know the people, the intentions. Next time, most certainly!
But something more important happened at the first Coding Goûter. Hélène Desliens‘ daughter kept programming afterwards. We had offered t-shirts (very good call Chloe!) after the event and she’s wearing it proudly at school. I’ve been told by her parents that she had found something she loved doing, a means of expression, something her sister doesn’t do; her own activity. They told me her self-confidence had grown… Shit… Something I did can have this sort of impact on people?

And many other events, many other people.

There has been a second Coding Goûter, there will be others. I hope we’ll eventually work the the City too. It didn’t work out when I was on my own because I was on my own, part of nothing; a nobody. A graph with one node. Being part of Aquinum, Le Node, making the first Coding Goûter happen were all part of creating circumstances that made the same idea suddenly becoming something that has to happen. An idea is never good for itself. It’s good only if it matches the right circumstances.
There will be other events, other projects like the Data Journalisme Lab. There will be other people or other circumstances with the same people. Maybe other people inspired. And I can be part of it only because of the inertia created from being here, from staying, from building. From building trust, relationships, friendships, successes, also from other failures like Dijiwan.

I could start over somewhere else, but I’d have to throw away all the inertia, the ability to organize a Coding Goûter in a fingersnap. I could start over somewhere else, but I’d have to give up what I’ve stayed for to build.
I’m not afraid of not being able to do it all over again somewhere else. I just don’t want to. I want to keep building on the existing foundations, see how far things go.

Not-so-coincidentally written listening to these songs

The W3C is a restaurant

A fantasy lives in the head of a large number of people about the role of the W3C in the web ecosystem. This fantasy is that the W3C would decide what gets in the web platform and what doesn’t, or even that the W3C would set a vision for the web. This is completely false.

History of an API

XMLHttpRequest, am I right? If you think it’s an innovation from the W3C, you’re wrong and should read up its history. XMLHttpRequest was shipped in IE5 in March 1999 (accessible via ActiveX, but whatever). Mozilla mimicked the interface via a global XMLHttpRequest object and worked on it from December 2000 and June 2002.

The W3C comes into play by publishing a first draft as late as April 5th 2006. The W3C played its sole and only role which is setting up a table for implementors to discuss what goes in the spec based on what has been implemented and what websites rely on to work. This is the history of the vast majority of the APIs we have now on the web (and one of the reason they suck).

Necessary side note

Having had this discussions dozens of time, at this point, I know what a lot of readers think “this is dirty! The web would be much better if the W3C decided of the APIs upfront and implementors arrived later in the game.”. But this is not how things work. This is not how the web work. Get over it, otherwise we’re all wasting time discussing a fantasy world.

The reason the W3C has no decision power is that it has no mechanism to prevent implementors from doing whatever they want. Implementors decide, the W3C is looking at the side.

A bunch of tables

The W3C does not make a single decision when it comes to the web. They tried to push XML, implementors had no interest, so the W3C failed and XML use on the web is now largely anecdotal and absent of modern front-end developer practices.

The W3C is a restaurant. It has a lot of tables, it has a bunch of people paying to be here and discuss at the tables. The W3C at occasions moderates discussions, but that’s its only impact on the web platform. The people around the tables make the decisions, not the restaurant.

Sentence wording

“The W3C supports DRMs and is WRONG!!!!1!!1!§”. If a restaurant happens to have racist customers, does it support racism? Not really. Should a table be refused to those with different opinions than us? We’ve seen in the history that having discriminatory policies on who to accept in restaurant never ended well.
The W3C agreed to set a table for people who want to talk about protected content, but never forget the implementors around the table make the decisions, not the W3C. If the W3C didn’t set a table, chances are implementors would happily find another restaurant…

A post you wish to read before considering using MongoDB for your next app

This post details lessons learned from a 8 months experience with MongoDB. Although I’m writing this article alone, my experience of MongoDB was in the context of teamwork, so I’ll be using “us” often in this article.

In this blog post, I’ll be sharing experience with MongoDB that you can only get when working with it for quite some time. It will also contains elements of understanding of upcoming MongoDB improvements as well as general thoughts on data modeling.

I’ll be mostly focusing on downsides, but keep in mind that I’m only sharing my own experience which may not apply to your case. Also, I haven’t been in the situation of working with all the features of MongoDB. Especially, I have never used sharding and all other scalability-related features, so I have no opinion whatsoever on these.

Quick intro to MongoDB

Core concepts

In MongoDB, a database is composed of collections (somehow the equivalent of tables of those who are used to SQL). A collection contains documents which are JSON objects (with some additions), this means that a document can represent objects in the naive JavaScript sense (mapping between string and values) and even arrays, but also nested objects (when values are themselves objects or arrays).

Among the document values, you have the classic strings, numbers, booleans, but also ObjectIDs. Each document in a collection is uniquely identified by an ObjectID and you can store a reference to the document as a value using this ObjectID which is nothing more than an object wrapper around a unique string.

That’s about it for the core concepts. In a nutshell, it means that when you model your data, you have to think in terms of tree-shapes entities that you’ll put in collections. Cross-collection references are possible using ObjectIDs (there is a DBRef thing which isn’t strictly necessary and is just a built-in structure bundling an ObjectID and a collection name).

Queries

MongoDB has its own query system. It does the job for the most part with a couple of limitations. One good point is that you can query over data nested deep and get deep data if you need without the need to pull the entire document.

One minor limitation is very specific (occurs when you have nested arrays) and is planned to be fixed. Another minor limitation is that since the query language relies on dots, dots can’t be used in object keys. This is usually not a problem except when you want to use urls as keys as it has been my case. It has to be noted that apparently, when using the update primitives (like $push), dot-key’d objects can be inserted anyway, putting your database in a quite inconsistent state (and that’s how I discovered the dot restriction :-s Needless to say the error message was rather cryptic).

The second limitation is more fundamental: any query you do is performed on a single collection. In a nutshell, there is no equivalent to the SQL JOIN. This has a very important consequence for any application written on top which is that whenever you need data from different collections using the ID values in one to reach some objects in another, you need to do several round trips between your application and your database. That’s fine-ish if they’re on the same server; that’s an outrageous constraint when they are on different servers, because it means network latency is imposed by a database design choice.

MongoDB justifies that there is no need for joins because data are denormalized in MongoDB. I have to disagree with that. First off, data modeling is hard and people make mistake. Sometimes, what should have been one collection is two. Changing a data model is a perious task and having to pay a systematic performance cost for that is unacceptable. Then, from experience, it seems that all applications data model cannot be represented as trees. Some are graphs, true graphs. And when it’s the case, I don’t see why Mongo would impose a performance cost (either in round-trip or data duplication), that’s just ridiculous.

That reason alone makes that I don’t recommend Mongo. Although sold for being flexible, the inability to make multi-collection queries makes Mongo quite inflexible. This got me thinking that from now on, I will be looking at databases that enable to do any query in one round-trip.

The no-schema fantasy and reality

Data in the database

MongoDB as one champion of the NoSQL movement claims that you don’t need a schema for your data. It is true from a technical standpoint that MongoDB doesn’t enforce a model. Reality of use and maintainability make that within the same collection, you tend to structure objects roughly the same way. Mongo allows optional fields where SQL doesn’t, but that’s pretty much where the difference stops in practice.

I have to share that in one case, there was one field for which we had decided to do no validation over, because we knew that the data supposed to be stored there would change over time. It did happen and having the freedom to store whatever we needed in this field at “no cost” really was a life-saving feature in our context.

Experience in storing objects from code without model quickly leads to a maintenance issue, because since you don’t have a model, you start wondering “do we already have a field X in the collection Y?”, or “how is structured field Z, b.t.dubs?” which inevitably forces you to write a schema documentation to compensate the lack of enforced schema.

Once you have acknowledged that you do need a form of schema anyhow, an interesting question is where to set the bar of strictness. Imposing a schema at the database level is a development burden that devs using SQL experience the hard way. Reality is that the data model of an application changes. Maybe because the need changes, maybe because the current design is imperfect. The data model does change and the cost of this reality is too high in SQL. No schema or documentation schema induce too high of a cognitive burden on the developer though.

A middleground we have found was to use Mongoose which enables to programmatically define a model in a fairly readable and somewhat declarative way. Documentation can easily be added as comments to the schema declaration and checks on data integrity can be performed at runtime. I have a good share of rants against Mongoose specifically, but for the most part, it does the 80% part of the job that you need to move on safely and conveniently with your communication with the database.

When the world around the database changes

So you have an application and a database. Your application necessarily relies on assertion regarding your database; names of collections, field names, value types, value ranges, nesting structure shape, etc. At some point, the application you write on top of the database will need to see the database model change. in SQL, it systematically means to do an “ALTER table” or equivalent operation. Interestingly, in Mongo, some classes of assertion changes will require no change to the database whatsoever. For example, if you add a field to new documents in a collection, you don’t need to add it to all existing documents. Depending on the case, you can ignore the missing field or deal with it at the application level.

But of course, for other classes of changes, you’ll need to write a script to reorganise it. To the best of my knowledge, there is no way to send a script to the database so that it reorganises itself in situ. You need to pull the data, remodel it and re-push it. Needless to say that it feels really dirty. I hope I have missed something.

Regardless, it seems that the problem is somewhat equivalent in SQL. Data model change is part of the development flow and I wish it was treated as a first-class use case in database designs. I’m not an expert in that field. If I’m plain wrong, feel free to share your knowledge in the comments.

The Map-Reduce Eldorado

Map reduce promises huge performance gains under the condition of writing code in a certain way (I’m giving a practical definition, not a formal one here). Be aware it’s not a silver bullet applicable to any problem.

I have found the API more complicated than necessary, but once you understand it, you’re good to go; I let you judge by yourself, but at least, the map, reduce and finalize functions can be written in JS… I mean… sort of… which brings me to:

Third-world JavaScript

The JS engine is SpiderMonkey 1.7 (the one shipped along with Firefox 2 for the curious). It means you have no Object.keys, only Array#forEach as array extra, etc. It fills like writing grandma’s JS when you actually know the language and the goodness added in ES5.

In theory, all map and reduce calls could be done in parallel independently; that’s the reason you need to write your code with some discipline. In practice, on one MongoDB install, all operations occur in sequence because of SpiderMonkey I have read somewhere. That and SpiderMonkey 1.7 pure performance, of course…

Good news, there is a plan to switch to V8 and that should solve all JS-related problems. Bad news, it’s unclear when the switch will be effective.

Debugging

The debugging experience is very painful. There is no proper debugger, no console.log. The best tool you have is conditionally throwing, because the only thing you can get back from a MapReduce operation is the final data or the first uncaught error (which stops the operation). That’s annoying when your map or reduce or finalize function gets above 30 lines. I came to a point where I wrote a small browser-based emulator and pulled data out of the database to test my MapReduce code in an environment where I could debug it (I probably should open source that actually…).

… and a couple of noobish design mistakes

Points here are not major drawbacks. They are just some WTF I came across

Can’t rename a database

Self-explanatory title. No need to explain why it feels such a stupid feature that should have been implemented as part of Mongo 0.0.1. Are we really at Mongo 2.2 and still can’t rename a database? To track the bug.

Global lock

Until Mongo 2.0 included, there is a global MongoDB lock. It means that write (or write+read) operations on completely unrelated database on the same install can’t be performed concurrently. I’m sorry, but this is plain stupid.

As per Mongo 2.2, there is a per-database lock, which means that operations on independent collections within the same collection can’t be done concurrently. Same issue than before at a different level (the one that matter when you have only one database). They are working on that issue to have locks at finer-grained entities. I have no info as to when this will happen, but they seem committed to fix the problem.

Safe off by default

I have read in some blog posts that safe mode is off by default. I haven’t experienced data loss, so I have no clue of what is the danger in being in “unsafe mode”. It can be changed anyway, but it really is not a good default value it seems.

Additional readings

Some insightful readings. Informations in these posts partially overlap with one another and with what I have said already.

Conclusion

MongoDB has some major design mistakes in it. Its developer experience is overall much better than the SQL one, but I still feel dissatisfied by both and wouldn’t recommend any for now. Sadly, I don’t know any database software I would recommend. Likely, I’ll be interested in exploring Neo4j for a next experience. I’m open to suggestions, of course.

The web performance story

This blogpost is mostly a reaction to Daniel Clifford’s Google I/O 2012 talk: “Breaking the JavaScript Speed Limit with V8″. I’m reacting only to this talk. I don’t know Daniel Clifford and I have nothing against him personally whatsoever. I think V8 is an excellent product, that’s not what I’m talking about. I would have things to say about the “optimize your code for engine X” trend, but I won’t even talk about that.

I’d like to take a step back to talk about performance in general and focus on as well as criticizing a bit how things are presented in the talk. Once again, although I’ll be mentioning Daniel Clifford in my post but I’m criticizing the talk and not the person.

Why performance matters?

Daniel Clifford’s answer is that better performance allow to do things that weren’t possible, because bad performance was getting in the way of a good experience. I agree with this analysis.

One consequence is that you should care about performance only if either it currently degrades the experience of the product you’re doing or gets in the way of an improvement. If that’s not the case, move on. I’m serious on that point. Too often in technical arguments, people will talk about performance when it actually doesn’t matter since it wouldn’t noticeably improve the user experience.

Performance in software

If we want to talk about web performance, let’s start from the beginning, because improving your JavaScript code isn’t the first thing to care about.

Software architecture

In my opinion, major performance improvements are achieved with good software architecture (this is also true outside of the web context). This field is a very complicated art and very specific to your application, to your needs and your constraints. No one will ever tell you that in a talk, because it is too specific. An architecture consists of deciding which components compose your system, which role each is playing and how they are going to communicate. This is complicated.

Algorithms

This one is also independent of the context of the web. Know your classic algorithms. Know when to use them. Also know when not to use them. For short lists, a bubble sort can be faster than a quick sort.

In “algorithms”, I also put knowing when to use parallelism, knowing the different sort of parallelisms. This isn’t easy, distributed algorithms are not the easiest thing even in a master/workers paradigm.

I’ll get back to this point about algorithms :-)

Web performance

The web has some well-known constraints and it induces things to know to improve performance

Network

As said elsewere, no matter how much computers (CPUs) become faster, light speed won’t increase and the distance between Bordeaux and San Diego is going to stay the same (continental drift aside). This creates an unbreakable physical boundary to speed at which you can transmit a web page. We are not even close to reaching that limit since the transmitted information doesn’t take the shortest path and rarely reach light speed.

As a consequence, reduce the amount of communication and round trips to the minimum. This will be your first major win. Among the practical tips, use HTTP caching and reduce asset sizes.

The DOM and graphics

Since JavaScript has gotten fast, the bottleneck of web scripting is the DOM. DOM objects are weird beast which access isn’t as efficiently optimized than ECMAScript objects. Also, manipulating DOM objects often performs some graphics operation. These are costly. Change things on the screen only when necessary.

Touch the DOM only when needed. Specifically, don’t use the DOM to store information (in data-urls). Use your own objects.

Now, let’s talk about JavaScript

The first advice is knowing the language. It takes time, but it’s worth it to avoid reimplementing inefficiently what the language can do efficiently for you.

Then, write clean modular code (it’s a sort of follow-up of the above “good architecture” advice) and it will do most of the job to avoid useless computation.

If you have followed all the advice above in the provided order (and that’s a LOT of work!) and your application still have a performance issue, now you can start considering following Daniel Clifford’s advices.

Partial conclusion

Above, I’ve tried to step back on performance to explain that engine-specific optimizations are the last thing to take care of (because they yield the smallest benefit by comparison to the other things) because the talk was a bit elusive on that point.

It doesn’t mean knowing JS engines and knowing how to write efficiently for them is a stupid idea. It just mean that’s not the first thing you should care about, and as I said above, in my opinion, it’s the last.

Critique of the rest of the talk

The performance problem to be solved

Compute the 25000th prime number. Quote from the talk (emphasis added):
“I have put together a sample problem that I’d like to talk about throughout the course of this talk. It is a toy problem that I come up with but I think it’s representative of some of the things of some of the performance problem that you might face in your own code

In a very American-bullshit-communication style, we here have a sentence that is purposefully vague and say exactly nothing but uses a lot of well-chosen and confusing words to do so (from what I know, it’s a very cultural thing). Remove the source of vagueness (“I think”, “some” and “might”) and you have a sentence that’s wrong. The very large majority of web developers do not have this problem or a problem related to this sort of heavy computation. For the large majority, the biggest client-side computation issue is probably sorting a bunch of table rows.

I have to admit that I’m annoyed by this attempt to justify something wrongly. Acknowledge that JavaScript performance isn’t an issue for a majority of webdevs and move on! Anyway…

The benchmarked algorithm and the talk conclusion (WARNING: SPOILER!)

After the description of the problem is the description of an algorithm to solve this problem. The algorithm is naive, but this talk isn’t about algorithms, so it doesn’t matter at all.

The talk gives tips to improve JavaScript code for performance. Most of them are legitimate and fortunately concur with good practices. One which brings a major improvement is fixing a bug in the code. The code was reading out of array bounds. Hmm… The example is interesting, but that’s not a performance improvement. That’s just fixing a bug. V8 hasn’t made your code faster. You just made your code correct, it’s dishonest to compare things that do not work the same way.

And the conclusion. One of the biggest scams of all times in the history of tech talks. The algorithm is changed from a linear algorithm to a square-root complexity algorithm. Guess what! 350 times speedup! No kidding!

I guess that’s a Google IO talk, so you’ve got to do a show and show takeaway numbers so people at the end of the talk can say “with V8 you can achieve a 350 times speedup!”, but once again, that’s just dishonest. If your product is really good, why do you need to lie or confuse people? My question is genuine.

Since changing of algorithm is ok to solve the problem faster, I’d like to propose another algorithm which is simple, so I’m convinced will be even faster: function answer(){return 287107;}. But I feel too lazy to benchmark this one…

Conclusion

The talk has valuable advices and I recommend it for this content. However, I’m really unsatisfied of how the talk is constructed and “packaged”. Explanation at the beginning of the web performance priorities would have been good. Also, the couple of scams the talk contains are sad, mostly because they are unnecessary.