Distributed companies vs time zones

Time zones is the arch rival of distributed companies (or rather, the earth being round, but I digress into the meaning of time zones).

When you run a distributed company and you hire people, you might be tempted to hire from all over the world but there’s a problem. If you hire someone that lives 12 time zones away from you, you’ll barely interact with that person and suddenly the company moves at a day-cadence, instead of immediate cadence.

The second problem is that those people in the other side of the world will develop their own culture, their own style, an us-vs-them mentality in which you, the company, is them. All unwanted traits.

My advice, when getting started, is not to hire anyone more than 4 time zones away, so, you can end up with half a day of work of overlap. Only after you have very trusted and senior managers 4 time zones away, you can hire people 8 time zones away that have 4 hour overlaps with those managers, and repeat until you have all time zones covered by trusted senior managers that can carry the culture of your company.

This means that you are unlikely to go fully global until you are at least 50 and even then that’s pushing it. I’d be uncomfortable with a fully global company that’s smaller than 200 people.

How I’m testing seed data generation

When I create a new Rails project I like to have a robust seeds that can be used to quickly bootstrap development, testing and staging environments to interact with the application. I think this is critical for development speed.

If a developer creates a feature to, for example, connect two records together, you just want them to fire up the application and connect two records to see it work. You don’t want them spending time creating the records because that’s a waste of time, but also, because all developers end up having different subsets of testing data and generally ignoring everything that’s not included in them. It’s better to grow a set of testing data that’s good and complete.

One of the problem I run into is that generating testing data, or sample data, doesn’t happen often enough and changes in the application often break it. Because of that, I wrote this simple test:

RSpec.describe "rake db:seed" do
  it "runs" do
    Imok::Application.load_tasks if Rake::Task.tasks.none? { |t| t.name == "db:seed" }
    ENV["verbose"] = "false"
    Rake::Task["db:seed"].invoke
  end
end

It doesn’t have any assertions, but with just a few lines of code it probably covers 90% of seed data creation problems, that generally result in a crash. I would advice against having assertions here, as they may cost more time than the time they’ll save because sample data evolves a lot and it’s not production critical.

The startup CTO dilemma

About 10 years ago I took my first job as CTO but I wasn’t a CTO, I just had the title. I was a developer with ambition. I made mistakes, very expensive mistakes, mistakes that contributed to the failure of the startup. Since then I have learned and grown a lot and although there’s still a lot for me to learn, there are some things I understand reasonably well. One of those is how to be the CTO of an early and not so early stage startup.

With this experience, though, my salary went up. I’m more expensive now than I was 10 years ago and I didn’t know what I was doing. Because of this, I tend to evaluate working for a startup not on day 1, but on day 700 or later, when they have some traction, revenue, etc. The problem is that a lot of those startups are deep in problems that are very hard or impossible to fix by that point. It’s very painful for me to see expenses that cost hundreds of thousands of dollars because someone didn’t do 30 minutes of work 5 years ago (this is a real example).

So, the dilemma is this:

  • If a startup hires an experienced CTO from day 1, they are wasting money because they might only be spending 5% or 10% CTOing and the rest coding, doing IT, etc. which can be done by a less experienced developer.
  • If a startup doesn’t hire an experienced CTO from day 1, they are likely to make very expensive mistakes that may literally kill the startup in year 3 or it may slow it down a lot.

 I’ve been thinking about this for a while, how can this be solved?

One of my ideas was being a sort of CTO enhancer, to be the voice of experience for a less experienced co-founding CTO, helping them a few hours a week for a few months up to a couple of years. What do you think? Does this sound valuable? Useful?

I might be thinking a lot about this lately since I’m leaving my current job and searching for the next thing to do.

Nicer printing of Rails models

I like my models to be printed nicely, to make the class of the model as well as the id and other data available, so, when they end up in a log or console, I can now exactly what it is. I’ve been doing this since before Rails 3 and since Rails projects now have an ApplicationRecord class, it’s even easier.

On my global parent for all model classes, ApplicationRecord I add this:

def to_s(extra = nil)
  if extra
    extra = ":#{extra}"
  end
  "#<#{self.class.name}:#{id}#{extra}>"
end

That makes all records automatically print as:

<ModelName:id>

For example:

<User:123>

which makes the record findable.

But also, allows the sub-classes to add a bit of extra information without having to re-specify the whole format. For example, for the User class, it might be:

  def to_s
    super(email)
  end

so that when the user gets printed, it ends up being:

<User:123:sam@example.com>

which I found helps a lot with quickly debugging issues, in production as well as development environments.

Editing Rails 6.0 credentials on Windows

Rails 6 shipped with a very nice feature to keep encrypted credentials on the repo but separate them by environment, so you can have the credentials for development, staging and production, encrypted with different keys, that you keep safe at different levels.

For example, you might give the development key to all developers, but the production keys are kept very secret and only accessible to a small set of devops people.

You edit these credentials by running:

bundle exec rails credentials:edit --environment development

for the development credentials, or

bundle exec rails credentials:edit --environment production

for production ones. You get the idea.

When you run it, if the credentials don’t exist, it generates a key. If they exist, you need to have the keys. After decrypting, it runs your default editor and on Windows, this is the error I was getting:

No $EDITOR to open file in. Assign one like this:

EDITOR="mate --wait" bin/rails credentials:edit

For editors that fork and exit immediately, it's important to pass a wait flag,
otherwise the credentials will be saved immediately with no chance to edit.

It took me a surprisingly long time to figure out how to set the editor on Windows, so, for me and others, I’m documenting it in this post:

$env:EDITOR="notepad"

After that, running the credentials:edit command works and opens Notepad. Not the best editor by far, but for this quick changes, it works. Oh, and I’m using Powershell. I haven’t run cmd in ages.

Converting a Python data into a ReStructured Text table

This probably exist but I couldn’t find it. I wanted to export a bunch of data from a Python/Django application into something a non-coder could understand. The data was not going to be a plain CSV, but a document, with various tables and explanations of what each table is. Because ReStructured Text seems to be the winning format in the Python world I decided to go with that.

Generating the text part was easy and straightforward. The question was how to export tables. I decided to represent tables as lists of dicts and thus, I ended up building this little module:

def dict_to_rst_table(data):
    field_names, column_widths = _get_fields(data)
    with StringIO() as output:
        output.write(_generate_header(field_names, column_widths))
        for row in data:
            output.write(_generate_row(row, field_names, column_widths))
        return output.getvalue()


def _generate_header(field_names, column_widths):
    with StringIO() as output:
        for field_name in field_names:
            output.write(f"+-{'-' * column_widths[field_name]}-")
        output.write("+\n")
        for field_name in field_names:
            output.write(
                f"| {field_name} {' ' * (column_widths[field_name] - len(field_name))}"
            )
        output.write("|\n")
        for field_name in field_names:
            output.write(f"+={'=' * column_widths[field_name]}=")
        output.write("+\n")
        return output.getvalue()


def _generate_row(row, field_names, column_widths):
    with StringIO() as output:
        for field_name in field_names:
            output.write(
                f"| {row[field_name]}{' ' * (column_widths[field_name] - len(str(row[field_name])))} "
            )
        output.write("|\n")
        for field_name in field_names:
            output.write(f"+-{'-' * column_widths[field_name]}-")
        output.write("+\n")
        return output.getvalue()


def _get_fields(data):
    field_names = []
    column_widths = defaultdict(lambda: 0)
    for row in data:
        for field_name in row:
            if field_name not in field_names:
                field_names.append(field_name)
            column_widths[field_name] = max(
                column_widths[field_name], len(field_name), len(str(row[field_name]))
            )
return field_names, column_widths

It’s straightforward and simple. It currently cannot deal very well with cases in which dicts have different set of columns.

Should this be turned into a reusable library?

WordPress.com new editor handles titles right

I like my text properly formatted and with pretty much every CMS editor out there I always have some confusion when it comes to titles. The post or page has a title and then sections inside it also have titles and they are second level titles. On most CMS you have the option of titles or headers starting at level 1 through 6 (that maps to h1, h2, through h6 in HTML).

For example, this is Confluence, a tool that I really like:

The confusion that I get here is whether a subsection to this page should have Heading 1 or Heading 2. Sometimes Heading 1 will be displayed with the same font, size, etc as the title of the page, so, by using Heading 1 you are almost creating two pages in one. But in other CMS, Heading 1 is the top level section heading and the title of the page is a special title that will always sit above it.

The new WordPress.com editor does this correctly by being very clear that your first option for section headers, after setting the title of the page or post, is h2:

I think that was a very neat solution to the problem. Bravo WordPress.

Turning a list of dicts into a ReStructured Text table

I recently found myself having to prepare a report of some mortgage calculations so that non-technical domain experts could read it, evaluate it, and tell me whether my math and the way I was using certain APIs was correct.

Since I’m using Python, I decided to go as native as possible and make my little script generate a ReStructured Text file that I would then convert into HTML, PDFs, whatever. The result of certain calculations ended up looking like a data table expressed as list of dicts all with the same keys. I wrote a function that would turn that list of dicts into the appropriately formatted ReStructured Text.

For example, given this data:

creators = [{"name": "Guido van Rossum", "language": "Python"}, 
            {"name": "Alan Kay", "language": "Smalltalk"},
            {"name": "John McCarthy", "language": "Lisp"}]

when you call it with:

dict_to_rst_table(creators)

it produces:

+------------------+-----------+
| name             | language  |
+==================+===========+
| Guido van Rossum | Python    |
+------------------+-----------+
| Alan Kay         | Smalltalk |
+------------------+-----------+
| John McCarthy    | Lisp      |
+------------------+-----------+

The full code for this is:

from collections import defaultdict

from io import StringIO


def dict_to_rst_table(data):
    field_names, column_widths = _get_fields(data)
    with StringIO() as output:
        output.write(_generate_header(field_names, column_widths))
        for row in data:
            output.write(_generate_row(row, field_names, column_widths))
        return output.getvalue()


def _generate_header(field_names, column_widths):
    with StringIO() as output:
        for field_name in field_names:
            output.write(f"+-{'-' * column_widths[field_name]}-")
        output.write("+\n")
        for field_name in field_names:
            output.write(f"| {field_name} {' ' * (column_widths[field_name] - len(field_name))}")
        output.write("|\n")
        for field_name in field_names:
            output.write(f"+={'=' * column_widths[field_name]}=")
        output.write("+\n")
        return output.getvalue()


def _generate_row(row, field_names, column_widths):
    with StringIO() as output:
        for field_name in field_names:
            output.write(f"| {row[field_name]}{' ' * (column_widths[field_name] - len(str(row[field_name])))} ")
        output.write("|\n")
        for field_name in field_names:
            output.write(f"+-{'-' * column_widths[field_name]}-")
        output.write("+\n")
        return output.getvalue()


def _get_fields(data):
    field_names = []
    column_widths = defaultdict(lambda: 0)
    for row in data:
        for field_name in row:
            if field_name not in field_names:
                field_names.append(field_name)
            column_widths[field_name] = max(column_widths[field_name], len(field_name), len(str(row[field_name])))
    return field_names, column_widths

Feel free to use it as you see fit, and if you’d like this to be a nicely tested reusable pip package, let me know and I’ll turn it to one. One thing that I would need to add is making it more robust to malformed data and handle more cases of data that looks differently.

If I turn it into a pip package, it would be released from Eligible, as I wrote this code while working there and we are happy to contribute to open source.

Book Review: Scrum: The Art of Doing Twice the Work in Half the Time by Jeff Sutherland

scrum coverI first came in contact with Scrum when I was working at Google and since then, I’ve been applying it to the startups I co-founded with good outcomes. Since I was searching for a job, I kept seeing “Scrum Master” come up over and over and I thought it was about time that I learned all the details of Scrum to be able to be a proper Scrum master. Well, in only a couple of ways I finished the book and discovered I was already a proper Scrum master, having learned all the details about it from my time at Google and blog posts.

About the book itself, it’s short and entertaining with enough story telling to keep you engaged even if you only have a passing interest in Scrum. The system is rather simple, with only a few moving pieces and I’m glad of that. Simple systems tend to work better. The testimonials of how much productive a team is with Scrum feel exaggerated completely out of proportion, but then again, some companies are so terrible at producing anything at all, being the cradle of dysfunction, that is no surprise their productivity can be doubled or quadrupled.

★★★★☆

Buy The Art of Doing Twice the Work in Half the Time in USA

Buy The Art of Doing Twice the Work in Half the Time in UK

Weird interaction with Google, was it Duplex?

For a few weeks I’ve been receiving this email from Google:

birth control

My first question is why is birth control an issue. Is Google limiting the advertisement of birth control? Why? Did we somehow slipped into the 19th century and nobody told me?

My second question was… how could they think I have anything to do with birth control. I guess that was nothing more than AI failing miserably, so, I decided to go and fix it. The important caveat here is that I haven’t run any ads in months, maybe more than a year.

I click the Fix button and and it took me to a black page with the “Unknown Business” title. There was no way to text support, so, when the emails got annoying enough, I called them.

The support guy was nice, but he couldn’t do much about it. I explained that I wasn’t running any ads and I didn’t plan on running any for now, but maybe in the future. He told me to ignore the emails.  I asked if there was a way to stop them and he told me to search for an unsubscribe link at the bottom of the email. That’s not what I meant, I want to fix the problem. All right, that was enough time on the phone, I’ll just ignore the emails.

Now is when it got weirder. The support guy said that his supervisor was there and wanted to talk to me. Ok… Click! Someone else starts speaking:

Supervisor: Hello Hos (their way of pronouncing José), did so-and-so answer your query today?

I’m always careful here. He didn’t solve the problem but I’m sure it wasn’t his fault. Most of the times I have an issue, it’s their system being broken and a support specialist shouldn’t be punished for that.

Me: Sort-of…

And as I was trying to explain the situation, the supervisor interrupt me:

Supervisor: Ok then. If you have any other questions, feel free to call us between 9 am and 5 pm.

Click! Hung up. Wow… that was rude… and odd. And now I’m thinking, did I just talk to Duplex and it failed at managing my answer?

When it comes to technology such as Duplex, my take is this: it’s going to happen no matter what, fighting it is futile, let’s try to figure out how to make the most out of it. But I have to admit having interacted with what I suspect was Duplex gives me an odd feeling (even if it was it). It makes me want to rebel, it makes me want to test it the next time I call to try to figure out if it’s a human or not. This is obviously useless; the only thing that matters is getting my issue resolved. What concerns me here is that if a technology-loving person such as myself is getting this strong reaction, how will the general population react?

I think we are going to have some interesting growing pains in the next couple of decades.