A major hurdle for novices to begin programming is the time investment of learning versus doing something manually. I wanted to share a few things that will reduce the time it will takes to write a program. Numbers 1 to 3 here requires no installation. Numbers 4 and 5 require more time and effort but are things I wish I knew about earlier. To experience the most gratification from the following, you will need an objective. So please bookmark this page, and revisit it next time you have a problem you need to solve.

  1. Screen scraping using Google Sheets. Use the importxml() function. This allows people with zero experience to create their first screen scrape in less than a minute. Try putting the following into adjacent cells:
    =importxml("https://randhub.com","//span[@class='post-meta']")
    =importxml("https://randhub.com","//a[@class='post-link']")
    

    Think of Google sheets as a gateway drug to coding, because after one scrapes a website, one will never want to manually enter anything ever again.

  2. Google Colab. Now we have graduated to coding versus using a built-in function. Google Colab lets you write our first line of code without installing anything, and using Google’s servers. That’s ideal if we want to do some simple coding at work and don’t want to deal with compliance. Try importing an excel sheet into the program as your first exercise. The file you want to import should be on your root Google Drive folder.
    import pandas as pd
    from google.colab import drive
    drive.mount('/content/drive')
    pd.read_excel("drive/My Drive/[filename].xlsx")
    

    Now you can start messing around with your excel sheet using python. But you don’t know any python! That’s fine, just google it, and usually the first hit will be from stackoverflow, and it’ll explain what to do. When you’re done, you can export it back to google drive using:

    df.to_excel("drive/My Drive/[filename].xlsx")
    

    Obviously, to learn this takes some commitment, but it will certainly save time over the long run.

  3. APIs. Another program that requires no installation is Postman. There’s a lot of information that can be accessed through API (think Yelp and Google Custom Search). Postman allows you to try getting that data without writing a program. I am particularly impressed with Google Custom Search. Google Custom Search recognizes the format of a website and takes a lot of the most important information from the website and lets you download it without ever entering the website.
  4. Notebooks: after Google Colab, the next step is to finally install python, along with Jupyter Notebook or Atom + Hydrogen. These are offline versions of Google Colab. Jupyter Notebook is hosted on a local server and uses a web browser as a client. It is not customizable. Install using Anaconda. Atom is a more traditional text editor made by github, and Hydrogen converts it to a Notebook like Google Colab. It is highly customizable. A Notebook is something that lets your run your program line-by-line and not necessarily in the order it was written. It’s hard to explain what it is because you will never have experienced what programming used to be before these inventions.
  5. MongoDB: if you have large excel sheets, it might make sense to use MongoDB to store your data. There are two great courses on it on coursera. Intro Course. Intermediate Course. MongoDB is particularly interesting to me because it stores data the same way it comes in through an API. A very simple code can ‘get request’ from an API and upload it to MongoDB. One issue with writing a program is the question of how to store data. In excel, this seems second nature. But it’s less obvious with python. Every time you start the program, you need to re-import the data. When you want to save it, you need to explicitly do so. Exporting to excel is often not acceptable as the data formats may not be preserved. Using pickle to store data (the classic method with python) is often suboptimal because you need to save the entire file even if you made only a few changes, affecting performance.