Difference between revisions of "Software Development"

From The Thinkulum
Jump to navigation Jump to search
m (Andy Culbertson moved page From Private to Public Coding to Software Development: A more generic title fits the scope of the project better.)
(Moved the code examples and bibliography to a separate Sources article.)
(10 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Introduction ==
== Introduction ==
=== Purpose ===
This project is a growing set of notes on various aspects of software development. Its purpose is to define a set of standard operating procedures for my programming projects.
=== Background ===


For most of my coding life, I have been the only user of my programs. I wrote them to aid my work or personal life, and only rarely did anyone else even see my code or watch it run. This wasn't on purpose. It just turned out that way. In my current job I do belong to a small development team, but our programs are still only for in-house use, so the only needs we have to consider are our own.
For most of my coding life, I have been the only user of my programs. I wrote them to aid my work or personal life, and only rarely did anyone else even see my code or watch it run. This wasn't on purpose. It just turned out that way. In my current job I do belong to a small development team, but our programs are still only for in-house use, so the only needs we have to consider are our own.
Line 7: Line 13:
I haven't completely ignored these areas, though. I pick up on them here and there as I read about how other people code or as I see them in the software I use. Knowing that I'll probably need to know about them someday, I've kept a list. And now the time has come for me to fill in the details.
I haven't completely ignored these areas, though. I pick up on them here and there as I read about how other people code or as I see them in the software I use. Knowing that I'll probably need to know about them someday, I've kept a list. And now the time has come for me to fill in the details.


I'm filling in the details here in this guide for two reasons. First, as I learn, writing helps me clarify my thoughts and keep them flowing. And second, posting my writing might help other people. So if you're like me and you've programmed mainly for yourself--what I call private coding--and you're starting to share your work with others--public coding--then maybe what I've learned so far can grease some of your own planning.
I'm filling in the details here in this project for two reasons. First, as I learn, writing helps me clarify my thoughts and keep them flowing. And second, posting my writing might help other people. So if you're like me and you've programmed mainly for yourself--what I call private coding--and you're starting to share your work with others--public coding--then maybe what I've learned so far can grease some of your own planning.
 
=== Method ===


This project consists of my reading other developers' advice, examining their code, deciding what to adopt and how to adapt it to my needs and preferences, and creating templates, procedures, reference materials, and anything else that might help me code well for a public user base.
This project consists of my reading other developers' advice, examining their code, deciding what to adopt and how to adapt it to my needs and preferences, and creating templates, procedures, reference materials, and anything else that might help me code well for a public user base.


To give you an idea of my background, I've worked for the past 15 or so years in the publishing industry doing mostly text processing and some web development. The languages I'm most familiar with are Perl, Python, XSLT, and some JavaScript, with a smattering of PHP and VBA. I've spent almost all my time in Windows. I hope to branch out into at least Linux in the future, if only to expand my mind.
To show how I'm implementing these ideas in my own code, I'll refer to [https://github.com/thinkulum my GitHub repositories] throughout the project, mainly my templates for [https://github.com/audreyr/cookiecutter cookiecutter], a Python app that creates project skeletons for any language, and my snippets for [https://www.sublimetext.com/ Sublime Text 2], the text editor I use. Snippets are templates Sublime inserts wherever you type the corresponding trigger strings.


This guide will be oriented toward Python for now, but I imagine a lot of its suggestions apply to other languages too. At this point it also centers around desktop command-line programs, but I'm hoping to expand it into other interfaces and environments.
=== Limitations ===


Since this guide is a place for me to learn, I welcome feedback, and I'm always open to arguments that my choices could be improved or suggestions for things I've missed.
To give you an idea of my background, I've worked for the past 15 or so years in the publishing industry doing mostly text processing and some web development. The languages I'm most familiar with are Perl, Python, XSLT, and some JavaScript, with a smattering of PHP and VBA. I've spent almost all my time in Windows. I hope to branch out into at least Linux in the future, if only to expand my mind.
 
This is an early iteration of the guide, so it's incomplete, and a lot of it is only sketched out.
 
== General coding guides ==
 
These are resources with advice on many aspects of writing good code. The ones I haven't read are on my to-read list. I'll probably be adding others.
 
* [https://en.wikipedia.org/wiki/Code_Complete Code Complete - Wikipedia]
* [https://en.wikipedia.org/wiki/The_Art_of_Unix_Programming The Art of Unix Programming - Wikipedia]
* [https://en.wikipedia.org/wiki/Perl_Best_Practices Perl Best Practices - Wikipedia]
 
Another resource that overlaps with this guide but covers more on the project management end of OSS is [http://producingoss.com/ Producing Open Source Software].
 
== Code examples ==
 
To show how I'm implementing these ideas in my own code, I'll refer to [https://github.com/thinkulum my GitHub repositories] throughout the guide, mainly my templates for [https://github.com/audreyr/cookiecutter cookiecutter], a Python app that creates project skeletons for any language, and my snippets for [https://www.sublimetext.com/ Sublime Text 2], the text editor I use. Snippets are templates Sublime inserts wherever you type the corresponding trigger strings.
 
Here are projects that show up in lists of high quality code, so I'll examine some of them and include any relevant discoveries in the guide. All the projects I'm listing for now are in Python.
 
* [https://launchpad.net/bzr Bazaar]
* [https://github.com/boto/boto3 boto3]
* [https://github.com/django/django Django]
* [https://github.com/pallets/flask Flask]
* [https://github.com/gevent/gevent gevent]
* [https://www.mercurial-scm.org/downloads Mercurial]
* [https://github.com/nltk/nltk NLTK]
* [https://github.com/Pylons/pyramid Pyramid]
* [https://github.com/reddit/reddit reddit]
* [https://github.com/kennethreitz/requests Requests]
* [https://github.com/zzzeek/sqlalchemy SQLAlchemy]
* [https://github.com/kennethreitz/tablib Tablib]
* [https://github.com/tornadoweb/tornado Tornado]
* [http://trac.edgewall.org/browser/trunk/trac trac]
* [https://github.com/twisted/twisted Twisted]
* [https://github.com/rg3/youtube-dl youtube-dl]
 
In addition to these, I keep a mental list of software with features I might want to emulate, even if I don't plan to study their code very thoroughly:
 
* [https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Source_Code Firefox] - automatic updating; possibly its add-on architecture, if it ever settles down
* [https://wordpress.org/download/source/ WordPress] - plugin hooks
 
== Software development methodology ==
 
I'll start with some general principles that direct my thinking about software development in general. These are the ideas behind agile development practices. [https://www.agilealliance.org/agile101/ Agile software development] is a paradigm that was created to answer waterfall development, the approach of planning all the features of a piece of software in advance, an effort that often ends up wasted because the customer's needs change in the middle of the project, so the software has to be redesigned.
 
So agile development was intended to prevent overplanning. But using any development methodology will help prevent the opposite problem, which is a complete ''lack'' of structure and discipline. This results in a code organization scheme called a [http://www.laputan.org/mud/ big ball of mud]. I don't think my programs were complete mud, but the fact that I was the only one seeing or using my code let me get by with a lot of slacking. Slacking does keep you from working too hard, but it fails to prevent other kinds of problems. Agile software development seems like a good middle ground.
 
Agile development encompasses a lot of principles and practices, and it has a lot of varieties, but for me at this point, its most important practices are incremental development, test-driven development, and refactoring.
 
=== Incremental development ===
 
[https://en.wikipedia.org/wiki/Iterative_and_incremental_development Incremental development] is the approach of adding a few features per release rather than developing the whole program at once. It lets users begin using the software earlier, and it lets the feature roadmap change without wasting much development effort on the original design. It's different from iterative development, but they normally go together, so I'm using incremental as shorthand for both.
 
If you're a developer working on your own, the habit of small, frequent releases also can add accountability to the development process. It'll at least hold you accountable to work on the project, since other people will notice when the releases stop coming. It might even encourage you to code each feature well. A short release cycle does mean there's less time to get the code right, but there's also less time to procrastinate on getting it to work, and if you implement good programming practices, you'll waste less of that time debugging.
 
Small, frequent releases can also help the developer avoid procrastination, since the tasks for a small release feel more achievable, similar to writing a [http://volokh.com/posts/1182282037.shtml zeroth draft] of a manuscript. There's little concern for polish, so a zeroth draft has fewer requirements to worry about, and therefore it's more likely to get written.
 
A related practice is to deliberately trim your feature list for each release, captured in the [http://wiki.c2.com/?ExtremeProgramming Extreme Programming] practice [http://wiki.c2.com/?YouArentGonnaNeedIt You Aren't Gonna Need It].
 
One practice that's helping me think in terms of incremental development is using [http://scottchacon.com/2011/08/31/github-flow.html GitHub flow] as my version control workflow. In GitHub flow the master branch is reserved for deployable code, so anything I'm actively developing goes into a topically named branch off of master. Once the code related to that branch is ready for deployment, I merge that branch back into master. This procedure gets me to plan in terms of small clusters of features. I'll talk more about version control in the next main section.
 
=== Test-driven development ===
 
* [https://martinfowler.com/bliki/TestDrivenDevelopment.html TestDrivenDevelopment - Martin Fowler]
* [http://agiledata.org/essays/tdd.html Introduction to Test Driven Development (TDD) - Agile Data]
* [https://www.facebook.com/notes/kent-beck/rip-tdd/750840194948847/ RIP TDD - Kent Beck - Facebook]
 
To get myself into the habit of writing tests first, I'm including it as a subtask for each function I note in my task manager (I use [https://nirvanahq.com/ Nirvana], an app designed with [https://en.wikipedia.org/wiki/Getting_Things_Done GTD] in mind). I have a template for these subtasks, which I copy into the description of the tasks. At the moment the template looks like this:
 
<pre>
- Document
- Test
- Code
- Commit
</pre>
 
Really I'll move back and forth between documenting, testing, and coding, but I'll check them off when I've completed the first instance of each of those subtasks for that function.
 
A practice I'm starting to explore that takes TDD a step further is behavior-driven development. It allows you to derive tests from documentation that's both human readable and executable.
 
* [https://en.wikipedia.org/wiki/Behavior-driven_development Behavior-driven development - Wikipedia]
* [https://github.com/cucumber/cucumber/wiki/Python Python - cucumber Wiki] - A short list of Cucumber alternatives for Python.
 
=== Refactoring ===
 
* [https://martinfowler.com/books/refactoring.html Refactoring - Martin Fowler]
 
I don't have a specific refactoring procedure in place. I just do it when I notice the need. I'm hoping to study Fowler's book and learn to think in terms of the structures that refactoring creates. That way I can create them as I code and reduce the need to refactor.
 
Some IDEs have tools to aid refactoring. I'm planning to test using the Sublime Text plugin [https://packagecontrol.io/packages/PyRefactor PyRefactor] for refactoring Python.
 
== Version control ==
 
* [http://guides.beanstalkapp.com/version-control/intro-to-version-control.html An introduction to version control - Beanstalk Guides]
 
I'm hosting my projects on GitHub (see the Distribution section), which means I'm using Git as my version control system. (Before that I was using Subversion via TortoiseSVN, which I still use for some non-GitHub projects.) I learned how to use Git from ''Git Essentials'' by Ferdinando Santacroce, and I try to follow its advice. For a reference I use ''Git Pocket Guide'' by Richard Silverman. Rather than typing all the Git commands myself, I'm lazy and use the GUI tool [https://www.gitkraken.com/ GitKraken]. As I said above, I use the GitHub flow workflow to organize my commits. GitHub has as [https://guides.github.com/introduction/flow/ guide] for it.
 
== Coding style ==
 
* [https://www.python.org/dev/peps/pep-0008/ PEP 8 -- Style Guide for Python Code - Python.org]
 
I've picked up my coding style from various sources, including ''Perl Best Practices'', but PEP 8 is a good starting point that's specifically geared toward Python.
 
== Project structure ==
 
* [http://docs.python-guide.org/en/latest/writing/structure/ Structuring Your Project - The Hitchhiker's Guide to Python]
* [https://learnpythonthehardway.org/book/ex46.html Exercise 46: A Project Skeleton - Learn Python the Hard Way]
 
I make a directory on my local machine named whatever I'm using for the GitHub URL (e.g., math-student-sim), and I create this structure inside it:
 
<pre>
docs/
<package>/
tests/
app.py
LICENSE.md
MANIFEST.in
README.md
setup.py
</pre>
 
My package name is usually the project name without the hyphens (e.g., mathstudentsim).
 
I'll fill in more of the details in later sections.
 
== Distribution ==
 
To distribute your code you need somewhere to host it, and I've chosen GitHub, since it's the one I hear the most about.
 
* [http://kbroman.org/github_tutorial/ git/github guide - Karl Broman]
 
== Installation ==
 
I haven't completely wrapped my mind around Python code distribution and installation, but for now I'm relying on people downloading the source from GitHub and running the setup script. Here are some more details on some of the issues that came up for me in the project structure guides above.
 
* [https://docs.python.org/3/distutils/setupscript.html Writing the Setup Script - Distributing Python Modules (Legacy version)]
* [http://stackoverflow.com/questions/9949420/how-to-configure-setup-py-to-have-pip-install-from-github-master How to configure setup.py to have pip install from GitHub master? - Stack Overflow]
* [https://caremad.io/posts/2013/07/setup-vs-requirement/ setup.py vs requirements.txt - caremad]
* [http://stackoverflow.com/questions/7923509/how-to-include-docs-directory-in-python-distribution How to include docs directory in python distribution - Stack Overflow]
 
== Metadata ==
 
Users of your project will need certain information about it, and you'll need to keep this information somewhere. For Python modules this is in the setup script.
 
* [https://www.python.org/dev/peps/pep-0345/ PEP 345 -- Metadata for Python Software Packages 1.2 - Python.org]
 
=== Version numbers ===
 
* [https://www.python.org/dev/peps/pep-0396/ PEP 396 -- Module Version Numbers - Python.org]
* [http://semver.org/ Semantic Versioning]
* [https://git-scm.com/book/en/v2/Git-Basics-Tagging Git Basics - Tagging - Git]
* [https://help.github.com/articles/creating-releases/ Creating Releases - GitHub Help]
 
I'm just following the guidelines in those articles.
 
=== License ===
 
* [http://choosealicense.com/ Choose an open source license]
 
I've picked MIT as my default to give people freedom to use the code however they want and because I don't expect to have patents to license.
 
== Tests ==
 
Based on the advice in ''Learn Python the Hard Way'', I've chosen nose as a test management tool, but since the [http://nose.readthedocs.io/en/latest/ future of nose] is uncertain, I'm using [https://github.com/nose-devs/nose2 nose2].
 
== User interface ==
 
=== Command line ===
 
To make my projects usable quickly, I'm starting most of them with a command line interface.
 
* [https://wiki.python.org/moin/CmdModule CmdModule - The Python Wiki]
* [https://docs.python.org/3/library/cmd.html cmd - The Python Standard Library]
 
I'm placing the commands in <code>cli.py</code> within the package.
 
== Documentation ==
 
When you're coding for yourself, it's easy to forget about documentation, since you know how your code works and how to use it, and if you forget, you can reread it and figure it out. Deciphering your own code is annoying, but possibly less annoying than writing documentation. Usually when I'm in my lazy, non-documenting mode, I wait to comment pieces of the code during especially painful rereadings. I'd like to be nicer than that to people who aren't me.
 
Documentation comes in several different kinds, based on its location, format, and intended audience. It might help to think of documentation as falling into two basic genres: cookbooks and references. References tell you how the project works, and cookbooks tell you how to navigate through the project to reach particular objectives. Tutorials, for example, are a type of cookbook. Audiences fall into about three main categories: end users, API users who are developers using the code to write plugins or outside apps, and project developers.
 
As an overview, here's my take on the purpose and audience of documentation types you might find in Python projects, though most aren't limited to Python. Afterward I'll list links that discuss each type in more detail.
 
* '''README''' - In a document in the root directory of the project. A cookbook that tells users and developers the basics of working with the project.
* '''Docstrings''' - Mostly at the top of a function. A reference that tells API users how to use the function.
* '''Command line help''' - In the docstrings of command handling functions and at the top of a script. A reference that tells the user how to use the script and its command.
* '''Comments''' - Interspersed among lines of code. A reference that tells project developers what the code does and why.
* '''Code self-documentation''' - Elements of coding style aimed at communicating the code's intentions. A reference that tells project developers and API users what the code does. Some developers try to substitute this technique for comments.
* '''Project documentation''' - In documents separate from the code. A cookbook and reference that gives users and developers thorough information for working with the project.
* '''Literate programming''' - Interspersed among chunks of code. A reference that gives developers detailed explanations of the reasoning behind the code. This type of documentation isn't common, but it could be a good idea for some projects.
 
=== General advice ===
 
* [http://docs.python-guide.org/en/latest/writing/documentation/ Documentation - The Hitchhiker's Guide to Python] - Covers a lot of these documentation types.
* [https://docs.python.org/devguide/documenting.html Documenting Python - Python Developer's Guide - Python.org] - A style guide for the Python source code, which could be adopted for other projects.
 
=== README ===
 
Here's a recommendation of what to put in a README:
 
* [http://www.writethedocs.org/guide/writing/beginners-guide-to-docs/ A beginner’s guide to writing documentation - Write the Docs]
 
Here are some example READMEs that have helped me plan mine:
 
* [https://gist.github.com/jxson/1784669 jxson/README.md]
* [https://gist.github.com/PurpleBooth/109311bb0361f32d87a2 PurpleBooth/README-Template.md]
* [https://gist.github.com/zenorocha/4526327 zenorocha/README.md]
 
Here are some other lists of READMEs:
 
* [https://changelog.com/posts/a-beginners-guide-to-creating-a-readme A Beginner's Guide to Creating a README]
* [https://github.com/matiassingers/awesome-readme matiassingers/awesome-readme]
* [https://github.com/repat/README-template repat/README-template]
 
The initial README I came up with is [https://github.com/thinkulum/math-student-sim/blob/master/README.md here].
 
=== Docstrings ===
 
Here are discussions of what to put in a docstring:
 
* [https://www.python.org/dev/peps/pep-0257/ PEP 257 -- Docstring Conventions - Python.org]
* [http://stackoverflow.com/questions/3898572/what-is-the-standard-python-docstring-format What is the standard Python docstring format? - Stack Overflow]
* [https://en.wikipedia.org/wiki/Javadoc Javadoc - Wikipedia] - A documentation generator for Java code. The conventions Javadoc understands can be adapted for other languages. I'm evaluating tools for processing Python docstrings that use this format.
 
My snippets for functions and modules will include docstring outlines.
 
Then you need a tool to present the docstrings to the reader. For now I'm going with Sphinx, since Python developers seem to prefer it. Since it formats more than just docstrings, I've put its links in the "Project documentation" section below.
 
=== Command line help ===
 
Traditional command-line programs are run from the OS command line, and their help statement is called a usage message. On Unix-like OSes they may also have a man (manual) page. I'll talk about man pages in the "Project documentation" section.
 
Python has the <code>cmd</code> module for creating a command-line interpreter within your program, and each of your program's commands can also have a help message, contained in the command function's docstring.
 
My snippets for scripts and command functions will include outlines for usage and help messages.
 
==== Usage message ====
 
* [https://en.wikipedia.org/wiki/Usage_message Usage message - Wikipedia]
* [http://courses.cms.caltech.edu/cs11/material/general/usage.html How to write usage statements - CS 11 - Caltech]
* [http://stackoverflow.com/questions/17314872/shell-scripts-conventions-to-write-usage-text-for-parameters Shell scripts: conventions to write usage text for parameters? - Stack Overflow]
* [https://github.com/docopt/docopt docopt (Python implementation)] - A library that checks the user's command line arguments by parsing the script's usage message. Even if you're not using docopt, it gives you a convenient set of conventions for formatting usage messages.
* [https://docs.python.org/3/library/argparse.html argparse - Python.org] - Moves in the opposite direction from docopt, generating a usage message from a specification.
 
==== Command help ====
 
I haven't found any conventions for <code>cmd</code> help text, and the examples I've seen have been free-form descriptions. Maybe it would make sense to treat each command as a separate program and use usage message conventions for its help.
 
=== Comments and self-documentation ===
 
Comments and self-documentation are the two sides of the code clarity coin.
 
Code comments are one of the [http://www.hacker-dictionary.com/terms/holy-wars holy wars] of programming--everyone has a strongly held view and debates it vigorously. I basically agree with the views in these articles:
 
* [https://visualstudiomagazine.com/articles/2013/06/01/roc-rocks.aspx Why You Shouldn't Comment (or Document) Code - Visual Studio Magazine]
* [https://sourcemaking.com/refactoring/smells/comments Comments - Source Making]
* [https://www.pluralsight.com/blog/software-development/code-comments-dos-and-donts Code comments: A quick guide on when (and when not) to use them - Pluralsight]
 
That is, comments are a liability, so for the most part write them as little as possible, and update them when you update the code. Make your code as clear as you can (see the self-documentation section), and comment only to communicate what the code can't on its own, such as its purpose and the reasoning behind your choices in implementing it. If nothing else, comments can be a good indication of places that need refactoring.
 
I don't think it's necessary to stick to one procedure for commenting, but generally when I've commented more, it's because I've paused after a chunk of coding time, such as when I've finished a function. I code in paragraphs, sets of related lines separated by a blank line. In the commenting phase I reread the code I've just written and try to summarize or justify each paragraph, if needed. Stepping back from the code to state things in English sometimes leads to improvements in the code--refactoring or bug fixes.
 
Here are some in-depth discussions on making code self-documenting:
 
* [http://wiki.c2.com/?SelfDocumentingCode Self Documenting Code - WikiWikiWeb]
* [http://wiki.c2.com/?SystemMetaphor System Metaphor - WikiWikiWeb]
* [https://m.signalvnoise.com/hunting-for-great-names-in-programming-16f624c8fc03 Hunting for great names in programming - Signal v. Noise]
 
Every solution has its own problems. Self-documentation can go wrong too, and in some of the same ways comments can. For example, your function name doesn't have to have anything to do with its contents. Here are a bunch of tricks for wrecking your code's clarity:
 
* [http://mindprod.com/jgloss/unmain.html unmaintainable code - Java Glossary]
 
On the problem that documentation isn't executable, behavior-driven development might give us a partial solution. See the links in the test-driven development section above.
 
=== Project documentation ===
 
READMEs give the reader a basic intro to your project, but at some point they'll need more extensive information. I'm calling this in-depth treatment the project documentation.
 
==== Documentation generation ====
 
Some documentation is freeform writing, but using [https://en.wikipedia.org/wiki/Documentation_generator documentation generators] a lot of it can be collected from specialized comments and formatted automatically. You'd mainly use this to document your API. Sphinx is a popular Python tool for this purpose. It handles both types of documentation.
 
* [http://www.sphinx-doc.org/en/stable/ Sphinx]
 
My default Sphinx setup will be in my cookiecutter templates.
 
==== Content ====
 
I didn't find many guides to writing user or developer guides beyond the README, so I may have to write my own at some point, based on an examination of well-documented projects. But here's one guide that covers some of the information developers will need:
 
* [https://www.ctl.io/developers/blog/post/how-to-document-a-project How to Document a Project - CenturyLink Cloud Developer Center]
 
One source of insight for documenters might be the content of man pages. Man (manual) pages are the documentation for Unix tools, and they have a fairly standard format.
 
* [https://en.wikipedia.org/wiki/Man_page man page - Wikipedia]
* [https://liw.fi/manpages/ Writing manual pages - Lars Wirzenius]
* [http://www.schweikhardt.net/man_page_howto.html The Linux Man Page How-to - Jens Schweikhardt]
 
=== Literate programming ===
 
In spite of what I said about comments and self-documentation, sometimes I really want to expound on the meaning of my code, and literate programming is the way to do it.
 
* [https://en.wikipedia.org/wiki/Literate_programming Literate programming - Wikipedia]
* [http://akkartik.name/post/literate-programming Literate programming: Knuth is doing it wrong - Kartik Agaram]
 
Python tools:
 
* [https://sourceforge.net/projects/pywebtool/ pyWeb] - After glancing at a few, I chose pyWeb as probably the closest to what I was looking for.
* [http://mpastell.com/pweave/ Pweave] - A feature-rich tool I might try at some point.
 
My first attempt at literate programming is my [https://github.com/thinkulum/math-student-sim Math Student Simulator]. See the README for basic details on writing and processing the source files.
 
== Configuration ==
 
When you're coding for yourself, it's easy to keep all the variables inside the code files, since you can just open the files and change the values whenever you want. But when you have outside users, you don't necessarily want them changing the code, and if the code is compiled into binary executables, they can't change it anyway. Plus, separating code and data makes your program easier to think about. So you'll want to put the user-settable data in external files.
 
There are at least two questions to answer: what format to store the settings in and where to store them.
 
=== Format ===
 
Here are some discussions that cover several formats:
 
* [http://pyvideo.org/pycon-us-2009/pycon-2009--a-configuration-comparison-in-python-.html A Configuration Comparison in Python - PyCon 2009 - PyVideo] (video).
* [https://martin-thoma.com/configuration-files-in-python/ Configuration files in Python - Martin Thoma] - Examples of most of the formats I'm listing here.
* [http://stackoverflow.com/questions/8225954/python-configuration-file-any-file-format-recommendation-ini-format-still-appr Python configuration file: Any file format recommendation? INI format still appropriate? Seems quite old school - Stack Overflow]
* [http://stackoverflow.com/questions/5055042/whats-the-best-practice-using-a-settings-file-in-python What's the best practice using a settings file in Python? - Stack Overflow]
* [http://stackoverflow.com/questions/186916/configuration-file-with-list-of-key-value-pairs-in-python Configuration file with list of key-value pairs in python - Stack Overflow]
* [http://stackoverflow.com/questions/1726802/what-is-the-difference-between-yaml-and-json-when-to-prefer-one-over-the-other What is the difference between YAML and JSON? When to prefer one over the other - Stack Overflow]
* [http://www.robg3d.com/2012/06/why-bother-with-python-and-config-files/ Why bother with python and config files? - RobG3d]
 
I've sorted these formats roughly in order of increasing complexity.
 
==== CSV ====
 
* [https://docs.python.org/3/library/csv.html csv - The Python Standard Library]
 
Pros:
 
* It's simple.
* Some version of it is supported in many languages.
* It's fairly easy to import and export.
 
Cons:
 
* It's a little hard to read, especially when the lines get long.
* It's limited to simple data.
* It doesn't enforce data types.
* It doesn't have a standard format.
 
==== INI ====
 
* [https://docs.python.org/3/library/configparser.html configparser - The Python Standard Library]
* [https://pypi.python.org/pypi/configobj/ ConfigObj - PyPI] - This one has a few more features than configparser.
 
Pros:
 
* It's easy to read.
* It's easy to avoid syntax errors (given a particular format).
* Some version of it is supported in many languages.
* It's easy to import and export.
 
Cons:
 
* It's limited to simple data.
* It doesn't enforce data types (unless you use ConfigObj).
* It doesn't have a standard format.
 
==== JSON ====
 
* [https://docs.python.org/3/library/json.html json - The Python Standard Library]
 
Pros:
 
* It enables multiple data types.
* It enables complex data structures.
* It's supported in many languages.
* It's easy to import and export.
 
Cons:
 
* It's easy to commit syntax errors, especially if you're not familiar with the format.
* It can be hard to read, especially if it's not pretty printed or if it uses complex data structures.
 
==== YAML ====
 
* [http://pyyaml.org/ PyYAML]
 
Pros:
 
* It enables multiple data types.
* It enables complex data structures.
* It's supported in many languages.
* It's easy to import and export.
* It's easy to read if you stick to simple data structures.
 
Cons:
 
* It can be hard to write if you use its more complex features.
 
I'm trying out YAML as my default choice.
 
==== XML ====
 
There are a few major Python tools for working with XML:
 
* [http://lxml.de/tutorial.html lxml.etree Tutorial - lxml]
* [https://www.crummy.com/software/BeautifulSoup/ Beautiful Soup - Crummy: The Site] - It's mainly meant for HTML, but it parses XML too.
* [https://docs.python.org/3/library/xml.etree.elementtree.html xml.etree.ElementTree - The Python Standard Library ]
 
Here's some discussion of the use of XML for configuration:
 
* [https://commons.apache.org/proper/commons-configuration/userguide/howto_hierarchical.html Hierarchical Configurations - Apache Commons Configuration User Guide]
 
Pros:
 
* It enables multiple data types.
* It enables complex data structures.
* It's supported in many languages.
* It's easy to import.
 
Cons:
 
* It doesn't have a standard format for representing configuration.
* It can be hard to read.
* It can be hard to export if the starting format isn't compatible.
* Enforcing data types requires an additional schema.
 
==== Python ====
 
You can use regular Python code to define settings, usually by putting them in their own module. I'd personally only use Python for configuration in special cases. Here are some recommendations and warnings:
 
* [http://stackoverflow.com/questions/10439486/loading-settings-py-config-file-into-a-dict Loading settings.py config file into a dict - Stack Overflow]
* [http://stackoverflow.com/questions/2259427/load-python-code-at-runtime load python code at runtime - Stack Overflow]
* [http://stackoverflow.com/questions/6198372/most-pythonic-way-to-provide-global-configuration-variables-in-config-py Most Pythonic way to provide global configuration variables in config.py? - Stack Overflow]
* [https://pypi.python.org/pypi/pyconfig pyconfig - PyPI]
* [https://ralsina.me/posts/python-is-not-a-configuration-file-format.html Python is Not a Configuration File Format - Lateral Opinion]
 
Pros:
 
* It enables multiple data types.
* It enables complex data structures.
* It enables using conditions to set values.
* If your code is already in Python, it doesn't require learning an additional language.
* Importing is very easy.
 
Cons:
 
* It mixes code and data, which makes the program harder to think about.
* It can create a security risk, since users can add arbitrary code to the configuration file for the program to run.
* Exporting can be hard or impossible.
* It can be hard to read, depending on the code.
* It doesn't have a standard format for representing configuration.
* It can be easy to commit syntax errors.
 
==== Database ====
 
The formats I've already covered are plain text formats that a human can read and edit. Another option is putting the settings in a binary database and accessing them through a separate UI. The advantages are that you can enforce data types and the user isn't editing the settings file directly, which leaves it less open to syntax errors. For settings to be set by non-technical end users, this is probably the best option.
 
* [https://docs.python.org/3/library/sqlite3.html sqlite3 - The Python Standard Library]
 
Pros:
 
* It's hard for the user to commit syntax errors.
* It enables multiple data types.
* You can control access to the settings.
* It's supported in many languages.
* It can be easy to import and export.


Cons:
This project will be oriented toward Python for now, but I imagine a lot of its suggestions apply to other languages too. At this point it also centers around desktop command-line programs, but I'm hoping to expand it into other interfaces and environments.


* Complex data structures are hard to represent, or they require an additional format.
=== Feedback ===
* You have to code a UI for viewing and setting values.


=== Location ===
Since this project is a place for me to learn, I welcome feedback, and I'm always open to arguments that my choices could be improved or suggestions for things I've missed.


Where should you put your config file? Depending on the circumstances, you might need more than one file, and they might need to go in different locations. Generally speaking, you might separate global and user settings into different files. Global settings might go in the app's root directory or a config subdirectory, and user settings somewhere in a user directory.
== Topics ==


* [https://pypi.python.org/pypi/appdirs appdirs - PyPI] - A Python module for finding various directories across platforms, such as the user data, a possible location for a config file.
As a way to organize my notes and make sure I don't miss anything important, I'll use the structure of IEEE's ''Guide to the Software Engineering Body of Knowledge'' (SWEBOK). It's a standard that overviews the entire field and forms the basis for creating things like curricula and certifications. You can download it for free [https://www.computer.org/web/swebok/index as a PDF] or read it online [https://www.iso.org/obp/ui/#iso:std:iso-iec:tr:19759:ed-2:v2:en as HTML].


== Logging ==
The items under the first level link to my notes on those topics.


You need some kind of reporting even when you're coding for yourself, because to debug your program, you have to know what's going on as it runs. But when you're coding for yourself, you can usually afford to be minimal and undisciplined about it. When you might get bug reports from outside users, among other reasons, your program needs to write relevant state and event information to a file they can share.
* Software requirements
* Software design
* Software construction
** [[/Code Style/]]
** [[/Project Structure/]]
** [[/Installation/]]
** [[/Metadata/]]
** [[/Command Line Interface/]]
** [[/Documentation/]]
*** [[/Comments/]]
*** [[/Literate Programming/]]
*** [[/Docstrings/]]
*** [[/Project Documentation/]]
*** [[/READMEs/]]
*** [[/Command Line Help/]]
** [[/Application Configuration/]]
** [[/Logging/]]
** [[/Test-First Programming/]]
* Software testing
** [[/Testing/]]
* Software maintenance
** [[/Refactoring/]]
* Software configuration management
** [[/Version Control/]]
** [[/Version Numbers/]]
** [[/Distribution/]]
* Software engineering management
* Software engineering process
** [[/Software Development Methodology/]]
*** [[/Iterative and Incremental Development/]]
* Software engineering models and methods
* Software quality
* Software engineering professional practice
** [[/License/]]
* Software engineering economics
* Computing foundations
* Mathematical foundations
* Engineering foundations


* [https://docs.python.org/3/library/logging.html logging - The Python Standard Library]
== [[/Sources/]] ==
* [http://python-guide-pt-br.readthedocs.io/en/latest/writing/logging/ Logging - The Hitchhiker's Guide to Python]
* [https://www.blog.pythonlibrary.org/2012/08/02/python-101-an-intro-to-logging/ Python 101: An Intro to logging - Mouse vs Python]
* [https://fangpenlin.com/posts/2012/08/26/good-logging-practice-in-python/ Good logging practice in Python - Fang's coding note]


[More sections to come]
See the article linked in the heading for a list of code examples and other sources I'm drawing from.


<disqus/>
<disqus/>

Revision as of 22:44, 23 February 2019

Introduction

Purpose

This project is a growing set of notes on various aspects of software development. Its purpose is to define a set of standard operating procedures for my programming projects.

Background

For most of my coding life, I have been the only user of my programs. I wrote them to aid my work or personal life, and only rarely did anyone else even see my code or watch it run. This wasn't on purpose. It just turned out that way. In my current job I do belong to a small development team, but our programs are still only for in-house use, so the only needs we have to consider are our own.

This situation is starting to change. My programming projects are becoming more relevant to the outside world, and I want to share some of them as open source. This means I'm needing to reshape some of my programming practices and learn some areas of software development I've had the luxury of ignoring all these years.

I haven't completely ignored these areas, though. I pick up on them here and there as I read about how other people code or as I see them in the software I use. Knowing that I'll probably need to know about them someday, I've kept a list. And now the time has come for me to fill in the details.

I'm filling in the details here in this project for two reasons. First, as I learn, writing helps me clarify my thoughts and keep them flowing. And second, posting my writing might help other people. So if you're like me and you've programmed mainly for yourself--what I call private coding--and you're starting to share your work with others--public coding--then maybe what I've learned so far can grease some of your own planning.

Method

This project consists of my reading other developers' advice, examining their code, deciding what to adopt and how to adapt it to my needs and preferences, and creating templates, procedures, reference materials, and anything else that might help me code well for a public user base.

To show how I'm implementing these ideas in my own code, I'll refer to my GitHub repositories throughout the project, mainly my templates for cookiecutter, a Python app that creates project skeletons for any language, and my snippets for Sublime Text 2, the text editor I use. Snippets are templates Sublime inserts wherever you type the corresponding trigger strings.

Limitations

To give you an idea of my background, I've worked for the past 15 or so years in the publishing industry doing mostly text processing and some web development. The languages I'm most familiar with are Perl, Python, XSLT, and some JavaScript, with a smattering of PHP and VBA. I've spent almost all my time in Windows. I hope to branch out into at least Linux in the future, if only to expand my mind.

This project will be oriented toward Python for now, but I imagine a lot of its suggestions apply to other languages too. At this point it also centers around desktop command-line programs, but I'm hoping to expand it into other interfaces and environments.

Feedback

Since this project is a place for me to learn, I welcome feedback, and I'm always open to arguments that my choices could be improved or suggestions for things I've missed.

Topics

As a way to organize my notes and make sure I don't miss anything important, I'll use the structure of IEEE's Guide to the Software Engineering Body of Knowledge (SWEBOK). It's a standard that overviews the entire field and forms the basis for creating things like curricula and certifications. You can download it for free as a PDF or read it online as HTML.

The items under the first level link to my notes on those topics.

Sources

See the article linked in the heading for a list of code examples and other sources I'm drawing from.

<disqus/>