Ethereum Builder's Guide

Glossary

One of the things that cryptocurrency, and really any new genre of technology, is notorious for is the sheer quantity of vocabulary that gets generated to describe all of the new concepts. Anyone dealing with peer-to-peer internet software on anything more than a casual basis needs to deal with concepts of cryptography, including hashes, signatures and public and private keys, symmetric and asymmetric encryption, denial of service protection, as well as arcane constructions such as distributed hash tables and webs of trust. New Bitcoin users are forced to contend with learning not just the common basics of cryptography, but also additional internal jargon such as "blocks", "confirmations", "mining", "SPV clients" and "51% attacks", as well as economic concepts like incentive-compatibility and the fine nuances of centralization and decentralization. Ethereum, being a decentralized application development platform based on a generalization of a cryptocurrency, necessarily incorporates both of these sets of concepts, as well as adding many of its own. To help anyone new to Ethereum, whether they are in it as cryptocurrency enthusiasts, business owners, social or political visionaries, web developers or are simply ordinary people looking to see how the technology can improve their lives, the following list is intended to provide a basic summary of the vocabulary that Ethereum users often tend to use:

Cryptography

Computational infeasibility: a process is computationally infeasible if it would take an impracticably long time (eg. billions of years) to do it for anyone who might conceivably have an interest in carrying it out. Generally, 2⁸⁰ computational steps is considered the lower bound for computational infeasibility.
Hash: a hash function (or hash algorithm) is a process by which a document (ie. a piece of data or file) is processed into a small piece of data (usually 32 bytes) which looks completely random, and from which no meaningful data can be recovered about the document, but which has the important property that the result of hashing one particular document is always the same. Additionally, it is crucially important that it is computationally infeasible to find two documents that have the same hash. Generally, changing even one letter in a document will completely randomize the hash; for example, the SHA3 hash of "Saturday" is c38bbc8e93c09f6ed3fe39b5135da91ad1a99d397ef16948606cdcbd14929f9d, whereas the SHA3 hash of Caturday is b4013c0eed56d5a0b448b02ec1d10dd18c1b3832068fbbdc65b98fa9b14b6dbf. Hashes are usually used as a way of creating a globally agreed-upon identifier for a particular document that cannot be forged.
Encryption: encryption is a process by which a document (plaintext) is combined with a shorter string of data, called a key (eg. c85ef7d79691fe79573b1a7064c19c1a9819ebdbd1faaab1a8ec92344438aaf4), to produce an output (ciphertext) which can be "decrypted" back into the original plaintext by someone else who has the key, but which is incomprehensible and computationally infeasible to decrypt for anyone who does not have the key.
Public key encryption: a special kind of encryption where there is a process for generating two keys at the same time (typically called a private key and a public key), such that documents encrypted using one key can be decrypted with the other. Generally, as suggested by the name, individuals publish their public keys and keep their private keys to themselves.
Digital signature: a digital signing algorithm is a process by which a user can produce a short string of data called a "signature" of a document using a private key such that anyone with the corresponding public key, the signature and the document can verify that (1) the document was "signed" by the owner of that particular private key, and (2) the document was not changed after it was signed. Note that this differs from traditional signatures where you can scribble extra text onto a document after you sign it and there's no way to tell the difference; in a digital signature any change to the document will render the signature invalid.

Blockchains

Address: an address is essentially the representation of a public key belonging to a particular user; for example, the address associated with the private key given above is cd2a3d9f938e13cd947ec05abc7fe734df8dd826. Note that in practice, the address is technically the hash of a public key, but for simplicity it's better to ignore this distinction.
Transaction: a transaction is a document authorizing some particular action associated with the blockchain. In a currency, the dominant transaction type is sending currency units or tokens to someone else; in other systems actions like registering domain names, making and fulfilling trade offers and entering into contracts are also valid transaction types.
Block: a block is a package of data that contains zero or more transactions, the hash of the previous block ("parent"), and optionally other data. The total set of blocks, with every block except for the initial "genesis block" containing the hash of its parent, is called the blockchain and contains the entire transaction history of a network. Note that some blockchain-based cryptocurrencies instead use the word "ledger" for a blockchain; the two are roughly equivalent, although in systems that use the term "ledger" each block generally contains a full copy of the current state (eg. currency balances, partially fulfilled contracts, registrations) of every account allowing users to discard outdated historical data.
Account: an account is the entry in a ledger, indexed by its address, that contains the complete data about the state of that account. In a currency system, this involves currency balances and perhaps unfulfilled trade orders; in other cases more complex relationships may be stored inside of accounts.
Proof of work: one important property of a block in Bitcoin, Ethereum and many other crypto-ledgers is that the hash of the block must be smaller than some target value. The reason this is necessary is that in a decentralized system anyone can produce blocks, so in order to prevent the network from being flooded with blocks, and to provide a way of measuring how much consensus there is behind a particular version of the blockchain, it must in some way be hard to produce a block. Because hashes are pseudorandom, finding a block whose hash is less than 0000000100000000000000000000000000000000000000000000000000000000 takes an average of 4.3 billion attempts. In all such systems, the target value self-adjusts so that on average one node in the network finds a block every N minutes (eg. N = 10 for Bitcoin and 1 for Ethereum).
Nonce: a meaningless value in a block which can be adjusted in order to try to satisfy the proof of work condition
Mining: mining is the process of repeatedly aggregating transactions, constructing a block and trying difference nonces until a nonce is found that satisfies the proof of work condition. If a miner gets lucky and produces a valid block, they are granted a certain number of coins as a reward as well as all of the transaction fees in the block, and all miners start trying to create a new block containing the hash of the newly generated block as their parent.
Stale: a stale is a block that is created when there is already another block with the same parent out there; stales typically get discarded and are wasted effort.
Fork: a situation where two blocks are generated pointing to the same block as their parent, and some portion of miners see one block first and some see the other. This may lead to two blockchains growing at the same time. Generally, it is mathematically near-certain that a fork will resolve itself within four blocks as miners on one chain will eventually get lucky and that chain will grow longer and all miners switch to it; however, forks may last longer if miners disagree on whether or not a particular block is valid.
Double spend: a deliberate fork, where a user with a large amount of mining power sends a transaction to purchase some produce, then after receiving the product creates another transaction sending the same coins to themselves. The attacker then creates a block, at the same level as the block containing the original transaction but containing the second transaction instead, and starts mining on the fork. If the attacker has more than 50% of all mining power, the double spend is guaranteed to succeed eventually at any block depth. Below 50%, there is some probability of success, but it is usually only substantial at a depth up to about 2-5; for this reason, most cryptocurrency exchanges, gambling sites and financial services wait until six blocks have been produced ("six confirmations") before accepting a payment.
SPV client (or light client) - a client that downloads only a small part of the blockchain, allowing users of low-power or low-storage hardware like smartphones and laptops to maintain almost the same guarantee of security by sometimes selectively downloading small parts of the state without needing to spend megabytes of bandwidth and gigabytes of storage on full blockchain validation and maintennance.

Ethereum Blockchain

Serialization: the process of converting a data structure into a sequence of bytes. Ethereum internally uses an encoding format called recursive-length prefix encoding (RLP), described here
Patricia tree (or trie): a data structure which stores the state of every account. The trie is built by starting from each individual node, then splitting the nodes into groups of up to 16 and hashing each group, then making hashes of hashes and so forth until there is one final "root hash" for the entire trie. The trie has the important properties that (1) there is exactly one possible trie and therefore one possible root hash for each set of data, (2) it is very easy to update, add or remove nodes in the trie and generate the new root hash, (3) there is no way to modify any part of the tree without changing the root hash, so if the root hash is included in a signed document or a valid block the signature or proof of work secures the entire tree, and (4) one can provide just the "branch" of a tree going down to a particular node as cryptographic proof that that node is indeed in the tree with that exact content. Patricia trees are also used to store the internal storage of accounts as well as transactions and uncles. See here for a more detailed description.
GHOST: GHOST is a protocol by which blocks can contain a hash of not just their parent, but also hashes for stales that are other children of the parent's parent (called uncles). This ensures that stales still contribute to blockchain security, and mitigates a problem with fast blockchains that large miners have an advantage because they hear about their own blocks immediately and so are less likely to generate stales.
Uncle: a child of a parent of a parent of a block that is not the parent, or more generally a child of an ancestor that is not an ancestor. If A is an uncle of B, B is a nephew of A.
Account nonce: a transaction counter in each account. This prevents replay attacks where a transaction sending eg. 20 coins from A to B can be replayed by B over and over to continually drain A's balance.
EVM code: Ethereum virtual machine code, the programming language in which accounts on the Ethereum blockchain can contain code. The EVM code associated with an account is executed every time a message is sent to that account, and has the ability to read/write storage and itself send messages.
Message: a sort of "virtual transaction" sent by EVM code from one account to another. Note that "transactions" and "messages" in Ethereum are different; a "transaction" in Ethereum parlance specifically refers to a physical digitally signed piece of data that goes in the blockchain, and every transaction triggers an associated message, but messages can also be sent by EVM code, in which case they are never represented in data anywhere.
Storage: a key/value database contained in each account, where keys and values are both 32-byte strings but can otherwise contain anything.
Externally owned account: an account controlled by a private key. Externally owned accounts cannot contain EVM code.
Contract: an account which contains, and is controlled by, EVM code. Contracts cannot be controlled by private keys directly; unless built into the EVM code, a contract has no owner once released.
Ether: the primary internal cryptographic token of the Ethereum network. Ether is used to pay transaction and computation fees for Ethereum transactions.
Gas: a measurement roughly equivalent to computational steps. Every transaction is required to include a gas limit and a fee that it is willing to pay per gas; miners have the choice of including the transaction and collecting the fee or not. If the total number of gas used by the computation spawned by the transaction, including the original message and any sub-messages that may be triggered, is less than or equal to the gas limit, then the transaction processes. If the total gas exceeds the gas limit, then all changes are reverted, except that the transaction is still valid and the fee can still be collected by the miner. Every operation has a gas expenditure; for most operations it is 1, although some expensive operations fave expenditures up to 100 and a transaction itself has an expenditure of 500.

Non-blockchain

EtherBrowser: the upcoming primary client for Ethereum, which will exist in the form of a web browser that can be used to access both normal websites and applications built on top of the Ethereum platform
Whisper: an upcoming P2P messaging protocol that will be integrated into the EtherBrowser
Swarm: an upcoming P2P data storage protocol optimized for static web hosting that will be integrated into the EtherBrowser
Solidity, LLL, Serpent and Mutan: prorgamming languages for writing contract code which can be compiled into EVM code. Serpent can also be compiled into LLL.
PoC: proof-of-concept, another name for a pre-launch release

Surrounding concepts: applications and governance

Decentralized application: a program which is run by many people which either uses or creates a decentralized network for some specific purpose (eg. connecting buyers and sellers in some marketplace, sharing files, online file storage, maintaining a currency). Ethereum-based decentralized applications (also called Đapps, where the Đ is the Norse letter "eth") typically consist of an HTML/Javascript webpage, and if viewed inside the EtherBrowser the browser recognizes special Javascript APIs for sending transactions to the blockchain, reading data from the blockchain and interacting with Whisper and Swarm. A Đapp typically also has a specific associated contract on the blockchain, though Đapps that facilitate the creation of many contracts are quite possible.
Decentralized organization: an organization that has no centralized leadership, instead using a combination of formal democratic voting processes and stigmergic self-organization as their primary operating principles. A less impressive but sometimes confused concept is a "geographically distributed organization", an organization where people work far apart from each other and which may even have no office at all; GDOs can still have formal centralized leadership.
Theseus criterion: a test for determining how decentralized an organization is. The test is as follows: suppose the organization has N people, and aliens try to pick K people in the organization at a time (eg. once per week) and zap them out of existence, replacing each group with K new people who know nothing about the organization. For how high K can the organization function? Dictatorships fail at K = 1 once the dictator is zapped away; the US government does slightly better but would still be in serious trouble if all 638 members of the Senate and Congress were to disappear, where as something like Bitcoin or BitTorrent is resilient for extremely high K because new agents can simply follow their economic incentives to fill in the missing roles. A stricter test, the Byzantine Theseus criterion, involves randomly substituting K users at a time with malicious actors for some time before replacing them with new users.
Decentralized autonomous organization: decentralized organizations where the method of governance is in some fashion "autonomous", ie. it's not controlled by some form of discussion process or committee.
No common language criterion: a test for determining how autonomous an organization is. The test is as follows: suppose the N people in the organization speak N different languages, with none in common. Can the organization still function?
Delegative democracy (or liquid democracy): a governance mechanism for DOs and DAOs where everyone votes on everything by default, but where individuals can select specific individuals to vote for them on specific issues. The idea is to generalize the tradeoff between full direct democracy with every individual having equal power and the expert opinion and quick decision making ability provided by specific individuals, allowing people to defer to friends, politicians, subject matter experts or anyone else to exactly the extent that they want to.
Futarchy: a governance mechanism originally proposed by Robin Hanson for managing political organizations, but which is actually extremely applicable for DOs and DAOs: rule by prediction market. Essentially, some easily measurable success metrics are chosen, and tokens are released which will pay out some time in the future (eg. after 1 year) depending on the values of those success metrics, with one such token for each possible action to take. Each of these tokens are traded against a corresponding dollar token, which pays out exactly $1 if the corresponding measure is taken (if the corresponding measure is not taken, both types of tokens pay $0, so the probability of the action actually being taken does not affect the price). The action that the market predicts will have the best results, as judged by the token's high price on its market, is the action that will be taken. This provides another, autonomous, mechanism for selecting for and simultaneously rewarding expert opinion.

Economics

Token system: essentially, a fungible virtual good that can be traded. More formally, a token system is a database mapping addresses to numbers with the property that the primary allowed operation is a transfer of N tokens from A to B, with the conditions that N is non-negative, N is not greater than A's current balance, and a document authorizing the transfer is digitally signed by A. Secondary "issuance" and "consumption" operations may also exist, transaction fees may also be collected, and simultaneous multi-transfers with many parties may be possible. Typical use cases include currencies, cryptographic tokens inside of networks, company shares and digital gift cards.
Namespace: a database mapping names to values. In the simplest example anyone can register an entry if the name has not already been taken (perhaps after paying some fee). If a name has been taken then it can only be changed (if at all) by the account that made the original registration (in many systems, ownership can also be transferred). Namespaces can be used to store usernames, public keys, internet domain names, token systems or other namespaces, and many other applications.
Identity: a set of cryptographically verifiable interactions that have the property that they were all created by the same person
Unique identity: a set of cryptographically verifiable interactions that have the property that they were all created by the same person, with the added constraint that one person cannot have multiple unique identities
Incentive compatibility: a protocol is incentive-compatible if everyone is better off "following the rules" than attempting to cheat, at least unless a very large number of people agree to cheat together at the same time.
Basic income: the idea of issuing some quantity of tokens to every unique identity every period of time (eg. months), with the ultimate intent being that people who do not wish to work or cannot work can survive on this allowance alone. These tokens can simply be crafted out of thin air, or they can come from some revenue stream (eg. profit from some revenue-generating entity, or a government); in order for the BI to be sufficient for a person to live on it alone, a combination of revenue streams will likely have to be used.
Public good: a service which provides a very small benefit to a very large number of people, such that no individual has a large effect on whether or not the good is produced and so no one has the incentive to help pay for it.
Reputation: the property of an identity that other entities believe that identity to be either (1) competent at some specific task, or (2) trustworthy in some context, ie. not likely to betray others even if short-term profitable.
Web of trust: the idea that if A highly rates B, and B highly rates C, then A is likely to trust C. Complicated and powerful mechanisms for determining the reliability of specific individuals in specific concepts can theoretically be gathered from this principle.
Escrow: if two low-reputation entities are engaged in commerce, the payer may wish to leave the funds with a high-reputation third party and instruct that party to send the funds to the payee only when the product is delivered. This reduces the risk of the payer or payee committing fraud.
Deposit: digital property placed into a contract involving another party such that if certain conditions are not satisfied that property is automatically forfeited to the counterparty
Hostage: digital property placed into a contract such that if certain conditions are not satisfied that property is automatically either destroyed or donated to charity or basic income funds, perhaps with widely distributed benefit but necessarily with no significant benefit to any specific individual.