Introduction to Spreadsheet
Literal data, variables, value-, format- and parse-expressions (formulas) are stored in the cells of an HTML table. The user provides input for variables, the browser's scripting engine evaluates expressions and renders the results in the table cells.
TL;DR
- Getting Started
- Examples
Features
Expressions are valid JavaScript code and have full access to the entire JavaScript language, as well as many useful functions provided by the spreadsheet-core.js module. For example, the following expression converts an angle in row 1, column 1 from degree to radian value:
cell(1,1)/180*Math.PI
Angle | |
---|---|
degree | radian |
0 | ? |
90 | ? |
180 | ? |
270 | ? |
360 | ? |
Another example computes the length of the hypothenusis in a rectangular triangle after the rule of Pythagoras:
Math.sqrt(offsetCell(-2,0)**2 + offsetCell(-1,0)**2)
Triangle | |
---|---|
cathetus a | 3 |
cathetus b | 4 |
hypotenuse | ? |
To edit the cathetus values, click on the cells in column 1 and enter a numric value.
All expressions are evaluated by the browser's scripting engine. There is no home-grown expression interpreter, therefore, expression evaluation is fast and reliable.
Expression evaluation in spreadsheet-core is extensible – with ordinary JavaScript modules. So users and 3rd-parties can write their own function libraries in JavaScript and make them directly available to evaluation. This is dead-simple: Any regular JavaScript function can be used as user-defined function (UDF). Dynamic UDF module registration is a simple spreadsheet-core function call. Static UDF module import into spreadsheet-core is even simpler.
Modularity
spreadsheet-core.js is modular. Modules can be loaded statically and/or dynamically and on-demand. In combination with a fine-grained module structure, this ensures the smallest possible memory footprint and load times.
The modules:
- spreadsheet-core.js
- Core functionality (initialization, row and column label generation, expression evaluation across multiple spreadsheet tables, input parsing, output formatting, data input and event-triggered re-evaluation, static and dynamic JavaScript module import.
- This minimum feature set provides fully-functional interactive spreadsheets with predefined structure and expressions provided by the HTML author. Users can enter data, spreadsheets are recalculated autmatically.
- Expressions can address the cells of all other spreadsheets on the same HTML page. Relative and absolute addressing modes are available.
- Typical application: Calulators, from simple to complex.
- Example: Test 4 - Data Input and Expression Evaluation.
- Status: Beta. Initial tests passed, seems to be reliable, but still under construction.
- spreadsheet.css
- Not strictly a JavaScript module, but provides common styling attributes for spreadsheet tables.
- Of course, you may use your own stylesheet or override provided rules.
- spreadsheet-utility.js
- Useful functions to simplify coding.
- Status: Beta. Names may change at some time.
- spreadsheet-functions.js
- A static import module with user-defined functions, extends the set of functions provided by spreadsheet-core.js. These functions are avalable to all spreadsheets.
- Developers simply add functions to this module, or delete the ones they don't need. No configuration or registration needed.
- Status: Working. More functions will be added over time.
- spreadsheet-udf.js
- A dynamic import module template for more specialized functions. Typically used to package (free and paid) 3rd-party function libraries for a specific field, such as finance, statistics, simulation et cetera. Modules like this are dynamically imported into spreadsheet-core where needed.
- Status: Working. More functions will be added over time.
- spreadsheet-persistence.js
- Adds the funcionality to load spreadsheets from and store in the local file system.
- Spreadsheet files are stored as plain HTML text. When loaded from disk, the file is verified to be a valid HTML spreadsheet table. Value, format and parse expressions are removed by default, but the user can cancel a confirmation dialog to accept them.
- A HTML page developer may choose to load the spreadsheets dynamically into the page. This keeps the source code clean and minimizes the initial page load time.
- Status: Beta. Works, needs documentation.
- spreadsheet-data-exchange.js
- Import and export between commercial spreadsheet applications and spreadsheet-core.js.
- Static cell values should not be a problem. Expressions could be, because JavaScript is somewhat different from what is used in E***el and G**gle Sh**ts. Some sort of translator will be required.
- Status: Planned.
- spreadsheet-interactive.js
- Resize, hide and show rows and columns, select cells and cell ranges.
- Should be relatively small.
- Status: Implementation.
- spreadsheet-structure.js
- Adds features to allow the user to modify the spreadsheet structure (create, delete, cut, copy and delete rows, columns and cells), create and edit cell expressions, control cell value formatting and alignment.
- Status: Implementation.
- spreadsheet-graphics.js
- SVG is the keyword. Data visualization is an important aspect of spreadsheet applications. Currently just an idea, but we could either generate SVG dynamically, or incorporate a 3rd-party library like DDD.
- Status: Idea
Implementation Decisions
Object-oriented design has proven to be successful in large applications. However, spreadsheet expressions in commercial applications only use cell references and functions. So I guess they won't benefit much from an object-oriented design.
So I have chosen to go for a functional approach, (nearly) all functions are defined in the module scope. As a result, spreadsheet-core resembles very much an object-oriented singleton, which I may have chosen anyway.
Spreadsheet-0 is an alpha version, a design study, usuable, but breaking changes may occur. For spreadsheet-core-1, decisions might be changed, as well as a transition from JavaScript to Typescript.
Security Considerations
spreadsheet-core.js uses JavaScript eval() to evaluate expression in spreadsheet tables. A common mindset considers eval (and its siblings, Function() for example) to be evil, for security reasons. So, why is eval() still available in JavaScript after so many years? At least, any spreadsheet application is based on editable formulas that are evaluated by some sort of evaluator. So aren't they vulnerable too?
From the MDN: "If you run eval() with a string that could be affected by a malicious party, you may end up running malicious code..."
That's the point: The input string is the problem, not the eval function by itself.
In spreadsheet-core, expressions are stored in HTML table cell attributes, which are not accessible from (malicious) 3rd parties due to Cross Origin Resource Sharing (CORS) protection. Expressions are built by the HTML page author and/or the user. Both are considered to be trusted parties.
Loading Pages From a Malicious Site
Loading a web page from a malicious website that contains spreadsheet tables does not introduce an additional security risk: An attacker can store malicious code anywhere in a web page, there is no particular need or advantage in using spreadsheet tables.
Accidentally loading a page with malicious code into the browser is not automatically a catastrophe. Browsers run code in a sandbox environment, with very limited access to the system and its resources. Additionally, browsers "know" malicious sites, warn and prevent unintended access.
The general advice is to heed the browser warnings and stay safe.
Loading Spreadsheet Import Files
Users can be load stored spreadsheets from disk or an arbitrary URL.
Attention! Loading spreadsheets from untrusted 3rd parties can be dangerous!
As explained above, input to the eval function is the potential risk, and spreadsheet import files can contain malicious code. Therefore, the spreadsheet-persistence.js module takes a number of measures to keep a user safe:
- Spreadsheet import files are unencrypted plain text (html) files. They are fully accessible to virus scanners and the user.
- Upon import, all expressions are removed by default, before integration into the DOM of the page.
- All script elements are removed by default.
- Spreadsheet import file URLs different from the current document origin (= trusted site) are rejected by default.
This makes spreadsheet as similarly safe as regular spreadsheet applications, where macros are disabled by default.
If a user wants to reject the defaults (which is the equivalent of accepting macros in a spreadsheet application), he should follow these guidelines to stay on the safe side:
- Make sure to use the encrypted https:// protocol, not the unsave http://, to prevent man-in-the-middle attacks.
- Make sure the document into which you want to import the spreadsheet file is from a trusted source, such as your company webserver or a trusted 3rd-party webserver.
- Check the spreadsheet import file URL, and identify and only accept files from trusted parties.
Usually these two steps should be sufficient, but you always can and should go further:
- Download the spreadsheet and store it in the local file system. This is safe, because a spreadsheet file is plain text.
- If you are unsure, run a virus scanner on that file. Modern operating system will do that anyway before you can get hold of the file.
- Inspect the file with a plain text editor, such as Notepad, Nano or vi. This is safe because plain text editors don't execute text files.
- Search for data-xpr, data-format and data-parse attributes. These are the elements that may contain expressions. Have a close look on them, and when in doubt, delete the attribute, reject the file or ask someone with JavaScript expertise.
- Search for <script> tags in the file. There should be none. If you find one, inspect the code closely. Obfuscated code is suspicious and hard to understand. If you are in doubt, delete the script, ask some expert or reject the file.
- Delete scripts that link to spreadsheet modules. Spreadsheet import files are indended to be loaded into to the DOM of a trusted HTML file, which provides the necessary infrastructure for spreadsheet tables. Of course must be from a trusted source and safe, such as your company webserver or a trusted 3rd party webserver.
- Search for <a> tags. These are links. Inspect their href attribute. If they reference suspicious sites, delete the href attribute or reject the file entirely.
Row and Column Numbering
Worksheet rows and columns are logically numbered, starting from 0. Cell(0,0) is the first cell in the upper left corner.
0,0 | 0,1 | 0,2 | 0,3 |
1,0 | 1,1 | 1,2 | 1,3 |
2,0 | 2,1 | 2,2 | 2,3 |
Usually logial cell coordinates correspond to the physical row and cell indices in the table.rows and row.cells arrays. For example, cell(1,2) is addressed in code with table.rows[ 1 ].cells[ 2 ].. But cells can span multiple rows and/or columns. This has the effect that some row.cells arrays can have fewer cells than table columns:
0,0 | 0,1 | 0,2 | 0,3 |
1,0 | 1,1 | 1,3 | |
2,0 | 2,3 | ||
3,0 | 3,1 | 3,2 | 3,3 |
As a result, locial cell column numbers will differ from physical cell index values. In the example above, rows[ 1 ].cells has only three elements, rows[ 2 ].cells has only two elements. The logical cell(1,3) is accessed with table.rows[ 1 ].cells[ 2 ], and cell(2,3) is accessed in code with table.rows[ 2 ].cells[ 1 ].
Also note that row indices are not affected. Rows are always present, cells may not.
A spreadsheet may have row and column labels. They are implemented with regular HTML table cells, but they don't affect the logical numbering of workarea cells. As a result, programmatic row and cell index values will differ by 1 from locigal row and column numbers.
0 | 1 | 2 | 3 | |
---|---|---|---|---|
0 | 0,0 | 0,1 | 0,2 | 0,3 |
1 | 1,0 | 1,1 | 1,2 | 1,3 |
2 | 2,0 | 2,1 | 2,2 | 2,3 |
Formulas in the spreadsheet always use logial row and column numbers to address cells, because they don't change when a label row or column is added or removed. Additionally, this logical numbering is what we are used to from existing popular spreadsheet applications.
To make logical addressing possible, the initSpreadsheet method calculates logical row and column numbers and stores them in row.dataset.row and cell.dataset.col fields. Note that these dataset members reflect the data-row and data-col HTML element attributes, they are visible in the DOM inspector in the browser developer tools.
Anatomy of a Cell
The following list contains a concise description of cell elements that participate in cell value presentation, storing and editing.
- Cell.spreadheetCore.value
- Holds the cell value as native JavaScript object.
- To obtain a textual representation, the value.toString() method is used.
- I have favoured this method over a data attribute to prevent a loss of precision, minimize processing overhead for expression evaluation, and keep native type information.
- Cell.innerHTML
- determines the visual representation of the value, possibly decorated with formatting tags such as <i> and <b>.
- Cell.innerText
- A plain text representation of the value (Cell.spreadsheetCore.value.toString()) while editing. The value is set in the focusin event handler of the spreadheet table, and parsed in the focusout event handler. The parsing result determines the data type.
- data-parse Attribute
- The page author can demand a JavaScript data type for the cell value. The attribute usually contains a JavaScript constructor call like Date(value), Number(value) or Boolean(value), but actually can hold arbitrary script code.
- The script is executed with cell.innerText in the first argument when cell editing is finished, i. e. in the focusout event handler.
- data-format Attribute
- This attribute is used to create the string for cell.innerHTML. The attribute contains JavaScript code that is evaluated to a string. The variable name "value" is used to access the cell.spreadsheetCore.value.
- Some examples: toFixed( value, 2 ), value.toLocaleDateString(), value.toISOString(), value.toISOString().substr(0,10), "<b><i>"+value+</b></i>.
- More details in the Formatting Cell Values chapter.
- data-xpr Attribute
- This attribute contains an expression, the cell value is calculated from other cell values or literal values. When a cell value has been modified, all expressions on the page must be re-evaluated.
Formatting Cell Values
As already mentioned, the data-format attribute contains JavaScript instructions to convert the binary cell.spreadsheetCore.value into a text representation for cell.innerHTML. Whenever cell.spreadsheetCore.value is changed, the formatting result is written to cell.innerHTML.
This format/parse architecture was borrowed from the databinding of the .NET Framework where it works very well. The formatting code can leverage the full potential of JavaScript to get the value formatted. A few useful formatting support functions were implemented in spreadsheet-core.js. Others can be added by the page developer.
Spreadsheet Core Formatting Functions
At the time of writing, these formatting support functions are available.
- toFixed( numberValue, decimalPlaces )
- Convertes a number value into a floating point representation with the specified number of decimal places.
- italic( value )
- Surrounds value with <i> tags.
- bold( value )
- Surrounds value with <b> tags.
- underline( value )
- Surrounds value with <u> tags.
Author-Supplied Formatting Modules
The page author may supply his own JavaScript module(s) with custom formatting functions.
Formatting Examples
Numbers | ||
---|---|---|
123.456789 | toFixed( value, 2 ) | 123.46 |
12.3 | toFixed( value, 2 ) | 12.30 |
12345678 | value.toString( 16 ) | bc614e |
12345678 | value.toExponential(4) | 1.2346e+7 |
Date | ||
2023-01-27T10:22:19Z | value.toString() | Fri Jan 27 2023 11:22:19 GMT+0100 (Central European Standard Time) |
value.toISOString() | 2023-01-27T10:20:19.000Z | |
value.toISOString().substr(0, 10) | 2023-01-27 | |
value.toLocaleString( "en-US" ) | 1/27/2023, 11:22:19 AM | |
value.toLocaleString( "de-DE" ) | 27.1.2023, 11:22:19 | |
Decoration | ||
123 | italic(value) | <i>123</i> |
bold(value) | <b>123</b> | |
bold(italic(value)) | <b><i>123</i></b> |
User Input Parsing
The type of a cell value is defined by the user by entering some text or the evaluation of a formula, but the document author can provide the data-parse attribute to enforce that the cell value to be stored in cell.spreadsheetCore.value is an object of a specific kind, such as Number, BigInt, String, Boolean, or Date.
The result of a formula evaluation usually provides a specific data type, but user input is just plain text. If no data-parseattribute was provided, the cell data type will have to be guessed. In order to support the type guessing algorithm, the user can provide control characters in the first two characters of the input text:
- n$
- Number
- i$
- Big Integer
- s$
- String
- b$
- Boolean
- d$
- Date
- =$
- Formula
If no prefix was given, some attempts are mate to convert the input text into typed objects, in the following order:
- Number (because spreadsheets are about crunching numbers in the first place)
- Boolean (because that can easily be determined, only true and false are allowed)
- String (everything else)
Everything else cannot be guessed, because the type constructors for e. g. Date accept nearly everything that can remotely be converted to a Date.
Cell Editing
The cell, when not being edited, shows the formatted value in cell.innerHTML. Therefore the displayed value may have reduced precsion, e. g. due to rounding, and even may be decorated with HTML tags.
When a cell is edited, the unformatted, the conditionally prefixed raw value appears in the cell. The prefix indicates the data type:
- Single quote (')
- Indicates a string value
- Equal sign (=)
- Indicates an expression
- Otherwise
- The cell contains a number
When editing is finished, the raw value in the cell is parsed. The prefix character will determine the data type. If there is a no prefix, the data type is guessed. The cell.innerText is then passed to related type constructor (Number, BigInt, Boolean, Date or String), the result is then stored in cell.spreadsheetCore.value.
If there is an equal sign, the cell.innerText is stored in the cell.dataset.xpr member, the expression is evaluated and the result is stored in cell.spreadsheetCore.value as JavaScript object. Finally, this value is formatted according to cell.dataset.format and written to cell.innerHTML.
the cell.innerText is stored as innerHTML to allow custom text formatting. The cell value will then parsed and stored as
Cell Value Change and Re-Evaluation
The following HTML code shows user input fields and a computed cell with an xpr attribute:
<table class="flat-borders"><thead><tr> <th colspan="2">Equation Demo</th> </tr></thead><tbody><tr> <th>A</th> <tdcontenteditable>12</td> </tr><tr> <th>B</th> <tdcontenteditable>34</td> </tr><tr> <th>Sum</th> <tdxpr="cell(-2,0)+cell(-1,0)"></td> </tr></tbody></table>
Whenever the input cell value is changed, the result cell is updated. Effectively, it should work as in the following example:
Equation Demo | |
---|---|
A | 12 |
B | 34 |
Sum | 46 |
Although, the computation has been faked here with a hard-coded instruction rather than the evaluation of an equation. Here is the code:
// Find the spreadsheet tables const spreadsheets = document.getElementsByClassName( "spreadsheet" ); for ( let i = 0 ; i < spreadsheets.length ; i ++ ) {// Add input event handlers spreadsheets[ i ].addEventListener( "input", function ( evt ) {// Input event: Calculate the new sum and update the result cell this.rows[ 3 ].cells[ 1 ].innerText = parseInt( this.rows[ 1 ].cells[ 1 ].innerText ) + parseInt( this.rows[ 2 ].cells[ 1 ].innerText ); } ) ; }
The input event fires with every keystroke. The eventhandler performs the computation and updates the result cell. In reality, all equations of a spreadsheet must be evaluated. Or at least all equations that directly or indirectly reference the modified input cell. In a complex worksheet, this might take a considerable amount of time.
The change event occurs when the cell value is commited by pressing enter or when the input element looses focus. Unfortunately, this event is not fired by the td elements, so we will have to resort to a different stragety: A timeout event can be registered that updates the spreadsheet table after a short while of input inactivity.
Expression Syntax
Preparing Equations for Evaluation
A computed cell has a formula associated, like this:
<td equation="2 * (cell(-2,0) + cell(-1,0))"
This formula addresses cells in the two rows above the current cell, same column. This is relative addressing, the parameters of the cell function are considered offset values. This
<td equation="2 * (cell($0$1) + cell($1$1))"
The equation accesses cells in row 0, column 1 and row 1, column 0. Thi
Maintaining Cell References
Whenever rows or columns are inserted or deleted, cell references in equations will be broken. Without countermeasures, a spreadsheet will be rather useless. In this section we will develop a strategy how cell references can be maintained and kept valid.
Observations:
1. If we insert a row (column), references that go across this new row (column), are broken.
Differences to Spreadsheet Applications
- Cell References
- Rows and columns are addressed by numbers, starting from 0. Traditional spreadsheet applications use letters for columns and numbers for rows.
- Cells are addressed with functions such as cell(row, column), where other spreadsheets use letters, followed by numbers, i. e. in a column-row format.