Today we have a question about what they might have been doing with the R1 computer. It involves something (safe to say) I’d never expected to be writing about: Markov chain Monte Carlo sampling.

Many thanks to reader Bill Harris ’71 for raising this issue of the possible role of Zevi Salsburg (one of my heroes) and the R1 computer in the history of statistics. (Harris earlier sent in some great photographs of the R1, seen here.) He first asked it in an email to me. This is the relevant part, with the key question at the end:

When I was one of Dr. Salsburg’s student computer operators in the late 1960s, I recall being told that his programs did some sort of stochastic modeling of molecules, that they ran for 1,000 to 2,000 hours and that we had to run each program twice to be sure we had the same (or was it consistent?) answers. Reading that abstract and Betancourt’s article made me wonder if Dr. Salsburg was doing early MCMC work on the R1 and the two runs weren’t done because the hardware might be flaky, as I thought people had said, but because you typically run a program multiple times (each run called a “chain”) to be sure the Monte Carlo algorithm had “converged.

Last weekend the discussion was amplified and new detail added here at Andrew Gelman’s blog, Statistical Modeling, Causal Inference, and Social Science. If this makes sense to you or if you know someone who might be able to shed some light on the matter please do let me know. I realize that this is a wee bit arcane but it’s also a chance to learn something important about computers at Rice.

Modern computing systems have extra hardware for detecting, and sometimes correcting, bit errors. These can be on internal connections, on storage devices, networks, lots of places. In this case “modern” is something like the most recent 50 or 60 years of computing. Each time a new, cheaper kind of computer arrived (minicomputers, PCs), the first models did without error checking to save money.

With a hand-built computer like the R1, errors were possible and there might not have been error-detecting circuitry. Running the program twice would have been a reasonable precaution. Starting the pseudo-random number generators with the same seeds should mean both runs would get the exact same results.

The weekend after I started at HP in 1985, a memory board in my personal Unix computer failed. The “quarter pounder” board (256 kbytes) did not have error checking, so the computer kept using it. My computer was handling UUCP message traffic all weekend, so the bad memory wrote junk in the most-used message files. That was a sad machine on Monday morning. And a very annoyed engineer. I disconnected my disk drive and rolled it over to Don Bennett’s cube to fsck the thing.

Walter, that’s a credible explanation and the one I accepted at the time. The R1 was using core memory at that time, and I don’t recall hearing much about errors in core memory as I have in semiconductor RAM or as I can imagine in Williams tubes, though.

The problem in MCMC is that there is no proof that the sampling has converged, though, and the conventional approach is to run multiple “chains” starting from different initialization states and compare the results. There is a Gelman-Rubin R-hat statistic that’s used to help in the assessment these days; in the past, I think one compared the variance of the sampled results between chains. If this really was MCMC, as Carlos Ungil supports on the linked blog posting, then it also seems plausible that Dr. Salsburg was running 2 chains and checking convergence. As flakey as hardware can be, an MCMC sampler can probably match it or better. Or perhaps he did it for both reasons.

I don’t recall anything about ECC on the R1; maybe someone else has recollection about that.

Marty Graham had designed an error checking system for the R1 Radechon CRT memory using seven bits of each memory word. That system made it possible to use the CRT tubes long past the time they would otherwise have been replaced. Single bit errors were detected and corrected, two or more bit errors could be detected but not corrected. When we added the RCA and Ampex core memory systems I designed the interface so that the data path still went through the error checking logic, but as far as I remember we never detected a random bit error in the core memory, and of course by that time the Radechon tube memory was no longer used.
As far as we know the R1 was the first computer to use an error correcting system. Sometime in the early 1960s Richard Hamming visited to give a lecture on his work at Bell labs and when he was shown the R1 he expressed amazement that his code system had been put to use.

I can’t “shed light on the matter”, but maybe a former Dean of Engineering can. She’s a statistician by trade, but of course she was not here in the 60s. She served prior to Dean Ned Thomas. I don’t know where she is today. This was her title back in the day.

Dr. Sallie Keller-McNulty, Professor of Statistics and Dean, George R. Brown School of Engineering, Rice University

Melissa, I tried to leave a comment about my thesis which I think would be of interest for this topic, especially to Bill Harris. I wrote the original set of programs for the Monte Carlo procedure using Markov chains, and ran many hundreds of hours for the calculations described in my thesis.

âThis dissertation (63-7156) has been microfilmed exactly: as received CHESNUT, Dwayne Allen, 1936- A MONTE CARLD PROCEDURE FOR STATISTICAL MECHANICAL CALCULATIONS IN A RESTRICTED GRAND CANONICAL ENSEMBLE OF LATTICE SYSTEMS. Rice University, Ph.D., 1963 Chemistry, physical University Microfilms, Inc., Ann Arbor, Michigan

I can email a pdf of the entire 175 pages to anyone who is interested.

[image: Inline image 1] [image: Inline image 2]

On Mon, Jan 8, 2018 at 2:32 PM, Rice History Corner wrote:

> Melissa Kean posted: ” Today we have a question about what they might have > been doing with the R1 computer. It involves something (safe to say) I’d > never expected to be writing about: Markov chain Monte Carlo sampling. Many > thanks to reader Bill Harris ’71 for raising th” >

Many thanks!! This afternoon I managed to dig up a list of Zevi’s publications from 1968. I’m assuming this is you: Chestnut and Salsburg, “A Monte Carlo Procedure for the Computation of Averages in the Grand Canonical Ensemble,” J. Chem. Phys., 38, 2861 (1963). Yes?

That is correct. Yesterday I sent a long email to Rice History Corner containing the table of contents to my thesis, which has much more detail than the paper about the Markov Chain Monte Carlo method. I thought that might be of interest to you and Bill Harris. I can send a PDF to anyone who is interested in having the entire lengthy document.

There’s also an earlier paper: Salsburg, Jacobson, Fickett and Wood, “Application of the Monte Carlo Method to the Lattice-Gas Model. I. Two-dimensional Triangular Lattice,” J. Chem. Phys., 30, 65 (1959).

Dwayne, I would be interested in seeing it, although I don’t know how much I’d understand (while I liked chemistry, I stopped after CHEM 100 with Salsburg and Sass) Instead of emailing, I wonder if it would be of use for Melissa to host a copy on her domain so that others can see it–or is that overstepping what you’d like to see or what rights go along with the document?

I would be very happy if Melissa could host a copy of my thesis. I don’t know about the copyrights — it is possible that University Microfilms has some rights, but I suspect Melissa would know that.

Much of the thesis is a lot of detail on how the system was represented in the computer and the statistical issues, rather than chemistry. The math gets a bit tedious for connecting thermodynamic properties to the Markov chain averages, but you can sort of scan through that part, Bill.

I looked up the Chestnut and Salsburg paper from 1963. The first sentence of its second paragraph says it all.
“The Monte Carlo method as used in statistical mechanics consists of the generation of a Markov chain of successive states of the system being studied. ”
So, they were using an MCMC (Markov Chain Monte Carlo) method. Apparently, MCMC goes back to the early 1950s where the Metropolis algorithm for stochastic simulations was used. So, MCMC was first used in physics and has become popular more recently in many areas with the rise of Bayesian methods, which are based on Markov chains.

The repetitive simulations were required because of the randomness of the method, not memory problems. One can get stuck at a potential answer only to discover with a different set of starting values that a better answer emerges.

>>> “The repetitive simulations were required because of the randomness of the method, not memory problems.” <<<

That's what I was thinking. If multiple runs of Monte Carlo simulations converge on the same answer, you have more confidence in the validity of that answer.

(Of course, uncorrected machine (hardware) errors are another sort of perturbation, which reliable error-correction aims to eliminate.)

BTW, there is no middle “t” in Chesnut. In my thesis there is a discussion for estimating the errors in the MC averages as a function of the number of times the system is sampled (aka the chain length). The small memory limited the size of the system that could be considered.

Modern computing systems have extra hardware for detecting, and sometimes correcting, bit errors. These can be on internal connections, on storage devices, networks, lots of places. In this case “modern” is something like the most recent 50 or 60 years of computing. Each time a new, cheaper kind of computer arrived (minicomputers, PCs), the first models did without error checking to save money.

With a hand-built computer like the R1, errors were possible and there might not have been error-detecting circuitry. Running the program twice would have been a reasonable precaution. Starting the pseudo-random number generators with the same seeds should mean both runs would get the exact same results.

The weekend after I started at HP in 1985, a memory board in my personal Unix computer failed. The “quarter pounder” board (256 kbytes) did not have error checking, so the computer kept using it. My computer was handling UUCP message traffic all weekend, so the bad memory wrote junk in the most-used message files. That was a sad machine on Monday morning. And a very annoyed engineer. I disconnected my disk drive and rolled it over to Don Bennett’s cube to fsck the thing.

Walter, that’s a credible explanation and the one I accepted at the time. The R1 was using core memory at that time, and I don’t recall hearing much about errors in core memory as I have in semiconductor RAM or as I can imagine in Williams tubes, though.

The problem in MCMC is that there is no proof that the sampling has converged, though, and the conventional approach is to run multiple “chains” starting from different initialization states and compare the results. There is a Gelman-Rubin R-hat statistic that’s used to help in the assessment these days; in the past, I think one compared the variance of the sampled results between chains. If this really was MCMC, as Carlos Ungil supports on the linked blog posting, then it also seems plausible that Dr. Salsburg was running 2 chains and checking convergence. As flakey as hardware can be, an MCMC sampler can probably match it or better. Or perhaps he did it for both reasons.

I don’t recall anything about ECC on the R1; maybe someone else has recollection about that.

Marty Graham had designed an error checking system for the R1 Radechon CRT memory using seven bits of each memory word. That system made it possible to use the CRT tubes long past the time they would otherwise have been replaced. Single bit errors were detected and corrected, two or more bit errors could be detected but not corrected. When we added the RCA and Ampex core memory systems I designed the interface so that the data path still went through the error checking logic, but as far as I remember we never detected a random bit error in the core memory, and of course by that time the Radechon tube memory was no longer used.

As far as we know the R1 was the first computer to use an error correcting system. Sometime in the early 1960s Richard Hamming visited to give a lecture on his work at Bell labs and when he was shown the R1 he expressed amazement that his code system had been put to use.

I can’t “shed light on the matter”, but maybe a former Dean of Engineering can. She’s a statistician by trade, but of course she was not here in the 60s. She served prior to Dean Ned Thomas. I don’t know where she is today. This was her title back in the day.

Dr. Sallie Keller-McNulty, Professor of Statistics and Dean, George R. Brown School of Engineering, Rice University

Melissa, I tried to leave a comment about my thesis which I think would be of interest for this topic, especially to Bill Harris. I wrote the original set of programs for the Monte Carlo procedure using Markov chains, and ran many hundreds of hours for the calculations described in my thesis.

âThis dissertation (63-7156) has been microfilmed exactly: as received CHESNUT, Dwayne Allen, 1936- A MONTE CARLD PROCEDURE FOR STATISTICAL MECHANICAL CALCULATIONS IN A RESTRICTED GRAND CANONICAL ENSEMBLE OF LATTICE SYSTEMS. Rice University, Ph.D., 1963 Chemistry, physical University Microfilms, Inc., Ann Arbor, Michigan

I can email a pdf of the entire 175 pages to anyone who is interested.

[image: Inline image 1] [image: Inline image 2]

On Mon, Jan 8, 2018 at 2:32 PM, Rice History Corner wrote:

> Melissa Kean posted: ” Today we have a question about what they might have > been doing with the R1 computer. It involves something (safe to say) I’d > never expected to be writing about: Markov chain Monte Carlo sampling. Many > thanks to reader Bill Harris ’71 for raising th” >

Many thanks!! This afternoon I managed to dig up a list of Zevi’s publications from 1968. I’m assuming this is you: Chestnut and Salsburg, “A Monte Carlo Procedure for the Computation of Averages in the Grand Canonical Ensemble,” J. Chem. Phys., 38, 2861 (1963). Yes?

That is correct. Yesterday I sent a long email to Rice History Corner containing the table of contents to my thesis, which has much more detail than the paper about the Markov Chain Monte Carlo method. I thought that might be of interest to you and Bill Harris. I can send a PDF to anyone who is interested in having the entire lengthy document.

I am just now figuring out how to use this system for communicating.

There’s also an earlier paper: Salsburg, Jacobson, Fickett and Wood, “Application of the Monte Carlo Method to the Lattice-Gas Model. I. Two-dimensional Triangular Lattice,” J. Chem. Phys., 30, 65 (1959).

Dwayne, I would be interested in seeing it, although I don’t know how much I’d understand (while I liked chemistry, I stopped after CHEM 100 with Salsburg and Sass) Instead of emailing, I wonder if it would be of use for Melissa to host a copy on her domain so that others can see it–or is that overstepping what you’d like to see or what rights go along with the document?

I would be very happy if Melissa could host a copy of my thesis. I don’t know about the copyrights — it is possible that University Microfilms has some rights, but I suspect Melissa would know that.

Much of the thesis is a lot of detail on how the system was represented in the computer and the statistical issues, rather than chemistry. The math gets a bit tedious for connecting thermodynamic properties to the Markov chain averages, but you can sort of scan through that part, Bill.

I looked up the Chestnut and Salsburg paper from 1963. The first sentence of its second paragraph says it all.

“The Monte Carlo method as used in statistical mechanics consists of the generation of a Markov chain of successive states of the system being studied. ”

So, they were using an MCMC (Markov Chain Monte Carlo) method. Apparently, MCMC goes back to the early 1950s where the Metropolis algorithm for stochastic simulations was used. So, MCMC was first used in physics and has become popular more recently in many areas with the rise of Bayesian methods, which are based on Markov chains.

The repetitive simulations were required because of the randomness of the method, not memory problems. One can get stuck at a potential answer only to discover with a different set of starting values that a better answer emerges.

>>> “The repetitive simulations were required because of the randomness of the method, not memory problems.” <<<

That's what I was thinking. If multiple runs of Monte Carlo simulations converge on the same answer, you have more confidence in the validity of that answer.

(Of course, uncorrected machine (hardware) errors are another sort of perturbation, which reliable error-correction aims to eliminate.)

BTW, there is no middle “t” in Chesnut. In my thesis there is a discussion for estimating the errors in the MC averages as a function of the number of times the system is sampled (aka the chain length). The small memory limited the size of the system that could be considered.