In his *Discourse on the Method* (1637) he proposed the (use of the) doubt as a systematic method for reaching the truth. He made the most of doubt: doubt every given notion, admitting as valid only that knowledge that resists every attempt to refute it.

Even when this method starts by demolishing pre-existing *truths*, it pursues a constructive goal: reaching those undoubtful concepts that allow us to build our knowledge on top of them.

To put it shortly: *Destroy to build*. But to build with stronger, rather indestructible, blocks of knowledge.

In our daily work as programmers, we are constantly faced with existing knowledge: the source code. And the basic block of source code is a *method*. But this not just a textual and static record of a particular definition – it is *computable*. Moreover, our development practices guarantee that every method is covered by several tests that ensure that the expected invariants of such piece of code are honored.

As it happens with every logical corpus, the incorporation of a new concept usually makes us refute a previous one. In such cases we might decide to adapt our existing conceptions, or even to get rid of them, in order to accommodate the new ones.

But this not the only situation in which we prune our tree of wisdom. The computable part forces us to pay special attention to the performance side: the same definition might be implemented in different ways, some of them faster than others. So we continually modify methods to improve the execution performance.

In both cases we may hit code that we are simply not sure whether it is really necessary. In such cases we proceed following a Cartesian approach: we just remove it. And we do it not only because we consider it unnecessary but also because our development infrastructure, and tests in particular, ensures we can do it without worrying: if it happens to be actually required, surely some test will fail, making us aware of it.

I found similarities not only in the action of removing code when in doubt about its use, but also in the more general Cartesian way of demolishing to construct: we always claim that *to program is to remove (code)* and I found it more than just a work practice. I think it has philosophical implications. After all, we are constantly walking the path to knowledge.

Even in cases where appropriate big data is available, our models will only use them for the estimation of a few parameters. For others, no single piece of recorded data even exists because these are unique to our project. Unknowns and uncertainties always remain, no matter how big our data. And that’s ok, otherwise there wouldn’t be any project to begin with (projects only crystallize when risks and opportunities arise.)

More importantly, models have to have a more human size. Models should allow us to make sense of complexity by exposing the underlying structure. Much as the beautiful drawing by Leonardo, where the complexity of the human body is framed within a circle and a square.

Notice how using only two elementary geometric shapes Leonardo conveys the idea that even complex natural structures can be better comprehended if we realize their underlying model. Equally interesting is the idea that insightful relationships depend on few parameters (in this case center, radius and side.)

Now, at the end of the day, do we care about big data? Of course not! We just care about our project, its risks and opportunities, and how to model associated decisions. So then, let’s not forget that our models are made from Small Data: the parameters that matter because of their impact on our decisions.

Having clarity on which parameters are relevant is of paramount importance because these are the ones we have to focus on. That’s why we need flexible and agile features that would allow us to sense the importance of every knob in our model.

But let’s put some perspective here.

Nowadays everybody recognizes that we have to model our decision. There is no discussion about that. The problem is how, right?

There are basically two ways of modeling decisions. The first one consists in a series of calculations aimed at maximizing benefits. This is the approach taken by software packages that implement fairly sophisticated optimization algorithms. Super cool. Models of this kind are *optimizers*.

The second approach is more indirect – it aims at representing a wide range of structures. These are *simulations*. The strength of simulations resides in their ability to facilitate what-if questions, where the *if* part can incarnate a radical structural change, not just an adjustment of the inflation rate. Simulations are super cool too.

Optimizers seek *the* answer. Simulations facilitate hypothesis formulation and insight generation. If you are in a give-me-the-number situation, optimizers are for you. If you are faced with a hard decision and need to analyze and discover new possibilities, what you need is simulations.

While optimizers are better suited for consuming Big Data, simulators depend on a small set of key parameters that can be pondered in several scenarios, under different conditions.

One interesting distinction is that simulations *populate* big sets of simulated data. Two technical reasons support this claim. Firstly, simulations can populate thousands of objects from a small number of specifications (you only need some few prototypical wells to populate thousands in a field.)

These objects are the *actors* of the *virtual world* created from the specs that make up our model. This generative capability is not limited to simple replicas (every automatically generated well will have its own behavior and evolve accordingly.) Remember, you only need to model one molecule and a few natural laws to simulate the behavior of, say, an ideal gas.

In addition to the population feature, simulations are especially well suited for mathematical techniques based on Small Data, Probability Distributions and Monte Carlo runs. Optimizers also use probabilities and Monte Carlo, but they are inherently restricted to changing values, not structure or behavior. Simulations naturally support uncertainties on time and include conditional logic that dynamically creates radically different worlds. They explore plausible scenarios with different structures and traits.

Regarding Big Data, simulations have the property of generating a *continuum* of samples from which insights can be withdrawn. This is essential for exploring situations that are likely but haven’t happened yet.

Simulations and Probability Distributions enable an evergreen workflow where you iterate over a simple set of activities such as:

- Define
- Refine
- Correlate
- Generate

You *define* the uncertainties around key input parameters. As new evidence arrives you *refine* the parameters of your distributions (Bayesian Learning, etc.) Then you *correlate* these uncertainties to express dependences. Finally you (the software) *generate* a consistent set of samples from a continuum of possibilities.

By combining the spec-population feature with the continuum of data captured in probability distributions, the space of solutions at hand is bigger than any Big Data. It is infinite.

Surprisingly, all of this stems from small data: key parameters and conditional logic. Not just numbers but emergent structures, relationships, dependencies and consequences can be visualized using statistical techniques and graphical representations.

Of course, the combination of simulations and probability distributions is computationally expensive. However, the good news is that they are suitable for parallelization and constitute excellent candidates for distributed multiprocessing. How is that? Well, because each of the scenarios that has to be explored is independent from the others and thus we can analyze them all in parallel, using all the cores and processors at our disposal.

So yes, Big Data is very useful, not for drawing immediate outcomes but for helping us quantify parameters to better model our decisions. Optimization techniques are important too, as long as their role remains limited to localized calculations that are functional to the specification of key parameters. However, since there is conditional logic in any model, our chances of getting relevant insights and visualizing emergent behavior increase when we can run dynamic simulations on them. With all these ingredients plus the availability of multi-core computing we are in a privileged position to make the most of any opportunity. Enjoy!

]]>Let’s start by framing the problem more precisely. What we are looking for is a way to generate uniform random assignments of EUR given by percents p_{1}, …, p_{n} so that 0 ≤ p_{i} ≤ 1 and p_{1} + ⋯ + p_{n} = 1. With these factors we would define the recoverable fluid of the i-th well wi as p_{i}⋅EUR.

The interesting thing here is that all of this should happen at every Monte Carlo iteration, meaning that EUR is uncertain and will therefore adopt a new random value at each MC trial. For the same reason, we want all percents p_{1}, …, p_{n} to be randomly generated by PetroVR at every iteration.

One approach that comes to mind is this: Generate n uniformly distributed random numbers u_{1}, …, u_{n} between 0 and 1 and then divide each of them by their sum s = u_{1} + ⋯ + u_{n} to get p_{i} = u_{i} / s for all i = 1, …, n.

This approach is called *normalization* because it normalizes all u_{i} so that they now sum 1.

The problem with this “solution” is that it is not “uniform,” meaning that this method will tend to accumulate samples around the “center,” i.e., the n-dimensional point with p_{i} = 1/n for all i. This is so because the normalization is a sort of projection from the hypercube of side 1, which has a greater volume near the center than close to the corners (think 3D!)

So, what to do? Well, this problem is actually much harder that you might expect in that there is no “obvious” solution for it. Fortunately mathematicians solved it many years ago. The easiest solution is based on the Exponential distribution. In simple terms it works similarly to the one we discussed above, but with u_{1}, …, u_{n} *exponentially* rather than uniformly distributed. How do you do that? Quite simply! Take a uniform sample v_{1}, …, v_{n} between 0 and 1 and now define u_{i} = -ln v_{i}. Now you can normalize these u_{i} as in the naïve solution and arrive at the desired p_{i}‘s. Because of the way the Exponential distribution works, the u_{i}‘s are exponentially distributed.

In PetroVR this can be done very easily. Let’s assume you have 3 wells (so n = 3). We define three user variables “v1″, “v2″ and “v3″ with deterministic value ⅓ and a uniform distribution between 0 and 1.

Next, we add three other user variables “u1″, “u2″ and “u3″ using the following FML expressions:

“u1″ ≡ -ln(“v1″)

“u2″ ≡ -ln(“v2″)

“u3″ ≡ -ln(“v3″).

Finally we define “S” and the desired factors “p1″, “p2″ and “p3″ as:

“s” ≡ “u1″ + “u2″ + “u3″

“p1″ ≡ “u1″ / “s”

“p2″ ≡ “u2″ / “s”

“p3″ ≡ “u3″ / “s”.

Voilà! These are the quantities we were looking for.

Look. Here is a cross-plot between “p1″ and “p2″ obtained after running 1000 MC iterations. As you can see the distribution is uniform, i.e., the points do not accumulate around the center.

Of course, the same happens if you cross-plot any two variables between “p1″, “p2″ and “p3″. One way to bring “p3″ into the above plot is to add it as the Z-variable, which will use the diameter of the bubbles around the points:

Not surprisingly the bubbles are larger near the origin because it is there that “p1″ and “p2″ are smaller (“p3″ = 1 – “p1″ – “p2″).

Let’s now compare the first plot with the one we would have obtained with the naïve approach of normalizing uniform samples:

You see? The plot clearly shows an accumulation around the center with lower density of points next to the corners. This is because when you normalize uniform samples you don’t get uniform samples again.

But, wait a minute. Who said that the “right” solution is the uniform one? Maybe in your case what I’ve called the naïve approach is the one you were looking for. It doesn’t matter. PetroVR will help you in either case. And not only that. In a forthcoming post we will see that there aren’t two solutions to this problem but a continuum between the two of them. Sounds too complex? It’s not! At least it’s not when you have the appropriate modeling system.

]]>This is an international conference held in Argentina, mainly focused on Smalltalk programming but also on other Object Oriented technologies… and any other related stuff. Or not so related: we even had a talk about satellites last year.

We have been attending these conferences since its first edition in 2007. Every year the conference takes place in a different university around Argentina, and this year it was at UTN in the large city of Cordoba.

No doubt, a very interesting experience.

I think this was the edition with the widest variety on topics and abstraction level. We had our share of low-level and VM (Virtual Machine) talks: Javier Pimás about the Bee project, and Boris Shingarov with ideas for multi-platform backtracking JIT.

But the talks that catched most attention were those most removed from programming topics. Two by Rebecca Wirfs-Brock about design – I liked the first one better, which discussed the ideas of architect Alexander (yes, the guy on whose work GoF design patterns were based) that can be applied to code analysis and software design; three by Allen Wirfs-Brock about history, present and future of computing; Jannik Laval’s about robotics applications controlled from a Smalltalk environment (and why it is an interesting option); Richie (Gerardo Richarte) about privacy – nothing to do with Smalltalk but very thought-provoking; Peter Hatch plainly about critical thought; Andrés Valloud about mathematics and its different uses (original and not-so-original) in the problem solving process, in and outside Smalltalk; Nicolás Papagna about ways to use technology as a media to express yourself; Sebastián Sastre about startup ecology; Hernán Wilkinson about languages and forms of thought…

One could be tempted to think that all these talks have a “philosophical bias”. I prefer to think that they bind technical aspects with other areas. Both technology and philosophy permeate many aspects of our lives. And nowadays, it is more and more difficult to be a good programmer if you don’t know anything else outside programming. Progressively we can see the trend: programming is expressing, as Nicolás said. And in order to express yourself, you need to have something inside, something to say. But I’m digressing.

There were other talks that, having technical nature and focus, exposed some unusual aspects: One was Jannik’s; another was Adrián Somá’s with his visual programming environment VEO which was very interesting; of course Leandro Caniglia (our director and CTO) telling us about parallel programming in single-threaded machines; John Sarkela (a remarkable character) and his students showing ModTalk, which I personally don’t like but can’t leave unmentioned – it is a Smalltalk with a declarative language used to define and deploy applications.

Besides the talks themselves, Cordoba city is nice but big. It doesn’t feel that different from being in Buenos Aires, although the University campus is very nice and large. The conference was very absorbing and we had no time left for tourism (we were there from 9 to 19 and even later every day). Those who stayed on Saturday could enjoy one more sightseeing activity but I was not there, so I could only see the pictures. The social dinner was nice but it lacked the magic of the Concepción del Uruguay edition. That one will be hard to surpass.

I have exceeded the reasonable length for a post like this, but I find these conferences exciting and motivating. Even if I have the privilege of going to ESUG (a similar conference organized by the European Smalltalk user group) from time to time – thanks to Caesar sponsoring me when I prepare something to show – the Argentine conference is better in many respects. For me it is more inspiring, wide, open and rewarding. Registration and attendance is free, which means that this conference is extremely open. The things I see there always thrill me and give me power for the rest of the year. They open my mind and make me get in touch with the most interesting people.

So my advice is that you take advantage of these conferences, whenever you can and as much as you can. If you can attend, then go: it is unforgettable. Go to all the talks, talk to EVERYBODY in the halls and corridors. International guests are not rock stars, they don’t have bodyguards, they are very accessible and they all love to talk about what we like most.

If you couldn’t attend, then watch the videos of the presentations and later discuss by email with the authors. Read about the things they do, download their projects, explore.. and express yourselves.

]]>In any case, it is easy to feel one is getting lost in a sea of names: the number of technologies that serve the development process is so huge and it seems to change so rapidly that it is easy to get overwhelmed, even regarding topics closely related to the technology we use in our daily work.

Faced with such situations, some questions may shed some light: *Should we be aware of all these names (i.e., the technologies they refer to)? Do they come to solve problems that we have and of which we are not yet aware?* And more crucial, *Have the concepts related to these names changed in the same way as technologies have?*

The last question can be reformulated thus: *Is there a real change in the concepts behind these names?* Or: *Have we (as a particular technical community) learned something new, which allows us to solve an existing problem and so we have developed this new technology?* I tend to believe that some problems remain unsolved and sadder yet, I feel that many of those technologies are here to solve problems that previous technologies have caused.

Definitely, concepts do not change at such a tremendous rate. Thus, it is likely that the majority of the technologies around form part of a great mess.

I remember one of the premises claimed by teachers at the great university I had the fortune to attend: *Here you will learn how to solve problems instead of how to deal with a particular technology.* This encouraged us to focus on concepts rather than on the tools we used to deal with them.

Now I have found that premise is present in my daily work: even when we deal with problems related to the development process, we do it conceptually. However, we are far from being disconnected from the community. We make use of several technologies (development frameworks, platforms that support our development and documenting processes, etc.), we attend conferences periodically, and we are constantly reviewing our development methodologies, all of which necessarily implies taking a look at the world outside.

On the other hand, it is unthinkable to develop software in isolation from the surrounding (and ever changing) technology, both technically and commercially, but in front of the almost infinite ocean of names out there, most of them being nothing but noise, the conceptual approach should be the way.

In short: in order to avoid getting lost we should keep traveling on the *concept ship*.

In one of last week’s requests the *thing* was the transition from a Hyperbolic decline to an Exponential one. Read carefully; the question wasn’t how do I use the decline switch capability. The question was how do I express the event that will trigger the transition.

What this *model maker* was trying to express was the deceleration of the Hyperbolic curve below a certain threshold. In other words, when the negative derivative -dq/dv of the rate q with respect to cumulative production v fell below a given limit D_{min}, the decline had to become Exponential.

This is a well-known problem for which there is plenty of documentation in the literature, and the modeler mastered it. Still, a question remained: how do I express this event in PetroVR?

From our viewpoint as *software makers*, the question triggers another question: should we include a feature in the GUI to allow this? That’s a serious question too, because we do not want to turn PetroVR into a switchboard application. We want PetroVR to keep evolving as a *model composition environment*. So, instead of diverging our attention to a design question we decided to help our user solve his problem. How? Simple. Using FML, the Functional Modeling Language built into PetroVR.

In the well we created two user variables:

“Di” = 30 [%] / 1 [day]

“Dmin” = 0.09 [%] / 1 [day]

where “Di” stands for the initial (instantaneous) negative derivative -dq/dv|_{v=0} and “Dmin” is the one that will trigger the transition.

In the well performance we defined:

“Annual Decline” = 1 – raisedTo(“Di” * “exponent” * 365 + 1; -1 / “exponent”)

Then, in the Well Decline Manipulation job (or decline switch for short), we created two additional user variables:

“numerator” = “Initial Rate” * (raisedTo(“Di” / “Dmin”; 1 – 1 / “exponent”) – 1)

“Cum at Dmin” = “numerator” / (“Di” * (“exponent” – 1))

This variable “Cum at Dmin” is the cumulative production at which the Hyperbolic has to become Exponential. Thus the condition to trigger the decline switch is reduced to:

“Oil Cum” >= “Cum at Dmin”

which can be directly expressed in the decline switch GUI.

The beauty of the approach is that with only 5 user variables and three simple expressions (that anybody can deduce or find in the Internet) one can control the transition point from one single input: the limit parameter D_{min}.

Of course, the actual problem behind the question will likely require more work and knowledge on the part of the modeler. However, PetroVR gave him the expressiveness he was looking for. We can now step back and analyze whether it would be good to introduce some new decline deceleration control feature in the software. But that’s another story, isn’t it?

]]>This might sound strange (right?), but it is not uncommon for two apparently very different ideas to suddenly become alike when looked at closer. And for a group of people constantly working and thinking in a given domain, all of the closer looks will possibly form a chain of ideas that relate many different topics with said domain, in this case, PetroVR – as you can probably notice from some of our craziest posts.*

As you might already know, the 2048 game consists of a big square divided into sixteen “empty” boxes to be filled by powers of 2 (numbers in the form 2n) from 2 up to 2048, where you win the game. You can actually go on playing once the game is won. You can go to Gabriele Cirulli’s original link for the game to try it out in case you haven’t yet.

When the game starts, two of the sixteen boxes are already occupied by numbers, 2s or 4s. The goal of the game is to sum the equal numbers by moving them across the cells, and for every move a new number – again, either a 2 or a 4 – appears on one of the empty cells until either all cells are filled and therefore no moves are left, or you reach 2048 and win the game.

So, those first two values can be regarded as the first two *pieces of information*. It is what the game provides at first. In PetroVR these two pieces of information could represent, for example, wells, or maybe certain parameters from previous studies.

What happens from the beginning is a series of decisions to be made. Basically, which pieces of information to combine, swapping the values across the cells. This can be easily related to activity in the PetroVR compositional environment, where information is combined and modeled to find the most efficient scenario for that given information.

Of course, some of the decisions can be better than others, and until you figure out the best strategy for each move, many of the decisions you make are bad and lead to losing the game too soon. However, these lost games come in handy as experience for improving your strategy, which is true in PetroVR modeling as well: you can make bad decisions and learn from the given results to enhance your model and better work with the given information.

The most interesting part comes after each time you make a decision in the game: after every move, you know you will obtain new information – i.e. a new numeric value – but you do not know where in the board it will appear, and you are not certain of the value of such information since it could be a 2 or a 4.

In PetroVR, new decisions and simulation runs reveal uncertain information constantly. These pieces of information, brought by previously made decisions, come from managing information representing the *known unknown*,** and even the *unknown unknown* in those cases where you did not realize you would be getting new information.

Furthermore, all of this information comes with uncertainties of their own, and are now part of the new decisions to be made – as happens with new values appearing in the game.

What we are dealing with in the case of the game, and similarly, in PetroVR models, is strategies based on incomplete information. These strategies are often based on heuristic functions that measure how good a partial solution can be, based on the different aspects that give value to a decision. For example, in 2048 the goal is to sum values, but at the same time, you have to take into account that free cells are valuable for future possibilities of movement, the placement of cells of equal values after a decision of movement, etc. This means that information in itself is not the only, nor the more important, valuable aspect of the game, but how you treat it and combine it in the long run. That is crucial for getting higher results.

The analogy here is rather obvious for anyone using PetroVR: information or inputs by themselves only have value in the context of their usage in a decision-making environment – especially one where all uncertainties coming from new decisions are of such great importance.

The goal of the game, as I explained earlier, is to sum as many values as possible to get higher results. This could be seen in the oil and gas Industry as getting a higher NPV as your decision making process improves. However, many times good decisions could be decreasing your resulting NPV because there is a “luck” factor – both in the game and in the industry’s reality.

A “nicer” analogy comes to mind with PetroVR, where the highest result in 2048 could represent the Scenarios with most efficient use of the information, the best modeled, again, with *strategy* and *simplicity*.

As a last “meta-thought” it is surprising how in almost every experience we humans can find something to inspire us to further understand something we are interested in. All ideas can meet when focused on one direction, almost in the same way that every set of two points in space can be united by a straight line.

* See specially Smalltalk & Cubism, Gaudí and Software Architecture, and Behind the Illusion.

** See Dynamic Decisions & Unknown Unknowns Part 1 and 2.

]]>**Switch Well Declines on the Fly (Arp’s Equation)**: This new feature greatly expands the well decline manipulation job that used to only allow switching of declines at a specified point in time. Now two conditions can be set: the first sets a point in time to start checking the second. This optional second trigger enables the switching of each well separately. The new option gives greater control on performance changes, allowing users to model “multiple segment” well performance variations, which is particularly useful in unconventional projects utilizing “Arp’s equation” to switch from say an hyperbolic decline to an exponential decline. The switching and associated timing of this point is often critical to planning of drilling schedules (speed) as well as facility capacity planning.

In this example my wells switch to a new decline curve after reaching a given cumulative oil production, which happens after 4 months of production.

**Rig Count Now Impacts Facility Cloning**: The pace of facility cloning in autodevelopment and cloning jobs can now better account for mobilization and demobilization of rigs, providing more representative simulations of the actual needs of the field. This also works with the new multi-well functionality introduced in version 11.1. The improvements made to the cloning algorithm have also reduced simulation time.

**Enhanced Well Shut-In Policy**: The application of the sequential shut-in excess policy has been extended to include all wells upstream from the facility and not only those that are directly connected to it. For example when you have 3 wellhead platforms connected to a production platform, the policy set at the production platform now impacts all of the wells at the wellhead platforms in combination.

**Arrays as Parameters in interpolate2D Functions**: The interpolate2D function, which allows the use of a table as a two-entry matrix, now accepts arrays as parameters for defining columns and rows. This feature allows the configuration of tables whose values change over time.

In this example, I have created a table with a price per acre for a given water depth, and modified the price yearly by selecting an array for the <x> parameter.

**InverseTriangular Distribution Function**: A new function for calculating inverse triangular distributions has been added to the set of probability functions of PetroVR. With this addition, all the main distribution types are available inside an FML expression.

**Set Drilling and Completion Costs as OpEx**: Well drilling and completion costs can now be computed as either CapEx or OpEx. This functionality enables PetroVR users to properly allocate the costs associated with well types such as exploration, appraisal wells or dry holes to correctly represent their fiscal treatment.

**New PetroVR Plan Results**: Total Oil Developed and Total Gas Developed, which in previous versions were available at the reservoir level only, are now also computed for the whole project. These variables return the cumulative expected production of all wells completed so far at a given point of the project development.

These are the new features of this PetroVR version. But it also includes 47 major enhancements to existing features, new unit-tests, and new functional test cases, to make PetroVR more robust and agile than ever, so you can accelerate confidently.

]]>The base case (“without information”) is in light blue, while the scenario where information is acquired (“buy information”) is in purple.

As we can see from the graph, the standard distribution of NPV when buying information is higher than the base case. *How can this be?* the user asked. *When we buy information, we are supposed to reduce uncertainty, not increase it!*

Is something wrong with the graph or the PetroVR model? Of course not. At the bottom of the graph, the “buy information” scenario shows cases where the information acquired caused larger losses than the base case because it was misleading (false positives and false negatives), as well as cases where the cost of information is not compensated by better decisions. And the upper part of the graph shows cases where buying information enabled better facility sizing and faster developments.

It is fair to say that in many Value of Information projects, as here, you can expect a larger dispersion of possible outcomes when you buy information, while minimizing uncertainties at the same time. The fact that according to the graph the chance of failure decreases from 60% to 40% shows that information does reduce one key risk, the risk of losing money. At the same time it introduces a larger variety of possible results.

This apparent paradox arises from the identification of “Less uncertainty in Reservoir succes/size/performance” with “less dispersion in NPV”. They are two different concepts that PetroVR does not confound. The richness of the simulation lies in the interaction between objects: you cannot trace a result from a single input variable. In a Monte Carlo run, many possible worlds are built where your decisions based on partial information and the physical conditions interact in a series of cause-effect chains in time. The result is often more enlightening because it is unexpected!

]]>]]>