[mdk-re] Виртуальные машины (Was: Re: CD eject in Nau, является огромным конфликтом с automount.)

Пт Янв 18 12:31:01 MSK 2002

On Fri, Jan 18, 2002 at 06:22:22AM +0300, Alexandre Prokoudine wrote:
> C# - достаточно интересный язык сам по себе. Я ни в коем случае не
> собираюсь начинать holy war по поводу языков. С моей стороны это
> было бы в высшей степени непрофессионально :-)

Я не удержусь и все-таки перешлю это старое письмо Дэвида Симмонса.
Дэвид известен как один из лучших экспертов в области проектирования
виртуальных машин для трансляторов с различных языков программирования.
Фактически, этот человек стоял у истоков всех современных реализаций
ускоряющих (just-in-time) трансляторов.

Я "вырезал" куски дискуссии, не относящиеся к теме, которую я хочу
продемонстрировать. Речь идет о вызове виртуальных методов и о
скорости/правильности реализации этого процесса в C++ и Java по сравнению
с динамическими языками вроде Smalltalk или Ruby. Особенно обратите
внимание на фрагмент его ответа в предыдущем письме дискуссии, отмеченный
ниже между символами (***). Собственно, все описание ниже есть
развертывание того тезиса. 

Дэвид сейчас работает по контракту с Микрософт над усовершенствованием IL,
внутреннего языка .NET, в который транслируются все остальные, в том числе
и C#. Основная его задача -- научить IL работать с динамическими языками,
в которых классы/объекты могут изменять себя в runtime.

----- Forwarded message from David Simmons <david.simmons на smallscript.net> -----

Date: Mon, 26 Nov 2001 08:05:26 +0900
From: "David Simmons" <david.simmons на smallscript.net>
To: ruby-talk на ruby-lang.org (ruby-talk ML)
Subject: [ruby-talk:26473] Re: Table: Ruby versus Smalltalk, Objective-C, C++, Java;

> same in statical typing: http://www.eptacom.net/pubblicazioni/pub_eng/mdisp.html

I was not aware that someone was "bothering" to use RTTI (or templates
[macros]) to address this issue in C++. Given the importance of C++ I should
hardly be surprised. However (in broad terms), this is the problem with
implementing genericity in languages which only support static typing [as I
alluded to in my previous post regarding the languages Java and C#].

As I said in my previous post, the ideal is a language which has both static
and dynamic typing with overloading/multi-methods -- lacking that numerous
issues crop up. This is just a "facet" of that generics issue.

[s/.*//skip]

> > > http://people.mandrakesoft.com/~prigaux/overloading2.java
> >
> > Sure, the example is clearly illustrating the problem. Static type binding
> > of dynamic type information does not work. I.e., languages which only have
> > static binding cannot provide proper semantics for method-implementation
> > selection based on argument types.
>
> hum. C++ and Java do not have "only static binding".

I think we both understand the issue. The problem here is that the
"semantics" of the term dynamic-binding have been polluted/overloaded. So,
as with many topics in comparing languages with regard to "static" and
"dynamic" subjects, we quickly get bogged down in terminology as we try to
use that terminology to explain/discuss concepts. If all parties don't have
a clear concensus on the definitions of the terminology the discussions
rapidly digress into areas that are the result of that lack of concensus.

So, ignoring the terminology:

C++ and Java "typically" (but not always) implement virtual functions using
vtables. Where a vtable is an array of function pointers to virtual methods
associated with a given type.

These languages are not only statically typed, they are based on the concept
of static binding of type information as well. In an effort to be "object
oriented" [whatever that means] they attempt to provide [OO] polymorphism
using vtables which are a weak/limited solution to providing polymorphism
over one-type (the message receiver).

For static binding based on type information, this means that at compile
time the decision is made as to what method to invoke based on the types of
the arguments. The partial-polymorphic-solution exception is built by
assuming that all types are known at compile time. And that, therefore,
tables can be constructed containing common methods for type-trees based on
the "type of the receiver <this>".

So one gets a binary-deployment-system that cannot break those compile-time
rule of all types known at runtime [it is not dynamic/extensible at runtime
and so is brittle], and which also cannot address the problem of
polymorphism over the arguments to methods [which leads problems in
implementing/supporting genericity, etc because the model is based entirely
on static knowledge]. Some languages, like ML, have taken great strides to
addressing this problem and ensuring correct behavior but I (currently)
think it requires supporting both a static and dynamic type and binding
model.

So, back to exploring issues in the vtable approach. As mentioned before it
is based/built-upon static knowledge of the available set of types. It's
indirection mechanism is therefore based on static knowledge, which makes it
a static (lookup table) binding mechanism. I.e., the vtable-index [a
first-degree binding abstraction approach -- as opposed to a selector-object
in dynamic language which is a second-degree binding abstraction] is
statically bound into source and thus [among other things] precludes
handling dynamic [binding impact] changes to the classes and methods. I.e.,
vtables are limited in two semantic ways and additionally in one technical
way.

1. Semantically it only works for whole-cloth programs where no schema
changes [in the binary/compiled-form] will occur to class, namespace, or
methods available within the program [that's what makes them static
languages]. Why? Because if such changes occur the vtable layouts may need
to be changed, the call-sites that use indexed-lookup will have to have the
indices changed [not even considering the issues of the methods themselves
which have optimized-away/unfolded encapsulation of object structure, etc].
None of which is possible because all the necessary meta-information was
thrown away at compile-time [many things are not objects, most things have
no self-describing information (i.e., nominal to no reflection
facilities) -- RTTI being C++'s only facility].

2. In a unified OO view, there are no "pure-functions", because classes [or
prototypes], namespaces, etc are all objects and functions are just methods
on those classes [or prototypes] and namespaces. All operations semantically
become about the messages that objects can understand (perform). The
concerns about physical-layout/structure of a type vanish leaving us
[humans] only concerned [in general] with the message vocabulary [i.e.,
behavior/interfaces].

The compilers are where the optimization and knowledge about physical
representation become paramount for performance and external inter-op [and
humans only care when they are meshing with the boundary points -- usually
to create optimized small algorithms; an act which requires greater
attention and focus than one would normally want to have to put into
describing a problem to the computer via a programming language -- generally
we want our programming language to be capable of being as transparent as
possible so we can focus on the problem domain itself rather than the
problem of describing the problem to the computer].

Parameterized types in object definitions are relevant to humans in terms of
contractual (behavioral) adherance. To the compiler(s) they allow
optimizations to be achieved for better performance and resource utilization
[potentially as guided by a human author]. Which gets us back to genericity
and best efforts at being informed "statically" about design time
behavior-binding (type) errors in our use of contracts as opposed to runtime
contractual behavior-binding (type) errors that can always be detected in a
well designed language execution architecture. Which leads us into the
discussion of genericity in-static-only-binding-type systems [which have the
potential to yield optimum performance at a price in terms of correct
behavior and/or expressiveness], in-dynamic-only-binding-type systems [which
have the potential to yield correct at a price in terms of performance], and
in systems that offer both forms of binding [which has the potential to
yield both correct behavior and optimum performance].

Proponents of static-typing and correctness argue the virtues of design-time
detection. Proponents of dynamic-typing argue the virtues of unit-testing
and the dangers of reliance on static-checking of type-bindings as a
substitute for verifying interop relations semantics inherent in any system
of basic complexity. Both parties have valid and good points and neither are
wrong in my view. Both are addressing the same problem space with different
techniques; and both have been shown to eliminate the large proportion of
defects [albeit leaving different types of defects undetected]. But, in my
view, the significance is in the human "level of effort" factor in
expressing an original problem, coming later to understand such a design,
and later still working with an existing design to maintain or extend it.

** sigh, I seriously digressed [and now I will be in a deep rathole as
people beat me up for errors and/or disagreements with my assertions] **

#2 is principally that given that it is desireable to discriminate a
function's implementation based on the types of the arguments, vtable as a
polymorphic solution offer no mechanism of support for dispatching to a
type-specific implementation based on polymorphism amongst the arguments to
a function/method. The general approach is to use hand-crafted secondary
dispatch [which for arity-1/1-arg methods is called double-dispatch].

3. The technical issue is that vtables are slower (on current hardware
technology) than using adaptive jitting techniques with self-modifying code.
Primarily due to the cost of indirectly accessing memory to obtain an
address from a vtable which precludes processors from using eager/optimistic
prediction; where accessing the memory also results in bouncing from
L1-cache to L2-cache to primary memory [and that is increasingly expensive
as the timing gap between those forms of memory increases -- similar in
nature to the impact on design that one observed in algorithms designed
based on the 50's-70's problem of tape, disk, ram gaps].

*. In a just-in-time binding approach pioneered in Smalltalk during the mid
80's, one uses self-modifying code and assumes that most call sites are not
polymorphic [which is generally true]. If they are polymorphic, then that
breaks down into the case of having a low-degree of polymorphism [say 2-10
types] and the case of having a large degree of polymorphism which
literature in this area popularly terms mega-morphism. See OOPSLA papers
from the mid-80's to early 90's. This technique was the basis for the design
of the self-language, which in turn led to the design of the Animorphic
HotSpot VM/Execution-Engine/Runtime technology for Smalltalk; which in turn
was acquired by Sun to try and speed Java [JVM] up.

What has not, to my knowledge, been applied is a more generalized solution
of the same technique multiple-dispatch (overloading/mult-methods). I have
done it for both my own general-purpose dynamic-language virtual machine
(AOS Platform), and the Microsoft .NET platform (less-optimally through IL).
I must presume that someone must have implemented my technique before;
almost certainly someone in the lisp family community. However, it is
unlikely that it has also been optimized for hi-performance execution with
important newer binding predicates including sandboxing and selector
namespaces.

Providing hi-performance dynamically-dispatched-multi-methods has
significant impact on scripting languages and their ability to evolve into
full fledged languages that compete in features and performance with
mainstream languages today such as C, C++, Java, [C# will soon be in this
group if it is not already], Pascal-derivatives, etc. This is especially
important because there are, by most accounts, 10-20 times more people and
programs using scripting languages and techniques and that number is likely
increasing as cost factors and training/experience come into play.

(***)
> > It is worse than just describing C++ like mechanism as "the vtable-trick".
> > VTables are actually (demonstrably) slower than "true" (receiver only)
> > dynamic-binding-dispatch mechanisms. This fact is (reasonably well)
> > understood today. It is a reality that will increasingly be the case as long
> > as the gap between processor core speeds and L2 cache and memory speeds
> > continues to widen.
> >
> > It is fairly easy to illustrate on the Intel processor family. I've posted
> > (on comp.lang.smalltalk within the last 12 months) at least two detailed
> > explanations showing the machine instructions, cycle-times, and benchmarks.
> > The originally published technique was developed by David Ungar and
> > published in OOPSLA papers in the mid-80's. It has been a standard part of
> > most jit-based Smalltalk implementations for the last ten years or so.
(***)

>
> are you talking about http://www.sun.com/research/self/papers/type-feedback.html ?
>
> the idea is quite simple: specialize for a given type of object to allow
> inlining. In that case, to know which specialization to have, run-time
> feedback is used. This applies well to JITs of course.
>
> I don't see why it prooves that vtable is bad/slower. vtable can also benefit
> from specialization. This is also why the default in C++ in no "virtual"
> methods, so that performance is the best.
>
> This is a well known, no?

I think, based on your comments, that you are not aware of what I have been
doing for the last ten years or so. One of my professional areas of
specialty is the design of virtual machines. Your comments sound like the
kind of things I would have written as responses to you <g>.

As to the bad/slower, in the last 12 months, I wrote two different threads
of discussion on this topic [in comp.lang.smalltalk] describing the
instructions, cycles, and benchmark information. I also made reference to
that fact somewhere in this current thread of discussion. Where I also
mentioned that I had not published information on the much more general
techniques (which I suspect I have pioneered) for hi-performance dynamic
dispatch of common/important predicates with extensibility for general
predicate dispatch through Dynamic-AOP/Managed-Object facilities in the
object-model of the AOS Platform [the VM architecture I designed and have
been evolving for the last ten years] as well as its related peer/work, an
enabler for the Microsoft .NET platform.

>
> In the ML family, the same happens when going from polymorphic functions
> (needing boxing) to monomorphic functions (with unboxed data)
> see for example:
> - the SPECIALIZE pragma in ghc http://www.haskell.org/ghc/docs/4.04/users_guide/users_guide-5.html)
> - type-based unboxing (Leroy) http://citeseer.nj.nec.com/88305.html
>
> >
> > However, I intentionally have not published information on techniques I have
> > developed (on the AOS Platform VM) for hi-performance predicate based
> > (incl - multi-method) dispatching. Especially with regard to implementation
> > on the .NET platform, where I am still exploring with Microsoft [who needs
> > this technology as much as Sun/Java does].
>
> "Where Do You Want To Go Today?" ;p
>
> Microsoft has a *lot* of people working on languages. Hopefully many are
> allowed to publish their work (and even release GPL apps)

<g> I know.

As a 3rd party, I and others have been been fortunate to have the
opportunity to interact [and influence] quite a few of them [it would be
nice to be able to observe the same attitude and opportunity to have an
influence on Sun folks]. Of course, Microsoft has a clear business
objective, and I both hope and greatly fear they will be very successful in
achieving based on their approach [the word trepidation has constantly been
in my thoughts since I first got involved with Microsoft on the .NET project
in 1999].

The real challenge for scripting and dynamic languages lies in the
predominance and momentum of ideas and beliefs regarding type-theory and
statically-typed languages; and the relative/disproproportionate lack
thereof for dynamic languages (especially OO languages like Smalltalk, as
opposed to functional dynamic languages like scheme).

-- Dave S. [www.smallscript.com]

----- End forwarded message -----
-- 
/ Alexander Bokovoy
$ cat /proc/identity >~/.signature
  `Senior software developer and analyst for SaM-Solutions Ltd.`
---
Nov 21 20:58:58 alconost kernel: VFS: Busy inodes after unmount. 
		    Self-destruct in 5 seconds.  Have a nice day...