I'm one of the few conversation Lojban speakers, and after years extended study, I've found the language does not meet expectations. When you start deeply exploring the semantics, many constructs are only half defined or nonsensical. Further, the self appointed DFL tries to preserve the language in an unusable state and had alienated most of the proficient speakers.
There are other logical languages out there. Most of them are in development, and are discussed in a Discord called the "Logical Languages Lodge". However there is another called Toaq that is almost complete, it at least as complete as Lojban (its vocabulary is still growing, but all of the essential words are there, and the grammar and semantics are complete). If you wish to learn a logical language in 2023, that's the one I'd suggest.
I didn’t take it that far but asked my parents for the Lojban book for Christmas and did find it fun to work through. Sometimes it’s fun to just nerd out about something totally different to normal :) The year after I got a lockpick tutorial set.
Lojban relies on implicit pragmatics just like any language. It's not that it's unambiguous. That's essentially impossible without also making the language impractical. It's precisely ambiguous meaning that one can be precisely as ambiguous as they intend to be, and provide facilities to this end that no other language has.
I like the idea of Lojban. I hate the implementation. It’s basically spoken math notation for predicate logic, which falls flat well short of the goal. There are much better logical frameworks that have been developed for computational semantics (e.g. Davidsonian event semantics). Lojban’s formalism was out of date already when it was proposed as Loglan half a century ago.
If there is anyone out there who likes the idea of Lojban but want something more practical and with a firmer foundation to boot, contact me. I’ve got ideas and I’d love to find some people to work on this with.
I believe there is a variation supported by the language to use semantic predicates instead of argument order. But, since argument order is the standard approach it doesn’t much matter — I think this is the biggest flaw in the language.
I was also a little disappointed to learn that it is morphologically ambiguous without proper inflection (syllabic emphasis).
Other than that its an amazing language, as is Loglan.
I only wish it sounded a bit better. It sounds a bit slavic, where as I find languages like Arabic and Italian much more pleasing to the ear.
Semantic predicates don't solve the problem. It's an improvement for sure, as I believe the place-structure or Lojban/Loglan is the single worst mistake that has ever been made in any conlang. But grammatically it wouldn't change anything. The parsed meaning of a Lojban sentence is still ambiguous. Lojban does not try to have monoparsing of sentences to logical statements, except in very limited contexts.
I wonder if you could get the intended result for any language with a simple ruleset. You don't need a language to reduce/eliminate miscommunications, you need intent and understanding.
Having a better language helps. Every language has some pain points that are responsible for more than their fair share of miscommunications. Just to use English as an example, although this is true of most natural languages, the overuse of the copula "is/are" is a consistent source of mistakes whether accidental or purposeful. So much so that computer programmers learning object oriented programming have to be taught the difference between "is-a" vs "has-a" relationships.
If you say "My coworker Taro is a samurai", what does that mean? Did he dress up as a samurai for the office costume party? Does he study older forms of Japanese martial arts? Was he descended from a noble family from the Japanese feudal era and have claim to an actual samurai title?
There are controlled dialects of English, like E-Prime, which forbid the use of the copula "to be" as ungrammatical, or at least heavily restrict it. You can say your coworker "dressed as a samurai", "trains in the arts of a samurai", or "traces ancestral lineage from a samurai clan", but you can't say he "is a samurai" in E-Prime.
Would this make communication more clear with fewer mistakes? I don't think this has been adequately studied enough to say for certain. But at least in certain domains like military speech and air traffic control we have examples of such enforced language simplification resulting in measurable decreases in miscommunications. I'm very curious to see if we can generalize those results to a full, general-purpose dialect or entirely new language.
> the overuse of the copula "is/are" is a consistent source of mistakes whether accidental or purposeful
I’m not sure I’d characterise these as ‘mistakes’, per se. They’re just a consequence of the fact that the copula is multifunctional — just like every other English word. In fact, you could similarly criticise almost any common word: ‘go’, ‘from’, ‘good’, ‘like’, ‘not’…
But no, "is/are" is not multifunctional in the same way that "from" is. If I say "she came from school" it is clear that we're talking about relative motion today or in the recent past from one nearby location to another. If I say "her family came from Japan" it is clear I'm talking about ancestry and/or a long ago emigration, but also spatially oriented. If I say "the party is from 3pm to 5pm" then I am using "from" to indicate a temporal rather than a spatial motion. But it still conveys the same basic meaning in all these cases, as the origin point of a motion or interval, and it is not usually the case that a single usage of the word could be confused for more than one meaning.
The IS-A vs HAS-A relationship is entirely different. It is the difference between existential quantification (some aspect of this thing resembles/has an X) and universal quantification (the entirety of this thing is fully captured by the meaning of X). Rigorously analyzed these are very different claims. In a strongly typed language they couldn't be substituted for each other.
To make this concrete, if I say "he is bad" then it is unclear whether I am saying that person is an intrinsically bad man, or if I am just commenting that the thing that he is doing right now is not morally justified.
This may seem like splitting hairs, but that's rather the point. It's at the edge of what we are comfortable thinking about in everyday life. But how much of that is because the language that we use--English--doesn't have these distinctions built into its very foundation? If we were native speakers of a E-Prime, maybe this distinction would be obvious and trivial. And maybe, just maybe, we wouldn't let politicians and con men off the hook so easily for equivocating language.
‘The copula can be used in many different situations.’
-----
Also, I think you underestimate the amount of variation in other words, e.g.:
> If I say "she came from school" it is clear that we're talking about relative motion today or in the recent past from one nearby location to another.
Not just recent past, but having any time near to the point of reference: ‘When she came from school after the bomb scare…’.
> If I say "her family came from Japan" it is clear I'm talking about ancestry and/or a long ago emigration, but also spatially oriented.
Or they might have arrived yesterday to visit her.
> If I say "the party is from 3pm to 5pm" then I am using "from" to indicate a temporal rather than a spatial motion.
But only because ‘3pm’ and ‘5pm’ themselves have clear temporal reference as opposed to spatial reference. For an ambiguous example, if I say ‘we drove from breakfast to lunch’, that might mean we drove starting at breakfast-time and finishing at lunch-time, or it might mean we started at a location where we had breakfast and drove to a location where we had lunch. (To more fully show that the latter is a valid interpretation, consider ‘we drove from breakfast to breakfast’: it makes no sense if ‘breakfast’ is a time, but it makes sense if treated with the sense of ‘place where we had breakfast’.)
-----
> The IS-A vs HAS-A relationship is entirely different. It is the difference between existential quantification (some aspect of this thing resembles/has an X) and universal quantification (the entirety of this thing is fully captured by the meaning of X).
This doesn’t sound quite right to me. In my view, both ‘has’ and ‘is’ are basically relationships, rather than quantifiers. Consider a sentence like ‘I have the keys’: this merely states a relationship between two objects, rather than quantifying over any set. A sentence like ‘The morning star is the evening star’ is similar in this regard. It is true that a sentence like ‘I have a key’ has existential meaning — but I suspect the quantification is linked to the indefinite article ‘a’ more than the verb.
> To make this concrete, if I say "he is bad" then it is unclear whether I am saying that person is an intrinsically bad man, or if I am just commenting that the thing that he is doing right now is not morally justified.
This ambiguity isn’t limited to the copula, though. ‘I like him’ has exactly the same kind of ambiguity.
> If I say "she came from school" it is clear that we're talking about relative motion today or in the recent past from one nearby location to another.
Unless we mean that she came from one school of philosophy or art to another. Or she came from the school long ago to move somewhere else.
> If I say "her family came from Japan" it is clear I'm talking about ancestry and/or a long ago emigration
Unless her family came from Japan as part of a cruise before going to Thailand. Or her family moved out of Japan because they were American military stationed in Okinawa.
Getting rid of copula gets rid of only a small portion of the ambiguity in natural language.
Yup. One man's "overuse" and "source of mistakes" is another man's way to draw attention towards a region in the latent^H^H^H^H^H^H idea space.
This is useful and often good enough, because a) the rest of what's been said and situational context will help home in on more specific associations (and if not, one can always ask for clarification), and b) we often don't need to resolve specific ideas - "my friend is a samurai" may just be an invitation for the other person to reply "oh, fun, speaking of more ancient cultures, have you read about ...", and then continue jumping around big clusters of associations, without ever properly resolving any.
This is kinda what we call "legalese". It's a sort of formalized subset of English (or whatever language) that leans on standardized turns of phrase, it tends to set up definitions for terms that are then used throughout a document, etc. All in order to reduce misunderstanding and (hopefully) be easy to interpret in the event of a dispute.
However, we have whole judicial systems that spend a non-trivial fraction of their time interpreting legal verbiage. So clearly it falls short at least some of the time, otherwise courts could be, in part, automated away. Maybe that's because it's too hard or not possible with natural languages? Or the legalese ruleset just isn't refined enough?
I think that when law is introduced the consequences are not clear so the small print is used to introduce modifications to the rules, so the problem is about adapting a rule to the everyday use of it.
I suspect for clarity of communication we'd need to start with clarity of thought - the reason legalise doesn't eliminate ambiguity is because people aren't effectively omniscient, regardless of the level of their training. Add to that a shifting environment (new considerations etc) which can make previously valid documents ambiguous and you have a recipe for difficulties.
tl;dr cognitive ambiguities and changing circumstances are what make this hard, not language as a medium
I agree and disagree. I do think that the purpose of legal documents is to take a given set of inputs and dictate a predictable output. But (1) sometimes ambiguity is deliberate (for instance, to kick the can on a business point and hope that it never actually manifests itself after the deal is signed) and (2) as you note, sometimes totally unexpected circumstances arise.
Wouldn’t any constructed/logical language (is that the right term?) also be susceptible to unpredictable future developments?
Yes, you could, because any natural language can be used to teach mathematics. Lojban is, like any notation, a convenience for people who understand the concepts, but the concepts can be expressed in a natural language just as precisely, albeit less concisely.
In fact, as Florian Cajori stated in his book "A History of Mathematical Notations", the rise of mathematical notation was opposed by people who preferred the older, natural language style of doing mathematics, what Cajori termed the "struggle between symbolists and rhetoricians."
So, yes, logic is mathematics, and humans did mathematics in natural language for a very long time before we invented the conlang of mathematical notation. Moving more of the ideas into what is, ultimately, a more expressive notation is not a fundamental shift.
I’d be interested in hearing more about this. Don’t know much about formal semantics (and in fact am somewhat skeptical of the whole field), but I do have a background in language construction, and would say I have a fairly good knowledge of linguistics more generally.
Formal semantics is a broad field full of some very interesting and practical theories, and a lot of quackery. I don't blame you for being skeptical. Specifically though I think that Lucas Champollion has done a great job of marrying Davidsonian event semantics with more modern work on qualifiers and counterfactuals that finally creates--I believe--an universal framework for capturing the full meaning of any speech utterance:
If I am correct in this assessment, then the next logical step from a conlang/auxlang perspective is: can we build a direct, unambiguous spoken grammar for this universal semantics framework? If so, then we will have achieved what Lojban failed to even attempt: unambiguous meaning.
> If I am correct in this assessment, then the next logical step from a conlang/auxlang perspective is: can we build a direct, unambiguous spoken grammar for this universal semantics framework? If so, then we will have achieved what Lojban failed to even attempt: unambiguous meaning.
Personally, I’m not sure if ‘unambiguous meaning’ is possible or even desirable. Even if formal semantics allows us to achieve complete unambiguity when it comes to grammatical categories (quantifiers, temporal and spatial setting, etc.), I think it would be exceedingly difficult to make every single lexeme completely unambiguous. And even if this were achieved, I’m not sure if such a language would even be usable by humans. Still, it would be an interesting experiment!
Oh I'm happy to continue this conversation here, although it's late where I am and I'll be going to bed soon, and HN threads are ephemeral. Feel free to email me if you'd like to actually collaborate on this.
> Looks interesting… where should I start reading?
IIRC he did some graduate summer schools that were recorded online and have attached lecture notes. Those plus the referenced papers would be a good place to start. The one in particular that I'm thinking of is, I believe, the Summer 2014 topic of "Integrating Montague Semantics and Event Semantics." That marriage of Davidsonian and Montague semantics into a single framework is, I believe, a universal theory of semantics. [It should be noted that he makes no such grandiose claims.]
> Personally, I’m not sure if ‘unambiguous meaning’ is possible or even desirable
By 'unambiguous meaning' I mean something more specific than the words alone imply. [I am aware of the irony.]
In computational semantics, the goal is to transform a parsing of a sentence into a logical statement (or in the case of a question, a query) about the world. For example, the sentence "every dog barks" would be translated into a logical statement of the form "for every dog there exists at least one barking event with that dog as the acting subject." Or rather a mathematical statement to that effect that I don't know how to render on HN.
But unfortunately in natural language even when a language is correctly parsed in terms of syntax--which is all Lojban seeks to accomplish--there are still many different possible semantic implications. For example, the sentence "Spot didn't bark" could mean
1. There is no event in which Spot has ever barked, or
2. In the case of a specific event, Spot did not bark.
English doesn't differentiate without clarification, e.g. "Spot never barks" or "Spot didn't bark that time." Lojban has universal and existential quantifiers, but their use is not mandated.
When I say I want unambiguous meaning, I mean something very specific: monoparsing of sentences to logical statements/semantic formalisms. There ought to be a one-to-one mapping between sentences in the language, and logical sentences in this universal formalism. The claims of those sentences could be as broad or narrow as desired, but there shouldn't be any ambiguity about what is being claimed.
> Feel free to email me if you'd like to actually collaborate on this.
Emailed, though I’m not sure how much time I’d actually have available to collaborate.
> The one in particular that I'm thinking of is, I believe, the Summer 2014 topic of "Integrating Montague Semantics and Event Semantics."
I’ll have a look at it, thanks!
> When I say I want unambiguous meaning, I mean something very specific: monoparsing of sentences to logical statements/semantic formalisms. There ought to be a one-to-one mapping between sentences in the language, and logical sentences in this universal formalism. The claims of those sentences could be as broad or narrow as desired, but there shouldn't be any ambiguity about what is being claimed.
OK, this makes far more sense. I still maintain there will be a lot of ambiguity, purely because the individual words themselves are ambiguous, but it’s an interesting goal nonetheless.
Well I'm working two jobs and beginning the process of fundraising a new startup, all while parenting two kids. I'll be lucky to have any time myself ;) But it's good to keep in contact with like-minded individuals, in case the opportunity arises to get some work done on it.
> OK, this makes far more sense. I still maintain there will be a lot of ambiguity, purely because the individual words themselves are ambiguous, but it’s an interesting goal nonetheless.
I think we are in agreement. There are two definitions of ambiguity in play here. There is logical ambiguity, which I want to eliminate, in which a sentence can be parsed in multiple, contradictory ways. And then there is ambiguity due to imprecision, in which words have broad meaning and without context or clarification a given sentence parsed to a single logical statement can nevertheless have multiple distinct interpretations. That's ok.
> I think we are in agreement. There are two definitions of ambiguity in play here. There is logical ambiguity, which I want to eliminate, in which a sentence can be parsed in multiple, contradictory ways. And then there is ambiguity due to imprecision, in which words have broad meaning and without context or clarification a given sentence parsed to a single logical statement can nevertheless have multiple distinct interpretations. That's ok.
Yeah, I can agree with all of this. (Although Anna Wierzbicka’s work on the ‘Natural Semantic Language’ is an attempt to control the latter sort of ambiguity… you might find it quite interesting, actually.)
> But unfortunately in natural language even when a language is correctly parsed in terms of syntax--which is all Lojban seeks to accomplish--there are still many different possible semantic implications. For example, the sentence "Spot didn't bark" could mean
> 1. There is no event in which Spot has ever barked, or 2. In the case of a specific event, Spot did not bark.
Or maybe even a third case:
3. There exists an event in which Spot did not bark.
English pragmatic usage would make this interpretation rare, but you could still imagine a case in which it was valid.
For example, suppose that Spot normally barks 100% of the time (and never sleeps?).
Then there is a fortunate day in which Spot only barked 70% of the time, and was quiet 30% of the time.
Someone who witnessed this might say "Spot didn't bark [today]!" to emphasize the unusual event, even though Spot did bark the majority of the time on that day.
I just mean to distinguish this from your second case because I think in the second case the "specific event" is possibly already one which is known to the listener or already under discussion.
I guess I was distinguishing them in terms of whether the event in question has already previously been specified, or is being newly introduced and specified now.
> for every dog there exists at least one barking event with that dog as the acting subject
Just to wrap my head around your definition of universal and unambiguous here. If the language cant nail down the definition of dog to something all parties agree on, would the claim still be unambigous?
Yes, as explained in my other comment down thread. All parties would agree on the logical claim being made “for-each dog, exists event { dog did bark }”. There could be multiple interpretations of this claim using alternative or stretched definitions of the words “dog” or “bark,” but the logical structure would be unambiguous.
Thanks! Yes I’m aware of Toaq. I like many aspects of it. I just have too much baggage from trying to learn one tonal language (mandarin) to try to pick up another.
I tried it with gpt 3.5 but it doesn't seem to know lojban so well. I asked it to translate the following in to Lojban, then separately after deleting that history I asked it to translate that to English. Here's what it did:
Original English: "I heard of the TinyStories research project that unlocked cognitive capabilities in LLMs with fewer parameters. I wonder if something similar would be possible with lojban?"
Lojban round trip English: "We interpret the use of the LLM as a way to confuse the knowledge from TinyStories.gy, which is greater than that from the experimental findings. I find it amusing to use Lojban instead. Does the ability to confuse the experimental findings also mean the ability to confuse Lojban?"
GPT-4 doesn't do well at it, either. The problem is that there simply isn't that much text written in Lojban, so it can't learn it the way it does other languages. You can see similar problems with other languages with a relatively small corpus, such as Old Norse.
Yes this must be it. I'm curious what it would look like, to compare more directly. Like a LLM trained only on Lojban vs. an LLM trained only on an equally small amount of English.
hmm since there is a CFG for lojban, it’d be interesting to give [some LLM you can do fancy things woth] what corpus of lojban there is, and then use the work on constraining LLMs to a grammar, and see if that’d help its output.
As someone who speaks Hungarian, but never seen Hungarian notation outside the context of programming, I'm suddenly realizing that Hungarian notation is probably named after how the Hungarian language itself works.
Words in Hungarian have specific endings that change their "grammatical type" (I forget the grammatical term for that... part of speech?)
Some examples:
1. "-ás/-és": Transforms a verb into a noun that denotes an action or a profession. For example, the verb "tanul" (to study) becomes "tanulás" (study or studying).
2. "-ó/-ő": Converts a verb into a noun that refers to a person who does the action. For example, "tanít" (to teach) becomes "tanító" (teacher).
3. "-hat/-het": Added to a verb to form a new verb indicating possibility or permission. For example, "olvas" (reads) turns into "olvas-hat" (may/can read).
4. "-tlan/-tlen": When attached to a noun, it creates an adjective expressing the lack of something. For example, "szín" (color) becomes "színtelen" (colorless).
5. "-i": Added to a noun to create an adjective indicating origin or belonging. For example, "Amerika" (America) becomes "amerikai" (American).
6. "-ol": Attached to nouns to form verbs. This is often used with foreign words in Hungarian, like "ghosting-ol", "bullying-ol", "coworking-ol".
Additionally, in Hungarian these suffixes can be combined, so if you want to use the English noun "ghosting" as a noun in Hungarian, you still actually have to add the suffixes, so you would say "ghosting-ol-ás".
> The original Hungarian notation was invented by Charles Simonyi, a programmer who worked at Xerox PARC circa 1972–1981, and who later became Chief Architect at Microsoft. The name of the notation is a reference to Simonyi's nation of origin, and also, according to Andy Hertzfeld, because it made programs "look like they were written in some inscrutable foreign language". Hungarian people's names are "reversed" compared to most other European names; the family name precedes the given name. For example, the anglicized name "Charles Simonyi" in Hungarian was originally "Simonyi Károly". In the same way, the type name precedes the "given name" in Hungarian notation. The similar Smalltalk "type last" naming style (e.g. aPoint and lastPoint) was common at Xerox PARC during Simonyi's tenure there.
> Simonyi's paper on the notation referred to prefixes used to indicate the "type" of information being stored. His proposal was largely concerned with decorating identifier names based upon the semantic information of what they store (in other words, the variable's purpose). Simonyi's notation came to be called Apps Hungarian, since the convention was used in the applications division of Microsoft. Systems Hungarian developed later in the Microsoft Windows development team. Apps Hungarian is not entirely distinct from what became known as Systems Hungarian, as some of Simonyi's suggested prefixes contain little or no semantic information (see below for examples).
---
Simonyi's native language may have had some impact on how he wrote code, but it was not "human language -> write code like Hungarian -> Hungarian notation".
> The resulting code was dense and hard to read. Simonyi’s system came to be known as Hungarian notation, both in homage to its creator’s birthplace and because it made programs “look like they were written in some inscrutable foreign language,” according to programming pioneer Andy Hertzfeld. Hungarian is widely cursed by its detractors. Canadian Java expert Roedy Green has jokingly called it “the tactical nuclear weapon of source code obfuscation techniques.” Mozilla programmer Alec Flett wrote this parody:
prepBut nI vrbLike adjHungarian! qWhat’s artThe adjBig nProblem?
On my yet to be assembled shelf of random things will be a bunch of books that intend to make one go o_O when they read titles in the background of video call.
What problems? Why most? And why "our"? I find it silly to think that a more precise language is going to solve "most of our problems", it's a bold claim which I believe to be false. No, world problems do not vanish if we were just better at understanding each other, that's not how world problems arise.
Actually, I am not sure that having different meanings of the same word in different contexts is really the problematic part causing ambiguity in conversations.
For example, sarcasm is a much more difficult problem. The words are the same, the meaning of the sentences is the same but the author's intent, the meaning behind those sentences is opposite if they are being sarcastic or not.
Which is a bit ironic, because in the year 2023 it turns out that correctly parsing sentences is an easier problem to solve than extracting meaning from context, even when given a known correct parsing. Lojban solved the wrong problem.
Exactly. And being able to handle ambiguity as a human is one of our strong points. We must not cripple our mental capability to understand and use ambiguity in language, it would a loss.
Why would you want to use a language that eliminates this ambiguity other than in legal or academic contexts? I think sentences like "Fruit flies like a banana." must exist, they are useful in that they are stimulating the recipient's mind to question the senders intentions. It demands further inquiries and questions, which is I think a good thing. In some cases being vague and ambigious is the only way to ask further questions.
A lot of communication is intentionally ambiguous at that. Be it diplomatic texts written to let both sides present a victory, or comedy intentionally obscuring key facts for a later reveal. Having the ability to be unambiguous is great, but human communication is full of situations where that's not what we want.
Ambiguity is an intrinsic quality of language. The first words that an infant learns are exclusively learned through ostensive definitions (defining by pointing). This implies that the meaning of those words is fundamentally different to each of us, as it's founded on our phenomenological experience and not through the dictionary.
Further words are learned to combine previous definitions and more ostensive definition, so ambiguity is carried on.
Lojban does not eliminate the ambiguity of the meaning of the words, but the ambiguity of its syntactical structure.
To me, the ambiguity of language, and the impracticality to change the meaning of words are the fundamental issues preventing humans to enhance the innate cognitive abilities, and the root cause of most human conflicts around communication.
Any profession and culture has its own jargon, this enables them to express the particular quality of their reality with much more accuracy than common language.
So, instead of trying to build a shared language, what we can do is to create a personal Language for each of us. Not for the purpose of communication, but for the purpose of improving our reasoning and internal model of the world, so we can describe and make sense of our particular reality with much more accuracy.
I've been dedicated to building "Interplanetary mind-map", a tool that supports making your language by making your meaning more explicit. [0]
At the same time, I've been using a prototype of it, to build my personal Language. You can browse into each word and see my particular understanding of it. [1]
I think going from the difference in environment in learning first words to the meaning of those words being fundamentally different for everyone is a non sequitur.
Specifically when not defining the ambigous concept „meaning“ and without sketching the assumed mechanism by which they are learned and turned into language.
Lojban could have gone further to eliminate logical ambiguity, allowing monoparsing from sentence to an explicit statement in a framework of computational semantics (e.g. Davidsonian event semantics). That Lojban did not was one of its great failings.
I'm one of a few "fluent" speakers of Lojban. I still think that among similar attempts (including controlled English) Lojban is the best option so far.
I've been making a post-apoc survival adventure game lately [1], and as part of the world-making I've spent some time on banging out little slang languages and cultural terms, for the characters in it to use. They deal in English, mostly. But so much time has past, and theres been so much loss of the traditional institutions which "anchor" languages over the decades and actively help to keep them consistent, that there cant help but be LOTS of drift, here and there. Linguistic mutation and evolution -- but survival of the "most popular." Of whatever gets most repeated, person to person.
Its also a comedy so I get to have fun playing around with the distinction between "Approved, Proper and Official" languages and words, and their nemeses: the hip, the veil-piercing, the casual, the quickly improvised or intentionally irreverant ones.
When I was a kid I could effortlessly switch back and forth between the two types. Though I felt more natural and at home in the latter. Then as a groan-up in the white-collar workforce, when I was forced to do the whole hours-long, two--way highway commute M-F to spend yet another 8 hours in a drab cubicle, and had to Watch Every Single Thing I Said and Did All Day Long lest some random person take it out-of-context or Find It Offensive, a certain joy for life was... slowly beaten out of me.
I've always been a fan of Lojban, on paper. But I dont see it having a bigger future. English is "good enough" however imperfect.
Ithkuil also is a constructed language with the goal of being completely un-ambiguous. As a bonus it uses the most phonemes of any language so complex messages and meanings can be communicated in an efficient way.
After learning toki pona (the simplest conlang perhaps?) I got attracted by Lojban due to the machine-parsable aspect of the conlang, but I found it really tedious and not fun to learn.
I started working on a conlang with similar ideas a while back (emojis in the language, machine parsable) but less strict and nicer sounding. Never ended up with anything though :)
That would be terrible. Ideographic written languages are hard to learn (you need to memorize much more to get basic skills) and hard to parse (tokenization is more difficult).
Of course, if that means the entire world speaks the same language, the price would be worth it.
But I doubt we will converge on anything that is difficult to master, since water takes the path of least resistance.
That's not actually true. When MacArthur tried to get rid of the use of Chinese characters in Japanese writing during the post-WW2 occupation, he backed off when it was embarrassingly pointed out that Japan had a higher literacy rate than the US, despite having less of an investment in state schools at the time. Studies have shown that it does not take more time or effort for a child to learn to read and write Chinese or Japanese (assuming those are their native spoken languages) than for an American student to learn to read/write English. It may defy what you think you know as common sense, but it is a replicable observation.
The reason for this is that nobody except little children actually read by sounding out individual letters of words. That would be too much for the human mind to process in real time. You learn to read by recognizing particular clusters of lines on a paper or screen as having semantic meaning, and your brain translates that directly. The imlpiactoin is the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe. (Did you read that OK?)
So essentially, you and a Chinese person both put in the same amount of effort to learn the same number of squiggly lines on paper as having semantic meaning, and memorizing a mapping of squiggles to words. It's just that in English the pronunciation is emphasized in the writing, whereas in Chinese it is mainly the meaning of the word that is focused on. But given that in this age most of our communication is through text which may never even be read aloud, it does make one wonder if the Chinese had the right idea all along.
But if so, rather than use a 3000 year old writing system with a lot of inherited historical baggage, why not start afresh? Keep the same basic concept (digital pictures in squares representing atomistic units of meaning, aligned in sequential order according to a grammar), but allow the pictures to make full benefit of today's digital display and font rendering technology. The logical end result of this is, I believe, emoji. Or something very close to it / evolved from it.
To get an idea of what I mean, you might want to check out Xu Bing's "Book from the Ground", which is a very readable book written in emoji, or something close to it. You can read about his work here:
> When MacArthur tried to get rid of the use of Chinese characters in Japanese writing during the post-WW2 occupation, he backed off when it was embarrassingly pointed out that Japan had a higher literacy rate than the US, despite having less of an investment in state schools at the time.
I’m not sure this can be solely attributed to the difference between character sets.
> The reason for this is that nobody except little children actually read by sounding out individual letters of words. That would be too much for the human mind to process in real time.
This sounds a lot like the theory that justified getting rid of phonics in schools, which has been an unmitigated disaster. It’s also not true when you consider how adult readers approach new words; English in particular isn’t great at this but there’s at least a hope that you’ll know how to pronounce a word you see written down, which approaches certainty in other alphabetic languages. This is useful for cross-referencing between spoken and written language.
> (Did you read that OK?)
Most English readers can mentally correct transposed letters, which is yet more evidence of the resiliency of alphabetic writing.
> But given that in this age most of our communication is through text which may never even be read aloud, it does make one wonder if the Chinese had the right idea all along.
I very much doubt that this is true. It might be true (or seem true) in the weird bubble you and I live in, but it’s not true for the majority of people.
> it does not take more time or effort for a child to learn to read and write Chinese or Japanese (assuming those are their native spoken languages) than for an American student to learn to read/write English.
As alphabetic orthographies go, English is almost uniquely horrible, though, so that's hardly surprising. Have they tried comparing to e.g. Spanish?
And yes, when we read with a practiced eye, we do pattern match entire words at once. But it's much easier to get to that point with a simpler, composable writing system - and the advantage of the alphabet (or a syllabary) is that you can still read everything before you get there, just slower. And that's just the reading part - the advantages of a decently phonemic alphabet or syllabary for writing are even more blatant.
> And yes, when we read with a practiced eye, we do pattern match entire words at once. But it's much easier to get to that point with a simpler, composable writing system
It is not. From the moment you learn a character in Chinese you are sight-reading by shape. It doesn't take much practice at all. Every bilingual Chinese-English person I know prefers reading Chinese, no matter how long they have been exposed to both languages as it is less mentally straining. English is my first language and even I experience this to some degree.
But until you do learn a Chinese character, you are unable to read it at all. Whereas with an alphabet or a syllabary, you can read anything as soon as you learn the letters; memorizing the shapes of the words comes later, and then only for those words you actually see often.
There's a reason why Hangul was a big deal in Korea when it was introduced, and why the upper classes tried to actively suppress it - it did, in fact, made reading and writing more accessible to the common folk. The same reasons why it was true then still apply.
I don’t think I agree with your conclusion. Taiwan had a higher degree of literacy than China in spite of traditional characters, yet most Chinese and Taiwanese people seem to have a hard time remembering characters. Indeed in China people just use pinyin mostly now to type, which uses the latin alphabet.
The best thing you can do today if you want quick adoption of a language is to use the latin alphabet.
Taiwanese mostly use BoPoMoFo input schemes, btw. The thing about "forgetting the characters" is recognition vs. production. Those are different skills supported by different parts of the brain, and if you only practice one then the other atrophies. Most people these days only write Chinese on a keyboard, so the computer does the work of transforming the phonetic input (BoPoMoFo in the case of Taiwan) into the correct character, saving the typist the trouble. Someone who forgot how to write a character will have no trouble entering it into an IME and selecting the right option. They'll know it when they see it.
If anything this just reinforces my original point: people consume written text more than they produce it, so we should optimize for that.
I agree with you up until the last point, they do produce it as well and most Chinese speakers are resorting to pinyin to produce, which uses the latin alphabet :D
Even Chinese speakers can usually guess the pronunciation of a word they’d never seen, because of how radicals work. English is barely better than Chinese at this.
Of course there are plenty of languages far better than either, with almost perfect match between spoken and written language. Language changes constantly, after all.
A long time ago I tried to learn Esperanto well enough to read in it. I quit, but I was still intrigued.
Given events in the last 18 months, instead of a constructed language like Lojban or Esperanto I would vote for a small subset of English with a limited grammar and vocabulary - with tools to enforce both vocabulary and simplified grammar.
Why the change in opinion? It turns out that English works better than many other languages for text embeddings and generally working with Large Language Models. A simplified version of English, perhaps called ‘glish’ would be easy for English speakers to pick up and use with good automatic writing and editing tools. Advantages: make it easier for AI and for casual English users around the world.
I wonder if LLM entirely in lobjan would require less text to train to be just as good as the models we have today due to its higher clarity in expression.
I also feel the same way about Esperanto due to its high regularity in syntax.
Part of me also wonders if an ai model could be optimized by training it on a language specifically invented to be both easy for the ai to understand and the human to communicate in. Maybe the language was made up of lots of 4 letter words bringing the number of words to tokens closer to 1:1, or injecting more context per word so sentences could be shorter and use less tokens
IIRC Loglan vocabulary had a large element of Mandarin Chinese, and so Lojban too ?
This seemed both uselessly idealistic and absurdly user-hostile. I would think that a conlang targeted at global appeal would be based on some combination of Romance languages and Germanic languages, perhaps an ur-version of each (like, Latin and Icelandic). Such a process might bring you a result where the vocabulary was as accessible as English while having immensely better-than-English spelling, grammar, morphology, etc.
I have actually thought about this a lot in my life, I think a good compromise is to develop regional constructed languages that correspond to geography and current culture.
So for example, here in Alaska, Athabaskan languages are mostly dead, Yupik and Inupiaq is still around but English is predominant pretty much everywhere except west Alaska and the occasional community. Sugpiaq and Alutiiq are basically gone, Tlingit has a revival but it’s not around, Eyak is dead, etc.
Why don’t we create a blended conlang that can take pieces of each of these languages in the form of vocabulary and grammar - even though they’re very different grammatically and in terms of lexicon, we can make a language that is relatively easy for people to learn because it’s regular, substantially different than English so it’s not “colonizer-tongue” and largely unique to the geography of the place.
You could do the same thing in Europe to create lingua-francas.
I tried to learn lojban back in the day. I liked the phonetics, vocabulary and system in general, but it seemed an impossible task to remember the placeholder order of every single predicate. Maybe not so much of a struggle for an AI though.
I still believe there will be an awesome international auxiliary language that the world will embrace, or rather those who wish to communicate with everyone will embrace. It may currently exist or might yet need to be invented. People are making new ones all the time.
The dream of an international auxiliary language is an old one, yet none — of many — have ever caught on outside of niche hobbyists. It’s unclear to me what further advantages a conlang could have that would overcome the incumbency advantage of the dominant lingua francas. I don’t think eliminating syntactic ambiguity is really that exciting for people in general.
I vaguely remember a TNG* episode in which humans encounter an alien race that found our language too ...inaccurate, and spent a lot of time going over each word and sentence, to precisely define its meaning.
* Was it TNG or am I dreaming?
*edit: Thank you everyone, but it wasn't "Darmok". The humans and the aliens were working, for a long time, on a treaty of sorts because the human language was so confusing and ambiguous and they had to go over every small details, over and over ...
Need to re-watch TNG
Was it the episode in which the Federation needed more time to evacuate a colony and the other party refused because the treaty left no room to interpret it like that?
If I remember correctly, the crew found a loophole using a rule that allowed them to ask for arbitration - they asked for an arbiter from a species that was currently in rather long hibernation.
“Darmok” is Internet favorite, about a species which language is exclusively in analogy that the translator could not be set up, and they decide to kidnap Picard to establish vocabulary over a campfire.
There were couple episodes in Voyager in which aliens deem humans interesting but lesser, and tries to experiment or study on crews. One that I remember ended up in a ship self destruction, in another the doctor leaving his copy. As to why they didn’t simply ask Borg for a zipped archive, I don’t know.
The Dominion commanders throughout DS9 must have made similar remarks, and so should Cardassians have. Q would outright say so in spite, corrupted Data might. Maybe combination of those? Or maybe I’m just a suboptimal Trekkie :p
I'm not them, but this is it. Their government is called the Sheliak Corporate, and Picard finds in the overly long treaty a clause about resolving disputes needing a third party - and names a species currently in hibernation. So they give in and let him have the much shorter time he's being asking for, for an evacuation.
I believe I've seen every TNG episode at least once, and that's not ringing any bells. Maybe it was an original series episode? Despite my best efforts, I haven't been able to make it through more than one or two episodes of the original.
I like this point very much: "You can be as vague or detailed as you like when speaking lojban. For example, specifying tense (past, present or future) or number (singular or plural) is optional when they're clear from context.".
LMBTQ movement has came into my mind immediately. I think they would have more success by introducing a new, 'don't care about your sex' pronoun to the language instead of adding a third one that just complicates matters.
> LMBTQ movement has came into my mind immediately. I think they would have more success by introducing a new, 'don't care about your sex' pronoun to the language instead of adding a third one that just complicates matters.
I thought it was more of a pro-caring thing than a pro-uncaring thing.
Those focusing too much on the latter are really quite good at bumbling into insulting behavior...
Slippery slope fallacy here.
Why should I care if my online customer is he/she/they as long as they pay the price? Or if I talk about technology through the Internet? All I want is effective communication with no insults.
Is this meant to describe how a slippery slope forms, or call one out?
Effective communication involving other people only sometimes overlaps with this subjective-first, transactional approach, the: I care about, I talk about, or All I want is.
(It's a great view for designing a specific type of community, with business-transactional perspective + valuing of tightening subjective logic and avoiding fallacy. Well, this will always result in a specific type of tech community with some special focus on business and economic factors as combined with tech.)
And I guess if we effectively cherry-pick the premise, with straw people assuming their various positions on the stage, we can easily justify a more anonymizing, transactional view of language. "This is how I prefer to view my life, therefore the needed language looks like..."
In this view, language doesn't need all that extra stuff. Keep it direct, functional, logical, and avoid fallacies especially.
But this isn't really getting at the relating capabilities of language for example, like when there are bigger problems at hand and diverse mindsets working together to solve them.
Instead, it ends up bringing minds into silos, as broader personality theory predicts. So the I, I, me, is really more about self-support.
With this view, individuals effectively make relating all about the subjective-them 24/7, and about whether they are good/not good. They see relating with others as "we will each be our best and get what we each want" as if it's a universal creed. The one philosophy that just makes sense. Do others even exist?
And is this actively mean? No. Perhaps a better term is effectively: "passively careless," to borrow a phrase from Sabine Hossenfelder. And even in the subjective-first view it still does not make for a high-quality individual, which is also bad news.
It's unfortunate, because not only does this view raise the stakes on their end with no good reason, creating things like martyrdom issues, but it also perpetuates the avoidance of effective contact with others, effectively making "others" the persistent bête noir.
They/them already exists and have centuries of use for that use-case. The people objecting to that are just as likely to object to an entirely new one.
There also are already multiple "neopronouns" [1], and they are frequently ridiculed by those taking issue with "unexpected" (to them) pronoun use). This is also not a new debate, with variants (other than singular they) being in use in limited contexts since the late 18th century. None of the attempts have met more than very limited success, as you can tell from the fact this is still a debate.
> They/them already exists and have centuries of use for that use-case.
This is just gaslighting. People use "they" when they talk in the abstract. No one said sentences like "This is Tom. They is happy." until a few years ago.
Apart from being grammatically wrong (it'd still be "they are"), this is entirely besides the point, which is that the people who object to (the corrected version of) this construction are just as likely to object to a neopronoun, and so there's little reason to assume a neopronoun will stop people from getting worked up over this even if you could get people to use them.
You'd be surprised. I've met people with preferred pronouns like xe who were outright offended at the use of they/them, because "that's not my pronoun".
Different category of objections, so not really relevant to the point, which was about the objection to their use in general, not about any given person.
I spent some time looking at Loglan, purchased some books and tapes, but never really made any progress with it. A Scientific American article got me interested, pre-internet days so finding a community to converse with was not easy.
Lojban in theory can address syntactic ambiguity (that has semantic consequences). Lexical ambiguity is somewhat hopeless (unless you only have a fixed number of things, states, eventualities, &c. you're going to refer to). [This isn't an issue of Lojban as such, just of representational systems.]
Honestly if you're tempted to learn one of these universal languages then I'd suggest going all-in on SolReSol[1]. At least if you get bored of speaking it, you can instead paint it, or play it on a trumpet.
I spent a few weeks with Lojban around 2006. imho it felt really dense as a language. Definitely interesting. Back then there wasn’t as much material in Lojban as is now available.
Why is it so hard, and why hasn't anybody done it before? The whole problem seems trivial to me. You need a language where
- first you need your words to have only one meaning. You can even use an existing vocabulary, like English, or Esperanto.
- then you need to be express math/logic/structured programming.
That's it, now you have a language without any ambiguity. If there are some ambiguity in the objects/concepts that you are referring, then use your language more to specify the object.
> Why is it so hard, and why hasn't anybody done it before? The whole problem seems trivial to me. You need a language where
> - first you need your words to have only one meaning
There's your answer. There is no such language, there are no such words. It's questionable if it's even possible - to me, it sounds like a category error. "Words" aren't keys in a hash table, and "meaning" isn't a discrete thing.
> now you have a language without any ambiguity. If there are some ambiguity in the objects/concepts that you are referring, then use your language more to specify the object.
Thing is, I don't think ambiguity comes from language. Concepts and categories are naturally fuzzy for us. Precision is a continuum.
If you need to express something more precisely than your language allows with single word or term, you just create a new word and attach a long-form description to it to position and connect it to other words in the latent^H^H^H^H^H^H^H concept space. That's how we always did it, and it was never hard.
I still don't get it. In the linguistic part the only problem is to make to word->object mapping unambiguous. And it is trivial problem. If the world is fuzzy (i don't think so) then is not a problem in linguistics.
Yet linguists seem to aim for a language that eliminates ambiguity (see TFA). Where am I wrong?
The bear... no, not that bear, the other bear--yes, I know it's a dead bear... what, you call that a corpse and not a bear? Okay, the new corpse of a former bear, the one that is right over there...
When does a bear stop (or start) being a bear? The 'bear goo' quickly boils down to "what you believe in" and that (surprise!) people have different views, even if we restrict "bear" to only living individuals of the family Ursidae--is it a bear from the time of conception, or even before that, or maybe only at or some time after birth, or always, or never?
Of course you could be more specific--pa vizyci'ojvemrojvesofyjvericparcribe, for example--but now it takes forever to say anything, and you'll lose some of the audience--"the magistrates however replied that they had forgotten the things which had been spoken at the beginning"--and you can probably still find someone for whom the "nearby young and dead and soviet and tree-climbing bear" is still a bit too fuzzy, or anyways there will be people who will disagree with (or ignore) whatever arbitrary definition of bear you've cooked up. Maybe there are two such bears, or they'll quibble that a corpse is clearly not a bear, or they'll happily modify your oh-so-precise definition to mean something else, if you know what I mean, nudge nudge wink wink say no more because The Algorithm plusgoods discussions about bears, and so they use 'bear' to talk about something else while still collecting all those sweet, sweet plusgoods.
Also, lojban does not eliminate ambiguity, zo'e co'e be do'e zu'i, but we're za'o into nunribypevykreborvi'u. A shaved bear would be less fuzzy?
Virtually every word has edge cases and gray areas, so there can be no 1:1 correlation between language and the world. Think of something as basic as "chair." Is a love seat a couch or a chair, or both? What about a bench? A stool? An ottoman? A small table? A tree stump? Reasonable people will give different answers, and some answers will be nonbinary ("it's very chairlike but not strictly a chair").
Once thought is encoded in language, it's in a lossy symbolic domain. It's in a messy shared vector space. And every human decoder unpacks the embeddings differently.
theoretically you can introduce new words to be more precise, for example eskimos have many words for snow, from (1) that is because they use polysynthesis: a base word is attached to many different suffixes which change its meaning. So, while in the English language we might have a sentence describing snow, fusional languages such as the Eskimo-Aleut family will have long, complex words.
So, you may solve the ambiguity problem by introducing a huge numbers of words, that could help a LLM, but humans need a relatively small vocabulary tailored to everyday use. So the tradeoff is reducing ambiguity and not increasing a lot the vocabulary.
A more common example is the meme about women and men colors. Women have magenta and salmon, while men call both pink. This has not much to do with linguistics or eliminating ambiguity.
You say 'pink', you mean 'pink', and if the other person wants to know which pink, they ask?
edit: this was a strawmanning, colors are a non-issue. But there can be situations where there are 2 objects, you say a word to refer one of the objects, and your partner thinks about the other object, and neither of you think that it was ambiguous, so you don't clarify until it's too late. Like there is a dead bear and a bear cub, you say 'bring the bear' and you mean the cub, and they mean the dead bear.
The difficulty in languages is not in having good features, but getting people to use the language at all. Existing human languages mostly solved the adoption problem because you want to speak the same language as the people around you. I don't think Lojban has reached that critical mass yet.
Programming languages have other ways of increasing adoption, such as offering libraries that are tightly coupled with that language. If there are interesting literature that are only available exclusively in Lojban, that might have been a good motivation to learn the language, assuming that they can compete with the heaps of existing work written in languages we already knew.
Ambiguity of a context-free Grammer is undecidable means there are some context free grammers where it is impossible to determine if they are ambiguous or not.
There are context free grammers that we can prove are unambiguous. As an example "A = xAy | ε" is unambiguous. Lojban is one of the examples of grammers we are able to prove is unambiguous.
A language humans use only in non-digital medium would be interesting. To avoid LLMs and surveillance-capitalism from knowing too much about you and your ways.
There are other logical languages out there. Most of them are in development, and are discussed in a Discord called the "Logical Languages Lodge". However there is another called Toaq that is almost complete, it at least as complete as Lojban (its vocabulary is still growing, but all of the essential words are there, and the grammar and semantics are complete). If you wish to learn a logical language in 2023, that's the one I'd suggest.