As a functional programmer myself, I have the same confusion. As best I can tell...

crimsonalucard · on Dec 25, 2019

Label the string as a functor and think about lifting the array into the string functor and doing string manipulation to change it to a dict, then lift it back into a dict type.

String manipulations of a lifted type in "string space" can be faster and more efficient then actually manipulating it in "type space"

edflsafoiewq · on Dec 25, 2019

[flagged]

zug_zug · on Dec 25, 2019

This is exactly what I mean. I can't tell if that's a joke or not.

crimsonalucard · on Dec 25, 2019

Not a joke.

Take a look at my response to the other guy. I clarified my point with a thorough example. Please note that I was somewhere else and on the phone when I typed my initial response so forgive me for the brevity and lack of clarity. Hopefully the new response will clear things up.

Anyway...

I think theres a good chance that you simply aren't grasping what the category theory fans are saying. Hopefully you grasped what I'm saying.

crimsonalucard · on Dec 25, 2019

Apologies. I was on the phone typing the above. Still, remarks like "What?" that provide no information or context are against the rules on HN. Just FYI.

The OP said that there's no point in labeling an arbitrary type as a functor or monadic value. I'm going to use an example to illustrate how there is a benefit in being aware of this concept. I will be using a "String Monad" in my example.

Imagine you have some type, like in the initial example.

And you want to convert that type into another type.

I'll give you an arbitrary example using python:

You want to convert this: {1:2 ,3:4, 5:6, 7:8}

To this: [1,2,3,4,5,6,7,8]

You can do this via loops or you can "lift" the dictionary type from "type space" to "string space" via the python functor str():

   lifted_dictionary = str({1:2 ,3:4, 5:6, 7:8})

This is similar to movement between two categories or objects via a functor. You see similar techniques in other fields of engineering when say someone "lifts" something from cartesian coordinates to polar coordinates. Which is why I'm using the word "space" it's similar to how I'm shifting representations of coordinates to have an easier time with certain calculations.

Then you do your type manipulations in "string space" rather then on the type itself. So it all ends up being string manipulation operations to get from the string "{1:2 ,3:4, 5:6, 7:8}" to the string "[1,2,3,4,5,6,7,8]". I didn't use regexp in the example below but you get a huge performance boost if you use that instead.

   lifted_array = "{1:2 ,3:4, 5:6, 7:8}".replace('{','[').replace('}',']').replace(':',', ')
   #computed value is a string of the form: "[1,2,3,4,5,6,7,8]"

Then you use the opposite functor from str() to lift the value back into "type space"

   array = eval(lifted_array)

Here eval() is the opposite functor to str()

Type conversion complete. The full code:

   original_dict = {1:2 ,3:4, 5:6, 7:8}
   array = eval(str(original_dict).replace('{','[').replace('}',']').replace(':',', '))
   #value of array is [1,2,3,4,5,6,7,8]

Essentially you use functors to lift your data structures into other categories for easier conversion, then you simply bring the stringified type back down into regular "type space".

So in short I took types and lifted it into the String Monad or "String Category" or whatever you want to call it, then brought back down into types.

It seems like an arbitrary way to do type conversion but using regexp to do string manipulation in place of a for loop that unrolls the dictionary is more performant and faster. Think about preserving order as well. When I convert the dict type into string space, ordinal properties of the string itself will be enforced on the dict. SO the keys in the dict will follow the order they appear in the string as will the array string that it is converted to.

This technique utilizes category theory. The categories you are lifting to must be isomorphic to the origin category, you must understand the notion of functors (or monadic values) to really get it.

throwawaymath · on Dec 25, 2019

> This technique utilizes category theory.

No, it doesn't. Category theory is not intrinsic to the development, use or explanation of this technique. Category theory is merely one possible framework for explaining why the technique works the way it does. That's an important distinction, and likewise it's not the most practical framework for understanding this technique by a long shot.

In particular, understanding why certain programming techniques are more performant than others virtually never requires the language and abstractions of category theory. Everything you're saying here is a further demonstration of the complaints others have voiced in this thread. Not because you're wrong - you're not wrong. Rather because it's just not the most practical way of understanding or leveraging most programming techniques, and it injects a significant amount of unnecessary abstraction and foreign mathematical terminology into discussion.

For what it's worth I have taken graduate math courses wholly and partly focused on category theory, so I have the background to follow what you're saying. It's essentially correct. But it's also obtuse and mostly inscrutable to other people. It reduces, rather than increases, the shared context software engineers leverage to understand each other. You can project a vast and magnificent theory of categories onto programming because it adheres to a variety of algebras, but at the end of the day it's just not clarifying anything further. Instead it's miring it in overcomplicated verbiage.

There's basically no reason for you to be explaining why a regular expression is faster than a for loop for this use case with the formalism of category theory, unless it's as an academic exercise just to show you can. Explaining this example with the language of functors and type spaces is like proving that Excel is Turing complete in order to explain how summing arbitrary rows in a column works.

crimsonalucard · on Dec 25, 2019

>Category theory is not intrinsic to the development, use or explanation of this technique.

No it's not intrinsic to the technique, you're right on this.

>There's basically no reason for you to be explaining why a regular expression is faster than a for loop for this use case with the formalism of category theory, unless it's as an academic exercise just to show you can. Explaining this example with the language of functors and type spaces is like proving that Excel is Turing complete in order to explain how summing arbitrary rows in a column works.

You're right on this as well. The category theory part I am using here is basically saying that you can "lift" the type into "string space" and do conversions in an isomorphic category. I am not trying to say that category theory will explain why regexp is faster. More like I'm just saying that categorical insight offers you alternative design choices to reach your goal. It is up to you to determine whether that path is performant.

>For what it's worth I have taken graduate math courses wholly and partly focused on category theory, so I have the background to follow what you're saying. It's essentially correct. But it's also obtuse and mostly inscrutable to other people. It reduces, rather than increases, the shared context software engineers leverage to understand each other. You can project a vast and magnificent theory of categories onto programming because it adheres to a variety of algebras, but at the end of the day it's just not clarifying anything further. Instead it's miring it in overcomplicated verbiage.

I'm not a mathematician, I'm the farthest thing from that, so hopefully my explanation is understandable to the layman as I, being a layman myself, can offer that perspective naturally. But you aren't wrong here. I can see how the vocabulary can confuse... BUT I am not trying to inject overblown concepts into what is otherwise a straightforward technique. This is not why I brought it up and it is not my intention. I'll explain. Read on.

>That's an important distinction, and likewise it's not the most practical framework for understanding this technique by a long shot.

Honestly, I found this technique by thinking in terms of category theory. I would not have come up with it had I not learned category theory. I stated that it uses category theory because that is what I myself used to come up with it.

That is the point I am trying to make. Categories personally helped me utilize a technique. And hopefully my anecdotal experience will lend some credibility to using category theory to help with programming. I am not trying to use the terminology to sound pedantic. Apologies if it seems that way.

And also note this notion of converting from one "space" to another "space" to have an easier time doing transformations is a common technique even outside of software. To cite another example: electrical engineers move between "signal space" and "frequency space" in order to better understand the properties of signals.

I know of no universal term for this conversion that covers the general idea other then the categorical term: "functor." At the very least, category theory introduces a vocabulary for Design techniques like this. Similar to "GoF Design Patterns" but more theoretical and formally defined.

AlEinstein · on Dec 25, 2019

Well it might use category theory and be buzzword compliant but the code you’ve written wouldn’t pass code review at my work. This code converts numeric data to a string and back to numeric data.

crimsonalucard · on Dec 25, 2019

Buzzwords are popular words. You put category theory on your resume it's not gonna get you anything. It's obscure.

Look the point isn't the python, the point is, the design pattern can be applied to many places and in some contexts it's faster in other contexts it is not, but the design pattern exists. My intent was to convey understanding of the pattern at a very general level. Not to argue about the validity of the python implementation.

visarga · on Dec 25, 2019

I would simply do:

    original_dict = {1:2 ,3:4, 5:6, 7:8}
    array = sum(map(list, original_dict.items()), [])

And it would have been simpler if items returned lists instead of tuples. In my benchmarks it runs 10x faster.

crimsonalucard · on Dec 25, 2019

This wasn't the point. Also map is not recommended for usage in python. Stylistically the convention says you should be using a list comprehension in place of filter and map.

Also did you do your tests with regular expressions? I specifically mentioned I didn't use regexps but it would be dramatically faster if you used it instead of the replace method.

I'll reiterate the point. The point is there is a pattern you can follow where you can move from one "type space" into another "type space" then back because the other type space may be easier to write transformations.

Obviously from the this thread, the serialized type space may not be the best way for python. But there are many other contexts where this it is better and my point is, category theory allowed me to derive this general pattern. That's it.

patrec · on Dec 25, 2019

Still needlessly complex and slow.

    [x for kv in d.items() for x in kv]

michael-ax · on Dec 26, 2019

aaah! thanks for explaining that its about fun things to do with isomorphic conversions and transforms -- which makes it such a fundamental thing that i can likely continue to invent the bits i need of it as i go.

crimsonalucard · on Dec 27, 2019

It's not just fun, there's a practicality to the method as well which I believe, judging from the responses, that my example failed to convey.

In another comment, I came up with another example that uses the same pattern that uses a more "practical" example. I'll paste it below:

the goal is to convert:

   {"1":"2", "3":"4", "5":"6"} to {6:1, 2:3, 4:5}

Or essentially the ordered dict, rotated. Which space is a rotation more intuitive? A list space. [1,2,3,4,5,6] is more readily rotated into [6,1,2,3,4,5] with one operation and converted back into a dict.

If you tried to do the above directly by manipulating the dictionary it would not be as straightforward. Use a functor to lift the dict into a list, do the rotation and use the opposite functor to lift it back down to a dict.

crimsonalucard · on Dec 27, 2019

Also totally don't appreciate the sarcasm. I hate people who have this type of attitude. What's your intention? Just to piss me off? I'd flag your post, but this thread is pretty much dead already.

patrec · on Dec 25, 2019

[flagged]

dang · on Dec 25, 2019

Please don't cross into personal attack, no matter how wrong or ignorant another comment is, or seems.

https://news.ycombinator.com/newsguidelines.html

nopurpose · on Dec 25, 2019

Author mentioned elsehwere that he used that trick in postgresql , not in python, where it allowed him to skip SQL acrobatics and sped up query.

crimsonalucard · on Dec 25, 2019

Thank you for this. This is indeed where I actually used the technique. You do get dramatic speed up if you do it in SQL this is provable.

I chose to use python as an example here mainly because the SQL example would get very convoluted and be much less concise.

The main point was the general technique of moving back and forth between isomorphic types, but people are getting too involved with implementation details of python. Yeah it might(keyword) be slower, yeah it's more convoluted than a traditional list comprehension in python, but that is not the point.

Lesson learned: don't use arbitrary examples to prove a point. Use an example where all metrics are dramatically better.

crimsonalucard · on Dec 25, 2019

First off, I don't like your sarcastic attitude. It's against the rules on HN to do that. Don't violate rules to be rude.

Second off, my example is just an example. It's not production level code. The intention of the example is to show "lifting" from from one type into another "category." The efficiency is less relevant here... in some contexts it can be faster in other contexts it could be slower or not even possible.

>Your gloss reveals further substantial misapprehensions about python basics (there are no regexps here, no it's not faster to do it that way, no it's not any more order preserving then iteration).

You sentence here reveals a failure in reading comprehension. I never mentioned what did was a regexp. I said you Could use a regexp to make it faster. A regexp if you know basic computer science is the one of the fastest possible string parsing operations. Take a course on Automata at a top university, this would be helpful in educating you, but you need to qualify first.

If there is a bottleneck it's in the casting functors str() and eval(). I didn't benchmark it and I very much doubt you did either. I suspect it is faster as the interpreter needs to "interpret" the data structure anyway regardless of whether or not it has string quotation marks around it so the parsing bottleneck is already there regardless of whether or not you have quotation marks. This is just a guess, and I'm pretty sure both of us are too lazy to generate actual specs to prove it either way. Stop throwing hard conclusions at things you have ZERO evidence for. If you do have evidence, show it, don't talk out of your ass... I'm willing to concede given evidence but without evidence you don't know what you're talking about.

The string does hold order preserving properties. You are absolutely wrong about this:

"{1:2, 3:,4}" during string manipulation can be made into "[1,2,3,4]" without sorting because the string itself is a List of ordered characters. Unserialized, the dictionary has no order. Again what's going on here is you misreading and not understanding my explanation.

>Assuming your problem is purely lack of experience, here's a word of advice: stay the hell away from abstractions for a year or two.

This is flagrantly offensive. Let me give you a word of advice. Keep your attitude in check, this kind of attitude can ruin your whole career after you get your ass fired. Any hiring manager would prefer a less technical person (which you have not demonstrated any dramatic ability in) over someone with a bad overly domineering attitude, and that is a fact.

>Instead try to form some basic mental model of a) how a computer and b) your main tools (such as python) work (have a very simplified model of how the CPU, memory network and SSD work, with numbers correct to 1 or 2 orders of magnitude; read a bit about the python implementation). Try to predict how much time/memory/etc something should take before you implement it. Then implement it in the most straightforward and direct way.

How does this have anything to do with the topic at hand. You are invoking OS and CPU architectural level concepts which are so far away from python it has basically nothing to do with it. An SSD? Are you serious? All of these low level operations are light years in distance away from python. If you want time and space analysis of both algorithms, at the python level/category theory level you just follow the standard model of Big Oh complexity. You want to get nitty gritty? get the bench marks or switch to something like C++ or Rust. Nowhere does anyone have a detailed enough mental model to map out how the python code will execute at the assembly instruction level.

Category theory is a top down approach to programming while computer architecture is more of a bottom up approach. Experts in both fields are unlikely to collide.

FWIW I double majored in Electrical/Computer Engineering and Computer Science 10 years ago from a top university. Not only do I have a general model of computing from semiconductor physics (this is below logic gates and non linear circuits and computer architecture, all said to patronize you.) all the way up to python, but I hate throwing down credentials for no reason... SO why did I do it here? because you decided to assume that I lack experience.

patrec · on Dec 26, 2019

I timed it -- 2 orders of magnitude slower across the board.

    In [19]: short, middle, long = [{2*i+1:2*i+2 for i in range(n)} for n in [2, 2**8, 2**16]]
    In [20]: timeit  [x for kv in d.items() for x in short]
    812 ns ± 6.86 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
    In [21]: timeit  [x for kv in d.items() for x in middle]
    30.1 µs ± 1.87 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
    In [22]: timeit  [x for kv in d.items() for x in long]
    9.77 ms ± 1.19 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
    In [24]: timeit  eval(str(short).replace('{','[').replace('}',']').replace(':',', '))
    10.4 µs ± 240 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
    In [25]: timeit  eval(str(middle).replace('{','[').replace('}',']').replace(':',', '))
    592 µs ± 23.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    In [26]: timeit  eval(str(long).replace('{','[').replace('}',']').replace(':',', '))
    264 ms ± 3.22 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Python dictionaries are ordered, and iterate in order (since 3.6, as guaranteed property since 3.7). In older versions of python where they are not ordered, str also does not preserve the order in the literal (except for one of all possible orderings).

   Python 3.7.2 
   Type "help", "copyright", "credits" or "license" for more information.
   >>> str({5:6, 4:5, 3:2})
   '{5: 6, 4: 5, 3: 2}'

   Python 2.7.15
   >>> str({5:6, 4:5, 3:2})
   '{3: 2, 4: 5, 5: 6}'

You might consider what I wrote offensive, but maybe you can still take the above as evidence that there are areas where your mental model is imperfect. I have the impression there is a puzzlingly common class of CS engineering failures stemming from misguided abstraction attempts because people are either unable or just unpredisposed to making extremely back of the envelope task breakdown of the most direct implementation in terms of primitive, costed operations. I also see this happen to rather smart but inexperienced engineers. Since you are not inexperienced my advice is unlikely to be helpful.

crimsonalucard · on Dec 26, 2019

>I timed it -- 2 orders of magnitude slower across the board.

Dude, didn't I tell you to use regular expressions? Replace is obviously slower. I used it because it's easier to read then regexp. Rerun your experiment on that. I specifically mentioned that the regexp is the faster implementation. You completely ignored it.

Again the implementation was not my point. I don't even understand why you would spend the effort to record something so trivial.

Also the "magnitude" difference in speed is not exponential meaning for most applications the difference is negligible. You should know if you're "experienced" that a C++ binary is a "magnitude" faster than python just running a for loop counting to 100. Most people still use python because in the end because it Doesn't even matter.

>Python dictionaries are ordered, and iterate in order (since 3.6, as guaranteed property since 3.7). In older versions of python where they are not ordered,

Good to know.

>str also does not preserve the order in the literal (except for one of all possible orderings).

Categorically wrong. You talk about mental model? Examine yours. The very nature of a string is an ordered list of characters. A string literal "abc" has order and preserves it by nature of having order as a property. A data structure that loses order is one that does not explicitly have ordinal properties hence it does not preserve order.

>You might consider what I wrote offensive, but maybe you can still take the above as evidence that there are areas where your mental model is imperfect.

No one has a perfect model. To understand the world and computing, what people do is they make predictions and guesses and abstractions. I'm just guessing that string space might be faster in python. If I really wanted to understand why one benchmark was faster than the other I basically have to understand the source code behind eval and str. That is something that I don't care too much about and is therefore not worth my effort. Overall that wasn't even my main point.

>I have the impression there is a puzzlingly common class of CS engineering failures stemming from misguided abstraction attempts because people are either unable or just unpredisposed to making extremely back of the envelope task breakdown of the most direct implementation in terms of primitive, costed operations

The back envelope calculation is that the upper bound for both algorithms is O(N) where N is the amount of characters in the serialized data structure; and thus benchmark times are negligible in almost all contexts. Think about it. We have back envelope calculations down to a science (complexity theory) and that science doesn't even reference actual timing because humans can't see the difference between 0.001 ms and 0.001 ns. But you're going around all of the science and throwing bench marks at me. You want to squeeze the maximum amount of time out of your python program? Stop using python... start writing your code in cisc.

>I also see this happen to rather smart but inexperienced engineers. Since you are not inexperienced my advice is unlikely to be helpful.

An experienced engineer is only concerned with performance when it actually makes a difference. Your advice is not helpful because it is wrong.

Understanding computing is in itself composed of layers of abstraction. To optimize a high level language I don't have to understand assembly language. I definitely don't need to understand CPU architecture which you seem to imply is required. I have an abstraction that will help me: Big Oh Notation.

The most common time a programmer will need to understand the low level aspects of a computer is if they are writing systems level code, not application level code. These specialists tend not to use python at all. I would imagine category theory is not applicable at this level because they are closer to the turing model of computing which is by nature non-functional. Assembly, C, C++ or Rust is what they usually work in.

patrec · on Dec 26, 2019

> Dude, didn't I tell you to use regular expressions? Replace is obviously slower. I used it because it's easier to read then regexp. Rerun your experiment on that. I specifically mentioned that the regexp is the faster implementation. You completely ignored it.

Well, I ignored it because I thought that a) the actual string manipulation would form a negligible part of the total runtime b) regexp would not be faster (and the actual fast way to do it would be str.translate). Let's try it:

    In [43]: pat = re.compile('[{}:]'); slong = str(long)
    In [44]: timeit slong.replace('\{','[').replace('}',']').replace(':',', ')
    8.06 ms ± 23.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    In [45]: timeit repl = pat.sub(lambda m: {"{": "[", "}": "]", ":": ", "}[m.group()], slong)
    128 ms ± 5.94 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
    In [54]: timeit eval(repl)
    237 ms ± 3.92 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    In [54]: timeit slong.translate({'{':'[', '}':']', ':': ','})
    753 µs ± 27.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

See how having a reasonable mental model can be nice?

crimsonalucard · on Dec 26, 2019

Please read the entire post and respond to that rather then the tidbits that help you gain argumentative advantage. I have repeatedly said that the implementation WAS not the point and that all performance metrics are negligible as the are under O(N). Either way here's a response:

Honestly reg expressions in all general contexts is one of the fastest implementation of string parsing available for this specific type of parsing. You would have to understand computing theory to see this. Since you didn't even bring it up the actual fact of the matter is, you don't have a good model of computing in general in your head.

The issue you're seeing here is python specific. So really the only way to see this kind of thing is through benchmarks OR knowing the python source.

> See how having a reasonable mental model can be nice?

The problem is your posts also indicate to me that you don't have a good mental model. From what I can make of the model is this: Abstractions are bad, use less of it for more speed, also know SSD's and CPU architecture that will help you write faster python code.

Then what you do is run benchmarks and rely on that in place of an actual working mental model. Believe it or not you CAN run benchmarks on every permutation of a code path you can forego a model altogether. Evidence is more reliable then a model, yet your genius advice was for me to learn CPU architecture.

>a) the actual string manipulation would form a negligible part of the total runtime b) regexp would not be faster (and the actual fast way to do it would be str.translate)

Two things about this, first... str.translate is python specific. No general mental model would assist you with this. You are using python specific knowledge here.

The second part is similar. How do you know eval would be non-negligible? Theoretically the interpreter interprets python code and eval interprets the same thing. Is eval/str accessing heap space or stack space? What is causing it to be slow or is it the extra parsing itself?

Likely you don't know.

Either way my example served one thing. If you understood it you would know that the theme was basically to say that moving through Another type space could have advantages over moving through the original type space. The string functor was just an example.

I could easily say that the goal was to convert:

   {"1":"2", "3":"4", "5":"6"} to {6:1, 2:3, 4:5}

Or essentially the ordered dict, rotated. Tell me which space is a rotation more intuitive? A list space. [1,2,3,4,5,6] is more readily rotated into [6,1,2,3,4,5] with one operation and converted back into a dict.

If you tried to do the above directly by manipulating the dictionary it would not be as straightforward. Use a functor to lift the dict into a list, do the rotation and lift it back down to a dict.

That is the point and the pattern I am trying to convey. We can argue about benchmarks all day it serves Nothing.