Posts in Odin (20 found)
Ginger Bill 6 days ago

Designing Odin's Casting Syntax

Odin;s declaration syntax becomes second nature to everyone who uses the language but I do sometimes get asked ;Why are there two ways to do type conversions?; Enough that I had to make an FAQ entry..The reason that there are two ways to do type conversions is because one approach may feel better than the other case. If you are converting a large expression, it sometimes a lot easier to use the operator-style approach, . The call syntax is commonly used to specify a type of an expression which may be relatively short such as or .There are...

0 views
Anton Zhiyanov 2 weeks ago

Allocators from C to Zig

An allocator is a tool that reserves memory (typically on the heap) so a program can store its data structures there. Many C programs use the standard libc allocator, or at best, let you switch it out for another one like jemalloc or mimalloc. Unlike C, modern systems languages usually treat allocators as first-class citizens. Let's look at how they handle allocation and then create a C allocator following their approach. Rust • Zig • Odin • C3 • Hare • C • Final thoughts Rust is one of the older languages we'll be looking at, and it handles memory allocation in a more traditional way. Right now, it uses a global allocator, but there's an experimental Allocator API implemented behind a feature flag (issue #32838 ). We'll set the experimental API aside and focus on the stable one. The documentation begins with a clear statement: In a given program, the standard library has one "global" memory allocator that is used for example by and . Followed by a vague one: Currently the default global allocator is unspecified. It doesn't mean that a Rust program will abort an allocation, of course. In practice, Rust uses the system allocator as the global default (but the Rust developers don't want to commit to this, hence the "unspecified" note): The global allocator interface is defined by the trait in the module. It requires the implementor to provide two essential methods — and , and provides two more based on them — and : The struct describes a piece of memory we want to allocate — its size in bytes and alignment: Memory alignment Alignment restricts where a piece of data can start in memory. The memory address for the data has to be a multiple of a certain number, which is always a power of 2. Alignment depends on the type of data: CPUs are designed to read "aligned" memory efficiently. For example, if you read a 4-byte integer starting at address 0x03 (which is unaligned), the CPU has to do two memory reads — one for the first byte and another for the other three bytes — and then combine them. But if the integer starts at address 0x04 (which is aligned), the CPU can read all four bytes at once. Aligned memory is also needed for vectorized CPU operations (SIMD), where one processor instruction handles a group of values at once instead of just one. The compiler knows the size and alignment for each type, so we can use the constructor or helper functions to create a valid layout: Don't be surprised that a takes up 32 bytes. In Rust, the type can grow, so it stores a data pointer, a length, and a capacity (3 × 8 = 24 bytes). There's also 1 byte for the boolean and 7 bytes of padding (because of 8-byte alignment), making a total of 32 bytes. is the default memory allocator provided by the operating system. The exact implementation depends on the platform . It implements the trait and is used as the global allocator by default, but the documentation does not guarantee this (remember the "unspecified" note?). If you want to explicitly set as the global allocator, you can use the attribute: You can also set a custom allocator as global, like in this example: To use the global allocator directly, call the and functions: In practice, people rarely use or directly. Instead, they work with types like , or that handle allocation for them: The allocator doesn't abort if it can't allocate memory; instead, it returns (which is exactly what recommends): The documentation recommends using the function to signal out-of-memory errors. It immediately aborts the process, or panics if the binary isn't linked to the standard library. Unlike the low-level function, types like or call if allocation fails, so the program usually aborts if it runs out of memory: Allocator API • Memory allocation APIs Memory management in Zig is explicit. There is no default global allocator, and any function that needs to allocate memory accepts an allocator as a separate parameter. This makes the code a bit more verbose, but it matches Zig's goal of giving programmers as much control and transparency as possible. An allocator in Zig is a struct with an opaque self-pointer and a method table with four methods: Unlike Rust's allocator methods, which take a raw pointer and a size as arguments, Zig's allocator methods take a slice of bytes ( ) — a type that combines both a pointer and a length. Another interesting difference is the optional parameter, which is the first return address in the allocation call stack. Some allocators, like the , use it to keep track of which function requested memory. This helps with debugging issues related to memory allocation. Just like in Rust, allocator methods don't return errors. Instead, and return if they fail. Zig also provides type-safe wrappers that you can use instead of calling the allocator methods directly: Unlike the allocator methods, these allocation functions return an error if they fail. If a function or method allocates memory, it expects the developer to provide an allocator instance: Zig's standard library includes several built-in allocators in the namespace. asks the operating system for entire pages of memory, each allocation is a syscall: allocates memory into a fixed buffer and doesn't make any heap allocations: wraps a child allocator and allows you to allocate many times and only free once: The call frees all memory. Individual calls are no-ops. (aka ) is a safe allocator that can prevent double-free, use-after-free and can detect leaks: is a general-purpose thread-safe allocator designed for maximum performance on multithreaded machines: is a wrapper around the libc allocator: Zig doesn't panic or abort when it can't allocate memory. An allocation failure is just a regular error that you're expected to handle: Allocators • std.mem.Allocator • std.heap Odin supports explicit allocators, but, unlike Zig, it's not the only option. In Odin, every scope has an implicit variable that provides a default allocator: If you don't pass an allocator to a function, it uses the one currently set in the context. An allocator in Odin is a struct with an opaque self-pointer and a single function pointer: Unlike other languages, Odin's allocator uses a single procedure for all allocation tasks. The specific action — like allocating, resizing, or freeing memory — is decided by the parameter. The allocation procedure returns the allocated memory (for and operations) and an error ( on success). Odin provides low-level wrapper functions in the package that call the allocator procedure using a specific mode: There are also type-safe builtins like / (for a single object) and / (for multiple objects) that you can use instead of the low-level interface: By default, all builtins use the context allocator, but you can pass a custom allocator as an optional parameter: To use a different allocator for a specific block of code, you can reassign it in the context: Odin's provides two different allocators: When using the temp allocator, you only need a single call to clear all the allocated memory. Odin's standard library includes several allocators, found in the and packages. The procedure returns a general-purpose allocator: uses a single backing buffer for allocations, allowing you to allocate many times and only free once: detects leaks and invalid memory access, similar to in Zig: There are also others, such as or . Like Zig, Odin doesn't panic or abort when it can't allocate memory. Instead, it returns an error code as the second return value: Allocators • base:runtime • core:mem Like Zig and Odin, C3 supports explicit allocators. Like Odin, C3 provides two default allocators: heap and temp. An allocator in C3 is a interface with an additional option of zeroing or not zeroing the allocated memory: Unlike Zig and Odin, the and methods don't take the (old) size as a parameter — neither directly like Odin nor through a slice like Zig. This makes it a bit harder to create custom allocators because the allocator has to keep track of the size along with the allocated memory. On the other hand, this approach makes C interop easier (if you use the default C3 allocator): data allocated in C can be freed in C3 without needing to pass the size parameter from the C code. Like in Odin, allocator methods return an error if they fail. C3 provides low-level wrapper macros in the module that call allocator methods: These either return an error (the -suffix macros) or abort if they fail. There are also functions and macros with similar names in the module that use the global allocator instance: If a function or method allocates memory, it often expects the developer to provide an allocator instance: C3 provides two thread-local allocator instances: There are functions and macros in the module that use the temporary allocator: To macro releases all temporary allocations when leaving the scope: Some types, like or , use the temp allocator by default if they are not initialized: C3's standard library includes several built-in allocators, found in the module. is a wrapper around libc's malloc/free: uses a single backing buffer for allocations, allowing you to allocate many times and only free once: detects leaks and invalid memory access: There are also others, such as or . Like Zig and Odin, C3 can return an error in case of allocation failure: C3 can also abort in case of allocation failure: Since the functions and macros in the module use instead of , it looks like aborting on failure is the preferred approach. Memory Handling • core::mem::alocator • core::mem Unlike other languages, Hare doesn't support explicit allocators. The standard library has multiple allocator implementations, but only one of them is used at runtime. Hare's compiler expects the runtime to provide and implementations: The programmer isn't supposed to access them directly (although it's possible by importing and calling or ). Instead, Hare uses them to provide higher-level allocation helpers. Hare offers two high-level allocation helpers that use the global allocator internally: and . can allocate individual objects. It takes a value, not a type: can also allocate slices if you provide a second parameter (the number of items): works correctly with both pointers to single objects (like ) and slices (like ). Hare's standard library has three built-in memory allocators: The allocator that's actually used is selected at compile time. Like other languages, Hare returns an error in case of allocation failure: You can abort on error with : Or propagate the error with : Dynamic memory allocation • malloc.ha Many C programs use the standard libc allocator, or at most, let you swap it out for another one using macros: Or using a simple setter: While this might work for switching the libc allocator to jemalloc or mimalloc, it's not very flexible. For example, trying to implement an arena allocator with this kind of API is almost impossible. Now that we've seen the modern allocator design in Zig, Odin, and C3 — let's try building something similar in C. There are a lot of small choices to make, and I'm going with what I personally prefer. I'm not saying this is the only way to design an allocator — it's just one way out of many. Our allocator should return an error instead of if it fails, so we'll need an error enum: The allocation function needs to return either a tagged union (value | error) or a tuple (value, error). Since C doesn't have these built in, let's use a custom tuple type: The next step is the allocator interface. I think Odin's approach of using a single function makes the implementation more complicated than it needs to be, so let's create separate methods like Zig does: This approach to interface design is explained in detail in a separate post: Interfaces in C . Zig uses byte slices ( ) instead of raw memory pointers. We could make our own byte slice type, but I don't see any real advantage to doing that in C — it would just mean more type casting. So let's keep it simple and stick with like our ancestors did. Now let's create generic and wrappers: I'm taking for granted here to keep things simple. A more robust implementation should properly check if it is available or pass the type to directly. We can even create a separate pair of helpers for collections: We could use some macro tricks to make and work for both a single object and a collection. But let's not do that — I prefer to avoid heavy-magic macros in this post. As for the custom allocators, let's start with a libc wrapper. It's not particularly interesting, since it ignores most of the parameters, but still: Usage example: Now let's use that field to implement an arena allocator backed by a fixed-size buffer: Usage example: As shown in the examples above, the allocation method returns an error if something goes wrong. While checking for errors might not be as convenient as it is in Zig or Odin, it's still pretty straightforward: Here's an informal table comparing allocation APIs in the languages we've discussed: In Zig, you always have to specify the allocator. In Odin, passing an allocator is optional. In C3, some functions require you to pass an allocator, while others just use the global one. In Hare, there's a single global allocator. As we've seen, there's nothing magical about the allocators used in modern languages. While they're definitely more ergonomic and safe than C, there's nothing stopping us from using the same techniques in plain C. on Unix platforms; on Windows; : alignment = 1. Can start at any address (0, 1, 2, 3...). : alignment = 4. Must start at addresses divisible by 4 (0, 4, 8, 12...). : alignment = 8. Must start at addresses divisible by 8 (0, 8, 16...). is for general-purpose allocations. It uses the operating system's heap allocator. is for short-lived allocations. It uses a scratch allocator (a kind of growing arena). is for general-purpose allocations. It uses a operating system's heap allocator (typically a libc wrapper). is for short-lived allocations. It uses an arena allocator. The default allocator is based on the algorithm from the Verified sequential malloc/free paper. The libc allocator uses the operating system's malloc and free functions from libc. The debug allocator uses a simple mmap-based method for memory allocation.

0 views
ava's blog 1 months ago

my theme for 2026

Last year, I made a post called " My theme for 2025 ". Inchwyrm's post about the year of the wizard reminded me I should do another one for this year. My 2025 theme was 'learning'. I think I have managed that pretty well, even if it wasn't exactly the things I mentioned in the post. I just cannot make time for The Odin Project and Rust, and to make little games; I have to prioritize my studies, my volunteer work, staying up to date on data protection law and writing about it. Maybe one day :) The rest fits though: I passed everything I enrolled in last year, and finished the certification process in just 6 months. I started summarizing and translating cases for noyb.eu, and I was creative with my notebook and some pixel art. I learned a lot and I tried new things. My theme for this year is ' rejection '. I'm collecting them! A little while ago, the concept of collecting a 1000 no's was picked up by some blogs, and it helped me view rejection, criticism and other feedback in a more positive light. I want to grow, I want to try new things, and I want to become (positively) hardened by challenge. It feels uncomfortable and a part of me doesn't want to, but in a way, I also want to be humbled in a constructive way. This year, I will send out a lot of applications for both new work and a new apartment. That will undoubtedly result in a lot of no's; the market for both is just incredibly tough right now, and there always seems to be someone better. I have already received one rejection this year just a week after I sent out the application for something I thought for sure I'd at least get an interview from, so there's that. Other on-going things that produce rejections: You can also help with something rejection-adjacent: This is your opportunity to give me constructive criticism on what has always bothered you about my blog's theme, writing, or my behavior. I want the pressure and polish to result in a version of me that is better. That's what I need right now. I have relied long enough on mostly gut feelings, learning by myself and my own assessments of myself, and always thought I had to do it all alone; but I need outside feedback now, especially from people who want to see me grow and do better. I want to know how I can improve. Reply via email Published 23 Jan, 2026 I have submitted an idea to my workplace's idea management team and they are notorious for shooting down anything, but at least I tried. I'm sending out e-mails for a blog project I wanna do, and I have received no answer so far from the places/people I've messaged. I'll have to rethink my approach and then keep trying. Doing things that are a little embarrassing, like my post making known I am looking for work.

0 views
Ginger Bill 1 months ago

The Metaprogramming Dilemma

Article was originally posted here: https://odin.handmade.network/blogs/p/1723-the_metaprogramming_dilemmaDesigning this language has been difficult but fun. Two of the original goals of this language were simplicity and metaprogramming however, these together could be an oxymoron. But before I explain why, I first need to explain what I mean by ;metaprogramming;.Metaprogramming is an ;art; of writing programs to treats other programs as their data. This means that a program could generate, read, analyse, and transform code or even itself...

2 views
Ginger Bill 1 months ago

On the Aesthetics of the Syntax of Declarations

Article was originally posted here: https://odin.handmade.network/blogs/p/2994-on_the_aesthetics_of_the_syntax_of_declarationsn.b. This is a philosophical article and not a technical article. There are no correct answers to the questions that I will pose -- only compromises.I;m considering what the ;best; declaration syntax would be. Historically, there have been two categories: which I will call qualifier-focused and type-focused. An example of qualifier-focused would be the Pascal family. An example of type-focused would be the C family. Od...

1 views
Anton Zhiyanov 2 months ago

'Better C' playgrounds

I have a soft spot for the "better C" family of languages: C3, Hare, Odin, V, and Zig. I'm not saying these languages are actually better than C — they're just different. But I needed to come up with an umbrella term for them, and "better C" was the only thing that came to mind. I believe playgrounds and interactive documentation make programming languages easier for more people to learn. That's why I created online sandboxes for these langs. You can try them out below, embed them on your own website, or self-host and customize them. If you're already familiar with one of these languages, maybe you could even create an interactive guide for it? I'm happy to help if you want to give it a try. C3  • Hare  • Odin  • V  • Zig  • Editors An ergonomic, safe, and familiar evolution of C. ⛫  homepage • αω  tutorial • ⚘  community A systems programming language designed to be simple, stable, and robust. ⛫  homepage • αω  tutorial • ⚘  community A high-performance, data-oriented systems programming language. ⛫  homepage • αω  tutorial • ⚘  community A language with C-level performance and rapid compilation speeds. ⛫  homepage • αω  tutorial • ⚘  community A language designed for performance and explicit control with powerful metaprogramming. ⛫  homepage • αω  tutorial • ⚘  community If you want to do more than just "hello world," there are also full-size online editors . They're pretty basic, but still can be useful.

0 views
Ginger Bill 2 months ago

context—Odin's Most Misunderstood Feature

Even with the documentation on the topic, many people completely misunderstand what the system is for, and what problem it actually solves. For those not familiar with Odin, in each scope, there is an implicit value named . This context variable is local to each scope and is implicitly passed by pointer to any procedure call in that scope (if the procedure has the Odin calling convention). The main purpose of the implicit system is for the ability to intercept third-party code and libraries and modify their functionality. One such case is modifying how a library allocates something or logs something. In C, this was usually achieved with the library defining macros which could be overridden so that the user could define what they wanted. However, not many libraries support this, in any language, by default which meant intercepting third-party code to see what it does and to change how it does it is generally not possible. The value has default values for its parameters which is decided in the package runtime. These defaults are compiler specific. To see what the implicit value contains, please see the definition of the struct in package runtime . Fundamentally, the entire point of the system is to intercept third-party code, and to change how it does things. By third-party, I just mean code not written by yourself or code that you cannot easily modify (which could even be your own past self’s code). I expect most people to 100% ignore the because its existence is not for whatever preconceived reason they think it is for, be that minimizing typing/passing things around, or dynamic scoping, etc. It’s just for interception of third-party code. Ironically, works because people misunderstand it, and thus generally leave it alone. That allows those who do understand it to work around less-than-ideal API’s. I understand a lot of people may not understand why it exists when they might not currently need it, but it’s fundamentally a solution to a specific problem which cannot really be solved in another way. A common misunderstanding usually arises when it is necessary to interact with third-party code and having to write callbacks which do not use the Odin calling convention. There is a general misunderstanding that because some procedure may not appear to use directly (or at least not obviously do so), people will say that they should be marked as or , however this misses the entire point. Because the default calling convention of Odin has this , you don’t actually know if the code needs it or not (which is by design). The first common example of this of this interaction complain usually happens when using Odin’s printing procedures in . For most people, they just do to define a and then continue, but I have had a lot of people “complain” as to why that is even necessary. Arguing that other than the , why would you need the ? This complaint is due to not understanding what et al actually do. is a wrapper around which takes in a generic . Since other libraries can utilize these procedures, it might be necessary to intercept/override/track this behaviour. And from that, there is little choice to but require the . A good APIs offers a way to specify the allocator to use, e.g. or . An API that doesn’t offer it can be worked around by overriding the for that call or series of calls, in the knowledge that the other programmer didn’t hardcode the allocator they gave to , et al. There are two allocators on the and . I expect most people to never use custom allocators whatsoever (which is empirically true), but I do also want to encourage things like using the because it allows for many useful benefits, especially those that most people don’t even realize are a thing. For many people, they usually just want to do nothing with the (assuming they know about it) or set the to the and be done; that’s pretty much it. You could argue that it is “better” to pass allocators 1 around explicitly, but from my own experience in C with this exact interface (made and used well before I even made Odin), I found that I got in a very very lazy habit of not actually passing around allocators properly. This overly explicitness with a generalized interface lead to more allocation bugs than if I had used specific allocators on a per-system basis. When explicit allocators are wanted, you rarely want the generic interface too, and usually a specific allocator instead e.g. . As I have previously expressed in my Memory Allocation Strategies series, an allocator can be used to represent a specific set of lifetimes for a set of allocations—arenas being the most common kind but other allocators such as pools, basic free lists, etc may be useful. However because most people will still default to a traditional / style of dynamic memory allocation, having a generic interface which can be overridden/intercepted/tracked is extremely useful to be able to do, especially in third-party libraries/code. n.b. Odin’s defaults to a heap-like allocator on most platforms, and defaults to a growing arena-like allocator. The field exists because you might honestly want a different way of asserting, with more information it like stack traces. You might even want to use it as a mechanism to do a rudimentary sort of exception handling (similar to Go’s & ). Having this overridable is extremely useful, again with third-party code. I understand it does default to something when it is not set, but that’s still kind of the point still. It does need to assert/panic, which means it cannot just do nothing. Logging is common throughout most applications and we wanted to provide a default approach. I expect most people to default to this as they want a simple unified logging experience. Most people don’t want their logs to be handled by different libraries in numerous different ways BY DEFAULT . But because it is on the , the default logging behaviour can now overridden easily. If you something more than this logger interface, then use what you want. The point as I keep trying to iterate is: what is the default and what will third-party libraries default to so that you can then intercept it if necessary? This is the newest addition to the ; part of the reason for this being here is probably less than obvious. Sometimes a third-party library will do (pseudo-)random number generation but controlling how it does that is very hard (if not impossible). Take C’s for example. If you know the library is using , you can at least set a seed with if you want a deterministic controlled output. However I have used libraries in C which use a different random number generator (thank goodness because is dreadful), but I had no way to overriding it without modifying the source code directly (which is not always possible if it’s a precompiled LIB/DLL). The counter is when you want a cryptographic grade random number generator and you want non-determinacy whatsoever. Having a random generator be on the allows for all of this kind of interception. n.b. Odin’s default is based on ChaCha8 is heavily optimized with SIMD. If you have used C, you’ve probably experienced having callbacks where there is way to pass a custom user data pointer as part of it. The API designer has assumed that the callback is “pure”. However in reality, this is rarely the case, so how do you actually pass a callback (which is immediately used, not delayed) that you can pass user data too? The most obvious example of this is , and even in the “pure” case, it is common to do a sort of a key based on an external table. There are two approaches that some people do in languages without closures (e.g. C and Odin) to get around these issues, but neither of which are great: global variables and/or thread local variables. Honestly, those are just dreadful solutions to the problem, and why has the and fields, to allow you to intercept this poorly thought out API. Now you might be asking: Why both a pointer and an index? Why not just a pointer? From my experience of programming over many years, it is very common I just want to pass the same value to many callbacks but I want to access a different “element” inside of the data passed. Instead of creating a wrapper struct which has both this pointer and the index, I wanted to solve it by just having both already. It’s an empirically derived solution, not anything from “first principles”. I do recommend that an API should be designed to minimize the need for callbacks in the first place, but when necessary to at least have a parameter for callbacks. For when people do not design good APIs, is there to get around the crap. This just exists for internal use within the core library, which no-one should never use for any reason . Most of the time, it exists just for temporary things which will be improved in the future, or for passing things down the stack in a bodgy way. Again, this is not for the programmer, it’s for the compiler/core library developers only. As I said in the and section, a lot of the impetus for making comes from the experience of using numerous C libraries. The GOOD C libraries that allow for a form of interception usually do it through a macro; at best, they only do and style things, and sometimes . However, those are not the norm and usually only written by people in a similar “sphere of influence” to myself. Sadly, the average C library doesn’t allow for this. Even so, with the GOOD C libraries, this macro approach fails across a LIB/DLL boundary. Which is part of the problem when interfacing with a foreign language (e.g. Odin). So even in the GOOD case for C, it’s not that good in practice. Now some library writers are REALLY GOOD and they provide things like an allocation interface, but I probably know all of these library writers personally at this point, so I’d be preaching to the choir with my complaints. I’ve honestly had a few people effectively tell me that if it’s an bad API then the user should put up with it—“It’s API is bad? Oh well, tough luck”. However, I’ve had a lot the same people then ask “but why does the language need to solve that? Isn’t it a library problem?”. I’m sorry but telling someone the API is at fault doesn’t help them in the slightest, and if a API/library cannot be easily modified, then how can that be fixed in code? It’s fundamentally only fixable at the language itself. People rarely write things perfect the first time—code evolves. That’s what engineering is all about. Requirements change. The people change. The problem changes entirely. Expecting not to be able to intercept third-party code is pie-in-the-sky thinking. As I’ve said numerous times before, Third-party just means “stuff not written by you”; that’s it. As I stress, it could even be your past self, which is not the same as your present self. Point out that shitty APIs exist is the entire point. Just saying “tough luck” doesn’t solve anything, you’re adding to the problem. This is why exists. One important aspect about the is that memory layout is not user modifiable, and this is another big design choice too. It allows for a consistent and well understood ABI, which means you can—you guessed it—intercept third-party code even across LIB/DLL boundaries. If the user was allowed to add as many custom fields to the as desired, it would not be ABI consistent, and thus not be stable for the use of its interception abilities across LIB/DLL boundaries. At best, allowing for custom fields is allowing you to minimize passing/typing parameters to procedures. Typing is rarely—if ever—the bottleneck in programming. Another common question I’ve gotten a few times is why the is passed as an implicit pointer argument to a procedure, and not something like a thread local variable stack? The rationale being that there would not need to be a calling convention difference for . Unfortunately through a lot of experimentation and thought, there are a few reasons why it is implemented the way it is: Odin’s also has copy-on-write semantics. This is done for two reasons: to keep things local, and prevent back-propagation of “bad” data from an third-party library (be it malicious or just buggy). So not having an easily accessible stack of values makes it harder for this back-propagation to happen. The main inspiration for the implicit system does come from Jonathan Blow’s language, however I believe the reasoning for its existence in Jonathan Blow’s language is very different. As I have never used Jon’s language, I am only going on what other people have told me and what I have seen from Jon’s initial streams. From what I can tell, Jon’s language’s behaves quite different to Odin’s, since it allows for the ability to add custom fields to it and to back-propagate. I am not sure what Jon’s initial rationale was for his form of , but I do not believe Jon was thinking of third-party code interception when he designed/implemented his . I hypothesize it was something closer to a form of “static dynamic-scoping” but not exactly (I know that’s an oxymoron of a statement). All I know is when I saw it, I saw a brilliant solution to third-party code interception problem. I hope this clarifies a lot of the design rationale behind the implicit system, and why it exists. If you have any more questions, or want me to clarify something further, please feel to contact me! If you want to disallow like defaults in Odin on a per-file basis, you can do so with the get tag .  ↩︎ Easier to manage across LIB/DLL boundaries that trying to use a single thread-local stack Easier management of recovery from crashes where the might be hard to figure out. Using the existing stack makes stack management easier already, you don’t need to have a separate allocator for that stack Some platforms do not thread-local variables (e.g. freestanding targets) Works better with async/fiber based things, which would then require a fiber-local stack instead of a thread-local one Prevent back-propagation, which would be trivial with a global/thread-local stack If you want to disallow like defaults in Odin on a per-file basis, you can do so with the get tag .  ↩︎

0 views
Ginger Bill 7 months ago

If Odin Had Macros

I sometimes get asked if Odin has any plans to add hygienic macros or some other similar construct. My general, and now (in)famous, answer is to many such questions is: No . I am not against macros nor metaprogramming in general, and in fact I do make metaprograms quite often (i.e. programs that make/analyse a program). However my approach with the design of Odin has been extremely pragmatic . I commonly ask people who ask for such things what are they specifically trying to solve? Usually they are not trying to do anything whatsoever and are just thinking about non-existent hypotheticals. However, in the cases when people are trying to do something that they believe they need a macro for, Odin usually has a different approach to doing it and it is usually much better and more suited for the specific need of the problem. This is the surprising thing about designing Odin, I was expecting I would need some form of compile-time metaprogramming at the language level, be it hygienic macros, compile-time execution, or even complete compile-time AST modification. But every problem I came across, I find a better solution that could be solved with a different language construct or idiom. My original hypothesis in general has been shown to be wrong. n.b. This is all hypothetical, and the construct is very unlikely to happen in Odin One of the best cases for a macro-like construct that I can think of which Odin does not support would be push-based iterators. Since Go 1.23, it now has a way of doing push-based iterators . I’ve written on this topic before ( Why People are Angry over Go 1.23 Iterators ) and how they work. The main issue I have with them, and which many other individuals have complaints with too, is that they effectively rely on 3-levels deep of closures. You have a function that returns a closure that in turn takes in a closure which is automatically and implicitly generated from the body of a loop. This is honestly not going to be the best construct for performance by any stretch because it is relying on so heavily on closures. For Odin since it has no concept of a closure, being that it is a language with manual-memory-management 1 , it would not be possible to add the same approach that Go does. However there is a way of achieving this in a similar way which requires no closures, and would produce very good code due to its imperative nature. Using the example from the previous Go article I wrote, consider the following pseudo-syntax: The internally expanded code in the backend would look something similar to this: This is of course a restricted form of a hyigenic macro only applicable to iterators. However this is only way to achieve such a construct in the language. However the way you’d write the macro is still extremely imperative in nature. The push-iterator approach allows you to store state/data in the control flow itself without the need for explicit state which a pull-iterator would require. A more common example would be the iteration over a custom hash map: This approach to iterators is very imperative and not very “composable” in the functional sense. You cannot chain multiple iterators together using this approach. I personally don’t have much need for composing iterators in practice and I usually just want the ability to iterate across a custom data structure and that’s it. I honestly don’t think the composability of iterators is an actual need most programmers have, but rather something that “seems cool” 2 to use. I don’t think I can think of a case when I’ve actually wanted to use reusable composable iterators either, and when I’ve had something near to that, I’ve just written the code in-line since it was always a one-off. It’s unlikely that I would ever add a construct like this to Odin. Not because I don’t think it isn’t useful (it obviously is) but rather it is a slippery slope construct. A lot of features and constructs that have been proposed to me over the years usually fall into this category. A slippery slope is rarely a fallacy in my opinion 3 when it comes to design and that if you allow it at one point, there isn’t much justification to stop it later on. Giving into one whim does lead to giving into another whim. In this case, I’d argue the slippery slope is the design for more hygienic macros throughout the language, not just to modify pre-existing control flow but to add new control flow, or add other constructs to the language. This is why I have to say no . To keep Odin the language that is loved by so many people, I have to hold the reins and steer it with a clear vision in the right direction. If I allow anyone’s and everyone’s desire to slip into the language, it will become worse than a design by committee language such as C++. The road to hell is not paved with good intentions but rather the lack of intention. So my current answer at the time of writing to this construct is this: No . I’d argue that actual closures which are unified everywhere as a single procedure type with non-capturing procedure values require some form of automatic-memory-management. That does not necessarily garbage collection nor ARC, but it could be something akin to RAII. This is all still automatic and against the philosophy of Odin.  ↩︎ Remember, you’re a bunch of programmers–you’re not cool.  ↩︎ I’d easily argue if it actually is slippery slope, it cannot be a fallacy.  ↩︎ I’d argue that actual closures which are unified everywhere as a single procedure type with non-capturing procedure values require some form of automatic-memory-management. That does not necessarily garbage collection nor ARC, but it could be something akin to RAII. This is all still automatic and against the philosophy of Odin.  ↩︎ Remember, you’re a bunch of programmers–you’re not cool.  ↩︎ I’d easily argue if it actually is slippery slope, it cannot be a fallacy.  ↩︎

0 views
Ginger Bill 10 months ago

Unstructured Thoughts on the Problems of OSS/FOSS

Originally from replies to a Twitter thread: https://x.com/TheGingerBill/status/1914389352416993395 This is not a structured argument against FOSS/OSS but my uncommon thoughts on the topic. I am not sure if I agree [that FOSS/OSS derives from the same thinking process as the ideology of communism], but I understand the sentiment. The fundamental issue is that software is trivially copyable. I have loads of issues with FOSS and OSS 1 . And part of this “ideology” (as presented in the original post) is naïvety coupled with only first-order thinking and a poor understanding of ownership . Software isn’t property in the normal sense of that word, so trying to apply the same rules to it is why it starts to break down; coupled with the first-order thinking that many FOSS advocates do, it leads to unsustainable practices and single sources of dependency failure. However the simplest argument (which is technically wrong) is due to what I already said: it’s trivially copyable, and thus not really “property” in any traditional sense of that word. Software at the end of the day is a very complicated and complex implementation of an algorithm. It is neither physical property nor intellectual property , the latter of which is closer to an honour system, rather than a property rights system even if has the term “property” in its phrase. So in a weird sense, it fits into a different category (or maybe a subcategory) because it’s not the algorithm/idea itself which is “precious” but rather the implementation of it—the software itself. The foundations of the assumptions of FOSS/OSS about the “right” to “redistribute” is blending a few different aspects together. The licence itself is an honour system applied to the “software”. But the question is, is that an applicable in the first place? There are a lot of oddities when it comes to copyright and trademark law, which are mostly done due to practicality rather than based on principles. A good example that recipes 2 cannot be copyrightable, but from a principled aspect there isn’t any reason why not. Recipes have been passed around all over the place by numerous people over the years, so the origins are hard to trace, and even harder to enforce. This is why many industries have “trade secrets”, to protect their place in industry. Letting people know your “secrets” means that they are “secrets” no more. Even legally (secret) recipes are classed as the same as “trade secrets”. You could argue that letting people have more knowledge is a “net benefit for society” but that is the first-order thinking I am talking about. The assumption that “the truth with set you free” is adding the assumption that everyone should know it in the first place. I am not making a pseudo- gnostic argument, but rather some secrets are best kept, well, secret. It also makes sense from a business sense too to not let your competitors know how you do things if you want some sort of technical advantage. But this is still first-order thinking. To go second-order (which is still not that deep), it also means that people tend to rely on those “ideas” rather than evolving and generating more. It means that people just rely on the free things rather than trying to come up with other approaches. To clarify further what I mean by first-order thinking, it’s thinking about the immediate results rather than the long-term more complex and complicated results which require thinking in higher-orders. A good analogy would be a Taylor series . In this analogy, first-order is a linear approximation, whilst second-order would be quadratic, etc. As you add more terms, you get a more accurate approximation of the actual function, but with this fitting approach, it still has numerous limitations and flaws. And in the case of thinking, more and more orders might not be easy (or even possible) to think about. Ideas have virtually no cost to them, or even negative cost, and as such, when something is free, people cannot rationally calculate a cost benefit analysis of such ideas. It’s assuming that a “marketplace of ideas” is possible, when a market requires a price mechanism to work. A rational marketplace requires a way to rationally calculate costs from prices (as costs are determined by prices, not the other way around). There is a reason the XKCD 2347 meme exists. People will rely on something just because it is free and “forget” (I am using that term loosely) that everything has a cost to it. And I do run an Open Source (not FOSS) project: the Odin Programming Language . If there was a possibility of actually selling a compiler nowadays, I would, but because the expected price of a compiler for a new language is now free, it makes that nigh impossible. You have to either rely on charity or companies that rely on the product to pay for support. I am grateful for the amount of bug reports, Pull Requests, and general usage of Odin. It is extremely surreal that I work with a company that uses my language for all of their products , and I get paid to do it. Some of my time is spent working on the Odin compiler and Odin core library, but a lot of it is actually just working on the products themselves. And that’s what I made Odin for in the first place: a language I could actually program to make things in—a means to an end; not an end into itself. There does seem to be a common feeling of guilt programmers have that they should give their knowledge to the world freely. But why are they feeling guilty? Is that a correctly placed emotion? Is that even valid? And why should you give your knowledge and wisdom away for free? Just because you got it for free? I could also add, but I am not going to make this argument in general, that FOSS is specifically “forcing charity” on others, which the act itself is not virtuous but vicious. This is why I assume the original poster is kind of saying this it similar to the “ideology of communism”, if I am guessing. He is viewing that as the “forced charity” aspect of the “ideology”. It is also a very specific conception of charity , the view that charity is free-as-in-beer rather than a virtue of the friendship of man (or in the traditional theological conception, “the friendship of man for God”). A lot of charity is “free” but a lot is not. You can still be beneficial to others whilst making a profit. There is nothing intrinsically wrong about making a profit in-itself 3 . I’d trivially argue that people who release their source code when you pay for it when they don’t have to are still being charitable. Yes it does have a monetary cost, but that does not mean it isn’t a form of charity. But OSS/FOSS as a practice encourages, not forces, by telling people ought to work for free and give the services for free. To be clear, my position on this topic is far from a common one, and I don’t think it’s a “ strawman ” by any stretch of the imagination. There are many “definitions” of what OSS and FOSS are, but I’d argue most are idealistic forms which are on the whole (except in certain instances) do not bring forth the ideals that they set out. To use a common cybernetics phrase ( POSIWID ): The purpose of a system is what it does , and there is not point claiming that the purpose of a system is to do what it constantly fails to do. “Of course” you can hypothetically make money from OSS/FOSS but that doesn’t mean it is either possible nor sustainable or even desirable. And I know this from first-hand experience. I am always grateful for all of the donations received for the Odin programming language through people’s act of charity, and all of that money does go towards the development of Odin. However, it’s not a sustainable nor maintainable model—and I’d argue has never been nor could ever be. And for every purported exception to this rule, I’d try to argue none of them are because of the OSS/FOSS model and purely to the individual programmer/developer. The OSS/FOSS dream is that, a dream that cannot live up to its “ideals”. For every hypothetical benefit it has, it should be stated that it is a hypothesis and not a theory—theories require evidence to be classed as such. Most of the benefits and freedoms of OSS/FOSS are doubled-edge swords which are also huge attack vectors. Vectors for security, sustainability, maintainability, and redistribution. Most of the industry is based on blind-faith without anyway to verify that blind-trust. Regardless of whether I am correctly or incorrectly “defining” OSS/FOSS to your preferred “definition”, the multi-order effects are rarely considered. And to bastardize Lysander Spooner : this much is certain—that it has either authorized such a tech oligarchy as we have had, or has been powerless to prevent it. In either case, it is unfit to exist 4 . All of these lists of ideals of essential freedoms—and I’d argue they are not principles in the slightest—have not aided in anything in the long run. To use the GNU’s list for example: As to my previous statement, none of these are principles . “Freedom 0” the foundation for the rest isn’t even foundational. It pretty much relies on time preference between using pre-made software and home-made software. Software could be a service, but it’s also, again, an implementation of an algorithm/idea. Of course I know these “ideals” only apply to some FOSS advocates, and not necessarily any OSS advocates, but it’s still the general perception of it. To conclude from this very unstructured brain-dump, I can understand the original sentiment that a lot of the mentality of advocating for OSS/FOSS is from a similar standpoint of the “ideology of communism”, but I do not conceptualize it that way. I don’t think OSS nor FOSS has been a good thing for software, and probably a huge acceleration towards why software has been getting worse over the years. I am making a distinction between OSS (Open Source Software) and FOSS (Free [and] Open Source Software) in this article but most of my critiques are related to both.  ↩︎ I’d argue patents are fundamentally in the same-ilk as “recipes”, and software-patents even more so. I personally don’t like patents that much except in a few cases, but I really do dislike software-patents with a passion. However that’s a different topic for a different day.  ↩︎ Unless you are of course a communist which views that it is, but that’s a different discussion for a different day.  ↩︎ The original quote: “But whether the Constitution really be one thing, or another, this much is certain—that it has either authorized such a government as we have had, or has been powerless to prevent it. In either case it is unfit to exist.” — Lysander Spooner, No Treason: The Constitution of No Authority.  ↩︎ Assuming you can even run it in the first place. This statement is also kind of vague but I won’t go into it too much. This is actually two “freedoms” combined together. The first is access to source code, and the second is that “secrets” should not exist (i.e. secret sauce/source). And why should that be free-as-in-beer in practice ? I understand the “ideal” here is not suggesting it ought to be free-as-in-beer but that is end-result. And I’d argue the vast majority of FOSS advocates would say paying for source (i.e. Source Available) is not Open Source . Why?! Do we allow this for forms of intellectual property? If software is intellectual property, why is it different? I know I’ve made the argument it is in a separate category, but if it is actually a form of intellectual property, then our legal systems do not and should not allow for this. This “freedom” is probably the most egregious with respect to the “ideology of communism” proclamation. This viral nature of licences like GPL are a fundamentally pernicious aspect of the FOSS (not necessarily OSS) movement. It’s this idea of “forcing” charity on others. Of course you are “free” to not use the software, but if there are virtually no other options, you either have to write the thing yourself, or have the potential to have virtually no business model. This point is also an extension of (freedom 2) and as such, not helping the case. I am making a distinction between OSS (Open Source Software) and FOSS (Free [and] Open Source Software) in this article but most of my critiques are related to both.  ↩︎ I’d argue patents are fundamentally in the same-ilk as “recipes”, and software-patents even more so. I personally don’t like patents that much except in a few cases, but I really do dislike software-patents with a passion. However that’s a different topic for a different day.  ↩︎ Unless you are of course a communist which views that it is, but that’s a different discussion for a different day.  ↩︎ The original quote: “But whether the Constitution really be one thing, or another, this much is certain—that it has either authorized such a government as we have had, or has been powerless to prevent it. In either case it is unfit to exist.” — Lysander Spooner, No Treason: The Constitution of No Authority.  ↩︎

0 views
//pauls dev blog 11 months ago

5+2 Phenomenal FREE Platforms To Learn Coding

Some time ago, I was asked by fellow colleagues that were new how they could learn to code without following the "old path" by visiting a University or making an apprenticeship. If you are also new to the world of coding, it makes sense to start by teaching yourself using one of the many free resources found online. By taking advantage of these free resources, you can learn what you want and not waste money upfront. Once you’ve gone through enough free coding platforms, you will know what you want and can specialize in that field of coding. One of the best websites to improve your coding is freeCodeCamp. It contains numerous exercises based on different topics and languages for practice. Also, this website provides a means to get certified based on your skills by taking their test. I would highly suggest having a look at  their website here  and starting to learn. Exercism is a platform that exists to help as many people as possible to attain fluency in  ANY  programming language. It provides many concept exercises to improve your coding skills. The best thing about this website is that it publishes all information and all tutorials  for free . Also, you are able to keep track of your progress. Additionally, you can opt in as a mentor and share your knowledge with other people. Exercism contains tutorials/exercises for 55 different languages that you want to master:  Python ,  JavaScript ,  TypeScript ,  Go , and  many more . Here is a link to the Exercism website . The Odin Project is a platform where everyone can learn coding in  Ruby on Rails  and  JavaScript.  The platform is designed for people who could not attend an intensive coding school or had access to good computer science education. The Odin Project follows these beliefs: I personally think that this project is mandatory for every person who wants to learn any of the two programming languages. Here is a link to the website Codecademy was started with the goal to give anyone in the world the ability to learn the skills they need to succeed in the 21st century. To achieve this goal, Codecademy provides ~15 courses in different programming languages. Many of them are free, but some are only included within the Pro version that costs 18$ a month. At the moment, many free courses are available within the catalog. You can  start a quiz here  to find out which course is best suited for you. To be clear, this isn’t a coding platform itself, but it’s a great resource for community-curated programming courses. You can simply search for the programming language you want to learn, and you’ll get a list of the best courses, tutorials, and books that are recommended by coders and available online. Here is a link to the website WarriorJS is a learning platform for JavaScript that teaches you JavaScript while you are playing a game. This game is designed for new or advanced JavaScript programmers and will put your skills to the test! Here is a link to the website Elevator Saga is a game where you have to use JavaScript to transport people with an elevator in an efficient manner. While progressing through the different stages you have to complete even more difficult challenges. Only efficient programs will be able to complete all challenges. Here is a link to the website In this article, I showed five cool coding platforms that you can use to start coding. Taking advantage of any of the free coding resources out there is definitely the way to go when you are just starting. If you want a Gamification approach to learning to code, you should check out this article: Also, if you want to have a full path to follow, you can follow my described guide to become a Full Stack developer: If JavaScript is not your thing and you maybe want to learn Python, I have gathered some resources (videos, books, and websites) and listed them in this article: I hope you find any of these coding platforms helpful and find a suitable platform to start your learning. I would love to hear your ideas and thoughts. If you can provide another free coding platform, don’t hesitate to comment here. Also, if you have any questions, please jot them down below. I try to answer them if possible. Feel free to connect with me on  Medium ,  LinkedIn ,  Twitter ,  BlueSky , and  GitHub . Education should be free and accessible You learn best by actually building Motivation is fueled by working with others Open source is best

0 views
Ahead of AI 1 years ago

LLM Research Papers: The 2024 List

It’s been a very eventful and exciting year in AI research. This is especially true if you are interested in LLMs. I had big plans for this December edition and was planning to publish a new article with a discussion of all my research highlights from 2024. I still plan to do so, but due to an accident and serious injury, I am currently unable to work at a computer and finish the draft. But I hope to recover in the upcoming weeks and be back on my feet soon. In the meantime, I want to share my running bookmark list of many fascinating (mostly LLM-related) papers I stumbled upon in 2024. It’s just a list, but maybe it will come in handy for those who are interested in finding some gems to read for the holidays. And if you are interested in more code-heavy reading and tinkering, My Build A Large Language Model (From Scratch) book is out on Amazon as of last month. In addition, I added a lot of bonus materials to the GitHub repository . Bonus materials in the GitHub repository (stars highlight my personal favorites) Thanks for your understanding and support, and I hope to make a full recovery soon and be back with the Research Highlights 2024 article in a few weeks! Ahead of AI is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. 1 Jan, Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models , https://arxiv.org/abs/2401.00788 2 Jan, A Comprehensive Study of Knowledge Editing for Large Language Models , https://arxiv.org/abs/2401.01286 2 Jan, LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning , https://arxiv.org/abs/2401.01325 2 Jan, Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models , https://arxiv.org/abs/2401.01335 2 Jan, LLaMA Beyond English: An Empirical Study on Language Capability Transfer , https://arxiv.org/abs/2401.01055 3 Jan, A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity , https://arxiv.org/abs/2401.01967 4 Jan, LLaMA Pro: Progressive LLaMA with Block Expansion , https://arxiv.org/abs/2401.02415 4 Jan, LLM Augmented LLMs: Expanding Capabilities through Composition , https://arxiv.org/abs/2401.02412 4 Jan, Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM , https://arxiv.org/abs/2401.02994 5 Jan, DeepSeek LLM: Scaling Open-Source Language Models with Longtermism , https://arxiv.org/abs/2401.02954 5 Jan, Denoising Vision Transformers , https://arxiv.org/abs/2401.02957 7 Jan, Soaring from 4K to 400K: Extending LLM’s Context with Activation Beacon , https://arxiv.org/abs/2401.03462 8 Jan, Mixtral of Experts , https://arxiv.org/abs/2401.04088 8 Jan, MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts , https://arxiv.org/abs/2401.04081 8 Jan, A Minimaximalist Approach to Reinforcement Learning from Human Feedback , https://arxiv.org/abs/2401.04056 9 Jan, RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation , https://arxiv.org/abs/2401.04679 10 Jan, Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training , https://arxiv.org/abs/2401.05566 11 Jan, Transformers are Multi-State RNNs , https://arxiv.org/abs/2401.06104 11 Jan, A Closer Look at AUROC and AUPRC under Class Imbalance , https://arxiv.org/abs/2401.06091 12 Jan, An Experimental Design Framework for Label-Efficient Supervised Finetuning of Large Language Models , https://arxiv.org/abs/2401.06692 16 Jan, Tuning Language Models by Proxy , https://arxiv.org/abs/2401.08565 16 Jan, Scalable Pre-training of Large Autoregressive Image Models , https://arxiv.org/abs/2401.08541 16 Jan, Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering , https://arxiv.org/abs/2401.08500 16 Jan, RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture , https://arxiv.org/abs/2401.08406 17 Jan, ReFT: Reasoning with Reinforced Fine-Tuning , https://arxiv.org/abs/2401.08967 18 Jan, DiffusionGPT: LLM-Driven Text-to-Image Generation System , https://arxiv.org/abs/2401.10061 18 Jan, Self-Rewarding Language Models , https://arxiv.org/abs/2401.10020 18 Jan, VMamba: Visual State Space Model , https://arxiv.org/abs/2401.10166 19 Jan, Knowledge Fusion of Large Language Models , https://arxiv.org/abs/2401.10491 22 Jan, SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities , https://arxiv.org/abs/2401.12168 22 Jan, WARM: On the Benefits of Weight Averaged Reward Models , https://arxiv.org/abs/2401.12187 22 Jan, Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text , https://arxiv.org/abs/2401.12070 24 Jan, MambaByte: Token-free Selective State Space Model , https://arxiv.org/abs/2401.13660 24 Jan, SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced Token Detection , https://arxiv.org/abs/2401.13160 25 Jan, Rethinking Patch Dependence for Masked Autoencoders , https://arxiv.org/abs/2401.14391 25 Jan, Pix2gestalt: Amodal Segmentation by Synthesizing Wholes , https://arxiv.org/abs/2401.14398 25 Jan, Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities , https://arxiv.org/abs/2401.14405 26 Jan, EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty , https://arxiv.org/abs/2401.15077 29 Jan, MoE-LLaVA: Mixture of Experts for Large Vision-Language Models , https://arxiv.org/abs/2401.15947 29 Jan, Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling , https://arxiv.org/abs/2401.16380 31 Jan, KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization , https://arxiv.org/abs/2401.18079 1 Feb, Efficient Exploration for LLMs , https://arxiv.org/abs/2402.00396 1 Feb, OLMo: Accelerating the Science of Language Models , https://arxiv.org/abs/2402.00838 1 Feb, Tiny Titans: Can Smaller Large Language Models Punch Above Their Weight in the Real World for Meeting Summarization? , https://arxiv.org/abs/2402.00841 1 Feb, Repeat After Me: Transformers are Better than State Space Models at Copying , https://arxiv.org/abs/2402.01032 2 Feb, LiPO: Listwise Preference Optimization through Learning-to-Rank , https://arxiv.org/abs/2402.01878 2 Feb, FindingEmo: An Image Dataset for Emotion Recognition in the Wild , https://arxiv.org/abs/2402.01355 3 Feb, More Agents Is All You Need , https://arxiv.org/abs/2402.05120 5 Feb, DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models , https://arxiv.org/abs/2402.03300 6 Feb, MobileVLM V2: Faster and Stronger Baseline for Vision Language Model , https://arxiv.org/abs/2402.03766 6 Feb, A Phase Transition Between Positional and Semantic Learning in a Solvable Model of Dot-Product Attention , https://arxiv.org/abs/2402.03902 6 Feb, Scaling Laws for Downstream Task Performance of Large Language Models , https://arxiv.org/abs/2402.04177 6 Feb, MOMENT: A Family of Open Time-series Foundation Models , https://arxiv.org/abs/2402.03885 6 Feb, Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models , https://arxiv.org/abs/2402.03749 6 Feb, Self-Discover: Large Language Models Self-Compose Reasoning Structures , https://arxiv.org/abs/2402.03620 7 Feb, Grandmaster-Level Chess Without Search , https://arxiv.org/abs/2402.04494 7 Feb, Direct Language Model Alignment from Online AI Feedback , https://arxiv.org/abs/2402.04792 8 Feb, Buffer Overflow in Mixture of Experts , https://arxiv.org/abs/2402.05526 9 Feb, The Boundary of Neural Network Trainability is Fractal , https://arxiv.org/abs/2402.06184 11 Feb, ODIN: Disentangled Reward Mitigates Hacking in RLHF , https://arxiv.org/abs/2402.07319 12 Feb, Policy Improvement using Language Feedback Models , https://arxiv.org/abs/2402.07876 12 Feb, Scaling Laws for Fine-Grained Mixture of Experts , https://arxiv.org/abs/2402.07871 12 Feb, Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model , https://arxiv.org/abs/2402.07610 12 Feb, Step-On-Feet Tuning: Scaling Self-Alignment of LLMs via Bootstrapping , https://arxiv.org/abs/2402.07610 12 Feb, Suppressing Pink Elephants with Direct Principle Feedback , https://arxiv.org/abs/2402.07896 13 Feb, World Model on Million-Length Video And Language With RingAttention , https://arxiv.org/abs/2402.08268 13 Feb, Mixtures of Experts Unlock Parameter Scaling for Deep RL , https://arxiv.org/abs/2402.08609 14 Feb, DoRA: Weight-Decomposed Low-Rank Adaptation , https://arxiv.org/abs/2402.09353 14 Feb, Transformers Can Achieve Length Generalization But Not Robustly , https://arxiv.org/abs/2402.09371 15 Feb, BASE TTS: Lessons From Building a Billion-Parameter Text-to-Speech Model on 100K Hours of Data , https://arxiv.org/abs/2402.08093 15 Feb, Recovering the Pre-Fine-Tuning Weights of Generative Models , https://arxiv.org/abs/2402.10208 15 Feb, Generative Representational Instruction Tuning , https://arxiv.org/abs/2402.09906 16 Feb, FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models , https://arxiv.org/abs/2402.10986 17 Feb, OneBit: Towards Extremely Low-bit Large Language Models , https://arxiv.org/abs/2402.11295 18 Feb, LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration , https://arxiv.org/abs/2402.11550 19 Feb, Reformatted Alignment , https://arxiv.org/abs/2402.12219 19 Feb, AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling , https://arxiv.org/abs/2402.12226 19 Feb, Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs , https://arxiv.org/abs/2402.12030 19 Feb, LoRA+: Efficient Low Rank Adaptation of Large Models , https://arxiv.org/abs/2402.12354 20 Feb, Neural Network Diffusion , https://arxiv.org/abs/2402.13144 21 Feb, YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information , https://arxiv.org/abs/2402.13616 21 Feb, LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens , https://arxiv.org/abs/2402.13753 21 Feb, Large Language Models for Data Annotation: A Survey , https://arxiv.org/abs/2402.13446 22 Feb, TinyLLaVA: A Framework of Small-scale Large Multimodal Models , https://arxiv.org/abs/2402.14289 22 Feb, Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs , https://arxiv.org/abs/2402.14740 23 Feb, Genie: Generative Interactive Environments , https://arxiv.org/abs/2402.15391 26 Feb, CARTE: Pretraining and Transfer for Tabular Learning , https://arxiv.org/abs/2402.16785 27 Feb, The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits , https://arxiv.org/abs/2402.17764 27 Feb, Sora Generates Videos with Stunning Geometrical Consistency , https://arxiv.org/abs/2402.17403 27 Feb, When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method , https://arxiv.org/abs/2402.17193 29 Feb, Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models , https://arxiv.org/abs/2402.19427 1 Mar, Learning and Leveraging World Models in Visual Representation Learning , https://arxiv.org/abs/2403.00504 3 Mar, Improving LLM Code Generation with Grammar Augmentation , https://arxiv.org/abs/2403.01632 3 Mar, The Hidden Attention of Mamba Models , https://arxiv.org/abs/2403.01590 4 Mar, Training-Free Pretrained Model Merging , https://arxiv.org/abs/2403.01753 4 Mar, Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures , https://arxiv.org/abs/2403.02308 5 Mar, The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning , https://arxiv.org/abs/2403.03218 5 Mar, Evolution Transformer: In-Context Evolutionary Optimization , https://arxiv.org/abs/2403.02985 5 Mar, Enhancing Vision-Language Pre-training with Rich Supervisions , https://arxiv.org/abs/2403.03346 5 Mar, Scaling Rectified Flow Transformers for High-Resolution Image Synthesis , https://arxiv.org/abs/2403.03206 5 Mar, Design2Code: How Far Are We From Automating Front-End Engineering? , https://arxiv.org/abs/2403.03163 6 Mar, ShortGPT: Layers in Large Language Models are More Redundant Than You Expect , https://arxiv.org/abs/2403.03853 6 Mar, Backtracing: Retrieving the Cause of the Query , https://arxiv.org/abs/2403.03956 6 Mar, Learning to Decode Collaboratively with Multiple Language Models , https://arxiv.org/abs/2403.03870 6 Mar, SaulLM-7B: A pioneering Large Language Model for Law , https://arxiv.org/abs/2403.03883 6 Mar, Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious Challenges in Multimodal Reasoning , https://arxiv.org/abs/2403.03864 6 Mar, 3D Diffusion Policy , https://arxiv.org/abs/2403.03954 6 Mar, MedMamba: Vision Mamba for Medical Image Classification , https://arxiv.org/abs/2403.03849 6 Mar, GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection , https://arxiv.org/abs/2403.03507 6 Mar, Stop Regressing: Training Value Functions via Classification for Scalable Deep RL , https://arxiv.org/abs/2403.03950 7 Mar, How Far Are We from Intelligent Visual Deductive Reasoning? , https://arxiv.org/abs/2403.04732 7 Mar, Common 7B Language Models Already Possess Strong Math Capabilities , https://arxiv.org/abs/2403.04706 8 Mar, Gemini 1.5: Unlocking Multimodal Understanding Across Millions of Tokens of Context , https://arxiv.org/abs/2403.05530 8 Mar, Is Cosine-Similarity of Embeddings Really About Similarity? , https://arxiv.org/abs/2403.05440 8 Mar, LLM4Decompile: Decompiling Binary Code with Large Language Models , https://arxiv.org/abs/2403.05286 9 Mar, Algorithmic Progress in Language Models , https://arxiv.org/abs/2403.05812 11 Mar, Stealing Part of a Production Language Model , https://arxiv.org/abs/2403.06634 12 Mar, Chronos: Learning the Language of Time Series , https://arxiv.org/abs/2403.07815 13 Mar, Simple and Scalable Strategies to Continually Pre-train Large Language Models , https://arxiv.org/abs/2403.08763 13 Mar, Language Models Scale Reliably With Over-Training and on Downstream Tasks , https://arxiv.org/abs/2403.08540 14 Mar, BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences , https://arxiv.org/abs/2403.09347 14 Mar, LocalMamba: Visual State Space Model with Windowed Selective Scan , https://arxiv.org/abs/2403.09338 14 Mar, GiT: Towards Generalist Vision Transformer through Universal Language Interface , https://arxiv.org/abs/2403.09394 14 Mar, MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training , https://arxiv.org/abs/2403.09611 15 Mar, RAFT: Adapting Language Model to Domain Specific RAG , https://arxiv.org/abs/2403.10131 18 Mar, TnT-LLM: Text Mining at Scale with Large Language Models , https://arxiv.org/abs/2403.12173 18 Mar, Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression , https://arxiv.org/abs/2403.15447 19 Mar, PERL: Parameter Efficient Reinforcement Learning from Human Feedback , https://arxiv.org/abs/2403.10704 20 Mar, RewardBench: Evaluating Reward Models for Language Modeling , https://arxiv.org/abs/2403.13787 20 Mar, LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models , https://arxiv.org/abs/2403.13372 21 Mar, RakutenAI-7B: Extending Large Language Models for Japanese , https://arxiv.org/abs/2403.15484 22 Mar, SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time Series , https://arxiv.org/abs/2403.15360 22 Mar, Can Large Language Models Explore In-Context? , https://arxiv.org/abs/2403.15371 22 Mar, LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement , https://arxiv.org/abs/2403.15042 25 Mar, LLM Agent Operating System , https://arxiv.org/abs/2403.16971 26 Mar, The Unreasonable Ineffectiveness of the Deeper Layers , https://arxiv.org/abs/2403.17887 27 Mar, BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text , https://arxiv.org/abs/2403.18421 27 Mar, ViTAR: Vision Transformer with Any Resolution , https://arxiv.org/abs/2403.18361 27 Mar, Long-form Factuality in Large Language Models , https://arxiv.org/abs/2403.18802 27 Mar, Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models , https://arxiv.org/abs/2403.18814 26 Mar, LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning , https://arxiv.org/abs/2403.17919 26 Mar, Mechanistic Design and Scaling of Hybrid Architectures , https://arxiv.org/abs/2403.17844 28 Mar, MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions , https://arxiv.org/abs/2403.19651 28 Mar, Model Stock: All We Need Is Just a Few Fine-Tuned Models , https://arxiv.org/abs/2403.19522 1 Apr, Do Language Models Plan Ahead for Future Tokens? , https://arxiv.org/abs/2404.00859 1 Apr, Bigger is not Always Better: Scaling Properties of Latent Diffusion Models , https://arxiv.org/abs/2404.01367 1 Apr, The Fine Line: Navigating Large Language Model Pretraining with Down-streaming Capability Analysis , https://arxiv.org/abs/2404.01204 1 Apr, Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models , https://arxiv.org/abs/2404.04478 2 Apr, Mixture-of-Depths: Dynamically Allocating Compute in Transformer-Based Language Models , https://arxiv.org/abs/2404.02258 2 Apr, Long-context LLMs Struggle with Long In-context Learning , https://arxiv.org/abs/2404.02060 2 Apr, Emergent Abilities in Reduced-Scale Generative Language Models , https://arxiv.org/abs/2404.02204 2 Apr, Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks , https://arxiv.org/abs/2404.02151 3 Apr, On the Scalability of Diffusion-based Text-to-Image Generation , https://arxiv.org/abs/2404.02883 3 Apr, BAdam: A Memory Efficient Full Parameter Training Method for Large Language Models , https://arxiv.org/abs/2404.02827 3 Apr, Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models , https://arxiv.org/abs/2404.02747 4 Apr, Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences , https://arxiv.org/abs/2404.02151 4 Apr, Training LLMs over Neurally Compressed Text , https://arxiv.org/abs/2404.03626 4 Apr, CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues , https://arxiv.org/abs/2404.03820 5 Apr, ReFT: Representation Finetuning for Language Models , https://arxiv.org/abs/2404.03592 5 Apr, Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data , https://arxiv.org/abs/2404.03862 5 Apr, Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation , https://arxiv.org/abs/2404.04256 8 Apr, AutoCodeRover: Autonomous Program Improvement , https://arxiv.org/abs/2404.05427 8 Apr, Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence , https://arxiv.org/abs/2404.05892 8 Apr, CodecLM: Aligning Language Models with Tailored Synthetic Data , https://arxiv.org/abs/2404.05875 9 Apr, MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies , https://arxiv.org/abs/2404.06395 9 Apr, Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models , https://arxiv.org/abs/2404.06209 9 Apr, LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders , https://arxiv.org/abs/2404.05961 10 Apr, Adapting LLaMA Decoder to Vision Transformer , https://arxiv.org/abs/2404.06773 10 Apr, Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention , https://arxiv.org/abs/2404.07143 11 Apr, LLoCO: Learning Long Contexts Offline , https://arxiv.org/abs/2404.07979 11 Apr, JetMoE: Reaching Llama2 Performance with 0.1M Dollars , https://arxiv.org/abs/2404.07413 11 Apr, Best Practices and Lessons Learned on Synthetic Data for Language Models , https://arxiv.org/abs/2404.07503 11 Apr, Rho-1: Not All Tokens Are What You Need , https://arxiv.org/abs/2404.07965 12 Apr, Pre-training Small Base LMs with Fewer Tokens , https://arxiv.org/abs/2404.08634 12 Apr, Dataset Reset Policy Optimization for RLHF , https://arxiv.org/abs/2404.08495 13 Apr, LLM In-Context Recall is Prompt Dependent , https://arxiv.org/abs/2404.08865 15 Apr, State Space Model for New-Generation Network Alternative to Transformers: A Survey , https://arxiv.org/abs/2404.09516 15 Apr, Chinchilla Scaling: A Replication Attempt , https://arxiv.org/abs/2404.10102 15 Apr, Learn Your Reference Model for Real Good Alignment , https://arxiv.org/abs/2404.09656 16 Apr, Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study , https://arxiv.org/abs/2404.10719 16 Apr, Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies , https://arxiv.org/abs/2404.08197 16 Apr, How Faithful Are RAG Models? Quantifying the Tug-of-War Between RAG and LLMs' Internal Prior , https://arxiv.org/abs/2404.10198 17 Apr, A Survey on Retrieval-Augmented Text Generation for Large Language Models , https://arxiv.org/abs/2404.10981 18 Apr, When LLMs are Unfit Use FastFit: Fast and Effective Text Classification with Many Classes , https://arxiv.org/abs/2404.12365 18 Apr, Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing , https://arxiv.org/abs/2404.12253 18 Apr, OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data , https://arxiv.org/abs/2404.12195 19 Apr, The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions , https://arxiv.org/abs/2404.13208 22 Apr, How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study , https://arxiv.org/abs/2404.14047 22 Apr, Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone , https://arxiv.org/abs/2404.14219 22 Apr, OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework , https://arxiv.org/abs/2404.14619 22 Apr, A Survey on Self-Evolution of Large Language Models , https://arxiv.org/abs/2404.14662 23 Apr, Multi-Head Mixture-of-Experts , https://arxiv.org/abs/2404.15045 23 Apr, NExT: Teaching Large Language Models to Reason about Code Execution , https://arxiv.org/abs/2404.14662 23 Apr, Graph Machine Learning in the Era of Large Language Models (LLMs) , https://arxiv.org/abs/2404.14928 24 Apr, Retrieval Head Mechanistically Explains Long-Context Factuality , https://arxiv.org/abs/2404.15574 25 Apr, Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding , https://arxiv.org/abs/2404.16710 25 Apr, Make Your LLM Fully Utilize the Context , https://arxiv.org/abs/2404.16811 28 Apr, LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report , https://arxiv.org/abs/2405.00732 30 Apr, Better & Faster Large Language Models via Multi-token Prediction , https://arxiv.org/abs/2404.19737 30 Apr, RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing , https://arxiv.org/abs/2404.19543 30 Apr, A Primer on the Inner Workings of Transformer-based Language Models , https://arxiv.org/abs/2405.00208 30 Apr, When to Retrieve: Teaching LLMs to Utilize Information Retrieval Effectively , https://arxiv.org/abs/2404.19705 30 Apr, KAN: Kolmogorov–Arnold Networks , https://arxiv.org/abs/2404.19756 1 May, Is Bigger Edit Batch Size Always Better? An Empirical Study on Model Editing with Llama-3 , https://arxiv.org/abs/2405.00664 1 May, Self-Play Preference Optimization for Language Model Alignment , https://arxiv.org/abs/2405.00675 1 May, A Careful Examination of Large Language Model Performance on Grade School Arithmetic , https://arxiv.org/abs/2405.00332 2 May, Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models , https://arxiv.org/abs/2405.01535 3 May, What Matters When Building Vision-Language Models? , https://arxiv.org/abs/2405.02246 5 May, Is Flash Attention Stable? , https://arxiv.org/abs/2405.02803 7 May, vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention , https://arxiv.org/abs/2405.04437 7 May, xLSTM: Extended Long Short-Term Memory , https://arxiv.org/abs/2405.04517 8 May, You Only Cache Once: Decoder-Decoder Architectures for Language Models , https://arxiv.org/abs/2405.05254 8 May, DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model , https://arxiv.org/abs/2405.04434 8 May, Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models , https://arxiv.org/abs/2405.05417 9 May, Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? , https://arxiv.org/abs/2405.05904 10 May, Value Augmented Sampling for Language Model Alignment and Personalization , https://arxiv.org/abs/2405.06639 12 May, PHUDGE: Phi-3 as Scalable Judge , https://arxiv.org/abs/2405.08029 13 May, RLHF Workflow: From Reward Modeling to Online RLHF , https://arxiv.org/abs/2405.07863 15 May, LoRA Learns Less and Forgets Less , https://arxiv.org/abs/2405.09673 15 May, Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model , https://arxiv.org/abs/2405.09215 16 May, Chameleon: Mixed-Modal Early-Fusion Foundation Models , https://arxiv.org/abs/2405.09818 17 May, Towards Modular LLMs by Building and Reusing a Library of LoRAs , https://arxiv.org/abs/2405.11157 19 May, SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization , https://arxiv.org/abs/2405.11582 20 May, MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning , https://arxiv.org/abs/2405.12130 22 May, Attention as an RNN , https://arxiv.org/abs/2405.13956 22 May, Dense Connector for MLLMs , https://arxiv.org/abs/2405.13800 23 May, AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability , https://arxiv.org/abs/2405.14129 23 May, SimPO: Simple Preference Optimization with a Reference-Free Reward , https://arxiv.org/abs/2405.14734 23 May, Instruction Tuning With Loss Over Instructions , https://arxiv.org/abs/2405.14394 24 May, The Road Less Scheduled , https://arxiv.org/abs/2405.15682 26 May, Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training , https://arxiv.org/abs/2405.15319 26 May, gzip Predicts Data-dependent Scaling Laws , https://arxiv.org/abs/2405.16684 27 May, Trans-LoRA: Towards Data-free Transferable Parameter Efficient Finetuning , https://arxiv.org/abs/2405.17258 28 May, VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections , https://arxiv.org/abs/2405.17991 28 May, LLaMA-NAS: Efficient Neural Architecture Search for Large Language Models , https://arxiv.org/abs/2405.18377 29 May, Contextual Position Encoding: Learning to Count What's Important , https://arxiv.org/abs/2405.18719 2 Jun, Show, Don't Tell: Aligning Language Models with Demonstrated Feedback , https://arxiv.org/abs/2406.00888 3 Jun, Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models , https://arxiv.org/abs/2406.06563 3 Jun, OLoRA: Orthonormal Low-Rank Adaptation of Large Language Models , https://arxiv.org/abs/2406.01775 3 Jun, The Geometry of Categorical and Hierarchical Concepts in Large Language Models , https://arxiv.org/abs/2406.01506 3 Jun, Towards Scalable Automated Alignment of LLMs: A Survey , https://arxiv.org/abs/2406.01252 4 Jun, Scalable MatMul-free Language Modeling , https://arxiv.org/abs/2406.02528 4 Jun, Block Transformer: Global-to-Local Language Modeling for Fast Inference , https://arxiv.org/abs/2406.02657 6 Jun, Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models , https://arxiv.org/abs/2406.04271 6 Jun, The Prompt Report: A Systematic Survey of Prompting Techniques , https://arxiv.org/abs/2406.06608 6 Jun, Transformers Need Glasses! Information Over-Squashing in Language Tasks , https://arxiv.org/abs/2406.04267 6 Jun, Are We Done with MMLU? , https://arxiv.org/abs/2406.04127 6 Jun, Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step , https://arxiv.org/abs/2406.04314 7 Jun, Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach , https://arxiv.org/abs/2406.04594 7 Jun, CRAG -- Comprehensive RAG Benchmark , https://arxiv.org/abs/2406.04744 7 Jun, WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild , https://arxiv.org/abs/2406.04770 7 Jun, Mixture-of-Agents Enhances Large Language Model Capabilities , https://arxiv.org/abs/2406.04692 7 Jun, BERTs are Generative In-Context Learners , https://arxiv.org/abs/2406.04823 7 Jun, 3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination , https://arxiv.org/abs/2406.05132 8 Jun, Creativity Has Left the Chat: The Price of Debiasing Language Models , https://arxiv.org/abs/2406.05587 10 Jun, Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation , https://arxiv.org/abs/2406.06525 10 Jun, Margin-aware Preference Optimization for Aligning Diffusion Models Without Reference , https://arxiv.org/abs/2406.06424 10 Jun, Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning , https://arxiv.org/abs/2406.06469 10 Jun, Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters , https://arxiv.org/abs/2406.05955 10 Jun, Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching , https://arxiv.org/abs/2406.06326 11 Jun, An Image is Worth 32 Tokens for Reconstruction and Generation , https://arxiv.org/abs/2406.07550 11 Jun, TextGrad: Automatic "Differentiation" via Text , https://arxiv.org/abs/2406.07496 11 Jun, Simple and Effective Masked Diffusion Language Models , https://arxiv.org/abs/2406.07524 11 Jun, Never Miss A Beat: An Efficient Recipe for Context Window Extension of Large Language Models with Consistent "Middle" Enhancement , https://arxiv.org/abs/2406.07138 11 Jun, Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling , https://arxiv.org/abs/2406.07522 12 Jun, Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing , https://arxiv.org/abs/2406.08464 12 Jun, What If We Recaption Billions of Web Images with LLaMA-3? , https://arxiv.org/abs/2406.08478 12 Jun, Large Language Model Unlearning via Embedding-Corrupted Prompts , https://arxiv.org/abs/2406.07933 12 Jun, Large Language Models Must Be Taught to Know What They Don't Know , https://arxiv.org/abs/2406.08391 12 Jun, An Empirical Study of Mamba-based Language Models , https://arxiv.org/abs/2406.07887 12 Jun, Discovering Preference Optimization Algorithms with and for Large Language Models , https://arxiv.org/abs/2406.08414 13 Jun, Transformers Meet Neural Algorithmic Reasoners , https://arxiv.org/abs/2406.09308 13 Jun, MLKV: Multi-Layer Key-Value Heads for Memory Efficient Transformer Decoding , https://arxiv.org/abs/2406.09297 13 Jun, An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels , https://arxiv.org/abs/2406.09415 13 Jun, FouRA: Fourier Low Rank Adaptation , https://arxiv.org/abs/2406.08798 14 Jun, Bootstrapping Language Models with DPO Implicit Rewards , https://arxiv.org/abs/2406.09760 14 Jun, Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs , https://arxiv.org/abs/2406.10209 14 Jun, Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs , https://arxiv.org/abs/2406.10216 16 Jun, THEANINE: Revisiting Memory Management in Long-term Conversations with Timeline-augmented Response Generation , https://arxiv.org/abs/2406.10996 17 Jun, Task Me Anything , https://arxiv.org/abs/2406.11775 17 Jun, How Do Large Language Models Acquire Factual Knowledge During Pretraining? , https://arxiv.org/abs/2406.11813 17 Jun, mDPO: Conditional Preference Optimization for Multimodal Large Language Models , https://arxiv.org/abs/2406.11839 17 Jun, Nemotron-4 340B Technical Report , https://arxiv.org/abs/2406.11704 17 Jun, DataComp-LM: In Search of the Next Generation of Training Sets for Language Models , https://arxiv.org/abs/2406.11794 17 Jun, Tokenization Falling Short: The Curse of Tokenization , https://arxiv.org/abs/2406.11687 17 Jun, DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence , https://arxiv.org/abs/2406.11931 17 Jun, Unveiling Encoder-Free Vision-Language Models , https://arxiv.org/abs/2406.11832 17 Jun, Iterative Length-Regularized Direct Preference Optimization: A Case Study on Improving 7B Language Models to GPT-4 Level , https://arxiv.org/abs/2406.11817 17 Jun, HARE: HumAn pRiors, a key to small language model Efficiency , https://arxiv.org/abs/2406.11410 17 Jun, Measuring memorization in RLHF for code completion , https://arxiv.org/abs/2406.11715 17 Jun, Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts , https://arxiv.org/abs/2406.12034 18 Jun, From RAGs to Rich Parameters: Probing How Language Models Utilize External Knowledge Over Parametric Information for Factual Queries , https://arxiv.org/abs/2406.12824 18 Jun, Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges , https://arxiv.org/abs/2406.12624 19 Jun, Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? , https://arxiv.org/abs/2406.13121 20 Jun, Instruction Pre-Training: Language Models are Supervised Multitask Learners , https://arxiv.org/abs/2406.14491 20 Jun, Can LLMs Learn by Teaching? A Preliminary Study , https://arxiv.org/abs/2406.14629 21 Jun, A Tale of Trust and Accuracy: Base vs. Instruct LLMs in RAG Systems , https://arxiv.org/abs/2406.14972 21 Jun, LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs , https://arxiv.org/abs/2406.15319 21 Jun, MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression , https://arxiv.org/abs/2406.14909 21 Jun, Efficient Continual Pre-training by Mitigating the Stability Gap , https://arxiv.org/abs/2406.14833 24 Jun, Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers , https://arxiv.org/abs/2406.16747 24 Jun, WARP: On the Benefits of Weight Averaged Rewarded Policies , https://arxiv.org/abs/2406.16768 24 Jun, Adam-mini: Use Fewer Learning Rates To Gain More , https://arxiv.org/abs/2406.16793 25 Jun, The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale , https://arxiv.org/abs/2406.17557 25 Jun, LongIns: A Challenging Long-context Instruction-based Exam for LLMs , https://arxiv.org/abs/2406.17588 25 Jun, Following Length Constraints in Instructions , https://arxiv.org/abs/2406.17744 26 Jun, A Closer Look into Mixture-of-Experts in Large Language Models , https://arxiv.org/abs/2406.18219 26 Jun, RouteLLM: Learning to Route LLMs with Preference Data , https://arxiv.org/abs/2406.18665 26 Jun, Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs , https://arxiv.org/abs/2406.18629 27 Jun, Dataset Size Recovery from LoRA Weights , https://arxiv.org/abs/2406.19395 27 Jun, From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data , https://arxiv.org/abs/2406.19292 27 Jun, Changing Answer Order Can Decrease MMLU Accuracy , https://arxiv.org/abs/2406.19470 28 Jun, Direct Preference Knowledge Distillation for Large Language Models , https://arxiv.org/abs/2406.19774 28 Jun, LLM Critics Help Catch LLM Bugs , https://arxiv.org/abs/2407.00215 28 Jun, Scaling Synthetic Data Creation with 1,000,000,000 Personas , https://arxiv.org/abs/2406.20094 1 Jul, LLM See, LLM Do: Guiding Data Generation to Target Non-Differentiable Objectives , https://arxiv.org/abs/2407.01490 1 Jul, Searching for Best Practices in Retrieval-Augmented Generation , https://arxiv.org/abs/2407.01219 1 Jul, Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models , https://arxiv.org/abs/2407.01906 1 Jul, Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion , https://arxiv.org/abs/2407.01392 1 Jul, Eliminating Position Bias of Language Models: A Mechanistic Approach , https://arxiv.org/abs/2407.01100 2 Jul, JMInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention , https://arxiv.org/abs/2407.02490 2 Jul, TokenPacker: Efficient Visual Projector for Multimodal LLM , https://arxiv.org/abs/2407.02392 2 Jul, Reasoning in Large Language Models: A Geometric Perspective , https://arxiv.org/abs/2407.02678 2 Jul, RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs , https://arxiv.org/abs/2407.02485 3 Jul, AgentInstruct: Toward Generative Teaching with Agentic Flows , https://arxiv.org/abs/2407.03502 3 Jul, HEMM: Holistic Evaluation of Multimodal Foundation Models , https://arxiv.org/abs/2407.03418 4 Jul, Mixture of A Million Experts , https://arxiv.org/abs/2407.04153 5 Jul, Learning to (Learn at Test Time): RNNs with Expressive Hidden States , https://arxiv.org/abs/2407.04620 9 Jul, Vision Language Models Are Blind , https://arxiv.org/abs/2407.06581 9 Jul, Self-Recognition in Language Models , https://arxiv.org/abs/2407.06946 10 Jul, Inference Performance Optimization for Large Language Models on CPUs , https://arxiv.org/abs/2407.07304 11 Jul, Gradient Boosting Reinforcement Learning , https://arxiv.org/abs/2407.08250 11 Jul, FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision , https://arxiv.org/abs/2407.08608 12 Jul, SpreadsheetLLM: Encoding Spreadsheets for Large Language Models , https://arxiv.org/abs/2407.09025 12 Jul, New Desiderata for Direct Preference Optimization , https://arxiv.org/abs/2407.09072 12 Jul, Context Embeddings for Efficient Answer Generation in RAG , https://arxiv.org/abs/2407.09252 15 Jul, Qwen2 Technical Report , https://arxiv.org/abs/2407.10671 15 Jul, The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism , https://arxiv.org/abs/2407.10457 15 Jul, From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients , https://arxiv.org/abs/2407.11239 16 Jul, GoldFinch: High Performance RWKV/Transformer Hybrid with Linear Pre-Fill and Extreme KV-Cache Compression , https://arxiv.org/abs/2407.12077 16 Jul, Scaling Diffusion Transformers to 16 Billion Parameters , https://arxiv.org/abs/2407.11633 16 Jul, NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window? , https://arxiv.org/abs/2407.11963 17 Jul, Patch-Level Training for Large Language Models , https://arxiv.org/abs/2407.12665 17 Jul, LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models , https://arxiv.org/abs/2407.12772 17 Jul, A Survey of Prompt Engineering Methods in Large Language Models for Different NLP Tasks , https://arxiv.org/abs/2407.12994 17 Jul, Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models , https://arxiv.org/abs/2407.12327 18 Jul, Attention Overflow: Language Model Input Blur during Long-Context Missing Items Recommendation , https://arxiv.org/abs/2407.13481 18 Jul, Weak-to-Strong Reasoning , https://arxiv.org/abs/2407.13647 18 Jul, Understanding Reference Policies in Direct Preference Optimization , https://arxiv.org/abs/2407.13709 18 Jul, Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies , https://arxiv.org/abs/2407.13623 19 Jul, BOND: Aligning LLMs with Best-of-N Distillation , https://arxiv.org/abs/2407.14622 19 Jul, Compact Language Models via Pruning and Knowledge Distillation , https://arxiv.org/abs/2407.14679 19 Jul, LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference , https://arxiv.org/abs/2407.14057 22 Jul, Mini-Sequence Transformer: Optimizing Intermediate Memory for Long Sequences Training , https://arxiv.org/abs/2407.15892 22 Jul, DDK: Distilling Domain Knowledge for Efficient Large Language Models , https://arxiv.org/abs/2407.16154 23 Jul, Generation Constraint Scaling Can Mitigate Hallucination , https://arxiv.org/abs/2407.16908 23 Jul, Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach , https://arxiv.org/abs/2407.16833 23 Jul, Course-Correction: Safety Alignment Using Synthetic Preferences , https://arxiv.org/abs/2407.16637 26 Jul, Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data? , https://arxiv.org/abs/2407.16607 28 Jul, Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge , https://arxiv.org/abs/2407.19594 29 Jul, Improving Retrieval Augmented Language Model with Self-Reasoning , https://arxiv.org/abs/2407.19813 29 Jul, Apple Intelligence Foundation Language Models , https://arxiv.org/abs/2407.21075 30 Jul, ThinK: Thinner Key Cache by Query-Driven Pruning , https://arxiv.org/abs/2407.21018 31 Jul, The Llama 3 Herd of Models , https://arxiv.org/abs/2407.21783 31 Jul, Gemma 2: Improving Open Language Models at a Practical Size , https://arxiv.org/abs/2408.00118 1 Aug, S AM 2: Segment Anything in Images and Videos, https://arxiv.org/abs/2408.00714 2 Aug, POA: Pre-training Once for Models of All Sizes, https://arxiv.org/abs/2408.01031 2 Aug, RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework, https://arxiv.org/abs/2408.01262 2 Aug, A Survey of Mamba, https://arxiv.org/abs/2408.01129 3 Aug, MiniCPM-V: A GPT-4V Level MLLM on Your Phone, https://arxiv.org/abs/2408.01800 5 Aug, RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation, https://arxiv.org/abs/2408.02545 5 Aug, Self-Taught Evaluators, https://arxiv.org/abs/2408.02666 5 Aug, BioMamba: A Pre-trained Biomedical Language Representation Model Leveraging Mamba, https://arxiv.org/abs/2408.02600 5 Aug, Self-Taught Evaluators, https://arxiv.org/abs/2408.02666 7 Aug, EXAONE 3.0 7.8B Instruction Tuned Language Model, https://arxiv.org/abs/2408.03541 7 Aug, 1.5-Pints Technical Report: Pretraining in Days, Not Months -- Your Language Model Thrives on Quality Data, https://arxiv.org/abs/2408.03506 8 Aug, Conversational Prompt Engineering, https://arxiv.org/abs/2408.04560 8 Aug, Trans-Tokenization and Cross-lingual Vocabulary Transfers: Language Adaptation of LLMs for Low-Resource NLP, https://arxiv.org/abs/2408.04303 12 Aug, The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery, https://arxiv.org/abs/2408.06292 15 Aug, Hermes 3 Technical Report, https://arxiv.org/abs/2408.12570 19 Aug, Customizing Language Models with Instance-wise LoRA for Sequential Recommendation, https://arxiv.org/abs/2408.10159 20 Aug , Enhancing Robustness in Large Language Models: Prompting for Mitigating the Impact of Irrelevant Information, https://arxiv.org/abs/2408.10615 20 Aug, To Code, or Not To Code? Exploring Impact of Code in Pre-training, https://arxiv.org/abs/2408.10914 21 Aug , LLM Pruning and Distillation in Practice: The Minitron Approach, https://arxiv.org/abs/2408.11796 22 Aug, Jamba-1.5: Hybrid Transformer-Mamba Models at Scale, https://arxiv.org/abs/2408.12570 22 Aug, Controllable Text Generation for Large Language Models: A Survey, https://arxiv.org/abs/2408.12599 23 Aug, Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time, https://arxiv.org/abs/2408.13233 26 Aug, A Practitioner's Guide to Continual Multimodal Pretraining, https://arxiv.org/abs/2408.14471 26 Aug, Building and better understanding vision-language models: insights and future directions, https://arxiv.org/abs/2408.12637 26 Aug, CURLoRA: Stable LLM Continual Fine-Tuning and Catastrophic Forgetting Mitigation, https://arxiv.org/abs/2408.14572 27 Aug, The Mamba in the Llama: Distilling and Accelerating Hybrid Models, https://arxiv.org/abs/2408.15237 28 Aug, ReMamba: Equip Mamba with Effective Long-Sequence Modeling, https://arxiv.org/abs/2408.15496 29 Aug, Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling, https://arxiv.org/abs/2408.16737 31 Aug, LongRecipe: Recipe for Efficient Long Context Generalization in Large Languge Models, https://arxiv.org/abs/2409.00509 3 Sep, OLMoE: Open Mixture-of-Experts Language Models, https://arxiv.org/abs/2409.02060 3 Sep 2024, In Defense of RAG in the Era of Long-Context Language Models, https://arxiv.org/abs/2409.01666 5 Sep, Attention Heads of Large Language Models: A Survey, https://arxiv.org/abs/2409.03752 5 Sep, LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA , https://arxiv.org/abs/2409.02897 5 Sep, How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with High-Quality Data, https://arxiv.org/abs/2409.03810 6 Sep, T heory, Analysis, and Best Practices for Sigmoid Self-Attention, https://arxiv.org/abs/2409.04431 10 Sep, LLaMA-Omni: Seamless Speech Interaction with Large Language Models, https://arxiv.org/abs/2409.06666 10 Sep, What is the Role of Small Models in the LLM Era: A Survey, https://arxiv.org/abs/2409.06857 11 Sep, Policy Filtration in RLHF to Fine-Tune LLM for Code Generation, https://arxiv.org/abs/2409.06957 16 Sep, RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval , https://arxiv.org/abs/2409.10516 18 Sep, Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement , https://arxiv.org/abs/2409.12122 18 Sep, Qwen2.5-Coder Technical Report , https://arxiv.org/abs/2409.12186 21 Sep, Instruction Following without Instruction Tuning, https://arxiv.org/abs/2409.14254 30 Sep, I s Preference Alignment Always the Best Option to Enhance LLM-Based Translation? An Empirical Analysis, https://arxiv.org/abs/2409.20059 30 Sep, The Perfect Blend: Redefining RLHF with Mixture of Judges, https://arxiv.org/abs/2409.20370 (New paper by Meta on how they did RLHF for Llama 3) 1 Oct, Addition is All You Need for Energy-efficient Language Models, https://arxiv.org/abs/2410.00907 2 Oct Quantifying Generalization Complexity for Large Language Models, https://arxiv.org/abs/2410.01769 2 Oct, When a language model is optimized for reasoning, does it still show embers of autoregression? An analysis of OpenAI o1 , https://arxiv.org/abs/2410.01792 2 Oct, W ere RNNs All We Needed? , https://arxiv.org/abs/2410.01201 3 Oct, Selective Attention Improves Transformer , https://arxiv.org/abs/2410.02703 3 Oct, LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations , https://arxiv.org/abs/2410.02707 3 Oct, LLaVA-Critic: Learning to Evaluate Multimodal Models , https://arxiv.org/abs/2410.02712 7 Oct, Differential Transformer , https://arxiv.org/abs/2410.05258 7 Oct, GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models , https://arxiv.org/abs/2410.05229 8 Oct, ARIA: An Open Multimodal Native Mixture-of-Experts Model , https://arxiv.org/abs/2410.05993 8 Oct, O1 Replication Journey: A Strategic Progress Report -- Part 1 , https://arxiv.org/abs/2410.18982 8 Oct, Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG, https://arxiv.org/abs/2410.05983 9 Oct, From Generalist to Specialist: Adapting Vision Language Models via Task-Specific Visual Instruction Tuning , https://arxiv.org/abs/2410.06456 10 Oct, KV Prediction for Improved Time to First Token , https://arxiv.org/abs/2410.08391 11 Oct, Baichuan-Omni Technical Report , https://arxiv.org/abs/2410.08565 13 Oct, MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models , https://arxiv.org/abs/2410.10139 13 Oct, LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models , https://arxiv.org/abs/2410.09732 15 Oct, AFlow: Automating Agentic Workflow Generation , https://arxiv.org/abs/2410.10762 15 Oct, Toward General Instruction-Following Alignment for Retrieval-Augmented Generation , https://arxiv.org/abs/2410.09584 21 Oct, Pre-training Distillation for Large Language Models: A Design Space Exploration , https://arxiv.org/abs/2410.16215 23 Oct, MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models , https://arxiv.org/abs/2410.17637 23 Oct, Scalable Ranked Preference Optimization for Text-to-Image Generation , https://arxiv.org/abs/2410.18013 23 Oct, Scaling Diffusion Language Models via Adaptation from Autoregressive Models , https://arxiv.org/abs/2410.17891 24 Oct, Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback , https://arxiv.org/abs/2410.19133 25 Oct, Counting Ability of Large Language Models and Impact of Tokenization , https://arxiv.org/abs/2410.19730 25 Oct, A Survey of Small Language Models , https://arxiv.org/abs/2410.20011 26 Oct, Accelerating Direct Preference Optimization with Prefix Sharing , https://arxiv.org/abs/2410.20305 27 Oct, Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse , https://arxiv.org/abs/2410.21333 28 Oct, LongReward: Improving Long-context Large Language Models with AI Feedback , https://arxiv.org/abs/2410.21252 28 Oct, ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference , https://arxiv.org/abs/2410.21465 29 Oct, Beyond Text: Optimizing RAG with Multimodal Inputs for Industrial Applications , https://arxiv.org/abs/2410.21943 30 Oct, CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation , https://arxiv.org/abs/2410.23090 31 Oct, What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective , https://arxiv.org/abs/2410.23743 31 Oct, GPT or BERT: why not both? , https://arxiv.org/abs/2410.24159 31 Oct, Language Models can Self-Lengthen to Generate Long Texts , https://arxiv.org/abs/2410.23933 1 Nov, Adding Error Bars to Evals: A Statistical Approach to Language Model Evaluations , https://arxiv.org/abs/2411.00640 1 Nov 2024, Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation , https://arxiv.org/abs/2411.00412 1 Nov 2024, Multi-expert Prompting Improves Reliability, Safety, and Usefulness of Large Language Models , https://arxiv.org/abs/2411.00492 3 Nov, S ample-Efficient Alignment for LLMs , https://arxiv.org/abs/2411.01493 4 Nov 2024, A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness , https://arxiv.org/abs/2411.03350 4 Nov, "Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization , https://arxiv.org/abs/2411.02355 4 Nov, Parameter-Efficient Fine-Tuning of Large Language Models for Unit Test Generation: An Empirical Study , https://arxiv.org/abs/2411.02462 5 Nov, HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems , https://arxiv.org/abs/2411.02959 6 Nov, Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination , https://arxiv.org/abs/2411.03823 6 Nov, Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding , https://arxiv.org/abs/2411.04282 6 Nov, Number Cookbook: Number Understanding of Language Models and How to Improve It , https://arxiv.org/abs/2411.03766 7 Nov, Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models , https://arxiv.org/abs/2411.04996 7 Nov, BitNet a4.8: 4-bit Activations for 1-bit LLMs , https://arxiv.org/abs/2411.04965 7 Nov, Scaling Laws for Precision , https://arxiv.org/abs/2411.04330 8 Nov, Energy Efficient Protein Language Models: Leveraging Small Language Models with LoRA for Controllable Protein Generation , https://arxiv.org/abs/2411.05966 8 Nov, Balancing Pipeline Parallelism with Vocabulary Parallelism , https://arxiv.org/abs/2411.05288 11 Nov, Toward Optimal Search and Retrieval for RAG , https://arxiv.org/abs/2411.07396 12 Nov, Large Language Models Can Self-Improve in Long-context Reasoning , https://arxiv.org/abs/2411.08147 12 Nov, Stronger Models are NOT Stronger Teachers for Instruction Tuning , https://arxiv.org/abs/2411.07133 12 Nov, Direct Preference Optimization Using Sparse Feature-Level Constraints , https://arxiv.org/abs/2411.07618 13 Nov, Cut Your Losses in Large-Vocabulary Language Models , https://arxiv.org/abs/2411.09009 15 Nov, Does Prompt Formatting Have Any Impact on LLM Performance? , https://arxiv.org/abs/2411.10541 17 Nov, SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization , https://arxiv.org/abs/2411.11909 17 Nov, SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration , https://arxiv.org/abs/2411.10958 18 Nov, Bi-Mamba: Towards Accurate 1-Bit State Space Models , https://arxiv.org/abs/2411.11843 19 Nov, RedPajama: an Open Dataset for Training Large Language Models, https://arxiv.org/abs/2411.12372 20 Nov, Hymba: A Hybrid-head Architecture for Small Language Models , https://arxiv.org/abs/2411.13676 20 Nov, Loss-to-Loss Prediction: Scaling Laws for All Datasets , https://arxiv.org/abs/2411.12925 21 Nov, When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training , https://arxiv.org/abs/2411.13476 21 Nov, Multimodal Autoregressive Pre-training of Large Vision Encoders , https://arxiv.org/abs/2411.14402 21 Nov, Natural Language Reinforcement Learning , https://arxiv.org/abs/2411.14251 22 Nov, Large Multi-modal Models Can Interpret Features in Large Multi-modal Models , https://arxiv.org/abs/2411.14982 22 Nov, TÜLU 3: Pushing Frontiers in Open Language Model Post-Training , https://arxiv.org/abs/2411.15124 23 Nov, MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs , https://arxiv.org/abs/2411.15296 24 Nov, LLMs Do Not Think Step-by-step In Implicit Reasoning , https://arxiv.org/abs/2411.15862 25 Nov, O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson? , https://arxiv.org/abs/2411.16489 26 Nov, Star Attention: Efficient LLM Inference over Long Sequences , https://arxiv.org/abs/2411.17116 27 Nov, Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens , https://arxiv.org/abs/2411.17691 27 Nov, Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration , https://arxiv.org/abs/2411.17686 29 Nov, Reverse Thinking Makes LLMs Stronger Reasoners , https://arxiv.org/abs/2411.19865 29 Nov, Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability , https://arxiv.org/abs/2411.19943 2 Dec, Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis , https://arxiv.org/abs/2412.01819 2 Dec, X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models , https://arxiv.org/abs/2412.01824 2 Dec, Free Process Rewards without Process Labels , https://arxiv.org/abs/2412.01981 3 Dec, Scaling Image Tokenizers with Grouped Spherical Quantization , https://arxiv.org/abs/2412.02632 3 Dec, RARE: Retrieval-Augmented Reasoning Enhancement for Large Language Models , https://arxiv.org/abs/2412.02830 4 Dec, Perception Tokens Enhance Visual Reasoning in Multimodal Language Models , https://arxiv.org/abs/2412.03548 4 Dec, Evaluating Language Models as Synthetic Data Generators , https://arxiv.org/abs/2412.03679 4 Dec, Best-of-N Jailbreaking , https://arxiv.org/abs/2412.03556 4 Dec, PaliGemma 2: A Family of Versatile VLMs for Transfer , https://arxiv.org/abs/2412.03555 5 Dec, VisionZip: Longer is Better but Not Necessary in Vision Language Models , https://arxiv.org/abs/2412.04467 5 Dec, Evaluating and Aligning CodeLLMs on Human Preference , https://arxiv.org/abs/2412.05210 6 Dec, MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale , https://arxiv.org/abs/2412.05237 6 Dec, Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling , https://arxiv.org/abs/2412.05271 7 Dec, LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods , https://arxiv.org/abs/2412.05579 8 Dec, Does RLHF Scale? Exploring the Impacts From Data, Model, and Method , https://arxiv.org/abs/2412.06000 9 Dec, Unraveling the Complexity of Memory in RL Agents: An Approach for Classification and Evaluation , https://arxiv.org/abs/2412.06531 9 Dec, Training Large Language Models to Reason in a Continuous Latent Space , https://arxiv.org/abs/2412.06769 9 Dec, AutoReason: Automatic Few-Shot Reasoning Decomposition , https://arxiv.org/abs/2412.06975 11 Dec, Large Concept Models: Language Modeling in a Sentence Representation Space , https://arxiv.org/abs/2412.08821 12 Dec, Phi-4 Technical Report , https://arxiv.org/abs/2412.08905 13 Dec, Byte Latent Transformer: Patches Scale Better Than Tokens , https://arxiv.org/abs/2412.09871 13 Dec, SCBench: A KV Cache-Centric Analysis of Long-Context Methods , https://arxiv.org/abs/2412.10319 13 Dec, Cultural Evolution of Cooperation among LLM Agents , https://arxiv.org/abs/2412.10270 13 Dec, DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding , https://arxiv.org/abs/2412.10302 16 Dec, No More Adam: Learning Rate Scaling at Initialization is All You Need , https://arxiv.org/abs/2412.11768 16 Dec, Precise Length Control in Large Language Models , https://arxiv.org/abs/2412.11937 16 Dec, The Open Source Advantage in Large Language Models (LLMs) , https://arxiv.org/abs/2412.12004 16 Dec, A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges , https://arxiv.org/abs/2412.11936 17 Dec, Are Your LLMs Capable of Stable Reasoning? , https://arxiv.org/abs/2412.13147 18 Dec, LLM Post-Training Recipes, Improving Reasoning in LLMs , https://arxiv.org/abs/2412.14135 18 Dec, Hansel: Output Length Controlling Framework for Large Language Models , https://arxiv.org/abs/2412.14033 18 Dec, Mind Your Theory: Theory of Mind Goes Deeper Than Reasoning , https://arxiv.org/abs/2412.13631 18 Dec, Alignment Faking in Large Language Models , https://arxiv.org/abs/2412.14093 18 Dec, SCOPE: Optimizing Key-Value Cache Compression in Long-Context Generation , https://arxiv.org/abs/2412.13649 19 Dec, LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-Context Multitasks , https://arxiv.org/abs/2412.15204 20 Dec, Offline Reinforcement Learning for LLM Multi-Step Reasoning , https://arxiv.org/abs/2412.16145 24 Dec, Mulberry: Empowering MLLM with O1-like Reasoning and Reflection via Collective Monte Carlo Tree Search , https://arxiv.org/abs/2412.18319 31 Dec, Titans: Learning to Memorize at Test Time , https://arxiv.org/abs/2501.00663 This magazine is a personal passion project. For those who wish to support me, please consider purchasing a copy of my Build a Large Language Model (From Scratch) book . (I am confident that you'll get lots out of this book as it explains how LLMs work in a level of detail that is not found anywhere else.) Build a Large Language Model (From Scratch) now available on Amazon If you read the book and have a few minutes to spare, I'd really appreciate a brief review . It helps us authors a lot! Alternatively, I also recently enabled the paid subscription option on Substack to support this magazine directly. Ahead of AI is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. Bonus materials in the GitHub repository (stars highlight my personal favorites) Thanks for your understanding and support, and I hope to make a full recovery soon and be back with the Research Highlights 2024 article in a few weeks! Ahead of AI is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. January 2024 1 Jan, Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models , https://arxiv.org/abs/2401.00788 2 Jan, A Comprehensive Study of Knowledge Editing for Large Language Models , https://arxiv.org/abs/2401.01286 2 Jan, LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning , https://arxiv.org/abs/2401.01325 2 Jan, Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models , https://arxiv.org/abs/2401.01335 2 Jan, LLaMA Beyond English: An Empirical Study on Language Capability Transfer , https://arxiv.org/abs/2401.01055 3 Jan, A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity , https://arxiv.org/abs/2401.01967 4 Jan, LLaMA Pro: Progressive LLaMA with Block Expansion , https://arxiv.org/abs/2401.02415 4 Jan, LLM Augmented LLMs: Expanding Capabilities through Composition , https://arxiv.org/abs/2401.02412 4 Jan, Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM , https://arxiv.org/abs/2401.02994 5 Jan, DeepSeek LLM: Scaling Open-Source Language Models with Longtermism , https://arxiv.org/abs/2401.02954 5 Jan, Denoising Vision Transformers , https://arxiv.org/abs/2401.02957 7 Jan, Soaring from 4K to 400K: Extending LLM’s Context with Activation Beacon , https://arxiv.org/abs/2401.03462 8 Jan, Mixtral of Experts , https://arxiv.org/abs/2401.04088 8 Jan, MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts , https://arxiv.org/abs/2401.04081 8 Jan, A Minimaximalist Approach to Reinforcement Learning from Human Feedback , https://arxiv.org/abs/2401.04056 9 Jan, RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation , https://arxiv.org/abs/2401.04679 10 Jan, Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training , https://arxiv.org/abs/2401.05566 11 Jan, Transformers are Multi-State RNNs , https://arxiv.org/abs/2401.06104 11 Jan, A Closer Look at AUROC and AUPRC under Class Imbalance , https://arxiv.org/abs/2401.06091 12 Jan, An Experimental Design Framework for Label-Efficient Supervised Finetuning of Large Language Models , https://arxiv.org/abs/2401.06692 16 Jan, Tuning Language Models by Proxy , https://arxiv.org/abs/2401.08565 16 Jan, Scalable Pre-training of Large Autoregressive Image Models , https://arxiv.org/abs/2401.08541 16 Jan, Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering , https://arxiv.org/abs/2401.08500 16 Jan, RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture , https://arxiv.org/abs/2401.08406 17 Jan, ReFT: Reasoning with Reinforced Fine-Tuning , https://arxiv.org/abs/2401.08967 18 Jan, DiffusionGPT: LLM-Driven Text-to-Image Generation System , https://arxiv.org/abs/2401.10061 18 Jan, Self-Rewarding Language Models , https://arxiv.org/abs/2401.10020 18 Jan, VMamba: Visual State Space Model , https://arxiv.org/abs/2401.10166 19 Jan, Knowledge Fusion of Large Language Models , https://arxiv.org/abs/2401.10491 22 Jan, SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities , https://arxiv.org/abs/2401.12168 22 Jan, WARM: On the Benefits of Weight Averaged Reward Models , https://arxiv.org/abs/2401.12187 22 Jan, Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text , https://arxiv.org/abs/2401.12070 24 Jan, MambaByte: Token-free Selective State Space Model , https://arxiv.org/abs/2401.13660 24 Jan, SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced Token Detection , https://arxiv.org/abs/2401.13160 25 Jan, Rethinking Patch Dependence for Masked Autoencoders , https://arxiv.org/abs/2401.14391 25 Jan, Pix2gestalt: Amodal Segmentation by Synthesizing Wholes , https://arxiv.org/abs/2401.14398 25 Jan, Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities , https://arxiv.org/abs/2401.14405 26 Jan, EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty , https://arxiv.org/abs/2401.15077 29 Jan, MoE-LLaVA: Mixture of Experts for Large Vision-Language Models , https://arxiv.org/abs/2401.15947 29 Jan, Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling , https://arxiv.org/abs/2401.16380 31 Jan, KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization , https://arxiv.org/abs/2401.18079 1 Feb, Efficient Exploration for LLMs , https://arxiv.org/abs/2402.00396 1 Feb, OLMo: Accelerating the Science of Language Models , https://arxiv.org/abs/2402.00838 1 Feb, Tiny Titans: Can Smaller Large Language Models Punch Above Their Weight in the Real World for Meeting Summarization? , https://arxiv.org/abs/2402.00841 1 Feb, Repeat After Me: Transformers are Better than State Space Models at Copying , https://arxiv.org/abs/2402.01032 2 Feb, LiPO: Listwise Preference Optimization through Learning-to-Rank , https://arxiv.org/abs/2402.01878 2 Feb, FindingEmo: An Image Dataset for Emotion Recognition in the Wild , https://arxiv.org/abs/2402.01355 3 Feb, More Agents Is All You Need , https://arxiv.org/abs/2402.05120 5 Feb, DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models , https://arxiv.org/abs/2402.03300 6 Feb, MobileVLM V2: Faster and Stronger Baseline for Vision Language Model , https://arxiv.org/abs/2402.03766 6 Feb, A Phase Transition Between Positional and Semantic Learning in a Solvable Model of Dot-Product Attention , https://arxiv.org/abs/2402.03902 6 Feb, Scaling Laws for Downstream Task Performance of Large Language Models , https://arxiv.org/abs/2402.04177 6 Feb, MOMENT: A Family of Open Time-series Foundation Models , https://arxiv.org/abs/2402.03885 6 Feb, Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models , https://arxiv.org/abs/2402.03749 6 Feb, Self-Discover: Large Language Models Self-Compose Reasoning Structures , https://arxiv.org/abs/2402.03620 7 Feb, Grandmaster-Level Chess Without Search , https://arxiv.org/abs/2402.04494 7 Feb, Direct Language Model Alignment from Online AI Feedback , https://arxiv.org/abs/2402.04792 8 Feb, Buffer Overflow in Mixture of Experts , https://arxiv.org/abs/2402.05526 9 Feb, The Boundary of Neural Network Trainability is Fractal , https://arxiv.org/abs/2402.06184 11 Feb, ODIN: Disentangled Reward Mitigates Hacking in RLHF , https://arxiv.org/abs/2402.07319 12 Feb, Policy Improvement using Language Feedback Models , https://arxiv.org/abs/2402.07876 12 Feb, Scaling Laws for Fine-Grained Mixture of Experts , https://arxiv.org/abs/2402.07871 12 Feb, Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model , https://arxiv.org/abs/2402.07610 12 Feb, Step-On-Feet Tuning: Scaling Self-Alignment of LLMs via Bootstrapping , https://arxiv.org/abs/2402.07610 12 Feb, Suppressing Pink Elephants with Direct Principle Feedback , https://arxiv.org/abs/2402.07896 13 Feb, World Model on Million-Length Video And Language With RingAttention , https://arxiv.org/abs/2402.08268 13 Feb, Mixtures of Experts Unlock Parameter Scaling for Deep RL , https://arxiv.org/abs/2402.08609 14 Feb, DoRA: Weight-Decomposed Low-Rank Adaptation , https://arxiv.org/abs/2402.09353 14 Feb, Transformers Can Achieve Length Generalization But Not Robustly , https://arxiv.org/abs/2402.09371 15 Feb, BASE TTS: Lessons From Building a Billion-Parameter Text-to-Speech Model on 100K Hours of Data , https://arxiv.org/abs/2402.08093 15 Feb, Recovering the Pre-Fine-Tuning Weights of Generative Models , https://arxiv.org/abs/2402.10208 15 Feb, Generative Representational Instruction Tuning , https://arxiv.org/abs/2402.09906 16 Feb, FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models , https://arxiv.org/abs/2402.10986 17 Feb, OneBit: Towards Extremely Low-bit Large Language Models , https://arxiv.org/abs/2402.11295 18 Feb, LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration , https://arxiv.org/abs/2402.11550 19 Feb, Reformatted Alignment , https://arxiv.org/abs/2402.12219 19 Feb, AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling , https://arxiv.org/abs/2402.12226 19 Feb, Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs , https://arxiv.org/abs/2402.12030 19 Feb, LoRA+: Efficient Low Rank Adaptation of Large Models , https://arxiv.org/abs/2402.12354 20 Feb, Neural Network Diffusion , https://arxiv.org/abs/2402.13144 21 Feb, YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information , https://arxiv.org/abs/2402.13616 21 Feb, LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens , https://arxiv.org/abs/2402.13753 21 Feb, Large Language Models for Data Annotation: A Survey , https://arxiv.org/abs/2402.13446 22 Feb, TinyLLaVA: A Framework of Small-scale Large Multimodal Models , https://arxiv.org/abs/2402.14289 22 Feb, Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs , https://arxiv.org/abs/2402.14740 23 Feb, Genie: Generative Interactive Environments , https://arxiv.org/abs/2402.15391 26 Feb, CARTE: Pretraining and Transfer for Tabular Learning , https://arxiv.org/abs/2402.16785 27 Feb, The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits , https://arxiv.org/abs/2402.17764 27 Feb, Sora Generates Videos with Stunning Geometrical Consistency , https://arxiv.org/abs/2402.17403 27 Feb, When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method , https://arxiv.org/abs/2402.17193 29 Feb, Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models , https://arxiv.org/abs/2402.19427 1 Mar, Learning and Leveraging World Models in Visual Representation Learning , https://arxiv.org/abs/2403.00504 3 Mar, Improving LLM Code Generation with Grammar Augmentation , https://arxiv.org/abs/2403.01632 3 Mar, The Hidden Attention of Mamba Models , https://arxiv.org/abs/2403.01590 4 Mar, Training-Free Pretrained Model Merging , https://arxiv.org/abs/2403.01753 4 Mar, Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures , https://arxiv.org/abs/2403.02308 5 Mar, The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning , https://arxiv.org/abs/2403.03218 5 Mar, Evolution Transformer: In-Context Evolutionary Optimization , https://arxiv.org/abs/2403.02985 5 Mar, Enhancing Vision-Language Pre-training with Rich Supervisions , https://arxiv.org/abs/2403.03346 5 Mar, Scaling Rectified Flow Transformers for High-Resolution Image Synthesis , https://arxiv.org/abs/2403.03206 5 Mar, Design2Code: How Far Are We From Automating Front-End Engineering? , https://arxiv.org/abs/2403.03163 6 Mar, ShortGPT: Layers in Large Language Models are More Redundant Than You Expect , https://arxiv.org/abs/2403.03853 6 Mar, Backtracing: Retrieving the Cause of the Query , https://arxiv.org/abs/2403.03956 6 Mar, Learning to Decode Collaboratively with Multiple Language Models , https://arxiv.org/abs/2403.03870 6 Mar, SaulLM-7B: A pioneering Large Language Model for Law , https://arxiv.org/abs/2403.03883 6 Mar, Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious Challenges in Multimodal Reasoning , https://arxiv.org/abs/2403.03864 6 Mar, 3D Diffusion Policy , https://arxiv.org/abs/2403.03954 6 Mar, MedMamba: Vision Mamba for Medical Image Classification , https://arxiv.org/abs/2403.03849 6 Mar, GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection , https://arxiv.org/abs/2403.03507 6 Mar, Stop Regressing: Training Value Functions via Classification for Scalable Deep RL , https://arxiv.org/abs/2403.03950 7 Mar, How Far Are We from Intelligent Visual Deductive Reasoning? , https://arxiv.org/abs/2403.04732 7 Mar, Common 7B Language Models Already Possess Strong Math Capabilities , https://arxiv.org/abs/2403.04706 8 Mar, Gemini 1.5: Unlocking Multimodal Understanding Across Millions of Tokens of Context , https://arxiv.org/abs/2403.05530 8 Mar, Is Cosine-Similarity of Embeddings Really About Similarity? , https://arxiv.org/abs/2403.05440 8 Mar, LLM4Decompile: Decompiling Binary Code with Large Language Models , https://arxiv.org/abs/2403.05286 9 Mar, Algorithmic Progress in Language Models , https://arxiv.org/abs/2403.05812 11 Mar, Stealing Part of a Production Language Model , https://arxiv.org/abs/2403.06634 12 Mar, Chronos: Learning the Language of Time Series , https://arxiv.org/abs/2403.07815 13 Mar, Simple and Scalable Strategies to Continually Pre-train Large Language Models , https://arxiv.org/abs/2403.08763 13 Mar, Language Models Scale Reliably With Over-Training and on Downstream Tasks , https://arxiv.org/abs/2403.08540 14 Mar, BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences , https://arxiv.org/abs/2403.09347 14 Mar, LocalMamba: Visual State Space Model with Windowed Selective Scan , https://arxiv.org/abs/2403.09338 14 Mar, GiT: Towards Generalist Vision Transformer through Universal Language Interface , https://arxiv.org/abs/2403.09394 14 Mar, MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training , https://arxiv.org/abs/2403.09611 15 Mar, RAFT: Adapting Language Model to Domain Specific RAG , https://arxiv.org/abs/2403.10131 18 Mar, TnT-LLM: Text Mining at Scale with Large Language Models , https://arxiv.org/abs/2403.12173 18 Mar, Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression , https://arxiv.org/abs/2403.15447 19 Mar, PERL: Parameter Efficient Reinforcement Learning from Human Feedback , https://arxiv.org/abs/2403.10704 20 Mar, RewardBench: Evaluating Reward Models for Language Modeling , https://arxiv.org/abs/2403.13787 20 Mar, LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models , https://arxiv.org/abs/2403.13372 21 Mar, RakutenAI-7B: Extending Large Language Models for Japanese , https://arxiv.org/abs/2403.15484 22 Mar, SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time Series , https://arxiv.org/abs/2403.15360 22 Mar, Can Large Language Models Explore In-Context? , https://arxiv.org/abs/2403.15371 22 Mar, LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement , https://arxiv.org/abs/2403.15042 25 Mar, LLM Agent Operating System , https://arxiv.org/abs/2403.16971 26 Mar, The Unreasonable Ineffectiveness of the Deeper Layers , https://arxiv.org/abs/2403.17887 27 Mar, BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text , https://arxiv.org/abs/2403.18421 27 Mar, ViTAR: Vision Transformer with Any Resolution , https://arxiv.org/abs/2403.18361 27 Mar, Long-form Factuality in Large Language Models , https://arxiv.org/abs/2403.18802 27 Mar, Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models , https://arxiv.org/abs/2403.18814 26 Mar, LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning , https://arxiv.org/abs/2403.17919 26 Mar, Mechanistic Design and Scaling of Hybrid Architectures , https://arxiv.org/abs/2403.17844 28 Mar, MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions , https://arxiv.org/abs/2403.19651 28 Mar, Model Stock: All We Need Is Just a Few Fine-Tuned Models , https://arxiv.org/abs/2403.19522 1 Apr, Do Language Models Plan Ahead for Future Tokens? , https://arxiv.org/abs/2404.00859 1 Apr, Bigger is not Always Better: Scaling Properties of Latent Diffusion Models , https://arxiv.org/abs/2404.01367 1 Apr, The Fine Line: Navigating Large Language Model Pretraining with Down-streaming Capability Analysis , https://arxiv.org/abs/2404.01204 1 Apr, Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models , https://arxiv.org/abs/2404.04478 2 Apr, Mixture-of-Depths: Dynamically Allocating Compute in Transformer-Based Language Models , https://arxiv.org/abs/2404.02258 2 Apr, Long-context LLMs Struggle with Long In-context Learning , https://arxiv.org/abs/2404.02060 2 Apr, Emergent Abilities in Reduced-Scale Generative Language Models , https://arxiv.org/abs/2404.02204 2 Apr, Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks , https://arxiv.org/abs/2404.02151 3 Apr, On the Scalability of Diffusion-based Text-to-Image Generation , https://arxiv.org/abs/2404.02883 3 Apr, BAdam: A Memory Efficient Full Parameter Training Method for Large Language Models , https://arxiv.org/abs/2404.02827 3 Apr, Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models , https://arxiv.org/abs/2404.02747 4 Apr, Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences , https://arxiv.org/abs/2404.02151 4 Apr, Training LLMs over Neurally Compressed Text , https://arxiv.org/abs/2404.03626 4 Apr, CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues , https://arxiv.org/abs/2404.03820 5 Apr, ReFT: Representation Finetuning for Language Models , https://arxiv.org/abs/2404.03592 5 Apr, Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data , https://arxiv.org/abs/2404.03862 5 Apr, Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation , https://arxiv.org/abs/2404.04256 8 Apr, AutoCodeRover: Autonomous Program Improvement , https://arxiv.org/abs/2404.05427 8 Apr, Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence , https://arxiv.org/abs/2404.05892 8 Apr, CodecLM: Aligning Language Models with Tailored Synthetic Data , https://arxiv.org/abs/2404.05875 9 Apr, MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies , https://arxiv.org/abs/2404.06395 9 Apr, Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models , https://arxiv.org/abs/2404.06209 9 Apr, LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders , https://arxiv.org/abs/2404.05961 10 Apr, Adapting LLaMA Decoder to Vision Transformer , https://arxiv.org/abs/2404.06773 10 Apr, Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention , https://arxiv.org/abs/2404.07143 11 Apr, LLoCO: Learning Long Contexts Offline , https://arxiv.org/abs/2404.07979 11 Apr, JetMoE: Reaching Llama2 Performance with 0.1M Dollars , https://arxiv.org/abs/2404.07413 11 Apr, Best Practices and Lessons Learned on Synthetic Data for Language Models , https://arxiv.org/abs/2404.07503 11 Apr, Rho-1: Not All Tokens Are What You Need , https://arxiv.org/abs/2404.07965 12 Apr, Pre-training Small Base LMs with Fewer Tokens , https://arxiv.org/abs/2404.08634 12 Apr, Dataset Reset Policy Optimization for RLHF , https://arxiv.org/abs/2404.08495 13 Apr, LLM In-Context Recall is Prompt Dependent , https://arxiv.org/abs/2404.08865 15 Apr, State Space Model for New-Generation Network Alternative to Transformers: A Survey , https://arxiv.org/abs/2404.09516 15 Apr, Chinchilla Scaling: A Replication Attempt , https://arxiv.org/abs/2404.10102 15 Apr, Learn Your Reference Model for Real Good Alignment , https://arxiv.org/abs/2404.09656 16 Apr, Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study , https://arxiv.org/abs/2404.10719 16 Apr, Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies , https://arxiv.org/abs/2404.08197 16 Apr, How Faithful Are RAG Models? Quantifying the Tug-of-War Between RAG and LLMs' Internal Prior , https://arxiv.org/abs/2404.10198 17 Apr, A Survey on Retrieval-Augmented Text Generation for Large Language Models , https://arxiv.org/abs/2404.10981 18 Apr, When LLMs are Unfit Use FastFit: Fast and Effective Text Classification with Many Classes , https://arxiv.org/abs/2404.12365 18 Apr, Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing , https://arxiv.org/abs/2404.12253 18 Apr, OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data , https://arxiv.org/abs/2404.12195 19 Apr, The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions , https://arxiv.org/abs/2404.13208 22 Apr, How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study , https://arxiv.org/abs/2404.14047 22 Apr, Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone , https://arxiv.org/abs/2404.14219 22 Apr, OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework , https://arxiv.org/abs/2404.14619 22 Apr, A Survey on Self-Evolution of Large Language Models , https://arxiv.org/abs/2404.14662 23 Apr, Multi-Head Mixture-of-Experts , https://arxiv.org/abs/2404.15045 23 Apr, NExT: Teaching Large Language Models to Reason about Code Execution , https://arxiv.org/abs/2404.14662 23 Apr, Graph Machine Learning in the Era of Large Language Models (LLMs) , https://arxiv.org/abs/2404.14928 24 Apr, Retrieval Head Mechanistically Explains Long-Context Factuality , https://arxiv.org/abs/2404.15574 25 Apr, Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding , https://arxiv.org/abs/2404.16710 25 Apr, Make Your LLM Fully Utilize the Context , https://arxiv.org/abs/2404.16811 28 Apr, LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report , https://arxiv.org/abs/2405.00732 30 Apr, Better & Faster Large Language Models via Multi-token Prediction , https://arxiv.org/abs/2404.19737 30 Apr, RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing , https://arxiv.org/abs/2404.19543 30 Apr, A Primer on the Inner Workings of Transformer-based Language Models , https://arxiv.org/abs/2405.00208 30 Apr, When to Retrieve: Teaching LLMs to Utilize Information Retrieval Effectively , https://arxiv.org/abs/2404.19705 30 Apr, KAN: Kolmogorov–Arnold Networks , https://arxiv.org/abs/2404.19756 1 May, Is Bigger Edit Batch Size Always Better? An Empirical Study on Model Editing with Llama-3 , https://arxiv.org/abs/2405.00664 1 May, Self-Play Preference Optimization for Language Model Alignment , https://arxiv.org/abs/2405.00675 1 May, A Careful Examination of Large Language Model Performance on Grade School Arithmetic , https://arxiv.org/abs/2405.00332 2 May, Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models , https://arxiv.org/abs/2405.01535 3 May, What Matters When Building Vision-Language Models? , https://arxiv.org/abs/2405.02246 5 May, Is Flash Attention Stable? , https://arxiv.org/abs/2405.02803 7 May, vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention , https://arxiv.org/abs/2405.04437 7 May, xLSTM: Extended Long Short-Term Memory , https://arxiv.org/abs/2405.04517 8 May, You Only Cache Once: Decoder-Decoder Architectures for Language Models , https://arxiv.org/abs/2405.05254 8 May, DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model , https://arxiv.org/abs/2405.04434 8 May, Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models , https://arxiv.org/abs/2405.05417 9 May, Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? , https://arxiv.org/abs/2405.05904 10 May, Value Augmented Sampling for Language Model Alignment and Personalization , https://arxiv.org/abs/2405.06639 12 May, PHUDGE: Phi-3 as Scalable Judge , https://arxiv.org/abs/2405.08029 13 May, RLHF Workflow: From Reward Modeling to Online RLHF , https://arxiv.org/abs/2405.07863 15 May, LoRA Learns Less and Forgets Less , https://arxiv.org/abs/2405.09673 15 May, Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model , https://arxiv.org/abs/2405.09215 16 May, Chameleon: Mixed-Modal Early-Fusion Foundation Models , https://arxiv.org/abs/2405.09818 17 May, Towards Modular LLMs by Building and Reusing a Library of LoRAs , https://arxiv.org/abs/2405.11157 19 May, SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization , https://arxiv.org/abs/2405.11582 20 May, MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning , https://arxiv.org/abs/2405.12130 22 May, Attention as an RNN , https://arxiv.org/abs/2405.13956 22 May, Dense Connector for MLLMs , https://arxiv.org/abs/2405.13800 23 May, AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability , https://arxiv.org/abs/2405.14129 23 May, SimPO: Simple Preference Optimization with a Reference-Free Reward , https://arxiv.org/abs/2405.14734 23 May, Instruction Tuning With Loss Over Instructions , https://arxiv.org/abs/2405.14394 24 May, The Road Less Scheduled , https://arxiv.org/abs/2405.15682 26 May, Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training , https://arxiv.org/abs/2405.15319 26 May, gzip Predicts Data-dependent Scaling Laws , https://arxiv.org/abs/2405.16684 27 May, Trans-LoRA: Towards Data-free Transferable Parameter Efficient Finetuning , https://arxiv.org/abs/2405.17258 28 May, VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections , https://arxiv.org/abs/2405.17991 28 May, LLaMA-NAS: Efficient Neural Architecture Search for Large Language Models , https://arxiv.org/abs/2405.18377 29 May, Contextual Position Encoding: Learning to Count What's Important , https://arxiv.org/abs/2405.18719 2 Jun, Show, Don't Tell: Aligning Language Models with Demonstrated Feedback , https://arxiv.org/abs/2406.00888 3 Jun, Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models , https://arxiv.org/abs/2406.06563 3 Jun, OLoRA: Orthonormal Low-Rank Adaptation of Large Language Models , https://arxiv.org/abs/2406.01775 3 Jun, The Geometry of Categorical and Hierarchical Concepts in Large Language Models , https://arxiv.org/abs/2406.01506 3 Jun, Towards Scalable Automated Alignment of LLMs: A Survey , https://arxiv.org/abs/2406.01252 4 Jun, Scalable MatMul-free Language Modeling , https://arxiv.org/abs/2406.02528 4 Jun, Block Transformer: Global-to-Local Language Modeling for Fast Inference , https://arxiv.org/abs/2406.02657 6 Jun, Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models , https://arxiv.org/abs/2406.04271 6 Jun, The Prompt Report: A Systematic Survey of Prompting Techniques , https://arxiv.org/abs/2406.06608 6 Jun, Transformers Need Glasses! Information Over-Squashing in Language Tasks , https://arxiv.org/abs/2406.04267 6 Jun, Are We Done with MMLU? , https://arxiv.org/abs/2406.04127 6 Jun, Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step , https://arxiv.org/abs/2406.04314 7 Jun, Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach , https://arxiv.org/abs/2406.04594 7 Jun, CRAG -- Comprehensive RAG Benchmark , https://arxiv.org/abs/2406.04744 7 Jun, WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild , https://arxiv.org/abs/2406.04770 7 Jun, Mixture-of-Agents Enhances Large Language Model Capabilities , https://arxiv.org/abs/2406.04692 7 Jun, BERTs are Generative In-Context Learners , https://arxiv.org/abs/2406.04823 7 Jun, 3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination , https://arxiv.org/abs/2406.05132 8 Jun, Creativity Has Left the Chat: The Price of Debiasing Language Models , https://arxiv.org/abs/2406.05587 10 Jun, Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation , https://arxiv.org/abs/2406.06525 10 Jun, Margin-aware Preference Optimization for Aligning Diffusion Models Without Reference , https://arxiv.org/abs/2406.06424 10 Jun, Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning , https://arxiv.org/abs/2406.06469 10 Jun, Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters , https://arxiv.org/abs/2406.05955 10 Jun, Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching , https://arxiv.org/abs/2406.06326 11 Jun, An Image is Worth 32 Tokens for Reconstruction and Generation , https://arxiv.org/abs/2406.07550 11 Jun, TextGrad: Automatic "Differentiation" via Text , https://arxiv.org/abs/2406.07496 11 Jun, Simple and Effective Masked Diffusion Language Models , https://arxiv.org/abs/2406.07524 11 Jun, Never Miss A Beat: An Efficient Recipe for Context Window Extension of Large Language Models with Consistent "Middle" Enhancement , https://arxiv.org/abs/2406.07138 11 Jun, Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling , https://arxiv.org/abs/2406.07522 12 Jun, Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing , https://arxiv.org/abs/2406.08464 12 Jun, What If We Recaption Billions of Web Images with LLaMA-3? , https://arxiv.org/abs/2406.08478 12 Jun, Large Language Model Unlearning via Embedding-Corrupted Prompts , https://arxiv.org/abs/2406.07933 12 Jun, Large Language Models Must Be Taught to Know What They Don't Know , https://arxiv.org/abs/2406.08391 12 Jun, An Empirical Study of Mamba-based Language Models , https://arxiv.org/abs/2406.07887 12 Jun, Discovering Preference Optimization Algorithms with and for Large Language Models , https://arxiv.org/abs/2406.08414 13 Jun, Transformers Meet Neural Algorithmic Reasoners , https://arxiv.org/abs/2406.09308 13 Jun, MLKV: Multi-Layer Key-Value Heads for Memory Efficient Transformer Decoding , https://arxiv.org/abs/2406.09297 13 Jun, An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels , https://arxiv.org/abs/2406.09415 13 Jun, FouRA: Fourier Low Rank Adaptation , https://arxiv.org/abs/2406.08798 14 Jun, Bootstrapping Language Models with DPO Implicit Rewards , https://arxiv.org/abs/2406.09760 14 Jun, Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs , https://arxiv.org/abs/2406.10209 14 Jun, Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs , https://arxiv.org/abs/2406.10216 16 Jun, THEANINE: Revisiting Memory Management in Long-term Conversations with Timeline-augmented Response Generation , https://arxiv.org/abs/2406.10996 17 Jun, Task Me Anything , https://arxiv.org/abs/2406.11775 17 Jun, How Do Large Language Models Acquire Factual Knowledge During Pretraining? , https://arxiv.org/abs/2406.11813 17 Jun, mDPO: Conditional Preference Optimization for Multimodal Large Language Models , https://arxiv.org/abs/2406.11839 17 Jun, Nemotron-4 340B Technical Report , https://arxiv.org/abs/2406.11704 17 Jun, DataComp-LM: In Search of the Next Generation of Training Sets for Language Models , https://arxiv.org/abs/2406.11794 17 Jun, Tokenization Falling Short: The Curse of Tokenization , https://arxiv.org/abs/2406.11687 17 Jun, DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence , https://arxiv.org/abs/2406.11931 17 Jun, Unveiling Encoder-Free Vision-Language Models , https://arxiv.org/abs/2406.11832 17 Jun, Iterative Length-Regularized Direct Preference Optimization: A Case Study on Improving 7B Language Models to GPT-4 Level , https://arxiv.org/abs/2406.11817 17 Jun, HARE: HumAn pRiors, a key to small language model Efficiency , https://arxiv.org/abs/2406.11410 17 Jun, Measuring memorization in RLHF for code completion , https://arxiv.org/abs/2406.11715 17 Jun, Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts , https://arxiv.org/abs/2406.12034 18 Jun, From RAGs to Rich Parameters: Probing How Language Models Utilize External Knowledge Over Parametric Information for Factual Queries , https://arxiv.org/abs/2406.12824 18 Jun, Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges , https://arxiv.org/abs/2406.12624 19 Jun, Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? , https://arxiv.org/abs/2406.13121 20 Jun, Instruction Pre-Training: Language Models are Supervised Multitask Learners , https://arxiv.org/abs/2406.14491 20 Jun, Can LLMs Learn by Teaching? A Preliminary Study , https://arxiv.org/abs/2406.14629 21 Jun, A Tale of Trust and Accuracy: Base vs. Instruct LLMs in RAG Systems , https://arxiv.org/abs/2406.14972 21 Jun, LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs , https://arxiv.org/abs/2406.15319 21 Jun, MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression , https://arxiv.org/abs/2406.14909 21 Jun, Efficient Continual Pre-training by Mitigating the Stability Gap , https://arxiv.org/abs/2406.14833 24 Jun, Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers , https://arxiv.org/abs/2406.16747 24 Jun, WARP: On the Benefits of Weight Averaged Rewarded Policies , https://arxiv.org/abs/2406.16768 24 Jun, Adam-mini: Use Fewer Learning Rates To Gain More , https://arxiv.org/abs/2406.16793 25 Jun, The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale , https://arxiv.org/abs/2406.17557 25 Jun, LongIns: A Challenging Long-context Instruction-based Exam for LLMs , https://arxiv.org/abs/2406.17588 25 Jun, Following Length Constraints in Instructions , https://arxiv.org/abs/2406.17744 26 Jun, A Closer Look into Mixture-of-Experts in Large Language Models , https://arxiv.org/abs/2406.18219 26 Jun, RouteLLM: Learning to Route LLMs with Preference Data , https://arxiv.org/abs/2406.18665 26 Jun, Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs , https://arxiv.org/abs/2406.18629 27 Jun, Dataset Size Recovery from LoRA Weights , https://arxiv.org/abs/2406.19395 27 Jun, From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data , https://arxiv.org/abs/2406.19292 27 Jun, Changing Answer Order Can Decrease MMLU Accuracy , https://arxiv.org/abs/2406.19470 28 Jun, Direct Preference Knowledge Distillation for Large Language Models , https://arxiv.org/abs/2406.19774 28 Jun, LLM Critics Help Catch LLM Bugs , https://arxiv.org/abs/2407.00215 28 Jun, Scaling Synthetic Data Creation with 1,000,000,000 Personas , https://arxiv.org/abs/2406.20094 1 Jul, LLM See, LLM Do: Guiding Data Generation to Target Non-Differentiable Objectives , https://arxiv.org/abs/2407.01490 1 Jul, Searching for Best Practices in Retrieval-Augmented Generation , https://arxiv.org/abs/2407.01219 1 Jul, Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models , https://arxiv.org/abs/2407.01906 1 Jul, Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion , https://arxiv.org/abs/2407.01392 1 Jul, Eliminating Position Bias of Language Models: A Mechanistic Approach , https://arxiv.org/abs/2407.01100 2 Jul, JMInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention , https://arxiv.org/abs/2407.02490 2 Jul, TokenPacker: Efficient Visual Projector for Multimodal LLM , https://arxiv.org/abs/2407.02392 2 Jul, Reasoning in Large Language Models: A Geometric Perspective , https://arxiv.org/abs/2407.02678 2 Jul, RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs , https://arxiv.org/abs/2407.02485 3 Jul, AgentInstruct: Toward Generative Teaching with Agentic Flows , https://arxiv.org/abs/2407.03502 3 Jul, HEMM: Holistic Evaluation of Multimodal Foundation Models , https://arxiv.org/abs/2407.03418 4 Jul, Mixture of A Million Experts , https://arxiv.org/abs/2407.04153 5 Jul, Learning to (Learn at Test Time): RNNs with Expressive Hidden States , https://arxiv.org/abs/2407.04620 9 Jul, Vision Language Models Are Blind , https://arxiv.org/abs/2407.06581 9 Jul, Self-Recognition in Language Models , https://arxiv.org/abs/2407.06946 10 Jul, Inference Performance Optimization for Large Language Models on CPUs , https://arxiv.org/abs/2407.07304 11 Jul, Gradient Boosting Reinforcement Learning , https://arxiv.org/abs/2407.08250 11 Jul, FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision , https://arxiv.org/abs/2407.08608 12 Jul, SpreadsheetLLM: Encoding Spreadsheets for Large Language Models , https://arxiv.org/abs/2407.09025 12 Jul, New Desiderata for Direct Preference Optimization , https://arxiv.org/abs/2407.09072 12 Jul, Context Embeddings for Efficient Answer Generation in RAG , https://arxiv.org/abs/2407.09252 15 Jul, Qwen2 Technical Report , https://arxiv.org/abs/2407.10671 15 Jul, The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism , https://arxiv.org/abs/2407.10457 15 Jul, From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients , https://arxiv.org/abs/2407.11239 16 Jul, GoldFinch: High Performance RWKV/Transformer Hybrid with Linear Pre-Fill and Extreme KV-Cache Compression , https://arxiv.org/abs/2407.12077 16 Jul, Scaling Diffusion Transformers to 16 Billion Parameters , https://arxiv.org/abs/2407.11633 16 Jul, NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window? , https://arxiv.org/abs/2407.11963 17 Jul, Patch-Level Training for Large Language Models , https://arxiv.org/abs/2407.12665 17 Jul, LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models , https://arxiv.org/abs/2407.12772 17 Jul, A Survey of Prompt Engineering Methods in Large Language Models for Different NLP Tasks , https://arxiv.org/abs/2407.12994 17 Jul, Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models , https://arxiv.org/abs/2407.12327 18 Jul, Attention Overflow: Language Model Input Blur during Long-Context Missing Items Recommendation , https://arxiv.org/abs/2407.13481 18 Jul, Weak-to-Strong Reasoning , https://arxiv.org/abs/2407.13647 18 Jul, Understanding Reference Policies in Direct Preference Optimization , https://arxiv.org/abs/2407.13709 18 Jul, Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies , https://arxiv.org/abs/2407.13623 19 Jul, BOND: Aligning LLMs with Best-of-N Distillation , https://arxiv.org/abs/2407.14622 19 Jul, Compact Language Models via Pruning and Knowledge Distillation , https://arxiv.org/abs/2407.14679 19 Jul, LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference , https://arxiv.org/abs/2407.14057 22 Jul, Mini-Sequence Transformer: Optimizing Intermediate Memory for Long Sequences Training , https://arxiv.org/abs/2407.15892 22 Jul, DDK: Distilling Domain Knowledge for Efficient Large Language Models , https://arxiv.org/abs/2407.16154 23 Jul, Generation Constraint Scaling Can Mitigate Hallucination , https://arxiv.org/abs/2407.16908 23 Jul, Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach , https://arxiv.org/abs/2407.16833 23 Jul, Course-Correction: Safety Alignment Using Synthetic Preferences , https://arxiv.org/abs/2407.16637 26 Jul, Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data? , https://arxiv.org/abs/2407.16607 28 Jul, Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge , https://arxiv.org/abs/2407.19594 29 Jul, Improving Retrieval Augmented Language Model with Self-Reasoning , https://arxiv.org/abs/2407.19813 29 Jul, Apple Intelligence Foundation Language Models , https://arxiv.org/abs/2407.21075 30 Jul, ThinK: Thinner Key Cache by Query-Driven Pruning , https://arxiv.org/abs/2407.21018 31 Jul, The Llama 3 Herd of Models , https://arxiv.org/abs/2407.21783 31 Jul, Gemma 2: Improving Open Language Models at a Practical Size , https://arxiv.org/abs/2408.00118 1 Aug, S AM 2: Segment Anything in Images and Videos, https://arxiv.org/abs/2408.00714 2 Aug, POA: Pre-training Once for Models of All Sizes, https://arxiv.org/abs/2408.01031 2 Aug, RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework, https://arxiv.org/abs/2408.01262 2 Aug, A Survey of Mamba, https://arxiv.org/abs/2408.01129 3 Aug, MiniCPM-V: A GPT-4V Level MLLM on Your Phone, https://arxiv.org/abs/2408.01800 5 Aug, RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation, https://arxiv.org/abs/2408.02545 5 Aug, Self-Taught Evaluators, https://arxiv.org/abs/2408.02666 5 Aug, BioMamba: A Pre-trained Biomedical Language Representation Model Leveraging Mamba, https://arxiv.org/abs/2408.02600 5 Aug, Self-Taught Evaluators, https://arxiv.org/abs/2408.02666 7 Aug, EXAONE 3.0 7.8B Instruction Tuned Language Model, https://arxiv.org/abs/2408.03541 7 Aug, 1.5-Pints Technical Report: Pretraining in Days, Not Months -- Your Language Model Thrives on Quality Data, https://arxiv.org/abs/2408.03506 8 Aug, Conversational Prompt Engineering, https://arxiv.org/abs/2408.04560 8 Aug, Trans-Tokenization and Cross-lingual Vocabulary Transfers: Language Adaptation of LLMs for Low-Resource NLP, https://arxiv.org/abs/2408.04303 12 Aug, The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery, https://arxiv.org/abs/2408.06292 15 Aug, Hermes 3 Technical Report, https://arxiv.org/abs/2408.12570 19 Aug, Customizing Language Models with Instance-wise LoRA for Sequential Recommendation, https://arxiv.org/abs/2408.10159 20 Aug , Enhancing Robustness in Large Language Models: Prompting for Mitigating the Impact of Irrelevant Information, https://arxiv.org/abs/2408.10615 20 Aug, To Code, or Not To Code? Exploring Impact of Code in Pre-training, https://arxiv.org/abs/2408.10914 21 Aug , LLM Pruning and Distillation in Practice: The Minitron Approach, https://arxiv.org/abs/2408.11796 22 Aug, Jamba-1.5: Hybrid Transformer-Mamba Models at Scale, https://arxiv.org/abs/2408.12570 22 Aug, Controllable Text Generation for Large Language Models: A Survey, https://arxiv.org/abs/2408.12599 23 Aug, Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time, https://arxiv.org/abs/2408.13233 26 Aug, A Practitioner's Guide to Continual Multimodal Pretraining, https://arxiv.org/abs/2408.14471 26 Aug, Building and better understanding vision-language models: insights and future directions, https://arxiv.org/abs/2408.12637 26 Aug, CURLoRA: Stable LLM Continual Fine-Tuning and Catastrophic Forgetting Mitigation, https://arxiv.org/abs/2408.14572 27 Aug, The Mamba in the Llama: Distilling and Accelerating Hybrid Models, https://arxiv.org/abs/2408.15237 28 Aug, ReMamba: Equip Mamba with Effective Long-Sequence Modeling, https://arxiv.org/abs/2408.15496 29 Aug, Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling, https://arxiv.org/abs/2408.16737 31 Aug, LongRecipe: Recipe for Efficient Long Context Generalization in Large Languge Models, https://arxiv.org/abs/2409.00509 3 Sep, OLMoE: Open Mixture-of-Experts Language Models, https://arxiv.org/abs/2409.02060 3 Sep 2024, In Defense of RAG in the Era of Long-Context Language Models, https://arxiv.org/abs/2409.01666 5 Sep, Attention Heads of Large Language Models: A Survey, https://arxiv.org/abs/2409.03752 5 Sep, LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA , https://arxiv.org/abs/2409.02897 5 Sep, How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with High-Quality Data, https://arxiv.org/abs/2409.03810 6 Sep, T heory, Analysis, and Best Practices for Sigmoid Self-Attention, https://arxiv.org/abs/2409.04431 10 Sep, LLaMA-Omni: Seamless Speech Interaction with Large Language Models, https://arxiv.org/abs/2409.06666 10 Sep, What is the Role of Small Models in the LLM Era: A Survey, https://arxiv.org/abs/2409.06857 11 Sep, Policy Filtration in RLHF to Fine-Tune LLM for Code Generation, https://arxiv.org/abs/2409.06957 16 Sep, RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval , https://arxiv.org/abs/2409.10516 18 Sep, Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement , https://arxiv.org/abs/2409.12122 18 Sep, Qwen2.5-Coder Technical Report , https://arxiv.org/abs/2409.12186 21 Sep, Instruction Following without Instruction Tuning, https://arxiv.org/abs/2409.14254 30 Sep, I s Preference Alignment Always the Best Option to Enhance LLM-Based Translation? An Empirical Analysis, https://arxiv.org/abs/2409.20059 30 Sep, The Perfect Blend: Redefining RLHF with Mixture of Judges, https://arxiv.org/abs/2409.20370 (New paper by Meta on how they did RLHF for Llama 3) 1 Oct, Addition is All You Need for Energy-efficient Language Models, https://arxiv.org/abs/2410.00907 2 Oct Quantifying Generalization Complexity for Large Language Models, https://arxiv.org/abs/2410.01769 2 Oct, When a language model is optimized for reasoning, does it still show embers of autoregression? An analysis of OpenAI o1 , https://arxiv.org/abs/2410.01792 2 Oct, W ere RNNs All We Needed? , https://arxiv.org/abs/2410.01201 3 Oct, Selective Attention Improves Transformer , https://arxiv.org/abs/2410.02703 3 Oct, LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations , https://arxiv.org/abs/2410.02707 3 Oct, LLaVA-Critic: Learning to Evaluate Multimodal Models , https://arxiv.org/abs/2410.02712 7 Oct, Differential Transformer , https://arxiv.org/abs/2410.05258 7 Oct, GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models , https://arxiv.org/abs/2410.05229 8 Oct, ARIA: An Open Multimodal Native Mixture-of-Experts Model , https://arxiv.org/abs/2410.05993 8 Oct, O1 Replication Journey: A Strategic Progress Report -- Part 1 , https://arxiv.org/abs/2410.18982 8 Oct, Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG, https://arxiv.org/abs/2410.05983 9 Oct, From Generalist to Specialist: Adapting Vision Language Models via Task-Specific Visual Instruction Tuning , https://arxiv.org/abs/2410.06456 10 Oct, KV Prediction for Improved Time to First Token , https://arxiv.org/abs/2410.08391 11 Oct, Baichuan-Omni Technical Report , https://arxiv.org/abs/2410.08565 13 Oct, MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models , https://arxiv.org/abs/2410.10139 13 Oct, LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models , https://arxiv.org/abs/2410.09732 15 Oct, AFlow: Automating Agentic Workflow Generation , https://arxiv.org/abs/2410.10762 15 Oct, Toward General Instruction-Following Alignment for Retrieval-Augmented Generation , https://arxiv.org/abs/2410.09584 21 Oct, Pre-training Distillation for Large Language Models: A Design Space Exploration , https://arxiv.org/abs/2410.16215 23 Oct, MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models , https://arxiv.org/abs/2410.17637 23 Oct, Scalable Ranked Preference Optimization for Text-to-Image Generation , https://arxiv.org/abs/2410.18013 23 Oct, Scaling Diffusion Language Models via Adaptation from Autoregressive Models , https://arxiv.org/abs/2410.17891 24 Oct, Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback , https://arxiv.org/abs/2410.19133 25 Oct, Counting Ability of Large Language Models and Impact of Tokenization , https://arxiv.org/abs/2410.19730 25 Oct, A Survey of Small Language Models , https://arxiv.org/abs/2410.20011 26 Oct, Accelerating Direct Preference Optimization with Prefix Sharing , https://arxiv.org/abs/2410.20305 27 Oct, Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse , https://arxiv.org/abs/2410.21333 28 Oct, LongReward: Improving Long-context Large Language Models with AI Feedback , https://arxiv.org/abs/2410.21252 28 Oct, ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference , https://arxiv.org/abs/2410.21465 29 Oct, Beyond Text: Optimizing RAG with Multimodal Inputs for Industrial Applications , https://arxiv.org/abs/2410.21943 30 Oct, CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation , https://arxiv.org/abs/2410.23090 31 Oct, What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective , https://arxiv.org/abs/2410.23743 31 Oct, GPT or BERT: why not both? , https://arxiv.org/abs/2410.24159 31 Oct, Language Models can Self-Lengthen to Generate Long Texts , https://arxiv.org/abs/2410.23933 1 Nov, Adding Error Bars to Evals: A Statistical Approach to Language Model Evaluations , https://arxiv.org/abs/2411.00640 1 Nov 2024, Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation , https://arxiv.org/abs/2411.00412 1 Nov 2024, Multi-expert Prompting Improves Reliability, Safety, and Usefulness of Large Language Models , https://arxiv.org/abs/2411.00492 3 Nov, S ample-Efficient Alignment for LLMs , https://arxiv.org/abs/2411.01493 4 Nov 2024, A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness , https://arxiv.org/abs/2411.03350 4 Nov, "Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization , https://arxiv.org/abs/2411.02355 4 Nov, Parameter-Efficient Fine-Tuning of Large Language Models for Unit Test Generation: An Empirical Study , https://arxiv.org/abs/2411.02462 5 Nov, HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems , https://arxiv.org/abs/2411.02959 6 Nov, Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination , https://arxiv.org/abs/2411.03823 6 Nov, Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding , https://arxiv.org/abs/2411.04282 6 Nov, Number Cookbook: Number Understanding of Language Models and How to Improve It , https://arxiv.org/abs/2411.03766 7 Nov, Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models , https://arxiv.org/abs/2411.04996 7 Nov, BitNet a4.8: 4-bit Activations for 1-bit LLMs , https://arxiv.org/abs/2411.04965 7 Nov, Scaling Laws for Precision , https://arxiv.org/abs/2411.04330 8 Nov, Energy Efficient Protein Language Models: Leveraging Small Language Models with LoRA for Controllable Protein Generation , https://arxiv.org/abs/2411.05966 8 Nov, Balancing Pipeline Parallelism with Vocabulary Parallelism , https://arxiv.org/abs/2411.05288 11 Nov, Toward Optimal Search and Retrieval for RAG , https://arxiv.org/abs/2411.07396 12 Nov, Large Language Models Can Self-Improve in Long-context Reasoning , https://arxiv.org/abs/2411.08147 12 Nov, Stronger Models are NOT Stronger Teachers for Instruction Tuning , https://arxiv.org/abs/2411.07133 12 Nov, Direct Preference Optimization Using Sparse Feature-Level Constraints , https://arxiv.org/abs/2411.07618 13 Nov, Cut Your Losses in Large-Vocabulary Language Models , https://arxiv.org/abs/2411.09009 15 Nov, Does Prompt Formatting Have Any Impact on LLM Performance? , https://arxiv.org/abs/2411.10541 17 Nov, SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization , https://arxiv.org/abs/2411.11909 17 Nov, SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration , https://arxiv.org/abs/2411.10958 18 Nov, Bi-Mamba: Towards Accurate 1-Bit State Space Models , https://arxiv.org/abs/2411.11843 19 Nov, RedPajama: an Open Dataset for Training Large Language Models, https://arxiv.org/abs/2411.12372 20 Nov, Hymba: A Hybrid-head Architecture for Small Language Models , https://arxiv.org/abs/2411.13676 20 Nov, Loss-to-Loss Prediction: Scaling Laws for All Datasets , https://arxiv.org/abs/2411.12925 21 Nov, When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training , https://arxiv.org/abs/2411.13476 21 Nov, Multimodal Autoregressive Pre-training of Large Vision Encoders , https://arxiv.org/abs/2411.14402 21 Nov, Natural Language Reinforcement Learning , https://arxiv.org/abs/2411.14251 22 Nov, Large Multi-modal Models Can Interpret Features in Large Multi-modal Models , https://arxiv.org/abs/2411.14982 22 Nov, TÜLU 3: Pushing Frontiers in Open Language Model Post-Training , https://arxiv.org/abs/2411.15124 23 Nov, MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs , https://arxiv.org/abs/2411.15296 24 Nov, LLMs Do Not Think Step-by-step In Implicit Reasoning , https://arxiv.org/abs/2411.15862 25 Nov, O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson? , https://arxiv.org/abs/2411.16489 26 Nov, Star Attention: Efficient LLM Inference over Long Sequences , https://arxiv.org/abs/2411.17116 27 Nov, Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens , https://arxiv.org/abs/2411.17691 27 Nov, Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration , https://arxiv.org/abs/2411.17686 29 Nov, Reverse Thinking Makes LLMs Stronger Reasoners , https://arxiv.org/abs/2411.19865 29 Nov, Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability , https://arxiv.org/abs/2411.19943 2 Dec, Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis , https://arxiv.org/abs/2412.01819 2 Dec, X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models , https://arxiv.org/abs/2412.01824 2 Dec, Free Process Rewards without Process Labels , https://arxiv.org/abs/2412.01981 3 Dec, Scaling Image Tokenizers with Grouped Spherical Quantization , https://arxiv.org/abs/2412.02632 3 Dec, RARE: Retrieval-Augmented Reasoning Enhancement for Large Language Models , https://arxiv.org/abs/2412.02830 4 Dec, Perception Tokens Enhance Visual Reasoning in Multimodal Language Models , https://arxiv.org/abs/2412.03548 4 Dec, Evaluating Language Models as Synthetic Data Generators , https://arxiv.org/abs/2412.03679 4 Dec, Best-of-N Jailbreaking , https://arxiv.org/abs/2412.03556 4 Dec, PaliGemma 2: A Family of Versatile VLMs for Transfer , https://arxiv.org/abs/2412.03555 5 Dec, VisionZip: Longer is Better but Not Necessary in Vision Language Models , https://arxiv.org/abs/2412.04467 5 Dec, Evaluating and Aligning CodeLLMs on Human Preference , https://arxiv.org/abs/2412.05210 6 Dec, MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale , https://arxiv.org/abs/2412.05237 6 Dec, Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling , https://arxiv.org/abs/2412.05271 7 Dec, LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods , https://arxiv.org/abs/2412.05579 8 Dec, Does RLHF Scale? Exploring the Impacts From Data, Model, and Method , https://arxiv.org/abs/2412.06000 9 Dec, Unraveling the Complexity of Memory in RL Agents: An Approach for Classification and Evaluation , https://arxiv.org/abs/2412.06531 9 Dec, Training Large Language Models to Reason in a Continuous Latent Space , https://arxiv.org/abs/2412.06769 9 Dec, AutoReason: Automatic Few-Shot Reasoning Decomposition , https://arxiv.org/abs/2412.06975 11 Dec, Large Concept Models: Language Modeling in a Sentence Representation Space , https://arxiv.org/abs/2412.08821 12 Dec, Phi-4 Technical Report , https://arxiv.org/abs/2412.08905 13 Dec, Byte Latent Transformer: Patches Scale Better Than Tokens , https://arxiv.org/abs/2412.09871 13 Dec, SCBench: A KV Cache-Centric Analysis of Long-Context Methods , https://arxiv.org/abs/2412.10319 13 Dec, Cultural Evolution of Cooperation among LLM Agents , https://arxiv.org/abs/2412.10270 13 Dec, DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding , https://arxiv.org/abs/2412.10302 16 Dec, No More Adam: Learning Rate Scaling at Initialization is All You Need , https://arxiv.org/abs/2412.11768 16 Dec, Precise Length Control in Large Language Models , https://arxiv.org/abs/2412.11937 16 Dec, The Open Source Advantage in Large Language Models (LLMs) , https://arxiv.org/abs/2412.12004 16 Dec, A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges , https://arxiv.org/abs/2412.11936 17 Dec, Are Your LLMs Capable of Stable Reasoning? , https://arxiv.org/abs/2412.13147 18 Dec, LLM Post-Training Recipes, Improving Reasoning in LLMs , https://arxiv.org/abs/2412.14135 18 Dec, Hansel: Output Length Controlling Framework for Large Language Models , https://arxiv.org/abs/2412.14033 18 Dec, Mind Your Theory: Theory of Mind Goes Deeper Than Reasoning , https://arxiv.org/abs/2412.13631 18 Dec, Alignment Faking in Large Language Models , https://arxiv.org/abs/2412.14093 18 Dec, SCOPE: Optimizing Key-Value Cache Compression in Long-Context Generation , https://arxiv.org/abs/2412.13649 19 Dec, LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-Context Multitasks , https://arxiv.org/abs/2412.15204 20 Dec, Offline Reinforcement Learning for LLM Multi-Step Reasoning , https://arxiv.org/abs/2412.16145 24 Dec, Mulberry: Empowering MLLM with O1-like Reasoning and Reflection via Collective Monte Carlo Tree Search , https://arxiv.org/abs/2412.18319 31 Dec, Titans: Learning to Memorize at Test Time , https://arxiv.org/abs/2501.00663

0 views
Ginger Bill 1 years ago

OpenGL is not Right-Handed

The original Twitter thread: https://x.com/TheGingerBill/status/1508833104567414785 I have a huge gripe when I read articles/tutorials on OpenGL: most people have no idea what they are talking about when it comes to coordinate systems and matrices. Specifically: OpenGL is NOT right-handed; the confusion over column-major “matrices”. Let’s clear up the first point. Many people will say OpenGL uses a right-handed coordinate system. Loads of articles/tutorials will keep repeating the view that OpenGL uses a right-handed coordinate system. So the question is, why do people think this? Modern OpenGL only knows about the Normalized Device Coordinates (NDC) , which is treated as if it is a left-handed 3D coordinate space. This means that OpenGL “is” left-handed, not right-handed as many articles will tell you. The origin of “right-handed” comes from the fixed-function days, where Z entries in the functions and were negated, which flips the handedness. Back in those days, it forced users to pretty much had to use a right-handed world space coordinate system. The Z-axis flip is to convert from the conventional world/entity space to the left handed Normalized Device Coordinates (NDC). And modern OpenGL only knowns the NDC. PLEASE READ THE SPEC BEFORE MAKING TUTORIALS! Now for the “column-major” part. This part is actually overloaded and means two things: what is the default vector kind (column-vector vs row-vector), and what is the internal memory layout of a matrix (array-of-column-vectors vs array-of-row-vectors). A good article on this has been written by @rygorous regarding this so I won’t repeat it too much. Row major vs. column major, row vectors vs. column vectors . It works because (A B)^T = B^T A^T (where T is the transpose). But where does this difference in convention come from? My best hypothesis is the physics vs mathematics distinction (just like everything else). In physics you default to using column-vectors and in mathematics you default to using row-vectors. It’s just a convention. It’s just another distinction between OpenGL and Direct3D which makes very little difference at the end of the day, especially since in both GLSL and HLSL, you can specify whether a matrix is or if necessary. I personally prefer column-vectors because of my background in physics but all that is important is that you are consistent/coherent in your codebase, and make sure that all conventions are make extremely clear: handedness, axis direction, “Euler”-angle conventions, units, etc. GLSL also doesn’t help it in that the matrix types are written “backwards” to most other conventions. in GLSL means a 3 row by 2 column matrix. GLSL does not work for this well known mnemonic: Then there is the other related issue of having matrices that are stored in the array-of-column-vector layout: many programming languages don’t have a built-in concept of a matrix & an array-of-row-vectors will be easier to write as an array-of-column-vectors because it is text. It is common to write a matrix in a language like the following and if you are trying to adhere to “column-major” memory layout, then you will have to write it down transposed compared to how you would write it down on paper. The Odin programming language has both built-in array programming (vectors & swizzling) and matrices, and as a result, this textual issue is completely removed! It even allows you to treat vectors as if they are column-vectors or row-vectors, and things will “just work”! You can even specify the memory layout of the matrices if needed too with (the default) or . It “was” true. It distinguishes it from Direct3D. People just repeat things w/o understanding it. Most who use OpenGL don’t really know anything about the GPU nor what the NDC is. (A, B) x (B, C) = (A, C) e.g. (3 by 4) x (4 by 1) = (3 by 1)

0 views
Ginger Bill 1 years ago

String Type Distinctions

[Originally from a Twitter Thread] Original Twitter Post One thing many languages & API designers get wrong is the concept of a string. I try to make a firm distinction between: They are not equivalent even if you can theoretically use them as such, and so many garbage collected language use them as such. They have different use cases which don’t actually overlap in practice. Most of the issues with strings come from trying to merge concepts into one. In Odin , the distinction between a string value and byte array is very important. is semantically a string and not an array of 8-bit unsigned integers. There is an implied character encoding (UTF-8) as part of the value. is also an immutable value in Odin. Having a string be immutable allows for a lot of optimizations, but in practice, you never want to mutate the string value itself once it has been created. And when you do mutate it, it most definitely a bug. This is why it is important to make a distinction between #1 and #3 and separate the concepts. Another way to conceptualize the ideas is as the following: Coupled with Run-Time Type Information (RTTI), having a distinction between []byte and string allows for a lot of really decent (de)serialization tooling, especially for “magical” printing (e.g. fmt.println). P.S. Even C makes a distinction between a string an array of integers string value ( or ) string builder ( or ) Backing buffer for a string ( or ) (3) is the “backing data”, an arena of sorts (fixed or dynamic) (2) are the operations on that buffer (fixed or dynamic) (1) is the final value that points to (3) and produced by (2)

0 views
Ginger Bill 1 years ago

The Video That Inspired Me To Create Odin

[Originally from a Twitter Thread] Original Twitter Post Many people may not know this but this video by Sean Barrett @nothings is partially the reason why I made the Odin programming language . And I’ll explain what insights it gave me in this thread 🧵. A lot of these seem “so obvious” but for some reason it never clicked to me before this video. A lot of the “higher level” “scripting” languages are not that much higher level than C. Those languages just have garbage collection and much better standard libraries than C. Those languages are “easier to use” than C just because they have loads of built-in data structures like arrays, hash maps, and built-in syntax for them (how “extravagant”, I know). I was already well aware of the STB libraries that Sean developed, and the single-header-file style too of them (which is mostly a cope for C’s flaws), but it never clicked to me that the library that Sean used the most was the “stb.h” one. This lead me to organize code I had used in previous projects already and put that into a single and easy to use library which I could use in any of my C projects going forward in 2015/2016: https://github.com/gingerBill/gb/blob/master/gb.h However by around March, I began experimenting with just making an “augmented” C compiler so that I could add constructs to C which I found to be some of the most annoying things C lacked. The first, and most important, two constructs were: I had already been using in C++ for many years already. I really wished the concept was native to C. (My opinion has now changed now for C for many reasons). P.S. I wrote an article about what I had been doing back in 2015 for defer in C++11 Eventually I realized I couldn’t just add constructs and features to C to “improve” my personal projects, and that any of my professional projects could not use them either. This then lead me to starting my own compiler to fix those issues I had with C. Interestingly, “Odin” (which was the original codename which just stuck) started life as a Pascal-like language with begin/end too, but I was trying to make it “C-like” enough for what I wanted. About 3 months later, I had implemented ~70% of Odin; the other ~30% took 7 years. Odin wasn’t my first programming language that I had designed nor implemented, but it was the first where I actually went “you know what?—this will actually be useful for my general needs”. And now I use it for my living at JangaFX , where all of our products are written in it. Other than slices and , the other things I did for Odin were to solve a few practical issues I had in C: I won’t discuss the benefits of slices here but I will discuss the latter 3 things of that list. In C, pretty much my only use case for variadic parameters were -style things. Adding a decent form of that + coupled with RTTI, allowed me to have a runtime-typesafe . As for “actual libraries”, they need their own namespace. When the user imports it, he can change import name itself, rather than relying on the library to prefix everything (or C++ namespace) to minimize namespace collisions. This eventually lead to Odin’s package system . I highly recommend everyone to watch this talk by Sean Barrett! It was a huge inspiration for me, and just a general eye-opener for something which should have been obvious to me at the time but wasn’t. A slice-based type Decent variadic parameters (a slice-based one) RTTI (not bug-prone format verbs in printf, and not my janky generated stuff) Actual libraries

0 views
Ginger Bill 3 years ago

Reverse Engineering Alembic

For my work at JangaFX , we require the use of the Alembic interchange file format. We have been using other libraries which wrap reading the Alembic file format but because it is not the native library, it has numerous issues due to the generic interface. I spent nearly 4 days trying to get the official Alembic C++ API, https://github.com/alembic/alembic/ , to compile correctly and then use the API itself. Numerous times the compilation would get corrupted (it compiled but none of the tests even ran) and when I got it work (on another machine), the API itself was a labyrinth to navigate. After this ordeal I decided to see how difficult it would be to create my own reader, and then writer, for the format from scratch. I don’t really like to “reinvent the wheel” when I don’t have to, but in this case, the C++ API was an absolute nightmare and I’ve spent more time trying to get that to work rather than actually solving the problem I had. Making my own library for Alembic turned out to be a little more work than I expected. Even though it is an “open source” format, it is effectively completely undocumented, and the bits that are documented refer mostly to specific (not all of them) schemas rather than the internal format. This article will be a basic document of the internals of the Alembic file format. Through my journey into “discovering” the Alembic format, I found out that Alembic is not actually a file format but masquerades as one. It is in fact two file formats with different memory layouts which can be determined based on the magic file signature : HDF5 is a hierarchical data format which is commonly used to store and organize large amounts of data in a hierarchical fashion. It’s commonly used in the scientific field rather than the visual effects industry, and as such, this internal format is rarely used for storing data that would be useful for our tasks for importing (e.g. meshes, cameras, animations, etc). HDF5 is effectively a storage format for database-like data, it’s a very good format in terms of usage (I will admit I have no idea about its internal design). It appears that the vast majority of “Alembic” files are not in the HDF5 format but in Ogawa. The main format of concern is named Ogawa . It’s a little-endian binary format (thank goodness) which was designed to be readable in-place for efficient multi-threaded data reading. This part of the file format is luckily documented 2 , and small enough that I could write it in a single tweet . Similar to HDF5, Ogawa is a hierarchical data format that is simple to read, but differing from HDF5, it is completely uncompressed. Note: When decoding binary formats, it is very useful if have a hex-viewer to see the bytes in a readable way. On Windows, I would recommend either Hex Editor Neo or 010 Editor . Even though the Ogawa format is mostly documented on that GitHub Wiki page, I still required a hex-viewer from time to time to ensure everything was laid out as I expected, especially for the more complex embedded data. Its header is very simple: The magic signature is just 5 bytes containing the ASCII characters , followed by a byte-wide flag which states whether the file is open or closed to writing, followed by 2 bytes specifying the version of the format (of which will always be ), and finally followed by an offset to the root “group”. The is there to allow writing to the file and prevent other applications from trying to read it whilst it is not finished. It begins off as and then once the file is finished, it becomes . For my Alembic writer, I was storing the data in a memory buffer and then writing the entire data to the file at once, meaning this byte-wide flag was not that useful. All grouped-based offsets in the Ogawa format are stored in a 64-bit unsigned little endian integer ( ) representing the number of bytes from the base of the file 3 . These group-based offsets come in two different flavours: an Ogawa Group or an Ogawa Data (byte stream) . The offset encodes a flag in its highest bit (63rd bit) to indicate which flavour: Group if 0 and Data if 1. The remaining 63-bits represent the offset to this specific kind of value; if the remaining 63-bits are all zero, it means it is a terminating value (not pointing to any actual memory). A Group begins with a representing the number of children following it in memory (of which all are another offset, ). If the value is 0, this represents an empty group that has no children. A Byte-Stream begins with a representing the number of bytes following it in memory. This basic file format is very simple to read but it has zero semantic information to make it useful for something like an interchange format. The semantic structure is what I am calling the Meta-Schema for Alembic . To begin decoding the format, I created a basic Ogawa to JSON viewer to see the tree structure of the file format. As I began to understand more of the format, the JSON structure became much more complex but richer in semantic data. Note: I highly recommend converting any hierarchical format into something like JSON just as a debug view alone, since there are many tools such as Dadroit that allow for easy viewing. But if you want to make your own format and viewer with your own tools, go ahead! After a bit of exploring, the meta-schema begins with the following data for the root group: From this point on, it can be assumed that a lot of the data is unaligned to its natural alignment and must be read accordingly. This data is only stored at the root group (child ) and referenced elsewhere by index. Each value containing the samples is stored in contiguous memory: This memory needs to be read in order in order to determine the number of samples. All metadata is stored as a string. Each entry is a key and value separated by the ASCII character , and each entry is separated by . Deserializing this is a very simple operation to do in most languages. These strings are NOT NUL terminated like in C because they are bounded by a length that usually prefixes it. As I am using Odin , its type is length-bounded making this process a lot easier to use compared to C. The indexed metadata is only stored at the root group (child ) and referenced elsewhere by index. Each value is stored in contiguous memory: The maximum number of index metadata entries allowed is ( ). The root group stores an offset to the first group (child ) representing Object Data . Object data contains a list of the nested children as well properties for its “this” Object . The Object Header are stored in contiguous memory for each entry but depending on the value of its (see below) will determine whether the metadata value is stored inline or stored in the indexed metadata at the root group. There are three different forms of properties that may be stored within Alembic: Scalar, Array, and Compound. Child of the Object Data group must represent a Compound property data. Compound property data is a group which stores children for properties. It’s effectively a hash map (like a JSON object) containing a keys, properties, and the values of more property data. In the compound property headers byte-stream, it contains a list of the nested of each property header. n.b. The property headers are the weirdest aspect of the entire format and not simply expressible in terms of basic data structures and require a lot of logic for the encoding. is an which encodes information about the property: For the property headers containing inline metadata: Scalar property data is constant if in the property header. Number of samples is equal to the . The indexed samples are stored in the children groups on the property data stored beginning at the relative offset 0. Depending on the whether the samples are constant or not, the actual index needs to be mapped using the following procedure: The data for each sample contains both the key/hash of the data and the uncompressed data. The hash is stored at the beginning; is represented as a 16-byte (128 bit) SpookyHash . The data after that is stored after this hash. Array property data is constant if in the property header. Number of samples is equal to the . Indexing into the samples is a little more complex than scalar data. The number groups stored for the sample data is doubled up, where the pair of groups represent the sample data and then the data for the dimensions. The sample data is similar to that of the scalar but the extra group that represents the dimensions is an array of values which represent the tensor shape of the data. Most of the time, the dimensions is just a single element represent the number of elements in the array. The data group has the same kind of hash prefixing the data (using SpookyHash). I will discuss the schemas of Alembic at a later date since they could fill their own article. For the need we have, we only really care about reading in Cameras and Xforms (Transforms), and writing out Particles/Points. I think the general format Ogawa format is absolute fine but because of its lack of semantic structure, it requires applying one to it. Even JSON has some semantic meaning by virtue of being a text-format (of which I have been using as a format to display to debug the Ogawa data). Note: I would never recommend using JSON when another file format is better fit (especially if you have the luxury of using a custom format), JSON is too general for the vast majority of use cases. If you have the choice of using an interchange format, always use a binary one (even a custom one) since it is pretty much never the case you will need to modify the file in a text editor. The design of the property header is very strange to me. It seems like it is trying to be as compact as possible but the reset of the file format is uncompressed. I cannot see why this bit would need to be compressed/compacted when the property data it refers to is uncompressed. The size hint bit is really bizarre and I personally would have just stuck with a single integer type (i.e. ) to keep things better. The use of indexed metadata seems a bit bizarre since because the entire format is offset-based, the indexed stuff could have pointed to the same memory address to save memory. I know this means that the hierarchy is now not strict, but for metadata, I personally don’t think it would be much of an issue. Speaking of the offset-based file formats, one potential issue for malicious files is form cycles in the hierarchy, which could cause many readers to crash. I am not sure why Spooky-Hash was used. It is not really an easy hashing algorithm to implement (~330 LOC of Odin) and not necessarily better than other hashes out there. It is also not that commonly used compared to simpler (and maybe better) hashes. After a lot of work, I got a fully functioning Alembic reader and writer that was ~2000 LOC of Odin (including the necessary schemas we required at JangaFX ). This is a hell of a lot smaller and simpler than the official Alembic C++ API for the same amount of functionality. Technically it’s Ogawa-Flavour Alembic with its own specific meta-schema, of which I will get to that later in the article.  ↩︎ https://github.com/alembic/alembic/wiki/Ogawa-Specification   ↩︎ I’m a huge fan of file binary formats designed this way where they use effectively internal relative pointers , of which I should explain my preferred approach to designing binary file format in the future.  ↩︎ 6 children minimum Every file that I tested always began with 6 children Child is a byte-stream which stores an containing something akin to the archive version (always appeared to ) Child is a byte-stream which stores an containing the file version which must be From the reverse engineering, it appears that the value stored in decimal represents the file format: e.g. represents Child is a group to the first Objects based at the root Child is a byte-stream which stores the file’s metadata Child is a byte-stream which stores the time samples and max samples information Child is a byte-stream which stores indexed metadata which nested properties may use later In my Alembic writer, I never used any indexed metadata and just did everything inline since it was easier to handle Child stores the data for the properties in the form of Compound Property Data Children stores the objects Child stores the data for the child Object Headers If is equal to , the metadata is stored directly after it If is any other value, it represents the index to the metadata stored at the root group (this value must be less than the number of index metadata entries otherwise it is a invalid file) Children represent the nested property data which is denoted by the property headers Child is a byte-stream which contains the compound property headers Bits : represents the property type: = Bits : represents the size hint = Bits : represents the plain old data type: = (byte) = string I am yet to see this in the wild yet Bit states whether the and is set for a Compound type Bit if set, and equal If bits and are not set, , Bit states if the property data is homogenous Bits : represents the “extent” of the value. e.g. Bits : represents the . This follows the same rules as Object Headers If is equal to , the metadata is stored directly after it If is any other value, it represents the index to the metadata stored at the root group (this value must be less than the number of index metadata entries otherwise it is a invalid file) Bits : I have no idea what these represent, if anything Technically it’s Ogawa-Flavour Alembic with its own specific meta-schema, of which I will get to that later in the article.  ↩︎ https://github.com/alembic/alembic/wiki/Ogawa-Specification   ↩︎ I’m a huge fan of file binary formats designed this way where they use effectively internal relative pointers , of which I should explain my preferred approach to designing binary file format in the future.  ↩︎

0 views
Ginger Bill 4 years ago

Multiple Return Values Research

I have recently been thinking about multiple return values as a concept, and wondering if there has been any kind of literature into the topic of “true” multiple return values which are not emulated through tuples . My current working hypothesis is that I think I have come to the conclusion (unless there is evidence to the contrary) Odin has invented a new kind of type system, something to the akin of polyadic expressions . It appears that Go and Odin are the only ones with what I call “true” multiple return values, but Odin’s approach is a lot more sophisticated than Go’s and extends the concept further. Odin’s idea is pretty simple: an expression may have n -values, where n is a natural number ≥0 . Meaning an expression may have 0, 1, 2, 3, 4, etc values, of which each value has a type, not that each expression has a type. An expression with 0-values has no (final) type to it (not even like in some languages). A way of thinking about this is that Odin is fundamentally a polyadic language rather than a monadic language, like pretty much everything else. However most languages seem to be polyadic in one aspect: input parameters to procedures/functions. In most languages, you have multiple (monadic) inputs but only one (monadic) output, e.g. ( ). Odin’s approach is to just extend this to the rest of the language’s type system, not just procedure inputs. In languages with tuples , they can emulate the ability to have multiple return values, especially if the language has sugar encouraging it. Let’s take Python as this basic example where tuples can be used to emulate multiple return values: Other languages may not use tuples to achieve multiple return values; languages such as Common Lisp and Lua do things slightly differently by pushing multiple values BUT the handling of the number of values is handled dynamically rather than statically: This is similar to in Common Lisp, Scheme, etc, of which it can suffer from exactly the same issues in terms of dropping of value bugs etc, especially due to its dynamic typing approach. A good example of this is to show that is the same as in Common Lisp. However, dynamically typed languages are not my area of research for this specific question, statically and strongly typed ones are. And in that case, all of the languages that allow for something like multiple return values (except for Odin and Go) use tuples. However tuples can still be passed around as a singular/monadic intermediary value. Tuples are effectively a form of monad which wrap values of (possibly) different types (essentially an ad-hoc record/struct type), which some languages may allow for explicit or implicit tuple destruction. Odin does differ from Go on multiple return values in one important aspect: Go’s approach is a bit of hack which only applies in the context of variable assignments. The following is valid in Odin but not valid in Go: The reason being is that Go’s approach for multiple returns only works in the context of assignment, and the assignment assumes that there is either N-expressions on the RHS or 1-expression on the RHS to match the N-variables on the LHS. Odin’s approach is a little more sophisticated here, the assignment assumes that there are N-values on the RHS for each of the N-variables on the LHS; where the N-values may be made from M-expressions (N ≥ M). Another fun thing about Odin because of this fact is that you can do the following (which is also not possible in Go): It also means something like this is possible in Odin: Internally, Odin’s approach could be implemented identically to as if a procedure was an input tuple + an output tuple and then automatically destructed, but semantically Odin does not support first-class tuples. n.b. For this research, implementation details are not of concern, only usage/grammar. Polyadic expressions are also used in another areas of the Odin programming language. One of the most common is the optional-ok idiom 1 : I am wondering if this approach to multiple return values is all Rob Pike ’s fault. Since the only languages I can find that are like what I speak of are: Alef and Limbo had multiple return values through tuples, but tuples were still not really first-class data types in those languages. In many ways, Alef and Limbo were the main precursors to Go (not just Oberon and Modula ). Limbo’s declaration syntax comes from Newsqueak which is also the precursor to Odin’s very own declaration syntax 2 . So maybe the reason there is little literature (that I can fine) into this topic is purely because it is limited to languages related-to/influenced-by Rob Pike. It might be the approach taken by Odin may be extremely unique to Odin, however it is also one of the my favourite things about Odin too. And I am really glad I added it so early on, since it has shown its utility really quickly. Coupled with named return values , bare statements, and , it is an absolute pleasure to work with. If anyone has any more information about this topic, please feel free to contact me with it! P.S. If there is no research done into this area, it is a good sign since there is so much left to discover in Programming Language Theory. This is technically 3 different addressing modes depending on the context ( map-index, optional-ok, or optional-ok-ptr )  ↩︎ https://www.gingerbill.org/article/2018/03/12/on-the-aesthetics-of-the-syntax-of-declarations/   ↩︎ Alef , Limbo , Go (Rob Pike related, the first two being non-destructing tuples) Odin ( me , who has been influenced by many of Pike’s languages) This is technically 3 different addressing modes depending on the context ( map-index, optional-ok, or optional-ok-ptr )  ↩︎ https://www.gingerbill.org/article/2018/03/12/on-the-aesthetics-of-the-syntax-of-declarations/   ↩︎

0 views
Ginger Bill 4 years ago

The Value Propagation Experiment Part 2

[Originally from a Twitter Thread] I have revisited The Value Propagation Experiment and have come to a different conclusion as to why it failed and how I recovered it. The recovery work has been merged into master now with this PR: https://github.com/odin-lang/Odin/pull/ 1082 I think there were three things which were confusing which make it look like a failure of an experiment: Changes in the PR: Most people understood what the purpose and semantics pretty intuitively but many were very confused regarding the semantics of . My conclusion was that the keyword-name and positioning were the main culprits for that. Another aspect which I stated in the original article was that I thought the construct was not as common as I originally thought. And for my codebases (including most of Odin’s core library), this was true. However it was false for many codebases, including the new core:math/big . In the core:math/big package alone, or_return was able to replace 350+ instances of idioms. The more C-like a library is in terms of design, the more it required this construct. It appears that when a package needs it, it REALLY needs it. When this correction was made, there were 350+ instances of in a ~3900 LOC is ~9% of (non-blank) lines of code. That’s a useful construct for definite. Another (smaller) package that found or_return useful was the new core:encoding/hxa, which is an Odin native implementation of @quelsolaar ’s HxA format. https://github.com/quelsolaar/HxA The entire implementation for reading and writing is 500 LOC, of which there are 32 s. I do believe that my general hypotheses are still correct regarding exception-like error value handling. The main points being: The most important one is the degenerate state issue, where all values can degenerate to a single type. It appears that you and many others pretty much only want to know if there was an error value or not and then pass that up the stack, writing your code as if it was purely the happy path and then handling any error value. Contrasting with Go, Go a built-in concept of an interface, and all values degenerate to this interface. In practice from what I have seen of many Go programmers, most people just don’t handle error values and just pass them up the stack to a very high place and then pretty much handle any error state as they are all the same degenerate value: “error or not”. This is now equivalent to a fancy boolean. I do hope that this addition of does aid a lot of people when making projects and designing packages. It does appear to be very useful already for many of the core library package developers for Odin. was a confusing name for what the semantics were. as a prefix was the wrong place. Unifying with was a bad idea for many reasons. Built-in procedure became a binary operator instead (meaning it became a keyword). was added as a keyword. became a suffix-operator/atom-expression. Error value propagation ACROSS library boundaries Degenerate states due to type erasure or automatic inference Cultural lack of partial success states

0 views
Ginger Bill 4 years ago

The Value Propagation Experiment

[Originally from a Twitter Thread] Part 2 of this Experiment I recently experimented with adding a feature into Odin which allowed for a way to propagate a value by early returning if that value was or not . It was in a similar vein to Rust’s which became , or Zig’s , etc. I have now removed it from Odin. But why? The hypothesis was that that this idiom was common: where may be an enum, a (discriminated) union, or any other kind of value that has . And replace it with 1 This construct solves a very specific kind of error handling, of which optimizes for typing code rather than reading code. It also fails because Odin (and Go) are languages with multiple return values rather than single-type returns. And the more I think about it, the and similar stuff may be the least worst option, and the best in terms of readability. It’s a question of whether you are optimizing for reading or typing, and in Odin, it has usually been reading. And something like instead of does reduce typing but is a lot harder to catch (even with syntax highlighting). It happens that Go already declined such a proposal for numerous reasons. And the research done for this is directly applicable to Odin because both languages share the multiple return value semantics. The research has been fruitful however. I did experiment with a construct which has now become a built-in procedure which can be used on things with an optional-ok check e.g. map indices, type assertions Some people may be a little surprised with my experimentation with this exception-like shorthand with error values. Especially since I wrote an article (which was originally two github comments) titled: Exceptions — And Why Odin Will Never Have Them . One thing I did not comment on in the that article is the cause of the problem (other than the cultural issues). My hypothesis is that if you have a degenerate type ( type erasure or automatic inference), then if a value can convert to it implicitly (easily), people will (ab)use it. So in languages with exceptions, all exception values can degenerate to the “base type”. In Rust, it can either go to the base trait or be inferred parametrically. In Zig it can either do or it will infer the error set from usage. Go has the built-in interface type which acts as the common degenerate value. As I discuss in the article, I am not against error value propagation within a library, but I am pretty much always against it across library boundaries. A degenerate state has high entropy and a lack of specific information. And due to this form of type erasure, “downcasting” (broad use of term) is a way to recover the information, but it assumes implicit information which is not known in the type system itself. The other issue when people pass the error up the stack for someone else to handle (something I criticize in the previous article already) is that it’s common to see this in many codebases already that have such a type: Go, Rust, and Zig (public) codebases exhibit this a lot. And my hypothesis for this phenomenon is due to the very nature of this “degenerative type”. Now a design judgement is to be made when designing a language: is such a concept worth it for the problems it intrinsically has. For Odin, I do not think it was worth it. In Odin, errors are just values, and not something special. For other languages? That’s another thing. For Odin, I have found that having an error value type defined per package is absolutely fine (and ergonomic too), and minimizes, but cannot remove, the problem value propagation across library boundaries. was a bad idea for Odin consider the rest of its semantics (multiple return values, lack of error value type at the semantics level, optimizes for typing rather than reading) has now become which is useful. n.b. I am not criticizing any particular language’s design for doing this, but rather saying that it does not work well for Odin’s semantics nor philosophy. Part 2 of this Experiment The concept of worked by popping off the end value in a multiple valued expression and checking whether it was or , and if so, setting the end return value to value if possible. If the procedure only had one return value, it did a simple return. If the procedure had multiple return values, required that they were all named so that the end value could be assigned to by name and then an empty could be called.  ↩︎ The concept of worked by popping off the end value in a multiple valued expression and checking whether it was or , and if so, setting the end return value to value if possible. If the procedure only had one return value, it did a simple return. If the procedure had multiple return values, required that they were all named so that the end value could be assigned to by name and then an empty could be called.  ↩︎

0 views
Ginger Bill 5 years ago

Structured Control Flow (Brain Dump)

Note: This is a “brain dump” article, and subject to be cleaned up. Examples in Odin Procedure call Terminating Conditional , , Looping - loop with initial statement, condition, post statement, and body - loop with a value to be iterated over - loop with condition then body - loop with body then condition Branching - go to end outside of the control statement - skip to the end of a loop - merge two switch case bodies, to have multiple entry points to the merged body Labels on other control flow statements Deferred / Structured Exception Handling (not specifically Microsoft’s SEH , Default (named) return values Procedure call x86: Call instruction Terminating x86: Return instruction Comparison x86: Comparison instructions Jump x86: Unconditional jump instructions x86: Conditional jump instructions Indirect Jump (goto by value) GNU C: Labels as values (jump tables) Stack Unwinding (exceptions) longjmp/setjmp

0 views
Ginger Bill 5 years ago

Relative Pointers

Pointers are a value type in programming languages that store a memory address. A pointer references a location in memory, and obtaining the value stored at this location in memory is known as dereferencing a pointer. Pointers are part and parcel of using languages like C, and are an extremely powerful tool. Pointers can be treated as a form of reference type, a value that refers to another typed value in memory. When most people think of a pointer, they usually treat them as if the pointer refers a physical memory address. I will call this form of pointer an absolute pointer. On most modern machines with virtualized memory, this is rarely the case any more, and for good reasons 1 . In the case of virtualized memory spaces, these pointers do not refer to a physical memory address but rather an alias to a physical memory address, in which usually the operating system handles on a per page basis 2 . In a sense, these pointers that belong to virtualized memory spaces can be classed as a form of relative pointer. At the end of the day, a pointer can be treated/encoded as if it was an integer of some form. Knowing this fact is extremely useful, not just with pointer arithmetic in C/C++, but also relative pointers. In general, there are four forms of relative pointers: MSVC supports a non-official extension to C and C++, based pointers , which allows the user to declare pointers based on pointers. They can be declared using the keyword. The size of a based pointer is still the same size as a regular pointer, but now the value it points to relative to that value. This form of relative pointers is useful but because it is based on a globally defined variable, it muddles the type system by coupling it with values which means it cannot be applied as generally. Based pointers can also be replicated in C++ through the use of templates and operator overloading, it also has the advantage of allowing the size of the user defined based pointer to be a different size to a regular pointer: Another approach which is very similar to based pointers but the base is not explicitly defined by the type itself, which I will refer to as offset pointers . I use this approach a lot when designing file formats, which has the base being the start of the file. I originally learnt about this trick from reading the Quake source code for the MD2/MD3/MD4 file formats which used this trick a lot, especially for relative arrays . This shows one of the main advantages of relative pointers: serializeability. This means that the data structure can be ’d and still be valid. Offset pointers can also be replicated in C++ through the use of templates. Similar to the previous user-defined based pointers above, the size of the underlying integer can be any size. One of the cool tricks here is that the data structure’s fields do not contain any type information about what data type it pointers but rather that data type is encoded in the constant parameters of the templated data structure. The same form of data structure can be implemented in Odin too: In the above cases, the represents the number of bytes away from the base . It could be used to represent the number of typed-units instead, but this may require that the base pointer is correctly aligned to the referenced type on some systems. The last approach is to define the base to be the memory address of the offset itself. This approach has been used at Valve , and in some programming languages such as Jonathan Blow’s programming language 3 Jai and my own language Odin . As with offset pointer , self-relative pointers have a similar main advantage: serializeability. This means that the data structure can be ’d and still be (mostly) valid. One issue with self-relative pointers is that for the value itself to be valid, it must be in an addressable memory space (i.e. must be an l-value and not in a register). If a language containing such a concept wanted to ensure the validity of such a data type, any other data type that contains it would also have to be addressable to. This adds a form of virality to the type system, adding a new concept of addressability to the type system directly. However, even if this was enforced, there are cases when a self-relative pointer in non-addressable memory could be perfectly valid. Another issue is that smaller base integer types for the internal offset value means that they can only be valid over a smaller range than regular pointers. Relative pointers can be implemented in C++ with the use of templates and operator overloading 4 : In the above cases, the represents the number of bytes away from its own memory address. It could be used to represent the number of typed-units instead, but this may require that the base pointer is correctly aligned to the referenced type on some systems. I personally find this approach less useful in practice even if a stride of greater than 1 does give a larger range to work with in theory ( bytes). It should also be noted that this implementation of a relative pointer does treat the value as a sentinel value similar to regular pointers. Relatively recently, I added self-relative pointers to Odin because I found a decent use case for them compared to just using (explicit) offset pointers. I also extended the concept of relative pointers to relative slices too, similar to the offset array example from above. Due to the fact that the underlying base integer of a self relative pointer can be specified by the programmer, it can be exploited for a lot of useful purposes. One basic example could be having a very small linked list where the next item in the list is defined to be within a certain range. This means you could defined a relative pointer to be stored in a single byte if necessary: Relative pointers allow for a lot more flexibility compared to “traditional” pointers, and a godsend with regards to serialization. They do have many advantages and disadvantages compared to “traditional” pointers, and the programmer must be absolutely aware of them before using them. Relative pointers can be easily serialized making them ideal for file formats, and relative pointers can be as small as they need to be rather than taking up the same size of a regular pointer (e.g. on 64-bit platforms, a pointer is 64 bits which might be a lot more than the user requires for a specific problem). Virtualized memory has many security, safety, and practical benefits.  ↩︎ I will not discuss how virtual memory works or its benefits in this article.  ↩︎ Jai is also the first language I have seen implement self-relative pointers as part of the type system directly.  ↩︎ Please note that this is technically undefined behaviour, even if most compilers handle it as expected.  ↩︎ Virtual memory pointers—relative to the virtualized memory space Global-based pointers—relative to a global variable Offset pointers—relative to an explicitly defined base, such as the beginning of a file Self-Relative/Auto-Relative pointers—relative to its own memory address Virtualized memory has many security, safety, and practical benefits.  ↩︎ I will not discuss how virtual memory works or its benefits in this article.  ↩︎ Jai is also the first language I have seen implement self-relative pointers as part of the type system directly.  ↩︎ Please note that this is technically undefined behaviour, even if most compilers handle it as expected.  ↩︎

0 views