GreatReads - Blog Aggregator · Phoenix Framework

Posts in Perl (12 found)

Entropic Thoughts 1 weeks ago

What Killed Perl?

Programming Perl

0 views

Matthias Endler 2 months ago

On Choosing Rust

Since my professional writing on Rust has moved to the corrode blog , I can be a bit more casual on here and share some of my personal thoughts on the recent debate around using Rust in established software. The two projects in question are git ( kernel thread , Hacker News Discussion ) and the recently rewritten coreutils in Rust , which will ship with Ubuntu 25.10 Quizzical Quokka . What prompted me to write this post is a discussion on Twitter and a blog post titled “Are We Chasing Language Hype Over Solving Real Problems?” . In both cases, the authors speculate about the motivations behind choosing Rust, and as someone who helps teams use Rust in production, I find those takes… hilarious. Back when I started corrode, people always mentioned that Rust wasn’t used for anything serious. I knew about the production use cases from client work, but there was very little public information out there. As a consequence, we started the ‘Rust in Production’ podcast to show that companies indeed choose Rust for real-world applications. However, people don’t like to be proven wrong, so that conspiracy theory has now morphed into “Big Rust” trying to take over the world. 😆 Let’s look at some of the claims made in the blog post and Twitter thread and see how these could be debunked pretty easily. “GNU Core Utils has basically never had any major security vulnerabilities in its entire existence” If only that were true. A quick CVE search shows multiple security issues over the decades, including buffer overflows and path traversal vulnerabilities. Just a few months ago, a heap buffer under-read was found in , which would cause a leak of sensitive data if an attacker sends a specially crafted input stream. The GNU coreutils are one of the most widely used software packages worldwide with billions of installations and hundreds (thousands?) of developers looking at the code. Yes, vulnerabilities still happen. No, it is not easy to write correct, secure C code. No, not even if you’re extra careful and disciplined. is five thousand lines long. (Check out the source code ). That’s a lot of code for printing file names and metadata and a big attack surface! “Rust can only ever match C performance at best and is usually slower” Work by Trifecta shows that it is possible to write Rust code that is faster than C in some cases. Especially in concurrent workloads and with memory safety guarantees. If writing safe C code is too hard, try writing safe concurrent C code! That’s where Rust shines. You can achieve ridiculous levels of parallelization without worrying about security issues. And no, you don’t need to litter your code with blocks. Check out Steve Klabnik’s recent talk about Oxide where he shows that their bootloader and their preemptive multitasking OS, hubris – both pretty core systems code – only contain 5% of code each. You can write large codebases in Rust with no unsafe code at all. As a trivial example, I sat down to rewrite in Rust one day. The result was 3x faster than GNU on my machine. You can read the post here . All I did was use to copy data, which saves one memory copy. Performance is not only dependent on the language but on the algorithms and system calls you use. If you play into Rust’s strengths, you can match C’s performance. At least there is no technical limitation that would prevent this. And I personally feel more willing to aggressively optimize my code in Rust, because I don’t have to worry about introducing memory safety bugs. It feels like I’m not alone . “We reward novelty over necessity in the industry” This ignores that most successful companies (Google, Meta, etc.) primarily use battle-tested tech stacks, not bleeding-edge languages. These companies have massive codebases and cannot afford to rewrite everything in the latest trendy language. But they see the value of using Rust for new components and gradually rewriting existing ones. That’s because 70% of security vulnerabilities are memory safety issues and these issues are extremely costly to fix. If these companies could avoid switching to a new language to do so, they would. Besides, Rust is not exactly new anymore. Rust 1.0 was released 10+ years ago! The industry is moving slowly, but not that slowly. You’d be surprised to find out how many established companies use Rust without even announcing it or thinking of it as “novelty”. “100% orchestrated” Multiple people in the Twitter thread were convinced this is some coordinated master plan rather than developers choosing better tools, while the very maintainers of git and coreutils openly discussed their motivations in public forums for everyone to see. “They’re trying to replace/erase C. It’s not going to happen” They are right. C is not going away anytime soon. There is just so much C/C++ code out there in the wild, and rewriting everything in Rust is not feasible. The good news is that you can incrementally rewrite C/C++ code in Rust, one component at a time. That’s what the git maintainers are planning, by using Rust for new components. “They’re rewriting software with a GNU license into software with an MIT license” Even if you use Rust, you can still license your code under GPL or any other license you want. Git itself remains GPL, and many Rust projects use various licenses, not only MIT. The license fear is often brought up by people who don’t understand how open source licensing works or it might just be FUD. MIT code is still compatible with GPL code and you can use both of them in the same project without issues. It’s just that the end product (the thing you deliver to your users, i.e. binary executables) is now covered by GPL because of its virality. “It’s just developers being bored and wanting to work with shiny new languages” The aging maintainers of C projects are retiring, and there are fewer new developers willing to pick up C just to maintain legacy code in their free time. C developers are essentially going extinct. New developers want to work with modern languages and who can blame them? Or would you want to maintain a 40-year-old COBOL codebase or an old Perl script? We have to move on. “Why not build something completely new instead of rewriting existing tools?” It’s not that easy. The code is only part of the story. The other part is the ecosystem, the tooling, the integrations, the documentation, and the user base. All of that takes years to build. Users don’t want to change their workflows, so they want drop-in replacements. Proven interfaces and APIs, no matter how crude and old-fashioned, have a lot of value. But yes, new tools are being built in Rust as well. “They don’t know how to actually solve problems, just chase trends” Talk about dismissing the technical expertise of maintainers who’ve been working on these projects for years or decades and understand the pain points better than anyone. If they were just chasing trends, they wouldn’t be maintaining these projects in the first place! These people are some of the most experienced developers in the world, and yet people want to tell them how to do their jobs. “It’s part of the woke mind virus infecting software” Imagine thinking memory safety is a political conspiracy. Apparently preventing buffer overflows is now an ideological stance. The closest thing to this is the White House’s technical report which recommends memory-safe languages for government software and mandating memory safety for software receiving federal funding is a pretty reasonable take. Conclusion I could go on, but I think you get my point. People who give Rust an honest chance know that it offers advantages in terms of memory safety, concurrency, and maintainability. It’s not about chasing hype but about long-term investment in software quality. As more companies successfully adopt Rust every day, it increasingly becomes the default choice for many new projects. If you’re interested in learning more about using Rust in production, check out my other blog or listen to the Rust in Production podcast . Oh, and if you know someone who posts such takes, stop arguing and send them a link to this post.

C++ Perl

Rust

Open Source

0 views

Entropic Thoughts 4 months ago

Interacting With Text Adventures Through Perl

Perl

Programming

0 views

Michael Stapelberg 4 months ago

Migrating my NAS from CoreOS/Flatcar Linux to NixOS

In this article, I want to show how to migrate an existing Linux server to NixOS — in my case the CoreOS/Flatcar Linux installation on my Network Attached Storage (NAS) PC. I will show in detail how the previous CoreOS setup looked like (lots of systemd units starting Docker containers), how I migrated it into an intermediate state (using Docker on NixOS) just to get things going, and finally how I migrated all units from Docker to native NixOS modules step-by-step. If you haven’t heard of NixOS, I recommend you read the first page of the NixOS website to understand what NixOS is and what sort of things it makes possible. The target audience of this blog post is people interested in trying out NixOS for the use-case of a NAS, who like seeing examples to understand how to configure a system. You can apply these examples by first following my blog post “How I like to install NixOS (declaratively)” , then making your way through the sections that interest you. If you prefer seeing the full configuration, skip to the conclusion . Over the last decade, I used a number of different operating systems for my NAS needs. Here’s an overview of the 2 NAS systems storage2 and storage3: When I started using CoreOS, Docker was pretty new technology. I liked that using Docker containers allowed you to treat services uniformly — ultimately, they all expose a port of some sort (speaking HTTP, or Postgres, or…), so you got the flexibility to run much more recent versions of software on a stable OS, or older versions in case an update broke something. Over a decade later, Docker is established tech. People nowadays take for granted the various benefits of the container approach. So, here’s my list of reasons why I wasn’t satisfied with Flatcar Linux anymore. The CoreOS cloud-init project was deprecated at some point in favor of Ignition , which is clearly more powerful, but also more cumbersome to get started with as a hobbyist. As far as I can tell, I must host my config at some URL that I then provide via a kernel parameter. The old way of just copying a file seems to no longer be supported. Ignition also seems less convenient in other ways: YAML is no longer supported, only JSON, which I don’t enjoy writing by hand. Also, the format seems to change quite a bit . As a result, I never made the jump from cloud-init to Ignition, and it’s not good to be reliant on a long-deprecated way to use your OS of choice. At some point, I did an audit of all my containers on the Docker Hub and noticed that most of them were quite outdated. For a while, Docker Hub offered automated builds based on a obtained from GitHub. However, automated builds now require a subscription, and I will not accept a subscription just to use my own computers. If Docker at some point ceases operation of the Docker Hub, I am unable to deploy software to my NAS. This isn’t a very hypothetical concern: In 2023, Docker Hub announced the end of organizations on the Free tier and then backpedaled after community backlash. Who knows how long they can still provide free services to hobbyists like myself. The final nail in the coffin was when I noticed that I could not try Immich on my NAS system! Modern web applications like Immich need multiple Docker containers (for Postgres, Redis, etc.) and hence only offer Docker Compose as a supported way of installation. Unfortunately, Flatcar does not include Docker Compose . I was not in the mood to re-package Immich for non-Docker-Compose systems on an ongoing basis, so I decided that a system on which I can neither run software like Immich directly, nor even run Docker Compose, is not sufficient for my needs anymore. With all of the above reasons, I would have had to set up automated container builds, run my own central registry and would still be unable to run well-known Open Source software like Immich. Instead, I decided to try NixOS again (after a 10 year break) because it seems like the most popular declarative solution nowadays, with a large community and large selection of packages. How does NixOS compare for my situation? My NAS setup needs to work every day, so I wanted to prototype my desired configuration in a VM before making changes to my system. This is not only safer, it also allows me to discover any roadblocks, and what working with NixOS feels like without making any commitments. I copied my NixOS configuration from a previous test installation (see “How I like to install NixOS (declaratively)” ) and used the following command to build a VM image and start it in QEMU: The configuration instructions below can be tried out in this VM, and once you’re happy enough with what you have, you can repeat the steps on the actual machine to migrate. For the migration of my actual system, I defined the following milestones that should be achievable within a typical session of about an hour (after prototyping them in a VM): In practice, this worked out exactly as planned: the actual installation of NixOS and setting up my config to milestone M4 took a little over one hour. All the other nice-to-haves were done over the following days and weeks as time permitted. Tip: After losing data due to an installer bug in the 2000s, I have adopted the habit of physically disconnecting all data disks (= pulling out the SATA cable) when re-installing the system disk. After following “How I like to install NixOS (declaratively)” , this is my initial : All following sections describe changes within this . All devices in my home network obtain their IP address via DHCP. If I want to make an IP address static, I configure it accordingly on my router. My NAS PCs have one specialty with regards to IP addressing: They are reachable via IPv4 and IPv6, and the IPv6 address can be derived from the IPv4 address. Hence, I changed the systemd-networkd configuration from above such that it configures a static IPv6 address in a dynamically configured IPv6 network: ✅ This fulfills milestone M1. To unlock my encrypted disks on boot, I have a custom systemd service unit that uses and to split the key file between the NAS and a remote server (= an attacker needs both pieces to unlock). With CoreOS/Flatcar, my configuration looked as follows: I converted it into the following NixOS configuration: We’ll also need to store the custom TLS certificate file on disk. For that, we can use the configuration: The references like will be replaced with a path to the Nix store ( → nix.dev documentation ). On CoreOS/Flatcar, I was limited to using just the (minimal set of) software included in the base image, or I had to reach for Docker. On NixOS, we can use all packages available in nixpkgs. After deploying and ing, I can access my unlocked disk under ! 🎉 When listing my files, I noticed that the group id was different between my old system and the new system. This can be fixed by explicitly specifying the desired group id: ✅ M2 is complete. Whereas I want to configure remote disk unlock at the systemd service level, for Samba I want to use Docker: I wanted to first transfer my old (working) Docker-based setups as they are, and only later convert them to Nix. We enable the Docker NixOS module which sets up the daemons that Docker needs and whatever else is needed to make it work: This is already sufficient for other services to use Docker, but I also want to be able to run the command interactively for debugging. Therefore, I added to : After deploying this configuration, I can run to verify things work. The version of samba looked like this: We can translate this 1:1 to NixOS: ✅ Now I can manage my files over the network, which completes M3! See also: Nice-to-haves: N5. samba from NixOS For backing up data, I use rsync over SSH. I restrict this SSH access to run only rsync commands by using (in a Docker container). To configure the SSH , we set: ✅ A successful test backup run completes milestone M4! See also: Nice-to-haves: N6. rrsync from NixOS I like to monitor all my machines with Prometheus (and Grafana). For network connectivity and authentication, I use the Tailscale mesh VPN. To install Tailscale, I enable its NixOS module and make the command available: After deploying, I run and open the login link in my browser. The Prometheus Node Exporter can also easily be enabled through its NixOS module : However, this isn’t reliable yet: When Tailscale’s startup takes a while during system boot, the Node Exporter might burn through its entire restart budget when it cannot listen on the Tailscale IP address yet. We can enable indefinite restarts for the service to eventually come up: While migrating my setup, I noticed that calling from directly is not reliable, and it’s better to let systemd manage the mounting: Afterwards, I could just remove the call from : In systemd services, I can now depend on the mount unit: To save power, I turn off my NAS when they are not in use. My backup orchestration uses Wake-on-LAN to start a wakeup and needs to wait until the NAS is fully booted up and has mounted its mount before it can start backup jobs. For this purpose, I have configured a web server (without any files) that depends on the mount. So, once the web server responds to HTTP requests, we know is mounted. The config looked as follows: The Docker version (ported from Flatcar Linux) looks like this: This configuration gets a lot simpler when migrating it from Docker to NixOS: The Docker version (ported from Flatcar Linux) looks like this: As before, when using jellyfin from NixOS, the configuration gets simpler: For a while, I had also set up compatibility symlinks that map the old location ( , inside the Docker container) to the new location ( ), but I encountered strange issues in Jellyfin and ended up just re-initializing my whole Jellyfin state. While the required configuration had more lines, I found it neat to move it into its own file, so here is how to do that: Remove the lines above from and move them into : Then, in , add to the : To use Samba from NixOS, I replaced my config from M3 with this: Note: Setting the samba password in the activation script works for small setups, but if you want to keep your samba passwords out of the Nix store, you’ll need to use a different approach. On a different machine, I use sops-nix to manage secrets and found that refactoring the call like so works reliably: I also noticed that NixOS does not create a group for each user by default, but I am used to managing my permissions like that. We can easily declare a group like so: The Docker version (ported from Flatcar Linux) looks like this: To use from NixOS, I changed the configuration like so: The Docker version (ported from Flatcar Linux) looks like this: I wanted to stop managing the following to ship : To get rid of the Docker container, I translated the file into a Nix expression that writes the Perl script to the Nix store: I can then reference this file by importing it in my and pointing it to the expression of my NixOS configuration: This works, but is it the best approach? Here are some thoughts: I want to configure all my NixOS systems such that my user settings are identical everywhere. To achieve that, I can extract parts of my into a and then declare an accompanying that provides this expression as an output. After publishing these files in a git repository, I can reference said repository in my : Everything declared in the can now be removed from ! One of the motivating reasons for switching away from CoreOS/Flatcar was that I couldn’t try Immich, so let’s give it a shot on NixOS: You can find the full configuration directory on GitHub . I am pretty happy with this NixOS setup! Previously (with CoreOS/Flatcar), I could declaratively manage my base system, but had to manage tons of Docker containers in addition. With NixOS, I can declaratively manage everything (or as much as makes sense). Custom configuration like my SSH+rsync-based backup infrastructure can be expressed cleanly, in one place, and structured at the desired level of abstraction/reuse. If you’re considering managing at least one other system with NixOS, I would recommend it! One of my follow-up projects is to convert storage3 (my other NAS build) from Ubuntu Server to NixOS as well to cut down on manual management. Being able to just copy the entire config to set up another system, or try out an idea in a throwaway VM, is just such a nice workflow 🥰 …but if you have just a single system to manage, probably all of this is too complicated. (This post is only about software! For my usage patterns and requirements regarding hardware selection, see “Design Goals” in my My all-flash ZFS NAS build post (2023) .) Remote management: I really like the model of having the configuration of my network storage builds version-controlled and managed on my main PC. It’s a nice property that I can regain access to my backup setup by re-installing my NAS from my PC within minutes. Automated updates, with easy rollback: Updating all my installations manually is not my idea of a good time. Hence, automated updates are a must — but when the update breaks, a quick and easy path to recovery is also a must. CoreOS/Flatcar achieved that with an A/B updating scheme (update failed? boot the old partition), whereas NixOS achieves that with its concept of a “generation” (update failed? select the old generation), which is finer-grained. Same: I also need to set up an automated job to update my NixOS systems. I already have such a job for updating my gokrazy devices. Docker push is asynchronous: After a successful push, I still need extra automation for pulling the updated containers on the target host and restarting the affected services, whereas NixOS includes all of that. Better: There is no central registry. With NixOS, I can push the build result directly to the target host via SSH. Better: The corpus of available software in NixOS is much larger (including Immich, for example) and the NixOS modules generally seem to be expressed at a higher level of abstraction than individual Docker containers, meaning you can configure more features with fewer lines of config. M1. Install NixOS M2. Set up remote disk unlock M3. Set up Samba for access M4. Set up SSH/rsync for backups Everything extra is nice-to-have and could be deferred to a future session on another day. By managing this script in a Nix expression, I can no longer use my editor’s Perl support. I could probably also keep as a separate file and use string interpolation in my Nix expression to inject an absolute path to the binary into the script. Another alternative would be to add a wrapper script to my Nix expression which ensures that contains and then the script wouldn’t need an absolute path anymore. For small glue scripts like this one, I consider it easier to manage the contents “inline” in the Nix expression, because it means one fewer file in my config directory.

Linux

DevOps

JSON

Tutorial Perl

0 views

Ahead of AI 11 months ago

LLM Research Papers: The 2024 List

It’s been a very eventful and exciting year in AI research. This is especially true if you are interested in LLMs. I had big plans for this December edition and was planning to publish a new article with a discussion of all my research highlights from 2024. I still plan to do so, but due to an accident and serious injury, I am currently unable to work at a computer and finish the draft. But I hope to recover in the upcoming weeks and be back on my feet soon. In the meantime, I want to share my running bookmark list of many fascinating (mostly LLM-related) papers I stumbled upon in 2024. It’s just a list, but maybe it will come in handy for those who are interested in finding some gems to read for the holidays. And if you are interested in more code-heavy reading and tinkering, My Build A Large Language Model (From Scratch) book is out on Amazon as of last month. In addition, I added a lot of bonus materials to the GitHub repository . Bonus materials in the GitHub repository (stars highlight my personal favorites) Thanks for your understanding and support, and I hope to make a full recovery soon and be back with the Research Highlights 2024 article in a few weeks! Ahead of AI is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. 1 Jan, Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models , https://arxiv.org/abs/2401.00788 2 Jan, A Comprehensive Study of Knowledge Editing for Large Language Models , https://arxiv.org/abs/2401.01286 2 Jan, LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning , https://arxiv.org/abs/2401.01325 2 Jan, Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models , https://arxiv.org/abs/2401.01335 2 Jan, LLaMA Beyond English: An Empirical Study on Language Capability Transfer , https://arxiv.org/abs/2401.01055 3 Jan, A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity , https://arxiv.org/abs/2401.01967 4 Jan, LLaMA Pro: Progressive LLaMA with Block Expansion , https://arxiv.org/abs/2401.02415 4 Jan, LLM Augmented LLMs: Expanding Capabilities through Composition , https://arxiv.org/abs/2401.02412 4 Jan, Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM , https://arxiv.org/abs/2401.02994 5 Jan, DeepSeek LLM: Scaling Open-Source Language Models with Longtermism , https://arxiv.org/abs/2401.02954 5 Jan, Denoising Vision Transformers , https://arxiv.org/abs/2401.02957 7 Jan, Soaring from 4K to 400K: Extending LLM’s Context with Activation Beacon , https://arxiv.org/abs/2401.03462 8 Jan, Mixtral of Experts , https://arxiv.org/abs/2401.04088 8 Jan, MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts , https://arxiv.org/abs/2401.04081 8 Jan, A Minimaximalist Approach to Reinforcement Learning from Human Feedback , https://arxiv.org/abs/2401.04056 9 Jan, RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation , https://arxiv.org/abs/2401.04679 10 Jan, Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training , https://arxiv.org/abs/2401.05566 11 Jan, Transformers are Multi-State RNNs , https://arxiv.org/abs/2401.06104 11 Jan, A Closer Look at AUROC and AUPRC under Class Imbalance , https://arxiv.org/abs/2401.06091 12 Jan, An Experimental Design Framework for Label-Efficient Supervised Finetuning of Large Language Models , https://arxiv.org/abs/2401.06692 16 Jan, Tuning Language Models by Proxy , https://arxiv.org/abs/2401.08565 16 Jan, Scalable Pre-training of Large Autoregressive Image Models , https://arxiv.org/abs/2401.08541 16 Jan, Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering , https://arxiv.org/abs/2401.08500 16 Jan, RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture , https://arxiv.org/abs/2401.08406 17 Jan, ReFT: Reasoning with Reinforced Fine-Tuning , https://arxiv.org/abs/2401.08967 18 Jan, DiffusionGPT: LLM-Driven Text-to-Image Generation System , https://arxiv.org/abs/2401.10061 18 Jan, Self-Rewarding Language Models , https://arxiv.org/abs/2401.10020 18 Jan, VMamba: Visual State Space Model , https://arxiv.org/abs/2401.10166 19 Jan, Knowledge Fusion of Large Language Models , https://arxiv.org/abs/2401.10491 22 Jan, SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities , https://arxiv.org/abs/2401.12168 22 Jan, WARM: On the Benefits of Weight Averaged Reward Models , https://arxiv.org/abs/2401.12187 22 Jan, Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text , https://arxiv.org/abs/2401.12070 24 Jan, MambaByte: Token-free Selective State Space Model , https://arxiv.org/abs/2401.13660 24 Jan, SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced Token Detection , https://arxiv.org/abs/2401.13160 25 Jan, Rethinking Patch Dependence for Masked Autoencoders , https://arxiv.org/abs/2401.14391 25 Jan, Pix2gestalt: Amodal Segmentation by Synthesizing Wholes , https://arxiv.org/abs/2401.14398 25 Jan, Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities , https://arxiv.org/abs/2401.14405 26 Jan, EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty , https://arxiv.org/abs/2401.15077 29 Jan, MoE-LLaVA: Mixture of Experts for Large Vision-Language Models , https://arxiv.org/abs/2401.15947 29 Jan, Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling , https://arxiv.org/abs/2401.16380 31 Jan, KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization , https://arxiv.org/abs/2401.18079 1 Feb, Efficient Exploration for LLMs , https://arxiv.org/abs/2402.00396 1 Feb, OLMo: Accelerating the Science of Language Models , https://arxiv.org/abs/2402.00838 1 Feb, Tiny Titans: Can Smaller Large Language Models Punch Above Their Weight in the Real World for Meeting Summarization? , https://arxiv.org/abs/2402.00841 1 Feb, Repeat After Me: Transformers are Better than State Space Models at Copying , https://arxiv.org/abs/2402.01032 2 Feb, LiPO: Listwise Preference Optimization through Learning-to-Rank , https://arxiv.org/abs/2402.01878 2 Feb, FindingEmo: An Image Dataset for Emotion Recognition in the Wild , https://arxiv.org/abs/2402.01355 3 Feb, More Agents Is All You Need , https://arxiv.org/abs/2402.05120 5 Feb, DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models , https://arxiv.org/abs/2402.03300 6 Feb, MobileVLM V2: Faster and Stronger Baseline for Vision Language Model , https://arxiv.org/abs/2402.03766 6 Feb, A Phase Transition Between Positional and Semantic Learning in a Solvable Model of Dot-Product Attention , https://arxiv.org/abs/2402.03902 6 Feb, Scaling Laws for Downstream Task Performance of Large Language Models , https://arxiv.org/abs/2402.04177 6 Feb, MOMENT: A Family of Open Time-series Foundation Models , https://arxiv.org/abs/2402.03885 6 Feb, Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models , https://arxiv.org/abs/2402.03749 6 Feb, Self-Discover: Large Language Models Self-Compose Reasoning Structures , https://arxiv.org/abs/2402.03620 7 Feb, Grandmaster-Level Chess Without Search , https://arxiv.org/abs/2402.04494 7 Feb, Direct Language Model Alignment from Online AI Feedback , https://arxiv.org/abs/2402.04792 8 Feb, Buffer Overflow in Mixture of Experts , https://arxiv.org/abs/2402.05526 9 Feb, The Boundary of Neural Network Trainability is Fractal , https://arxiv.org/abs/2402.06184 11 Feb, ODIN: Disentangled Reward Mitigates Hacking in RLHF , https://arxiv.org/abs/2402.07319 12 Feb, Policy Improvement using Language Feedback Models , https://arxiv.org/abs/2402.07876 12 Feb, Scaling Laws for Fine-Grained Mixture of Experts , https://arxiv.org/abs/2402.07871 12 Feb, Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model , https://arxiv.org/abs/2402.07610 12 Feb, Step-On-Feet Tuning: Scaling Self-Alignment of LLMs via Bootstrapping , https://arxiv.org/abs/2402.07610 12 Feb, Suppressing Pink Elephants with Direct Principle Feedback , https://arxiv.org/abs/2402.07896 13 Feb, World Model on Million-Length Video And Language With RingAttention , https://arxiv.org/abs/2402.08268 13 Feb, Mixtures of Experts Unlock Parameter Scaling for Deep RL , https://arxiv.org/abs/2402.08609 14 Feb, DoRA: Weight-Decomposed Low-Rank Adaptation , https://arxiv.org/abs/2402.09353 14 Feb, Transformers Can Achieve Length Generalization But Not Robustly , https://arxiv.org/abs/2402.09371 15 Feb, BASE TTS: Lessons From Building a Billion-Parameter Text-to-Speech Model on 100K Hours of Data , https://arxiv.org/abs/2402.08093 15 Feb, Recovering the Pre-Fine-Tuning Weights of Generative Models , https://arxiv.org/abs/2402.10208 15 Feb, Generative Representational Instruction Tuning , https://arxiv.org/abs/2402.09906 16 Feb, FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models , https://arxiv.org/abs/2402.10986 17 Feb, OneBit: Towards Extremely Low-bit Large Language Models , https://arxiv.org/abs/2402.11295 18 Feb, LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration , https://arxiv.org/abs/2402.11550 19 Feb, Reformatted Alignment , https://arxiv.org/abs/2402.12219 19 Feb, AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling , https://arxiv.org/abs/2402.12226 19 Feb, Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs , https://arxiv.org/abs/2402.12030 19 Feb, LoRA+: Efficient Low Rank Adaptation of Large Models , https://arxiv.org/abs/2402.12354 20 Feb, Neural Network Diffusion , https://arxiv.org/abs/2402.13144 21 Feb, YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information , https://arxiv.org/abs/2402.13616 21 Feb, LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens , https://arxiv.org/abs/2402.13753 21 Feb, Large Language Models for Data Annotation: A Survey , https://arxiv.org/abs/2402.13446 22 Feb, TinyLLaVA: A Framework of Small-scale Large Multimodal Models , https://arxiv.org/abs/2402.14289 22 Feb, Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs , https://arxiv.org/abs/2402.14740 23 Feb, Genie: Generative Interactive Environments , https://arxiv.org/abs/2402.15391 26 Feb, CARTE: Pretraining and Transfer for Tabular Learning , https://arxiv.org/abs/2402.16785 27 Feb, The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits , https://arxiv.org/abs/2402.17764 27 Feb, Sora Generates Videos with Stunning Geometrical Consistency , https://arxiv.org/abs/2402.17403 27 Feb, When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method , https://arxiv.org/abs/2402.17193 29 Feb, Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models , https://arxiv.org/abs/2402.19427 1 Mar, Learning and Leveraging World Models in Visual Representation Learning , https://arxiv.org/abs/2403.00504 3 Mar, Improving LLM Code Generation with Grammar Augmentation , https://arxiv.org/abs/2403.01632 3 Mar, The Hidden Attention of Mamba Models , https://arxiv.org/abs/2403.01590 4 Mar, Training-Free Pretrained Model Merging , https://arxiv.org/abs/2403.01753 4 Mar, Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures , https://arxiv.org/abs/2403.02308 5 Mar, The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning , https://arxiv.org/abs/2403.03218 5 Mar, Evolution Transformer: In-Context Evolutionary Optimization , https://arxiv.org/abs/2403.02985 5 Mar, Enhancing Vision-Language Pre-training with Rich Supervisions , https://arxiv.org/abs/2403.03346 5 Mar, Scaling Rectified Flow Transformers for High-Resolution Image Synthesis , https://arxiv.org/abs/2403.03206 5 Mar, Design2Code: How Far Are We From Automating Front-End Engineering? , https://arxiv.org/abs/2403.03163 6 Mar, ShortGPT: Layers in Large Language Models are More Redundant Than You Expect , https://arxiv.org/abs/2403.03853 6 Mar, Backtracing: Retrieving the Cause of the Query , https://arxiv.org/abs/2403.03956 6 Mar, Learning to Decode Collaboratively with Multiple Language Models , https://arxiv.org/abs/2403.03870 6 Mar, SaulLM-7B: A pioneering Large Language Model for Law , https://arxiv.org/abs/2403.03883 6 Mar, Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious Challenges in Multimodal Reasoning , https://arxiv.org/abs/2403.03864 6 Mar, 3D Diffusion Policy , https://arxiv.org/abs/2403.03954 6 Mar, MedMamba: Vision Mamba for Medical Image Classification , https://arxiv.org/abs/2403.03849 6 Mar, GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection , https://arxiv.org/abs/2403.03507 6 Mar, Stop Regressing: Training Value Functions via Classification for Scalable Deep RL , https://arxiv.org/abs/2403.03950 7 Mar, How Far Are We from Intelligent Visual Deductive Reasoning? , https://arxiv.org/abs/2403.04732 7 Mar, Common 7B Language Models Already Possess Strong Math Capabilities , https://arxiv.org/abs/2403.04706 8 Mar, Gemini 1.5: Unlocking Multimodal Understanding Across Millions of Tokens of Context , https://arxiv.org/abs/2403.05530 8 Mar, Is Cosine-Similarity of Embeddings Really About Similarity? , https://arxiv.org/abs/2403.05440 8 Mar, LLM4Decompile: Decompiling Binary Code with Large Language Models , https://arxiv.org/abs/2403.05286 9 Mar, Algorithmic Progress in Language Models , https://arxiv.org/abs/2403.05812 11 Mar, Stealing Part of a Production Language Model , https://arxiv.org/abs/2403.06634 12 Mar, Chronos: Learning the Language of Time Series , https://arxiv.org/abs/2403.07815 13 Mar, Simple and Scalable Strategies to Continually Pre-train Large Language Models , https://arxiv.org/abs/2403.08763 13 Mar, Language Models Scale Reliably With Over-Training and on Downstream Tasks , https://arxiv.org/abs/2403.08540 14 Mar, BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences , https://arxiv.org/abs/2403.09347 14 Mar, LocalMamba: Visual State Space Model with Windowed Selective Scan , https://arxiv.org/abs/2403.09338 14 Mar, GiT: Towards Generalist Vision Transformer through Universal Language Interface , https://arxiv.org/abs/2403.09394 14 Mar, MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training , https://arxiv.org/abs/2403.09611 15 Mar, RAFT: Adapting Language Model to Domain Specific RAG , https://arxiv.org/abs/2403.10131 18 Mar, TnT-LLM: Text Mining at Scale with Large Language Models , https://arxiv.org/abs/2403.12173 18 Mar, Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression , https://arxiv.org/abs/2403.15447 19 Mar, PERL: Parameter Efficient Reinforcement Learning from Human Feedback , https://arxiv.org/abs/2403.10704 20 Mar, RewardBench: Evaluating Reward Models for Language Modeling , https://arxiv.org/abs/2403.13787 20 Mar, LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models , https://arxiv.org/abs/2403.13372 21 Mar, RakutenAI-7B: Extending Large Language Models for Japanese , https://arxiv.org/abs/2403.15484 22 Mar, SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time Series , https://arxiv.org/abs/2403.15360 22 Mar, Can Large Language Models Explore In-Context? , https://arxiv.org/abs/2403.15371 22 Mar, LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement , https://arxiv.org/abs/2403.15042 25 Mar, LLM Agent Operating System , https://arxiv.org/abs/2403.16971 26 Mar, The Unreasonable Ineffectiveness of the Deeper Layers , https://arxiv.org/abs/2403.17887 27 Mar, BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text , https://arxiv.org/abs/2403.18421 27 Mar, ViTAR: Vision Transformer with Any Resolution , https://arxiv.org/abs/2403.18361 27 Mar, Long-form Factuality in Large Language Models , https://arxiv.org/abs/2403.18802 27 Mar, Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models , https://arxiv.org/abs/2403.18814 26 Mar, LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning , https://arxiv.org/abs/2403.17919 26 Mar, Mechanistic Design and Scaling of Hybrid Architectures , https://arxiv.org/abs/2403.17844 28 Mar, MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions , https://arxiv.org/abs/2403.19651 28 Mar, Model Stock: All We Need Is Just a Few Fine-Tuned Models , https://arxiv.org/abs/2403.19522 1 Apr, Do Language Models Plan Ahead for Future Tokens? , https://arxiv.org/abs/2404.00859 1 Apr, Bigger is not Always Better: Scaling Properties of Latent Diffusion Models , https://arxiv.org/abs/2404.01367 1 Apr, The Fine Line: Navigating Large Language Model Pretraining with Down-streaming Capability Analysis , https://arxiv.org/abs/2404.01204 1 Apr, Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models , https://arxiv.org/abs/2404.04478 2 Apr, Mixture-of-Depths: Dynamically Allocating Compute in Transformer-Based Language Models , https://arxiv.org/abs/2404.02258 2 Apr, Long-context LLMs Struggle with Long In-context Learning , https://arxiv.org/abs/2404.02060 2 Apr, Emergent Abilities in Reduced-Scale Generative Language Models , https://arxiv.org/abs/2404.02204 2 Apr, Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks , https://arxiv.org/abs/2404.02151 3 Apr, On the Scalability of Diffusion-based Text-to-Image Generation , https://arxiv.org/abs/2404.02883 3 Apr, BAdam: A Memory Efficient Full Parameter Training Method for Large Language Models , https://arxiv.org/abs/2404.02827 3 Apr, Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models , https://arxiv.org/abs/2404.02747 4 Apr, Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences , https://arxiv.org/abs/2404.02151 4 Apr, Training LLMs over Neurally Compressed Text , https://arxiv.org/abs/2404.03626 4 Apr, CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues , https://arxiv.org/abs/2404.03820 5 Apr, ReFT: Representation Finetuning for Language Models , https://arxiv.org/abs/2404.03592 5 Apr, Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data , https://arxiv.org/abs/2404.03862 5 Apr, Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation , https://arxiv.org/abs/2404.04256 8 Apr, AutoCodeRover: Autonomous Program Improvement , https://arxiv.org/abs/2404.05427 8 Apr, Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence , https://arxiv.org/abs/2404.05892 8 Apr, CodecLM: Aligning Language Models with Tailored Synthetic Data , https://arxiv.org/abs/2404.05875 9 Apr, MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies , https://arxiv.org/abs/2404.06395 9 Apr, Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models , https://arxiv.org/abs/2404.06209 9 Apr, LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders , https://arxiv.org/abs/2404.05961 10 Apr, Adapting LLaMA Decoder to Vision Transformer , https://arxiv.org/abs/2404.06773 10 Apr, Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention , https://arxiv.org/abs/2404.07143 11 Apr, LLoCO: Learning Long Contexts Offline , https://arxiv.org/abs/2404.07979 11 Apr, JetMoE: Reaching Llama2 Performance with 0.1M Dollars , https://arxiv.org/abs/2404.07413 11 Apr, Best Practices and Lessons Learned on Synthetic Data for Language Models , https://arxiv.org/abs/2404.07503 11 Apr, Rho-1: Not All Tokens Are What You Need , https://arxiv.org/abs/2404.07965 12 Apr, Pre-training Small Base LMs with Fewer Tokens , https://arxiv.org/abs/2404.08634 12 Apr, Dataset Reset Policy Optimization for RLHF , https://arxiv.org/abs/2404.08495 13 Apr, LLM In-Context Recall is Prompt Dependent , https://arxiv.org/abs/2404.08865 15 Apr, State Space Model for New-Generation Network Alternative to Transformers: A Survey , https://arxiv.org/abs/2404.09516 15 Apr, Chinchilla Scaling: A Replication Attempt , https://arxiv.org/abs/2404.10102 15 Apr, Learn Your Reference Model for Real Good Alignment , https://arxiv.org/abs/2404.09656 16 Apr, Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study , https://arxiv.org/abs/2404.10719 16 Apr, Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies , https://arxiv.org/abs/2404.08197 16 Apr, How Faithful Are RAG Models? Quantifying the Tug-of-War Between RAG and LLMs' Internal Prior , https://arxiv.org/abs/2404.10198 17 Apr, A Survey on Retrieval-Augmented Text Generation for Large Language Models , https://arxiv.org/abs/2404.10981 18 Apr, When LLMs are Unfit Use FastFit: Fast and Effective Text Classification with Many Classes , https://arxiv.org/abs/2404.12365 18 Apr, Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing , https://arxiv.org/abs/2404.12253 18 Apr, OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data , https://arxiv.org/abs/2404.12195 19 Apr, The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions , https://arxiv.org/abs/2404.13208 22 Apr, How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study , https://arxiv.org/abs/2404.14047 22 Apr, Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone , https://arxiv.org/abs/2404.14219 22 Apr, OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework , https://arxiv.org/abs/2404.14619 22 Apr, A Survey on Self-Evolution of Large Language Models , https://arxiv.org/abs/2404.14662 23 Apr, Multi-Head Mixture-of-Experts , https://arxiv.org/abs/2404.15045 23 Apr, NExT: Teaching Large Language Models to Reason about Code Execution , https://arxiv.org/abs/2404.14662 23 Apr, Graph Machine Learning in the Era of Large Language Models (LLMs) , https://arxiv.org/abs/2404.14928 24 Apr, Retrieval Head Mechanistically Explains Long-Context Factuality , https://arxiv.org/abs/2404.15574 25 Apr, Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding , https://arxiv.org/abs/2404.16710 25 Apr, Make Your LLM Fully Utilize the Context , https://arxiv.org/abs/2404.16811 28 Apr, LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report , https://arxiv.org/abs/2405.00732 30 Apr, Better & Faster Large Language Models via Multi-token Prediction , https://arxiv.org/abs/2404.19737 30 Apr, RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing , https://arxiv.org/abs/2404.19543 30 Apr, A Primer on the Inner Workings of Transformer-based Language Models , https://arxiv.org/abs/2405.00208 30 Apr, When to Retrieve: Teaching LLMs to Utilize Information Retrieval Effectively , https://arxiv.org/abs/2404.19705 30 Apr, KAN: Kolmogorov–Arnold Networks , https://arxiv.org/abs/2404.19756 1 May, Is Bigger Edit Batch Size Always Better? An Empirical Study on Model Editing with Llama-3 , https://arxiv.org/abs/2405.00664 1 May, Self-Play Preference Optimization for Language Model Alignment , https://arxiv.org/abs/2405.00675 1 May, A Careful Examination of Large Language Model Performance on Grade School Arithmetic , https://arxiv.org/abs/2405.00332 2 May, Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models , https://arxiv.org/abs/2405.01535 3 May, What Matters When Building Vision-Language Models? , https://arxiv.org/abs/2405.02246 5 May, Is Flash Attention Stable? , https://arxiv.org/abs/2405.02803 7 May, vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention , https://arxiv.org/abs/2405.04437 7 May, xLSTM: Extended Long Short-Term Memory , https://arxiv.org/abs/2405.04517 8 May, You Only Cache Once: Decoder-Decoder Architectures for Language Models , https://arxiv.org/abs/2405.05254 8 May, DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model , https://arxiv.org/abs/2405.04434 8 May, Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models , https://arxiv.org/abs/2405.05417 9 May, Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? , https://arxiv.org/abs/2405.05904 10 May, Value Augmented Sampling for Language Model Alignment and Personalization , https://arxiv.org/abs/2405.06639 12 May, PHUDGE: Phi-3 as Scalable Judge , https://arxiv.org/abs/2405.08029 13 May, RLHF Workflow: From Reward Modeling to Online RLHF , https://arxiv.org/abs/2405.07863 15 May, LoRA Learns Less and Forgets Less , https://arxiv.org/abs/2405.09673 15 May, Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model , https://arxiv.org/abs/2405.09215 16 May, Chameleon: Mixed-Modal Early-Fusion Foundation Models , https://arxiv.org/abs/2405.09818 17 May, Towards Modular LLMs by Building and Reusing a Library of LoRAs , https://arxiv.org/abs/2405.11157 19 May, SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization , https://arxiv.org/abs/2405.11582 20 May, MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning , https://arxiv.org/abs/2405.12130 22 May, Attention as an RNN , https://arxiv.org/abs/2405.13956 22 May, Dense Connector for MLLMs , https://arxiv.org/abs/2405.13800 23 May, AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability , https://arxiv.org/abs/2405.14129 23 May, SimPO: Simple Preference Optimization with a Reference-Free Reward , https://arxiv.org/abs/2405.14734 23 May, Instruction Tuning With Loss Over Instructions , https://arxiv.org/abs/2405.14394 24 May, The Road Less Scheduled , https://arxiv.org/abs/2405.15682 26 May, Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training , https://arxiv.org/abs/2405.15319 26 May, gzip Predicts Data-dependent Scaling Laws , https://arxiv.org/abs/2405.16684 27 May, Trans-LoRA: Towards Data-free Transferable Parameter Efficient Finetuning , https://arxiv.org/abs/2405.17258 28 May, VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections , https://arxiv.org/abs/2405.17991 28 May, LLaMA-NAS: Efficient Neural Architecture Search for Large Language Models , https://arxiv.org/abs/2405.18377 29 May, Contextual Position Encoding: Learning to Count What's Important , https://arxiv.org/abs/2405.18719 2 Jun, Show, Don't Tell: Aligning Language Models with Demonstrated Feedback , https://arxiv.org/abs/2406.00888 3 Jun, Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models , https://arxiv.org/abs/2406.06563 3 Jun, OLoRA: Orthonormal Low-Rank Adaptation of Large Language Models , https://arxiv.org/abs/2406.01775 3 Jun, The Geometry of Categorical and Hierarchical Concepts in Large Language Models , https://arxiv.org/abs/2406.01506 3 Jun, Towards Scalable Automated Alignment of LLMs: A Survey , https://arxiv.org/abs/2406.01252 4 Jun, Scalable MatMul-free Language Modeling , https://arxiv.org/abs/2406.02528 4 Jun, Block Transformer: Global-to-Local Language Modeling for Fast Inference , https://arxiv.org/abs/2406.02657 6 Jun, Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models , https://arxiv.org/abs/2406.04271 6 Jun, The Prompt Report: A Systematic Survey of Prompting Techniques , https://arxiv.org/abs/2406.06608 6 Jun, Transformers Need Glasses! Information Over-Squashing in Language Tasks , https://arxiv.org/abs/2406.04267 6 Jun, Are We Done with MMLU? , https://arxiv.org/abs/2406.04127 6 Jun, Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step , https://arxiv.org/abs/2406.04314 7 Jun, Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach , https://arxiv.org/abs/2406.04594 7 Jun, CRAG -- Comprehensive RAG Benchmark , https://arxiv.org/abs/2406.04744 7 Jun, WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild , https://arxiv.org/abs/2406.04770 7 Jun, Mixture-of-Agents Enhances Large Language Model Capabilities , https://arxiv.org/abs/2406.04692 7 Jun, BERTs are Generative In-Context Learners , https://arxiv.org/abs/2406.04823 7 Jun, 3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination , https://arxiv.org/abs/2406.05132 8 Jun, Creativity Has Left the Chat: The Price of Debiasing Language Models , https://arxiv.org/abs/2406.05587 10 Jun, Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation , https://arxiv.org/abs/2406.06525 10 Jun, Margin-aware Preference Optimization for Aligning Diffusion Models Without Reference , https://arxiv.org/abs/2406.06424 10 Jun, Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning , https://arxiv.org/abs/2406.06469 10 Jun, Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters , https://arxiv.org/abs/2406.05955 10 Jun, Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching , https://arxiv.org/abs/2406.06326 11 Jun, An Image is Worth 32 Tokens for Reconstruction and Generation , https://arxiv.org/abs/2406.07550 11 Jun, TextGrad: Automatic "Differentiation" via Text , https://arxiv.org/abs/2406.07496 11 Jun, Simple and Effective Masked Diffusion Language Models , https://arxiv.org/abs/2406.07524 11 Jun, Never Miss A Beat: An Efficient Recipe for Context Window Extension of Large Language Models with Consistent "Middle" Enhancement , https://arxiv.org/abs/2406.07138 11 Jun, Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling , https://arxiv.org/abs/2406.07522 12 Jun, Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing , https://arxiv.org/abs/2406.08464 12 Jun, What If We Recaption Billions of Web Images with LLaMA-3? , https://arxiv.org/abs/2406.08478 12 Jun, Large Language Model Unlearning via Embedding-Corrupted Prompts , https://arxiv.org/abs/2406.07933 12 Jun, Large Language Models Must Be Taught to Know What They Don't Know , https://arxiv.org/abs/2406.08391 12 Jun, An Empirical Study of Mamba-based Language Models , https://arxiv.org/abs/2406.07887 12 Jun, Discovering Preference Optimization Algorithms with and for Large Language Models , https://arxiv.org/abs/2406.08414 13 Jun, Transformers Meet Neural Algorithmic Reasoners , https://arxiv.org/abs/2406.09308 13 Jun, MLKV: Multi-Layer Key-Value Heads for Memory Efficient Transformer Decoding , https://arxiv.org/abs/2406.09297 13 Jun, An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels , https://arxiv.org/abs/2406.09415 13 Jun, FouRA: Fourier Low Rank Adaptation , https://arxiv.org/abs/2406.08798 14 Jun, Bootstrapping Language Models with DPO Implicit Rewards , https://arxiv.org/abs/2406.09760 14 Jun, Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs , https://arxiv.org/abs/2406.10209 14 Jun, Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs , https://arxiv.org/abs/2406.10216 16 Jun, THEANINE: Revisiting Memory Management in Long-term Conversations with Timeline-augmented Response Generation , https://arxiv.org/abs/2406.10996 17 Jun, Task Me Anything , https://arxiv.org/abs/2406.11775 17 Jun, How Do Large Language Models Acquire Factual Knowledge During Pretraining? , https://arxiv.org/abs/2406.11813 17 Jun, mDPO: Conditional Preference Optimization for Multimodal Large Language Models , https://arxiv.org/abs/2406.11839 17 Jun, Nemotron-4 340B Technical Report , https://arxiv.org/abs/2406.11704 17 Jun, DataComp-LM: In Search of the Next Generation of Training Sets for Language Models , https://arxiv.org/abs/2406.11794 17 Jun, Tokenization Falling Short: The Curse of Tokenization , https://arxiv.org/abs/2406.11687 17 Jun, DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence , https://arxiv.org/abs/2406.11931 17 Jun, Unveiling Encoder-Free Vision-Language Models , https://arxiv.org/abs/2406.11832 17 Jun, Iterative Length-Regularized Direct Preference Optimization: A Case Study on Improving 7B Language Models to GPT-4 Level , https://arxiv.org/abs/2406.11817 17 Jun, HARE: HumAn pRiors, a key to small language model Efficiency , https://arxiv.org/abs/2406.11410 17 Jun, Measuring memorization in RLHF for code completion , https://arxiv.org/abs/2406.11715 17 Jun, Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts , https://arxiv.org/abs/2406.12034 18 Jun, From RAGs to Rich Parameters: Probing How Language Models Utilize External Knowledge Over Parametric Information for Factual Queries , https://arxiv.org/abs/2406.12824 18 Jun, Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges , https://arxiv.org/abs/2406.12624 19 Jun, Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? , https://arxiv.org/abs/2406.13121 20 Jun, Instruction Pre-Training: Language Models are Supervised Multitask Learners , https://arxiv.org/abs/2406.14491 20 Jun, Can LLMs Learn by Teaching? A Preliminary Study , https://arxiv.org/abs/2406.14629 21 Jun, A Tale of Trust and Accuracy: Base vs. Instruct LLMs in RAG Systems , https://arxiv.org/abs/2406.14972 21 Jun, LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs , https://arxiv.org/abs/2406.15319 21 Jun, MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression , https://arxiv.org/abs/2406.14909 21 Jun, Efficient Continual Pre-training by Mitigating the Stability Gap , https://arxiv.org/abs/2406.14833 24 Jun, Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers , https://arxiv.org/abs/2406.16747 24 Jun, WARP: On the Benefits of Weight Averaged Rewarded Policies , https://arxiv.org/abs/2406.16768 24 Jun, Adam-mini: Use Fewer Learning Rates To Gain More , https://arxiv.org/abs/2406.16793 25 Jun, The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale , https://arxiv.org/abs/2406.17557 25 Jun, LongIns: A Challenging Long-context Instruction-based Exam for LLMs , https://arxiv.org/abs/2406.17588 25 Jun, Following Length Constraints in Instructions , https://arxiv.org/abs/2406.17744 26 Jun, A Closer Look into Mixture-of-Experts in Large Language Models , https://arxiv.org/abs/2406.18219 26 Jun, RouteLLM: Learning to Route LLMs with Preference Data , https://arxiv.org/abs/2406.18665 26 Jun, Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs , https://arxiv.org/abs/2406.18629 27 Jun, Dataset Size Recovery from LoRA Weights , https://arxiv.org/abs/2406.19395 27 Jun, From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data , https://arxiv.org/abs/2406.19292 27 Jun, Changing Answer Order Can Decrease MMLU Accuracy , https://arxiv.org/abs/2406.19470 28 Jun, Direct Preference Knowledge Distillation for Large Language Models , https://arxiv.org/abs/2406.19774 28 Jun, LLM Critics Help Catch LLM Bugs , https://arxiv.org/abs/2407.00215 28 Jun, Scaling Synthetic Data Creation with 1,000,000,000 Personas , https://arxiv.org/abs/2406.20094 1 Jul, LLM See, LLM Do: Guiding Data Generation to Target Non-Differentiable Objectives , https://arxiv.org/abs/2407.01490 1 Jul, Searching for Best Practices in Retrieval-Augmented Generation , https://arxiv.org/abs/2407.01219 1 Jul, Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models , https://arxiv.org/abs/2407.01906 1 Jul, Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion , https://arxiv.org/abs/2407.01392 1 Jul, Eliminating Position Bias of Language Models: A Mechanistic Approach , https://arxiv.org/abs/2407.01100 2 Jul, JMInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention , https://arxiv.org/abs/2407.02490 2 Jul, TokenPacker: Efficient Visual Projector for Multimodal LLM , https://arxiv.org/abs/2407.02392 2 Jul, Reasoning in Large Language Models: A Geometric Perspective , https://arxiv.org/abs/2407.02678 2 Jul, RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs , https://arxiv.org/abs/2407.02485 3 Jul, AgentInstruct: Toward Generative Teaching with Agentic Flows , https://arxiv.org/abs/2407.03502 3 Jul, HEMM: Holistic Evaluation of Multimodal Foundation Models , https://arxiv.org/abs/2407.03418 4 Jul, Mixture of A Million Experts , https://arxiv.org/abs/2407.04153 5 Jul, Learning to (Learn at Test Time): RNNs with Expressive Hidden States , https://arxiv.org/abs/2407.04620 9 Jul, Vision Language Models Are Blind , https://arxiv.org/abs/2407.06581 9 Jul, Self-Recognition in Language Models , https://arxiv.org/abs/2407.06946 10 Jul, Inference Performance Optimization for Large Language Models on CPUs , https://arxiv.org/abs/2407.07304 11 Jul, Gradient Boosting Reinforcement Learning , https://arxiv.org/abs/2407.08250 11 Jul, FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision , https://arxiv.org/abs/2407.08608 12 Jul, SpreadsheetLLM: Encoding Spreadsheets for Large Language Models , https://arxiv.org/abs/2407.09025 12 Jul, New Desiderata for Direct Preference Optimization , https://arxiv.org/abs/2407.09072 12 Jul, Context Embeddings for Efficient Answer Generation in RAG , https://arxiv.org/abs/2407.09252 15 Jul, Qwen2 Technical Report , https://arxiv.org/abs/2407.10671 15 Jul, The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism , https://arxiv.org/abs/2407.10457 15 Jul, From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients , https://arxiv.org/abs/2407.11239 16 Jul, GoldFinch: High Performance RWKV/Transformer Hybrid with Linear Pre-Fill and Extreme KV-Cache Compression , https://arxiv.org/abs/2407.12077 16 Jul, Scaling Diffusion Transformers to 16 Billion Parameters , https://arxiv.org/abs/2407.11633 16 Jul, NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window? , https://arxiv.org/abs/2407.11963 17 Jul, Patch-Level Training for Large Language Models , https://arxiv.org/abs/2407.12665 17 Jul, LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models , https://arxiv.org/abs/2407.12772 17 Jul, A Survey of Prompt Engineering Methods in Large Language Models for Different NLP Tasks , https://arxiv.org/abs/2407.12994 17 Jul, Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models , https://arxiv.org/abs/2407.12327 18 Jul, Attention Overflow: Language Model Input Blur during Long-Context Missing Items Recommendation , https://arxiv.org/abs/2407.13481 18 Jul, Weak-to-Strong Reasoning , https://arxiv.org/abs/2407.13647 18 Jul, Understanding Reference Policies in Direct Preference Optimization , https://arxiv.org/abs/2407.13709 18 Jul, Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies , https://arxiv.org/abs/2407.13623 19 Jul, BOND: Aligning LLMs with Best-of-N Distillation , https://arxiv.org/abs/2407.14622 19 Jul, Compact Language Models via Pruning and Knowledge Distillation , https://arxiv.org/abs/2407.14679 19 Jul, LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference , https://arxiv.org/abs/2407.14057 22 Jul, Mini-Sequence Transformer: Optimizing Intermediate Memory for Long Sequences Training , https://arxiv.org/abs/2407.15892 22 Jul, DDK: Distilling Domain Knowledge for Efficient Large Language Models , https://arxiv.org/abs/2407.16154 23 Jul, Generation Constraint Scaling Can Mitigate Hallucination , https://arxiv.org/abs/2407.16908 23 Jul, Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach , https://arxiv.org/abs/2407.16833 23 Jul, Course-Correction: Safety Alignment Using Synthetic Preferences , https://arxiv.org/abs/2407.16637 26 Jul, Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data? , https://arxiv.org/abs/2407.16607 28 Jul, Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge , https://arxiv.org/abs/2407.19594 29 Jul, Improving Retrieval Augmented Language Model with Self-Reasoning , https://arxiv.org/abs/2407.19813 29 Jul, Apple Intelligence Foundation Language Models , https://arxiv.org/abs/2407.21075 30 Jul, ThinK: Thinner Key Cache by Query-Driven Pruning , https://arxiv.org/abs/2407.21018 31 Jul, The Llama 3 Herd of Models , https://arxiv.org/abs/2407.21783 31 Jul, Gemma 2: Improving Open Language Models at a Practical Size , https://arxiv.org/abs/2408.00118 1 Aug, S AM 2: Segment Anything in Images and Videos, https://arxiv.org/abs/2408.00714 2 Aug, POA: Pre-training Once for Models of All Sizes, https://arxiv.org/abs/2408.01031 2 Aug, RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework, https://arxiv.org/abs/2408.01262 2 Aug, A Survey of Mamba, https://arxiv.org/abs/2408.01129 3 Aug, MiniCPM-V: A GPT-4V Level MLLM on Your Phone, https://arxiv.org/abs/2408.01800 5 Aug, RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation, https://arxiv.org/abs/2408.02545 5 Aug, Self-Taught Evaluators, https://arxiv.org/abs/2408.02666 5 Aug, BioMamba: A Pre-trained Biomedical Language Representation Model Leveraging Mamba, https://arxiv.org/abs/2408.02600 5 Aug, Self-Taught Evaluators, https://arxiv.org/abs/2408.02666 7 Aug, EXAONE 3.0 7.8B Instruction Tuned Language Model, https://arxiv.org/abs/2408.03541 7 Aug, 1.5-Pints Technical Report: Pretraining in Days, Not Months -- Your Language Model Thrives on Quality Data, https://arxiv.org/abs/2408.03506 8 Aug, Conversational Prompt Engineering, https://arxiv.org/abs/2408.04560 8 Aug, Trans-Tokenization and Cross-lingual Vocabulary Transfers: Language Adaptation of LLMs for Low-Resource NLP, https://arxiv.org/abs/2408.04303 12 Aug, The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery, https://arxiv.org/abs/2408.06292 15 Aug, Hermes 3 Technical Report, https://arxiv.org/abs/2408.12570 19 Aug, Customizing Language Models with Instance-wise LoRA for Sequential Recommendation, https://arxiv.org/abs/2408.10159 20 Aug , Enhancing Robustness in Large Language Models: Prompting for Mitigating the Impact of Irrelevant Information, https://arxiv.org/abs/2408.10615 20 Aug, To Code, or Not To Code? Exploring Impact of Code in Pre-training, https://arxiv.org/abs/2408.10914 21 Aug , LLM Pruning and Distillation in Practice: The Minitron Approach, https://arxiv.org/abs/2408.11796 22 Aug, Jamba-1.5: Hybrid Transformer-Mamba Models at Scale, https://arxiv.org/abs/2408.12570 22 Aug, Controllable Text Generation for Large Language Models: A Survey, https://arxiv.org/abs/2408.12599 23 Aug, Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time, https://arxiv.org/abs/2408.13233 26 Aug, A Practitioner's Guide to Continual Multimodal Pretraining, https://arxiv.org/abs/2408.14471 26 Aug, Building and better understanding vision-language models: insights and future directions, https://arxiv.org/abs/2408.12637 26 Aug, CURLoRA: Stable LLM Continual Fine-Tuning and Catastrophic Forgetting Mitigation, https://arxiv.org/abs/2408.14572 27 Aug, The Mamba in the Llama: Distilling and Accelerating Hybrid Models, https://arxiv.org/abs/2408.15237 28 Aug, ReMamba: Equip Mamba with Effective Long-Sequence Modeling, https://arxiv.org/abs/2408.15496 29 Aug, Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling, https://arxiv.org/abs/2408.16737 31 Aug, LongRecipe: Recipe for Efficient Long Context Generalization in Large Languge Models, https://arxiv.org/abs/2409.00509 3 Sep, OLMoE: Open Mixture-of-Experts Language Models, https://arxiv.org/abs/2409.02060 3 Sep 2024, In Defense of RAG in the Era of Long-Context Language Models, https://arxiv.org/abs/2409.01666 5 Sep, Attention Heads of Large Language Models: A Survey, https://arxiv.org/abs/2409.03752 5 Sep, LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA , https://arxiv.org/abs/2409.02897 5 Sep, How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with High-Quality Data, https://arxiv.org/abs/2409.03810 6 Sep, T heory, Analysis, and Best Practices for Sigmoid Self-Attention, https://arxiv.org/abs/2409.04431 10 Sep, LLaMA-Omni: Seamless Speech Interaction with Large Language Models, https://arxiv.org/abs/2409.06666 10 Sep, What is the Role of Small Models in the LLM Era: A Survey, https://arxiv.org/abs/2409.06857 11 Sep, Policy Filtration in RLHF to Fine-Tune LLM for Code Generation, https://arxiv.org/abs/2409.06957 16 Sep, RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval , https://arxiv.org/abs/2409.10516 18 Sep, Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement , https://arxiv.org/abs/2409.12122 18 Sep, Qwen2.5-Coder Technical Report , https://arxiv.org/abs/2409.12186 21 Sep, Instruction Following without Instruction Tuning, https://arxiv.org/abs/2409.14254 30 Sep, I s Preference Alignment Always the Best Option to Enhance LLM-Based Translation? An Empirical Analysis, https://arxiv.org/abs/2409.20059 30 Sep, The Perfect Blend: Redefining RLHF with Mixture of Judges, https://arxiv.org/abs/2409.20370 (New paper by Meta on how they did RLHF for Llama 3) 1 Oct, Addition is All You Need for Energy-efficient Language Models, https://arxiv.org/abs/2410.00907 2 Oct Quantifying Generalization Complexity for Large Language Models, https://arxiv.org/abs/2410.01769 2 Oct, When a language model is optimized for reasoning, does it still show embers of autoregression? An analysis of OpenAI o1 , https://arxiv.org/abs/2410.01792 2 Oct, W ere RNNs All We Needed? , https://arxiv.org/abs/2410.01201 3 Oct, Selective Attention Improves Transformer , https://arxiv.org/abs/2410.02703 3 Oct, LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations , https://arxiv.org/abs/2410.02707 3 Oct, LLaVA-Critic: Learning to Evaluate Multimodal Models , https://arxiv.org/abs/2410.02712 7 Oct, Differential Transformer , https://arxiv.org/abs/2410.05258 7 Oct, GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models , https://arxiv.org/abs/2410.05229 8 Oct, ARIA: An Open Multimodal Native Mixture-of-Experts Model , https://arxiv.org/abs/2410.05993 8 Oct, O1 Replication Journey: A Strategic Progress Report -- Part 1 , https://arxiv.org/abs/2410.18982 8 Oct, Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG, https://arxiv.org/abs/2410.05983 9 Oct, From Generalist to Specialist: Adapting Vision Language Models via Task-Specific Visual Instruction Tuning , https://arxiv.org/abs/2410.06456 10 Oct, KV Prediction for Improved Time to First Token , https://arxiv.org/abs/2410.08391 11 Oct, Baichuan-Omni Technical Report , https://arxiv.org/abs/2410.08565 13 Oct, MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models , https://arxiv.org/abs/2410.10139 13 Oct, LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models , https://arxiv.org/abs/2410.09732 15 Oct, AFlow: Automating Agentic Workflow Generation , https://arxiv.org/abs/2410.10762 15 Oct, Toward General Instruction-Following Alignment for Retrieval-Augmented Generation , https://arxiv.org/abs/2410.09584 21 Oct, Pre-training Distillation for Large Language Models: A Design Space Exploration , https://arxiv.org/abs/2410.16215 23 Oct, MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models , https://arxiv.org/abs/2410.17637 23 Oct, Scalable Ranked Preference Optimization for Text-to-Image Generation , https://arxiv.org/abs/2410.18013 23 Oct, Scaling Diffusion Language Models via Adaptation from Autoregressive Models , https://arxiv.org/abs/2410.17891 24 Oct, Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback , https://arxiv.org/abs/2410.19133 25 Oct, Counting Ability of Large Language Models and Impact of Tokenization , https://arxiv.org/abs/2410.19730 25 Oct, A Survey of Small Language Models , https://arxiv.org/abs/2410.20011 26 Oct, Accelerating Direct Preference Optimization with Prefix Sharing , https://arxiv.org/abs/2410.20305 27 Oct, Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse , https://arxiv.org/abs/2410.21333 28 Oct, LongReward: Improving Long-context Large Language Models with AI Feedback , https://arxiv.org/abs/2410.21252 28 Oct, ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference , https://arxiv.org/abs/2410.21465 29 Oct, Beyond Text: Optimizing RAG with Multimodal Inputs for Industrial Applications , https://arxiv.org/abs/2410.21943 30 Oct, CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation , https://arxiv.org/abs/2410.23090 31 Oct, What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective , https://arxiv.org/abs/2410.23743 31 Oct, GPT or BERT: why not both? , https://arxiv.org/abs/2410.24159 31 Oct, Language Models can Self-Lengthen to Generate Long Texts , https://arxiv.org/abs/2410.23933 1 Nov, Adding Error Bars to Evals: A Statistical Approach to Language Model Evaluations , https://arxiv.org/abs/2411.00640 1 Nov 2024, Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation , https://arxiv.org/abs/2411.00412 1 Nov 2024, Multi-expert Prompting Improves Reliability, Safety, and Usefulness of Large Language Models , https://arxiv.org/abs/2411.00492 3 Nov, S ample-Efficient Alignment for LLMs , https://arxiv.org/abs/2411.01493 4 Nov 2024, A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness , https://arxiv.org/abs/2411.03350 4 Nov, "Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization , https://arxiv.org/abs/2411.02355 4 Nov, Parameter-Efficient Fine-Tuning of Large Language Models for Unit Test Generation: An Empirical Study , https://arxiv.org/abs/2411.02462 5 Nov, HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems , https://arxiv.org/abs/2411.02959 6 Nov, Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination , https://arxiv.org/abs/2411.03823 6 Nov, Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding , https://arxiv.org/abs/2411.04282 6 Nov, Number Cookbook: Number Understanding of Language Models and How to Improve It , https://arxiv.org/abs/2411.03766 7 Nov, Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models , https://arxiv.org/abs/2411.04996 7 Nov, BitNet a4.8: 4-bit Activations for 1-bit LLMs , https://arxiv.org/abs/2411.04965 7 Nov, Scaling Laws for Precision , https://arxiv.org/abs/2411.04330 8 Nov, Energy Efficient Protein Language Models: Leveraging Small Language Models with LoRA for Controllable Protein Generation , https://arxiv.org/abs/2411.05966 8 Nov, Balancing Pipeline Parallelism with Vocabulary Parallelism , https://arxiv.org/abs/2411.05288 11 Nov, Toward Optimal Search and Retrieval for RAG , https://arxiv.org/abs/2411.07396 12 Nov, Large Language Models Can Self-Improve in Long-context Reasoning , https://arxiv.org/abs/2411.08147 12 Nov, Stronger Models are NOT Stronger Teachers for Instruction Tuning , https://arxiv.org/abs/2411.07133 12 Nov, Direct Preference Optimization Using Sparse Feature-Level Constraints , https://arxiv.org/abs/2411.07618 13 Nov, Cut Your Losses in Large-Vocabulary Language Models , https://arxiv.org/abs/2411.09009 15 Nov, Does Prompt Formatting Have Any Impact on LLM Performance? , https://arxiv.org/abs/2411.10541 17 Nov, SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization , https://arxiv.org/abs/2411.11909 17 Nov, SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration , https://arxiv.org/abs/2411.10958 18 Nov, Bi-Mamba: Towards Accurate 1-Bit State Space Models , https://arxiv.org/abs/2411.11843 19 Nov, RedPajama: an Open Dataset for Training Large Language Models, https://arxiv.org/abs/2411.12372 20 Nov, Hymba: A Hybrid-head Architecture for Small Language Models , https://arxiv.org/abs/2411.13676 20 Nov, Loss-to-Loss Prediction: Scaling Laws for All Datasets , https://arxiv.org/abs/2411.12925 21 Nov, When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training , https://arxiv.org/abs/2411.13476 21 Nov, Multimodal Autoregressive Pre-training of Large Vision Encoders , https://arxiv.org/abs/2411.14402 21 Nov, Natural Language Reinforcement Learning , https://arxiv.org/abs/2411.14251 22 Nov, Large Multi-modal Models Can Interpret Features in Large Multi-modal Models , https://arxiv.org/abs/2411.14982 22 Nov, TÜLU 3: Pushing Frontiers in Open Language Model Post-Training , https://arxiv.org/abs/2411.15124 23 Nov, MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs , https://arxiv.org/abs/2411.15296 24 Nov, LLMs Do Not Think Step-by-step In Implicit Reasoning , https://arxiv.org/abs/2411.15862 25 Nov, O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson? , https://arxiv.org/abs/2411.16489 26 Nov, Star Attention: Efficient LLM Inference over Long Sequences , https://arxiv.org/abs/2411.17116 27 Nov, Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens , https://arxiv.org/abs/2411.17691 27 Nov, Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration , https://arxiv.org/abs/2411.17686 29 Nov, Reverse Thinking Makes LLMs Stronger Reasoners , https://arxiv.org/abs/2411.19865 29 Nov, Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability , https://arxiv.org/abs/2411.19943 2 Dec, Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis , https://arxiv.org/abs/2412.01819 2 Dec, X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models , https://arxiv.org/abs/2412.01824 2 Dec, Free Process Rewards without Process Labels , https://arxiv.org/abs/2412.01981 3 Dec, Scaling Image Tokenizers with Grouped Spherical Quantization , https://arxiv.org/abs/2412.02632 3 Dec, RARE: Retrieval-Augmented Reasoning Enhancement for Large Language Models , https://arxiv.org/abs/2412.02830 4 Dec, Perception Tokens Enhance Visual Reasoning in Multimodal Language Models , https://arxiv.org/abs/2412.03548 4 Dec, Evaluating Language Models as Synthetic Data Generators , https://arxiv.org/abs/2412.03679 4 Dec, Best-of-N Jailbreaking , https://arxiv.org/abs/2412.03556 4 Dec, PaliGemma 2: A Family of Versatile VLMs for Transfer , https://arxiv.org/abs/2412.03555 5 Dec, VisionZip: Longer is Better but Not Necessary in Vision Language Models , https://arxiv.org/abs/2412.04467 5 Dec, Evaluating and Aligning CodeLLMs on Human Preference , https://arxiv.org/abs/2412.05210 6 Dec, MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale , https://arxiv.org/abs/2412.05237 6 Dec, Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling , https://arxiv.org/abs/2412.05271 7 Dec, LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods , https://arxiv.org/abs/2412.05579 8 Dec, Does RLHF Scale? Exploring the Impacts From Data, Model, and Method , https://arxiv.org/abs/2412.06000 9 Dec, Unraveling the Complexity of Memory in RL Agents: An Approach for Classification and Evaluation , https://arxiv.org/abs/2412.06531 9 Dec, Training Large Language Models to Reason in a Continuous Latent Space , https://arxiv.org/abs/2412.06769 9 Dec, AutoReason: Automatic Few-Shot Reasoning Decomposition , https://arxiv.org/abs/2412.06975 11 Dec, Large Concept Models: Language Modeling in a Sentence Representation Space , https://arxiv.org/abs/2412.08821 12 Dec, Phi-4 Technical Report , https://arxiv.org/abs/2412.08905 13 Dec, Byte Latent Transformer: Patches Scale Better Than Tokens , https://arxiv.org/abs/2412.09871 13 Dec, SCBench: A KV Cache-Centric Analysis of Long-Context Methods , https://arxiv.org/abs/2412.10319 13 Dec, Cultural Evolution of Cooperation among LLM Agents , https://arxiv.org/abs/2412.10270 13 Dec, DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding , https://arxiv.org/abs/2412.10302 16 Dec, No More Adam: Learning Rate Scaling at Initialization is All You Need , https://arxiv.org/abs/2412.11768 16 Dec, Precise Length Control in Large Language Models , https://arxiv.org/abs/2412.11937 16 Dec, The Open Source Advantage in Large Language Models (LLMs) , https://arxiv.org/abs/2412.12004 16 Dec, A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges , https://arxiv.org/abs/2412.11936 17 Dec, Are Your LLMs Capable of Stable Reasoning? , https://arxiv.org/abs/2412.13147 18 Dec, LLM Post-Training Recipes, Improving Reasoning in LLMs , https://arxiv.org/abs/2412.14135 18 Dec, Hansel: Output Length Controlling Framework for Large Language Models , https://arxiv.org/abs/2412.14033 18 Dec, Mind Your Theory: Theory of Mind Goes Deeper Than Reasoning , https://arxiv.org/abs/2412.13631 18 Dec, Alignment Faking in Large Language Models , https://arxiv.org/abs/2412.14093 18 Dec, SCOPE: Optimizing Key-Value Cache Compression in Long-Context Generation , https://arxiv.org/abs/2412.13649 19 Dec, LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-Context Multitasks , https://arxiv.org/abs/2412.15204 20 Dec, Offline Reinforcement Learning for LLM Multi-Step Reasoning , https://arxiv.org/abs/2412.16145 24 Dec, Mulberry: Empowering MLLM with O1-like Reasoning and Reflection via Collective Monte Carlo Tree Search , https://arxiv.org/abs/2412.18319 31 Dec, Titans: Learning to Memorize at Test Time , https://arxiv.org/abs/2501.00663 This magazine is a personal passion project. For those who wish to support me, please consider purchasing a copy of my Build a Large Language Model (From Scratch) book . (I am confident that you'll get lots out of this book as it explains how LLMs work in a level of detail that is not found anywhere else.) Build a Large Language Model (From Scratch) now available on Amazon If you read the book and have a few minutes to spare, I'd really appreciate a brief review . It helps us authors a lot! Alternatively, I also recently enabled the paid subscription option on Substack to support this magazine directly. Ahead of AI is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. Bonus materials in the GitHub repository (stars highlight my personal favorites) Thanks for your understanding and support, and I hope to make a full recovery soon and be back with the Research Highlights 2024 article in a few weeks! Ahead of AI is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. January 2024 1 Jan, Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models , https://arxiv.org/abs/2401.00788 2 Jan, A Comprehensive Study of Knowledge Editing for Large Language Models , https://arxiv.org/abs/2401.01286 2 Jan, LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning , https://arxiv.org/abs/2401.01325 2 Jan, Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models , https://arxiv.org/abs/2401.01335 2 Jan, LLaMA Beyond English: An Empirical Study on Language Capability Transfer , https://arxiv.org/abs/2401.01055 3 Jan, A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity , https://arxiv.org/abs/2401.01967 4 Jan, LLaMA Pro: Progressive LLaMA with Block Expansion , https://arxiv.org/abs/2401.02415 4 Jan, LLM Augmented LLMs: Expanding Capabilities through Composition , https://arxiv.org/abs/2401.02412 4 Jan, Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM , https://arxiv.org/abs/2401.02994 5 Jan, DeepSeek LLM: Scaling Open-Source Language Models with Longtermism , https://arxiv.org/abs/2401.02954 5 Jan, Denoising Vision Transformers , https://arxiv.org/abs/2401.02957 7 Jan, Soaring from 4K to 400K: Extending LLM’s Context with Activation Beacon , https://arxiv.org/abs/2401.03462 8 Jan, Mixtral of Experts , https://arxiv.org/abs/2401.04088 8 Jan, MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts , https://arxiv.org/abs/2401.04081 8 Jan, A Minimaximalist Approach to Reinforcement Learning from Human Feedback , https://arxiv.org/abs/2401.04056 9 Jan, RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation , https://arxiv.org/abs/2401.04679 10 Jan, Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training , https://arxiv.org/abs/2401.05566 11 Jan, Transformers are Multi-State RNNs , https://arxiv.org/abs/2401.06104 11 Jan, A Closer Look at AUROC and AUPRC under Class Imbalance , https://arxiv.org/abs/2401.06091 12 Jan, An Experimental Design Framework for Label-Efficient Supervised Finetuning of Large Language Models , https://arxiv.org/abs/2401.06692 16 Jan, Tuning Language Models by Proxy , https://arxiv.org/abs/2401.08565 16 Jan, Scalable Pre-training of Large Autoregressive Image Models , https://arxiv.org/abs/2401.08541 16 Jan, Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering , https://arxiv.org/abs/2401.08500 16 Jan, RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture , https://arxiv.org/abs/2401.08406 17 Jan, ReFT: Reasoning with Reinforced Fine-Tuning , https://arxiv.org/abs/2401.08967 18 Jan, DiffusionGPT: LLM-Driven Text-to-Image Generation System , https://arxiv.org/abs/2401.10061 18 Jan, Self-Rewarding Language Models , https://arxiv.org/abs/2401.10020 18 Jan, VMamba: Visual State Space Model , https://arxiv.org/abs/2401.10166 19 Jan, Knowledge Fusion of Large Language Models , https://arxiv.org/abs/2401.10491 22 Jan, SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities , https://arxiv.org/abs/2401.12168 22 Jan, WARM: On the Benefits of Weight Averaged Reward Models , https://arxiv.org/abs/2401.12187 22 Jan, Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text , https://arxiv.org/abs/2401.12070 24 Jan, MambaByte: Token-free Selective State Space Model , https://arxiv.org/abs/2401.13660 24 Jan, SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced Token Detection , https://arxiv.org/abs/2401.13160 25 Jan, Rethinking Patch Dependence for Masked Autoencoders , https://arxiv.org/abs/2401.14391 25 Jan, Pix2gestalt: Amodal Segmentation by Synthesizing Wholes , https://arxiv.org/abs/2401.14398 25 Jan, Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities , https://arxiv.org/abs/2401.14405 26 Jan, EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty , https://arxiv.org/abs/2401.15077 29 Jan, MoE-LLaVA: Mixture of Experts for Large Vision-Language Models , https://arxiv.org/abs/2401.15947 29 Jan, Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling , https://arxiv.org/abs/2401.16380 31 Jan, KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization , https://arxiv.org/abs/2401.18079 1 Feb, Efficient Exploration for LLMs , https://arxiv.org/abs/2402.00396 1 Feb, OLMo: Accelerating the Science of Language Models , https://arxiv.org/abs/2402.00838 1 Feb, Tiny Titans: Can Smaller Large Language Models Punch Above Their Weight in the Real World for Meeting Summarization? , https://arxiv.org/abs/2402.00841 1 Feb, Repeat After Me: Transformers are Better than State Space Models at Copying , https://arxiv.org/abs/2402.01032 2 Feb, LiPO: Listwise Preference Optimization through Learning-to-Rank , https://arxiv.org/abs/2402.01878 2 Feb, FindingEmo: An Image Dataset for Emotion Recognition in the Wild , https://arxiv.org/abs/2402.01355 3 Feb, More Agents Is All You Need , https://arxiv.org/abs/2402.05120 5 Feb, DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models , https://arxiv.org/abs/2402.03300 6 Feb, MobileVLM V2: Faster and Stronger Baseline for Vision Language Model , https://arxiv.org/abs/2402.03766 6 Feb, A Phase Transition Between Positional and Semantic Learning in a Solvable Model of Dot-Product Attention , https://arxiv.org/abs/2402.03902 6 Feb, Scaling Laws for Downstream Task Performance of Large Language Models , https://arxiv.org/abs/2402.04177 6 Feb, MOMENT: A Family of Open Time-series Foundation Models , https://arxiv.org/abs/2402.03885 6 Feb, Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models , https://arxiv.org/abs/2402.03749 6 Feb, Self-Discover: Large Language Models Self-Compose Reasoning Structures , https://arxiv.org/abs/2402.03620 7 Feb, Grandmaster-Level Chess Without Search , https://arxiv.org/abs/2402.04494 7 Feb, Direct Language Model Alignment from Online AI Feedback , https://arxiv.org/abs/2402.04792 8 Feb, Buffer Overflow in Mixture of Experts , https://arxiv.org/abs/2402.05526 9 Feb, The Boundary of Neural Network Trainability is Fractal , https://arxiv.org/abs/2402.06184 11 Feb, ODIN: Disentangled Reward Mitigates Hacking in RLHF , https://arxiv.org/abs/2402.07319 12 Feb, Policy Improvement using Language Feedback Models , https://arxiv.org/abs/2402.07876 12 Feb, Scaling Laws for Fine-Grained Mixture of Experts , https://arxiv.org/abs/2402.07871 12 Feb, Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model , https://arxiv.org/abs/2402.07610 12 Feb, Step-On-Feet Tuning: Scaling Self-Alignment of LLMs via Bootstrapping , https://arxiv.org/abs/2402.07610 12 Feb, Suppressing Pink Elephants with Direct Principle Feedback , https://arxiv.org/abs/2402.07896 13 Feb, World Model on Million-Length Video And Language With RingAttention , https://arxiv.org/abs/2402.08268 13 Feb, Mixtures of Experts Unlock Parameter Scaling for Deep RL , https://arxiv.org/abs/2402.08609 14 Feb, DoRA: Weight-Decomposed Low-Rank Adaptation , https://arxiv.org/abs/2402.09353 14 Feb, Transformers Can Achieve Length Generalization But Not Robustly , https://arxiv.org/abs/2402.09371 15 Feb, BASE TTS: Lessons From Building a Billion-Parameter Text-to-Speech Model on 100K Hours of Data , https://arxiv.org/abs/2402.08093 15 Feb, Recovering the Pre-Fine-Tuning Weights of Generative Models , https://arxiv.org/abs/2402.10208 15 Feb, Generative Representational Instruction Tuning , https://arxiv.org/abs/2402.09906 16 Feb, FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models , https://arxiv.org/abs/2402.10986 17 Feb, OneBit: Towards Extremely Low-bit Large Language Models , https://arxiv.org/abs/2402.11295 18 Feb, LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration , https://arxiv.org/abs/2402.11550 19 Feb, Reformatted Alignment , https://arxiv.org/abs/2402.12219 19 Feb, AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling , https://arxiv.org/abs/2402.12226 19 Feb, Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs , https://arxiv.org/abs/2402.12030 19 Feb, LoRA+: Efficient Low Rank Adaptation of Large Models , https://arxiv.org/abs/2402.12354 20 Feb, Neural Network Diffusion , https://arxiv.org/abs/2402.13144 21 Feb, YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information , https://arxiv.org/abs/2402.13616 21 Feb, LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens , https://arxiv.org/abs/2402.13753 21 Feb, Large Language Models for Data Annotation: A Survey , https://arxiv.org/abs/2402.13446 22 Feb, TinyLLaVA: A Framework of Small-scale Large Multimodal Models , https://arxiv.org/abs/2402.14289 22 Feb, Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs , https://arxiv.org/abs/2402.14740 23 Feb, Genie: Generative Interactive Environments , https://arxiv.org/abs/2402.15391 26 Feb, CARTE: Pretraining and Transfer for Tabular Learning , https://arxiv.org/abs/2402.16785 27 Feb, The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits , https://arxiv.org/abs/2402.17764 27 Feb, Sora Generates Videos with Stunning Geometrical Consistency , https://arxiv.org/abs/2402.17403 27 Feb, When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method , https://arxiv.org/abs/2402.17193 29 Feb, Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models , https://arxiv.org/abs/2402.19427 1 Mar, Learning and Leveraging World Models in Visual Representation Learning , https://arxiv.org/abs/2403.00504 3 Mar, Improving LLM Code Generation with Grammar Augmentation , https://arxiv.org/abs/2403.01632 3 Mar, The Hidden Attention of Mamba Models , https://arxiv.org/abs/2403.01590 4 Mar, Training-Free Pretrained Model Merging , https://arxiv.org/abs/2403.01753 4 Mar, Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures , https://arxiv.org/abs/2403.02308 5 Mar, The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning , https://arxiv.org/abs/2403.03218 5 Mar, Evolution Transformer: In-Context Evolutionary Optimization , https://arxiv.org/abs/2403.02985 5 Mar, Enhancing Vision-Language Pre-training with Rich Supervisions , https://arxiv.org/abs/2403.03346 5 Mar, Scaling Rectified Flow Transformers for High-Resolution Image Synthesis , https://arxiv.org/abs/2403.03206 5 Mar, Design2Code: How Far Are We From Automating Front-End Engineering? , https://arxiv.org/abs/2403.03163 6 Mar, ShortGPT: Layers in Large Language Models are More Redundant Than You Expect , https://arxiv.org/abs/2403.03853 6 Mar, Backtracing: Retrieving the Cause of the Query , https://arxiv.org/abs/2403.03956 6 Mar, Learning to Decode Collaboratively with Multiple Language Models , https://arxiv.org/abs/2403.03870 6 Mar, SaulLM-7B: A pioneering Large Language Model for Law , https://arxiv.org/abs/2403.03883 6 Mar, Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious Challenges in Multimodal Reasoning , https://arxiv.org/abs/2403.03864 6 Mar, 3D Diffusion Policy , https://arxiv.org/abs/2403.03954 6 Mar, MedMamba: Vision Mamba for Medical Image Classification , https://arxiv.org/abs/2403.03849 6 Mar, GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection , https://arxiv.org/abs/2403.03507 6 Mar, Stop Regressing: Training Value Functions via Classification for Scalable Deep RL , https://arxiv.org/abs/2403.03950 7 Mar, How Far Are We from Intelligent Visual Deductive Reasoning? , https://arxiv.org/abs/2403.04732 7 Mar, Common 7B Language Models Already Possess Strong Math Capabilities , https://arxiv.org/abs/2403.04706 8 Mar, Gemini 1.5: Unlocking Multimodal Understanding Across Millions of Tokens of Context , https://arxiv.org/abs/2403.05530 8 Mar, Is Cosine-Similarity of Embeddings Really About Similarity? , https://arxiv.org/abs/2403.05440 8 Mar, LLM4Decompile: Decompiling Binary Code with Large Language Models , https://arxiv.org/abs/2403.05286 9 Mar, Algorithmic Progress in Language Models , https://arxiv.org/abs/2403.05812 11 Mar, Stealing Part of a Production Language Model , https://arxiv.org/abs/2403.06634 12 Mar, Chronos: Learning the Language of Time Series , https://arxiv.org/abs/2403.07815 13 Mar, Simple and Scalable Strategies to Continually Pre-train Large Language Models , https://arxiv.org/abs/2403.08763 13 Mar, Language Models Scale Reliably With Over-Training and on Downstream Tasks , https://arxiv.org/abs/2403.08540 14 Mar, BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences , https://arxiv.org/abs/2403.09347 14 Mar, LocalMamba: Visual State Space Model with Windowed Selective Scan , https://arxiv.org/abs/2403.09338 14 Mar, GiT: Towards Generalist Vision Transformer through Universal Language Interface , https://arxiv.org/abs/2403.09394 14 Mar, MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training , https://arxiv.org/abs/2403.09611 15 Mar, RAFT: Adapting Language Model to Domain Specific RAG , https://arxiv.org/abs/2403.10131 18 Mar, TnT-LLM: Text Mining at Scale with Large Language Models , https://arxiv.org/abs/2403.12173 18 Mar, Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression , https://arxiv.org/abs/2403.15447 19 Mar, PERL: Parameter Efficient Reinforcement Learning from Human Feedback , https://arxiv.org/abs/2403.10704 20 Mar, RewardBench: Evaluating Reward Models for Language Modeling , https://arxiv.org/abs/2403.13787 20 Mar, LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models , https://arxiv.org/abs/2403.13372 21 Mar, RakutenAI-7B: Extending Large Language Models for Japanese , https://arxiv.org/abs/2403.15484 22 Mar, SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time Series , https://arxiv.org/abs/2403.15360 22 Mar, Can Large Language Models Explore In-Context? , https://arxiv.org/abs/2403.15371 22 Mar, LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement , https://arxiv.org/abs/2403.15042 25 Mar, LLM Agent Operating System , https://arxiv.org/abs/2403.16971 26 Mar, The Unreasonable Ineffectiveness of the Deeper Layers , https://arxiv.org/abs/2403.17887 27 Mar, BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text , https://arxiv.org/abs/2403.18421 27 Mar, ViTAR: Vision Transformer with Any Resolution , https://arxiv.org/abs/2403.18361 27 Mar, Long-form Factuality in Large Language Models , https://arxiv.org/abs/2403.18802 27 Mar, Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models , https://arxiv.org/abs/2403.18814 26 Mar, LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning , https://arxiv.org/abs/2403.17919 26 Mar, Mechanistic Design and Scaling of Hybrid Architectures , https://arxiv.org/abs/2403.17844 28 Mar, MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions , https://arxiv.org/abs/2403.19651 28 Mar, Model Stock: All We Need Is Just a Few Fine-Tuned Models , https://arxiv.org/abs/2403.19522 1 Apr, Do Language Models Plan Ahead for Future Tokens? , https://arxiv.org/abs/2404.00859 1 Apr, Bigger is not Always Better: Scaling Properties of Latent Diffusion Models , https://arxiv.org/abs/2404.01367 1 Apr, The Fine Line: Navigating Large Language Model Pretraining with Down-streaming Capability Analysis , https://arxiv.org/abs/2404.01204 1 Apr, Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models , https://arxiv.org/abs/2404.04478 2 Apr, Mixture-of-Depths: Dynamically Allocating Compute in Transformer-Based Language Models , https://arxiv.org/abs/2404.02258 2 Apr, Long-context LLMs Struggle with Long In-context Learning , https://arxiv.org/abs/2404.02060 2 Apr, Emergent Abilities in Reduced-Scale Generative Language Models , https://arxiv.org/abs/2404.02204 2 Apr, Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks , https://arxiv.org/abs/2404.02151 3 Apr, On the Scalability of Diffusion-based Text-to-Image Generation , https://arxiv.org/abs/2404.02883 3 Apr, BAdam: A Memory Efficient Full Parameter Training Method for Large Language Models , https://arxiv.org/abs/2404.02827 3 Apr, Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models , https://arxiv.org/abs/2404.02747 4 Apr, Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences , https://arxiv.org/abs/2404.02151 4 Apr, Training LLMs over Neurally Compressed Text , https://arxiv.org/abs/2404.03626 4 Apr, CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues , https://arxiv.org/abs/2404.03820 5 Apr, ReFT: Representation Finetuning for Language Models , https://arxiv.org/abs/2404.03592 5 Apr, Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data , https://arxiv.org/abs/2404.03862 5 Apr, Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation , https://arxiv.org/abs/2404.04256 8 Apr, AutoCodeRover: Autonomous Program Improvement , https://arxiv.org/abs/2404.05427 8 Apr, Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence , https://arxiv.org/abs/2404.05892 8 Apr, CodecLM: Aligning Language Models with Tailored Synthetic Data , https://arxiv.org/abs/2404.05875 9 Apr, MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies , https://arxiv.org/abs/2404.06395 9 Apr, Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models , https://arxiv.org/abs/2404.06209 9 Apr, LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders , https://arxiv.org/abs/2404.05961 10 Apr, Adapting LLaMA Decoder to Vision Transformer , https://arxiv.org/abs/2404.06773 10 Apr, Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention , https://arxiv.org/abs/2404.07143 11 Apr, LLoCO: Learning Long Contexts Offline , https://arxiv.org/abs/2404.07979 11 Apr, JetMoE: Reaching Llama2 Performance with 0.1M Dollars , https://arxiv.org/abs/2404.07413 11 Apr, Best Practices and Lessons Learned on Synthetic Data for Language Models , https://arxiv.org/abs/2404.07503 11 Apr, Rho-1: Not All Tokens Are What You Need , https://arxiv.org/abs/2404.07965 12 Apr, Pre-training Small Base LMs with Fewer Tokens , https://arxiv.org/abs/2404.08634 12 Apr, Dataset Reset Policy Optimization for RLHF , https://arxiv.org/abs/2404.08495 13 Apr, LLM In-Context Recall is Prompt Dependent , https://arxiv.org/abs/2404.08865 15 Apr, State Space Model for New-Generation Network Alternative to Transformers: A Survey , https://arxiv.org/abs/2404.09516 15 Apr, Chinchilla Scaling: A Replication Attempt , https://arxiv.org/abs/2404.10102 15 Apr, Learn Your Reference Model for Real Good Alignment , https://arxiv.org/abs/2404.09656 16 Apr, Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study , https://arxiv.org/abs/2404.10719 16 Apr, Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies , https://arxiv.org/abs/2404.08197 16 Apr, How Faithful Are RAG Models? Quantifying the Tug-of-War Between RAG and LLMs' Internal Prior , https://arxiv.org/abs/2404.10198 17 Apr, A Survey on Retrieval-Augmented Text Generation for Large Language Models , https://arxiv.org/abs/2404.10981 18 Apr, When LLMs are Unfit Use FastFit: Fast and Effective Text Classification with Many Classes , https://arxiv.org/abs/2404.12365 18 Apr, Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing , https://arxiv.org/abs/2404.12253 18 Apr, OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data , https://arxiv.org/abs/2404.12195 19 Apr, The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions , https://arxiv.org/abs/2404.13208 22 Apr, How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study , https://arxiv.org/abs/2404.14047 22 Apr, Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone , https://arxiv.org/abs/2404.14219 22 Apr, OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework , https://arxiv.org/abs/2404.14619 22 Apr, A Survey on Self-Evolution of Large Language Models , https://arxiv.org/abs/2404.14662 23 Apr, Multi-Head Mixture-of-Experts , https://arxiv.org/abs/2404.15045 23 Apr, NExT: Teaching Large Language Models to Reason about Code Execution , https://arxiv.org/abs/2404.14662 23 Apr, Graph Machine Learning in the Era of Large Language Models (LLMs) , https://arxiv.org/abs/2404.14928 24 Apr, Retrieval Head Mechanistically Explains Long-Context Factuality , https://arxiv.org/abs/2404.15574 25 Apr, Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding , https://arxiv.org/abs/2404.16710 25 Apr, Make Your LLM Fully Utilize the Context , https://arxiv.org/abs/2404.16811 28 Apr, LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report , https://arxiv.org/abs/2405.00732 30 Apr, Better & Faster Large Language Models via Multi-token Prediction , https://arxiv.org/abs/2404.19737 30 Apr, RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing , https://arxiv.org/abs/2404.19543 30 Apr, A Primer on the Inner Workings of Transformer-based Language Models , https://arxiv.org/abs/2405.00208 30 Apr, When to Retrieve: Teaching LLMs to Utilize Information Retrieval Effectively , https://arxiv.org/abs/2404.19705 30 Apr, KAN: Kolmogorov–Arnold Networks , https://arxiv.org/abs/2404.19756 1 May, Is Bigger Edit Batch Size Always Better? An Empirical Study on Model Editing with Llama-3 , https://arxiv.org/abs/2405.00664 1 May, Self-Play Preference Optimization for Language Model Alignment , https://arxiv.org/abs/2405.00675 1 May, A Careful Examination of Large Language Model Performance on Grade School Arithmetic , https://arxiv.org/abs/2405.00332 2 May, Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models , https://arxiv.org/abs/2405.01535 3 May, What Matters When Building Vision-Language Models? , https://arxiv.org/abs/2405.02246 5 May, Is Flash Attention Stable? , https://arxiv.org/abs/2405.02803 7 May, vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention , https://arxiv.org/abs/2405.04437 7 May, xLSTM: Extended Long Short-Term Memory , https://arxiv.org/abs/2405.04517 8 May, You Only Cache Once: Decoder-Decoder Architectures for Language Models , https://arxiv.org/abs/2405.05254 8 May, DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model , https://arxiv.org/abs/2405.04434 8 May, Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models , https://arxiv.org/abs/2405.05417 9 May, Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? , https://arxiv.org/abs/2405.05904 10 May, Value Augmented Sampling for Language Model Alignment and Personalization , https://arxiv.org/abs/2405.06639 12 May, PHUDGE: Phi-3 as Scalable Judge , https://arxiv.org/abs/2405.08029 13 May, RLHF Workflow: From Reward Modeling to Online RLHF , https://arxiv.org/abs/2405.07863 15 May, LoRA Learns Less and Forgets Less , https://arxiv.org/abs/2405.09673 15 May, Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model , https://arxiv.org/abs/2405.09215 16 May, Chameleon: Mixed-Modal Early-Fusion Foundation Models , https://arxiv.org/abs/2405.09818 17 May, Towards Modular LLMs by Building and Reusing a Library of LoRAs , https://arxiv.org/abs/2405.11157 19 May, SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization , https://arxiv.org/abs/2405.11582 20 May, MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning , https://arxiv.org/abs/2405.12130 22 May, Attention as an RNN , https://arxiv.org/abs/2405.13956 22 May, Dense Connector for MLLMs , https://arxiv.org/abs/2405.13800 23 May, AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability , https://arxiv.org/abs/2405.14129 23 May, SimPO: Simple Preference Optimization with a Reference-Free Reward , https://arxiv.org/abs/2405.14734 23 May, Instruction Tuning With Loss Over Instructions , https://arxiv.org/abs/2405.14394 24 May, The Road Less Scheduled , https://arxiv.org/abs/2405.15682 26 May, Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training , https://arxiv.org/abs/2405.15319 26 May, gzip Predicts Data-dependent Scaling Laws , https://arxiv.org/abs/2405.16684 27 May, Trans-LoRA: Towards Data-free Transferable Parameter Efficient Finetuning , https://arxiv.org/abs/2405.17258 28 May, VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections , https://arxiv.org/abs/2405.17991 28 May, LLaMA-NAS: Efficient Neural Architecture Search for Large Language Models , https://arxiv.org/abs/2405.18377 29 May, Contextual Position Encoding: Learning to Count What's Important , https://arxiv.org/abs/2405.18719 2 Jun, Show, Don't Tell: Aligning Language Models with Demonstrated Feedback , https://arxiv.org/abs/2406.00888 3 Jun, Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models , https://arxiv.org/abs/2406.06563 3 Jun, OLoRA: Orthonormal Low-Rank Adaptation of Large Language Models , https://arxiv.org/abs/2406.01775 3 Jun, The Geometry of Categorical and Hierarchical Concepts in Large Language Models , https://arxiv.org/abs/2406.01506 3 Jun, Towards Scalable Automated Alignment of LLMs: A Survey , https://arxiv.org/abs/2406.01252 4 Jun, Scalable MatMul-free Language Modeling , https://arxiv.org/abs/2406.02528 4 Jun, Block Transformer: Global-to-Local Language Modeling for Fast Inference , https://arxiv.org/abs/2406.02657 6 Jun, Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models , https://arxiv.org/abs/2406.04271 6 Jun, The Prompt Report: A Systematic Survey of Prompting Techniques , https://arxiv.org/abs/2406.06608 6 Jun, Transformers Need Glasses! Information Over-Squashing in Language Tasks , https://arxiv.org/abs/2406.04267 6 Jun, Are We Done with MMLU? , https://arxiv.org/abs/2406.04127 6 Jun, Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step , https://arxiv.org/abs/2406.04314 7 Jun, Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach , https://arxiv.org/abs/2406.04594 7 Jun, CRAG -- Comprehensive RAG Benchmark , https://arxiv.org/abs/2406.04744 7 Jun, WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild , https://arxiv.org/abs/2406.04770 7 Jun, Mixture-of-Agents Enhances Large Language Model Capabilities , https://arxiv.org/abs/2406.04692 7 Jun, BERTs are Generative In-Context Learners , https://arxiv.org/abs/2406.04823 7 Jun, 3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination , https://arxiv.org/abs/2406.05132 8 Jun, Creativity Has Left the Chat: The Price of Debiasing Language Models , https://arxiv.org/abs/2406.05587 10 Jun, Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation , https://arxiv.org/abs/2406.06525 10 Jun, Margin-aware Preference Optimization for Aligning Diffusion Models Without Reference , https://arxiv.org/abs/2406.06424 10 Jun, Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning , https://arxiv.org/abs/2406.06469 10 Jun, Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters , https://arxiv.org/abs/2406.05955 10 Jun, Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching , https://arxiv.org/abs/2406.06326 11 Jun, An Image is Worth 32 Tokens for Reconstruction and Generation , https://arxiv.org/abs/2406.07550 11 Jun, TextGrad: Automatic "Differentiation" via Text , https://arxiv.org/abs/2406.07496 11 Jun, Simple and Effective Masked Diffusion Language Models , https://arxiv.org/abs/2406.07524 11 Jun, Never Miss A Beat: An Efficient Recipe for Context Window Extension of Large Language Models with Consistent "Middle" Enhancement , https://arxiv.org/abs/2406.07138 11 Jun, Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling , https://arxiv.org/abs/2406.07522 12 Jun, Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing , https://arxiv.org/abs/2406.08464 12 Jun, What If We Recaption Billions of Web Images with LLaMA-3? , https://arxiv.org/abs/2406.08478 12 Jun, Large Language Model Unlearning via Embedding-Corrupted Prompts , https://arxiv.org/abs/2406.07933 12 Jun, Large Language Models Must Be Taught to Know What They Don't Know , https://arxiv.org/abs/2406.08391 12 Jun, An Empirical Study of Mamba-based Language Models , https://arxiv.org/abs/2406.07887 12 Jun, Discovering Preference Optimization Algorithms with and for Large Language Models , https://arxiv.org/abs/2406.08414 13 Jun, Transformers Meet Neural Algorithmic Reasoners , https://arxiv.org/abs/2406.09308 13 Jun, MLKV: Multi-Layer Key-Value Heads for Memory Efficient Transformer Decoding , https://arxiv.org/abs/2406.09297 13 Jun, An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels , https://arxiv.org/abs/2406.09415 13 Jun, FouRA: Fourier Low Rank Adaptation , https://arxiv.org/abs/2406.08798 14 Jun, Bootstrapping Language Models with DPO Implicit Rewards , https://arxiv.org/abs/2406.09760 14 Jun, Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs , https://arxiv.org/abs/2406.10209 14 Jun, Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs , https://arxiv.org/abs/2406.10216 16 Jun, THEANINE: Revisiting Memory Management in Long-term Conversations with Timeline-augmented Response Generation , https://arxiv.org/abs/2406.10996 17 Jun, Task Me Anything , https://arxiv.org/abs/2406.11775 17 Jun, How Do Large Language Models Acquire Factual Knowledge During Pretraining? , https://arxiv.org/abs/2406.11813 17 Jun, mDPO: Conditional Preference Optimization for Multimodal Large Language Models , https://arxiv.org/abs/2406.11839 17 Jun, Nemotron-4 340B Technical Report , https://arxiv.org/abs/2406.11704 17 Jun, DataComp-LM: In Search of the Next Generation of Training Sets for Language Models , https://arxiv.org/abs/2406.11794 17 Jun, Tokenization Falling Short: The Curse of Tokenization , https://arxiv.org/abs/2406.11687 17 Jun, DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence , https://arxiv.org/abs/2406.11931 17 Jun, Unveiling Encoder-Free Vision-Language Models , https://arxiv.org/abs/2406.11832 17 Jun, Iterative Length-Regularized Direct Preference Optimization: A Case Study on Improving 7B Language Models to GPT-4 Level , https://arxiv.org/abs/2406.11817 17 Jun, HARE: HumAn pRiors, a key to small language model Efficiency , https://arxiv.org/abs/2406.11410 17 Jun, Measuring memorization in RLHF for code completion , https://arxiv.org/abs/2406.11715 17 Jun, Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts , https://arxiv.org/abs/2406.12034 18 Jun, From RAGs to Rich Parameters: Probing How Language Models Utilize External Knowledge Over Parametric Information for Factual Queries , https://arxiv.org/abs/2406.12824 18 Jun, Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges , https://arxiv.org/abs/2406.12624 19 Jun, Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? , https://arxiv.org/abs/2406.13121 20 Jun, Instruction Pre-Training: Language Models are Supervised Multitask Learners , https://arxiv.org/abs/2406.14491 20 Jun, Can LLMs Learn by Teaching? A Preliminary Study , https://arxiv.org/abs/2406.14629 21 Jun, A Tale of Trust and Accuracy: Base vs. Instruct LLMs in RAG Systems , https://arxiv.org/abs/2406.14972 21 Jun, LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs , https://arxiv.org/abs/2406.15319 21 Jun, MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression , https://arxiv.org/abs/2406.14909 21 Jun, Efficient Continual Pre-training by Mitigating the Stability Gap , https://arxiv.org/abs/2406.14833 24 Jun, Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers , https://arxiv.org/abs/2406.16747 24 Jun, WARP: On the Benefits of Weight Averaged Rewarded Policies , https://arxiv.org/abs/2406.16768 24 Jun, Adam-mini: Use Fewer Learning Rates To Gain More , https://arxiv.org/abs/2406.16793 25 Jun, The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale , https://arxiv.org/abs/2406.17557 25 Jun, LongIns: A Challenging Long-context Instruction-based Exam for LLMs , https://arxiv.org/abs/2406.17588 25 Jun, Following Length Constraints in Instructions , https://arxiv.org/abs/2406.17744 26 Jun, A Closer Look into Mixture-of-Experts in Large Language Models , https://arxiv.org/abs/2406.18219 26 Jun, RouteLLM: Learning to Route LLMs with Preference Data , https://arxiv.org/abs/2406.18665 26 Jun, Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs , https://arxiv.org/abs/2406.18629 27 Jun, Dataset Size Recovery from LoRA Weights , https://arxiv.org/abs/2406.19395 27 Jun, From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data , https://arxiv.org/abs/2406.19292 27 Jun, Changing Answer Order Can Decrease MMLU Accuracy , https://arxiv.org/abs/2406.19470 28 Jun, Direct Preference Knowledge Distillation for Large Language Models , https://arxiv.org/abs/2406.19774 28 Jun, LLM Critics Help Catch LLM Bugs , https://arxiv.org/abs/2407.00215 28 Jun, Scaling Synthetic Data Creation with 1,000,000,000 Personas , https://arxiv.org/abs/2406.20094 1 Jul, LLM See, LLM Do: Guiding Data Generation to Target Non-Differentiable Objectives , https://arxiv.org/abs/2407.01490 1 Jul, Searching for Best Practices in Retrieval-Augmented Generation , https://arxiv.org/abs/2407.01219 1 Jul, Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models , https://arxiv.org/abs/2407.01906 1 Jul, Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion , https://arxiv.org/abs/2407.01392 1 Jul, Eliminating Position Bias of Language Models: A Mechanistic Approach , https://arxiv.org/abs/2407.01100 2 Jul, JMInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention , https://arxiv.org/abs/2407.02490 2 Jul, TokenPacker: Efficient Visual Projector for Multimodal LLM , https://arxiv.org/abs/2407.02392 2 Jul, Reasoning in Large Language Models: A Geometric Perspective , https://arxiv.org/abs/2407.02678 2 Jul, RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs , https://arxiv.org/abs/2407.02485 3 Jul, AgentInstruct: Toward Generative Teaching with Agentic Flows , https://arxiv.org/abs/2407.03502 3 Jul, HEMM: Holistic Evaluation of Multimodal Foundation Models , https://arxiv.org/abs/2407.03418 4 Jul, Mixture of A Million Experts , https://arxiv.org/abs/2407.04153 5 Jul, Learning to (Learn at Test Time): RNNs with Expressive Hidden States , https://arxiv.org/abs/2407.04620 9 Jul, Vision Language Models Are Blind , https://arxiv.org/abs/2407.06581 9 Jul, Self-Recognition in Language Models , https://arxiv.org/abs/2407.06946 10 Jul, Inference Performance Optimization for Large Language Models on CPUs , https://arxiv.org/abs/2407.07304 11 Jul, Gradient Boosting Reinforcement Learning , https://arxiv.org/abs/2407.08250 11 Jul, FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision , https://arxiv.org/abs/2407.08608 12 Jul, SpreadsheetLLM: Encoding Spreadsheets for Large Language Models , https://arxiv.org/abs/2407.09025 12 Jul, New Desiderata for Direct Preference Optimization , https://arxiv.org/abs/2407.09072 12 Jul, Context Embeddings for Efficient Answer Generation in RAG , https://arxiv.org/abs/2407.09252 15 Jul, Qwen2 Technical Report , https://arxiv.org/abs/2407.10671 15 Jul, The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism , https://arxiv.org/abs/2407.10457 15 Jul, From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients , https://arxiv.org/abs/2407.11239 16 Jul, GoldFinch: High Performance RWKV/Transformer Hybrid with Linear Pre-Fill and Extreme KV-Cache Compression , https://arxiv.org/abs/2407.12077 16 Jul, Scaling Diffusion Transformers to 16 Billion Parameters , https://arxiv.org/abs/2407.11633 16 Jul, NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window? , https://arxiv.org/abs/2407.11963 17 Jul, Patch-Level Training for Large Language Models , https://arxiv.org/abs/2407.12665 17 Jul, LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models , https://arxiv.org/abs/2407.12772 17 Jul, A Survey of Prompt Engineering Methods in Large Language Models for Different NLP Tasks , https://arxiv.org/abs/2407.12994 17 Jul, Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models , https://arxiv.org/abs/2407.12327 18 Jul, Attention Overflow: Language Model Input Blur during Long-Context Missing Items Recommendation , https://arxiv.org/abs/2407.13481 18 Jul, Weak-to-Strong Reasoning , https://arxiv.org/abs/2407.13647 18 Jul, Understanding Reference Policies in Direct Preference Optimization , https://arxiv.org/abs/2407.13709 18 Jul, Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies , https://arxiv.org/abs/2407.13623 19 Jul, BOND: Aligning LLMs with Best-of-N Distillation , https://arxiv.org/abs/2407.14622 19 Jul, Compact Language Models via Pruning and Knowledge Distillation , https://arxiv.org/abs/2407.14679 19 Jul, LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference , https://arxiv.org/abs/2407.14057 22 Jul, Mini-Sequence Transformer: Optimizing Intermediate Memory for Long Sequences Training , https://arxiv.org/abs/2407.15892 22 Jul, DDK: Distilling Domain Knowledge for Efficient Large Language Models , https://arxiv.org/abs/2407.16154 23 Jul, Generation Constraint Scaling Can Mitigate Hallucination , https://arxiv.org/abs/2407.16908 23 Jul, Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach , https://arxiv.org/abs/2407.16833 23 Jul, Course-Correction: Safety Alignment Using Synthetic Preferences , https://arxiv.org/abs/2407.16637 26 Jul, Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data? , https://arxiv.org/abs/2407.16607 28 Jul, Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge , https://arxiv.org/abs/2407.19594 29 Jul, Improving Retrieval Augmented Language Model with Self-Reasoning , https://arxiv.org/abs/2407.19813 29 Jul, Apple Intelligence Foundation Language Models , https://arxiv.org/abs/2407.21075 30 Jul, ThinK: Thinner Key Cache by Query-Driven Pruning , https://arxiv.org/abs/2407.21018 31 Jul, The Llama 3 Herd of Models , https://arxiv.org/abs/2407.21783 31 Jul, Gemma 2: Improving Open Language Models at a Practical Size , https://arxiv.org/abs/2408.00118 1 Aug, S AM 2: Segment Anything in Images and Videos, https://arxiv.org/abs/2408.00714 2 Aug, POA: Pre-training Once for Models of All Sizes, https://arxiv.org/abs/2408.01031 2 Aug, RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework, https://arxiv.org/abs/2408.01262 2 Aug, A Survey of Mamba, https://arxiv.org/abs/2408.01129 3 Aug, MiniCPM-V: A GPT-4V Level MLLM on Your Phone, https://arxiv.org/abs/2408.01800 5 Aug, RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation, https://arxiv.org/abs/2408.02545 5 Aug, Self-Taught Evaluators, https://arxiv.org/abs/2408.02666 5 Aug, BioMamba: A Pre-trained Biomedical Language Representation Model Leveraging Mamba, https://arxiv.org/abs/2408.02600 5 Aug, Self-Taught Evaluators, https://arxiv.org/abs/2408.02666 7 Aug, EXAONE 3.0 7.8B Instruction Tuned Language Model, https://arxiv.org/abs/2408.03541 7 Aug, 1.5-Pints Technical Report: Pretraining in Days, Not Months -- Your Language Model Thrives on Quality Data, https://arxiv.org/abs/2408.03506 8 Aug, Conversational Prompt Engineering, https://arxiv.org/abs/2408.04560 8 Aug, Trans-Tokenization and Cross-lingual Vocabulary Transfers: Language Adaptation of LLMs for Low-Resource NLP, https://arxiv.org/abs/2408.04303 12 Aug, The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery, https://arxiv.org/abs/2408.06292 15 Aug, Hermes 3 Technical Report, https://arxiv.org/abs/2408.12570 19 Aug, Customizing Language Models with Instance-wise LoRA for Sequential Recommendation, https://arxiv.org/abs/2408.10159 20 Aug , Enhancing Robustness in Large Language Models: Prompting for Mitigating the Impact of Irrelevant Information, https://arxiv.org/abs/2408.10615 20 Aug, To Code, or Not To Code? Exploring Impact of Code in Pre-training, https://arxiv.org/abs/2408.10914 21 Aug , LLM Pruning and Distillation in Practice: The Minitron Approach, https://arxiv.org/abs/2408.11796 22 Aug, Jamba-1.5: Hybrid Transformer-Mamba Models at Scale, https://arxiv.org/abs/2408.12570 22 Aug, Controllable Text Generation for Large Language Models: A Survey, https://arxiv.org/abs/2408.12599 23 Aug, Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time, https://arxiv.org/abs/2408.13233 26 Aug, A Practitioner's Guide to Continual Multimodal Pretraining, https://arxiv.org/abs/2408.14471 26 Aug, Building and better understanding vision-language models: insights and future directions, https://arxiv.org/abs/2408.12637 26 Aug, CURLoRA: Stable LLM Continual Fine-Tuning and Catastrophic Forgetting Mitigation, https://arxiv.org/abs/2408.14572 27 Aug, The Mamba in the Llama: Distilling and Accelerating Hybrid Models, https://arxiv.org/abs/2408.15237 28 Aug, ReMamba: Equip Mamba with Effective Long-Sequence Modeling, https://arxiv.org/abs/2408.15496 29 Aug, Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling, https://arxiv.org/abs/2408.16737 31 Aug, LongRecipe: Recipe for Efficient Long Context Generalization in Large Languge Models, https://arxiv.org/abs/2409.00509 3 Sep, OLMoE: Open Mixture-of-Experts Language Models, https://arxiv.org/abs/2409.02060 3 Sep 2024, In Defense of RAG in the Era of Long-Context Language Models, https://arxiv.org/abs/2409.01666 5 Sep, Attention Heads of Large Language Models: A Survey, https://arxiv.org/abs/2409.03752 5 Sep, LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA , https://arxiv.org/abs/2409.02897 5 Sep, How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with High-Quality Data, https://arxiv.org/abs/2409.03810 6 Sep, T heory, Analysis, and Best Practices for Sigmoid Self-Attention, https://arxiv.org/abs/2409.04431 10 Sep, LLaMA-Omni: Seamless Speech Interaction with Large Language Models, https://arxiv.org/abs/2409.06666 10 Sep, What is the Role of Small Models in the LLM Era: A Survey, https://arxiv.org/abs/2409.06857 11 Sep, Policy Filtration in RLHF to Fine-Tune LLM for Code Generation, https://arxiv.org/abs/2409.06957 16 Sep, RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval , https://arxiv.org/abs/2409.10516 18 Sep, Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement , https://arxiv.org/abs/2409.12122 18 Sep, Qwen2.5-Coder Technical Report , https://arxiv.org/abs/2409.12186 21 Sep, Instruction Following without Instruction Tuning, https://arxiv.org/abs/2409.14254 30 Sep, I s Preference Alignment Always the Best Option to Enhance LLM-Based Translation? An Empirical Analysis, https://arxiv.org/abs/2409.20059 30 Sep, The Perfect Blend: Redefining RLHF with Mixture of Judges, https://arxiv.org/abs/2409.20370 (New paper by Meta on how they did RLHF for Llama 3) 1 Oct, Addition is All You Need for Energy-efficient Language Models, https://arxiv.org/abs/2410.00907 2 Oct Quantifying Generalization Complexity for Large Language Models, https://arxiv.org/abs/2410.01769 2 Oct, When a language model is optimized for reasoning, does it still show embers of autoregression? An analysis of OpenAI o1 , https://arxiv.org/abs/2410.01792 2 Oct, W ere RNNs All We Needed? , https://arxiv.org/abs/2410.01201 3 Oct, Selective Attention Improves Transformer , https://arxiv.org/abs/2410.02703 3 Oct, LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations , https://arxiv.org/abs/2410.02707 3 Oct, LLaVA-Critic: Learning to Evaluate Multimodal Models , https://arxiv.org/abs/2410.02712 7 Oct, Differential Transformer , https://arxiv.org/abs/2410.05258 7 Oct, GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models , https://arxiv.org/abs/2410.05229 8 Oct, ARIA: An Open Multimodal Native Mixture-of-Experts Model , https://arxiv.org/abs/2410.05993 8 Oct, O1 Replication Journey: A Strategic Progress Report -- Part 1 , https://arxiv.org/abs/2410.18982 8 Oct, Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG, https://arxiv.org/abs/2410.05983 9 Oct, From Generalist to Specialist: Adapting Vision Language Models via Task-Specific Visual Instruction Tuning , https://arxiv.org/abs/2410.06456 10 Oct, KV Prediction for Improved Time to First Token , https://arxiv.org/abs/2410.08391 11 Oct, Baichuan-Omni Technical Report , https://arxiv.org/abs/2410.08565 13 Oct, MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models , https://arxiv.org/abs/2410.10139 13 Oct, LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models , https://arxiv.org/abs/2410.09732 15 Oct, AFlow: Automating Agentic Workflow Generation , https://arxiv.org/abs/2410.10762 15 Oct, Toward General Instruction-Following Alignment for Retrieval-Augmented Generation , https://arxiv.org/abs/2410.09584 21 Oct, Pre-training Distillation for Large Language Models: A Design Space Exploration , https://arxiv.org/abs/2410.16215 23 Oct, MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models , https://arxiv.org/abs/2410.17637 23 Oct, Scalable Ranked Preference Optimization for Text-to-Image Generation , https://arxiv.org/abs/2410.18013 23 Oct, Scaling Diffusion Language Models via Adaptation from Autoregressive Models , https://arxiv.org/abs/2410.17891 24 Oct, Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback , https://arxiv.org/abs/2410.19133 25 Oct, Counting Ability of Large Language Models and Impact of Tokenization , https://arxiv.org/abs/2410.19730 25 Oct, A Survey of Small Language Models , https://arxiv.org/abs/2410.20011 26 Oct, Accelerating Direct Preference Optimization with Prefix Sharing , https://arxiv.org/abs/2410.20305 27 Oct, Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse , https://arxiv.org/abs/2410.21333 28 Oct, LongReward: Improving Long-context Large Language Models with AI Feedback , https://arxiv.org/abs/2410.21252 28 Oct, ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference , https://arxiv.org/abs/2410.21465 29 Oct, Beyond Text: Optimizing RAG with Multimodal Inputs for Industrial Applications , https://arxiv.org/abs/2410.21943 30 Oct, CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation , https://arxiv.org/abs/2410.23090 31 Oct, What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective , https://arxiv.org/abs/2410.23743 31 Oct, GPT or BERT: why not both? , https://arxiv.org/abs/2410.24159 31 Oct, Language Models can Self-Lengthen to Generate Long Texts , https://arxiv.org/abs/2410.23933 1 Nov, Adding Error Bars to Evals: A Statistical Approach to Language Model Evaluations , https://arxiv.org/abs/2411.00640 1 Nov 2024, Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation , https://arxiv.org/abs/2411.00412 1 Nov 2024, Multi-expert Prompting Improves Reliability, Safety, and Usefulness of Large Language Models , https://arxiv.org/abs/2411.00492 3 Nov, S ample-Efficient Alignment for LLMs , https://arxiv.org/abs/2411.01493 4 Nov 2024, A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness , https://arxiv.org/abs/2411.03350 4 Nov, "Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization , https://arxiv.org/abs/2411.02355 4 Nov, Parameter-Efficient Fine-Tuning of Large Language Models for Unit Test Generation: An Empirical Study , https://arxiv.org/abs/2411.02462 5 Nov, HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems , https://arxiv.org/abs/2411.02959 6 Nov, Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination , https://arxiv.org/abs/2411.03823 6 Nov, Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding , https://arxiv.org/abs/2411.04282 6 Nov, Number Cookbook: Number Understanding of Language Models and How to Improve It , https://arxiv.org/abs/2411.03766 7 Nov, Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models , https://arxiv.org/abs/2411.04996 7 Nov, BitNet a4.8: 4-bit Activations for 1-bit LLMs , https://arxiv.org/abs/2411.04965 7 Nov, Scaling Laws for Precision , https://arxiv.org/abs/2411.04330 8 Nov, Energy Efficient Protein Language Models: Leveraging Small Language Models with LoRA for Controllable Protein Generation , https://arxiv.org/abs/2411.05966 8 Nov, Balancing Pipeline Parallelism with Vocabulary Parallelism , https://arxiv.org/abs/2411.05288 11 Nov, Toward Optimal Search and Retrieval for RAG , https://arxiv.org/abs/2411.07396 12 Nov, Large Language Models Can Self-Improve in Long-context Reasoning , https://arxiv.org/abs/2411.08147 12 Nov, Stronger Models are NOT Stronger Teachers for Instruction Tuning , https://arxiv.org/abs/2411.07133 12 Nov, Direct Preference Optimization Using Sparse Feature-Level Constraints , https://arxiv.org/abs/2411.07618 13 Nov, Cut Your Losses in Large-Vocabulary Language Models , https://arxiv.org/abs/2411.09009 15 Nov, Does Prompt Formatting Have Any Impact on LLM Performance? , https://arxiv.org/abs/2411.10541 17 Nov, SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization , https://arxiv.org/abs/2411.11909 17 Nov, SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration , https://arxiv.org/abs/2411.10958 18 Nov, Bi-Mamba: Towards Accurate 1-Bit State Space Models , https://arxiv.org/abs/2411.11843 19 Nov, RedPajama: an Open Dataset for Training Large Language Models, https://arxiv.org/abs/2411.12372 20 Nov, Hymba: A Hybrid-head Architecture for Small Language Models , https://arxiv.org/abs/2411.13676 20 Nov, Loss-to-Loss Prediction: Scaling Laws for All Datasets , https://arxiv.org/abs/2411.12925 21 Nov, When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training , https://arxiv.org/abs/2411.13476 21 Nov, Multimodal Autoregressive Pre-training of Large Vision Encoders , https://arxiv.org/abs/2411.14402 21 Nov, Natural Language Reinforcement Learning , https://arxiv.org/abs/2411.14251 22 Nov, Large Multi-modal Models Can Interpret Features in Large Multi-modal Models , https://arxiv.org/abs/2411.14982 22 Nov, TÜLU 3: Pushing Frontiers in Open Language Model Post-Training , https://arxiv.org/abs/2411.15124 23 Nov, MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs , https://arxiv.org/abs/2411.15296 24 Nov, LLMs Do Not Think Step-by-step In Implicit Reasoning , https://arxiv.org/abs/2411.15862 25 Nov, O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson? , https://arxiv.org/abs/2411.16489 26 Nov, Star Attention: Efficient LLM Inference over Long Sequences , https://arxiv.org/abs/2411.17116 27 Nov, Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens , https://arxiv.org/abs/2411.17691 27 Nov, Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration , https://arxiv.org/abs/2411.17686 29 Nov, Reverse Thinking Makes LLMs Stronger Reasoners , https://arxiv.org/abs/2411.19865 29 Nov, Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability , https://arxiv.org/abs/2411.19943 2 Dec, Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis , https://arxiv.org/abs/2412.01819 2 Dec, X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models , https://arxiv.org/abs/2412.01824 2 Dec, Free Process Rewards without Process Labels , https://arxiv.org/abs/2412.01981 3 Dec, Scaling Image Tokenizers with Grouped Spherical Quantization , https://arxiv.org/abs/2412.02632 3 Dec, RARE: Retrieval-Augmented Reasoning Enhancement for Large Language Models , https://arxiv.org/abs/2412.02830 4 Dec, Perception Tokens Enhance Visual Reasoning in Multimodal Language Models , https://arxiv.org/abs/2412.03548 4 Dec, Evaluating Language Models as Synthetic Data Generators , https://arxiv.org/abs/2412.03679 4 Dec, Best-of-N Jailbreaking , https://arxiv.org/abs/2412.03556 4 Dec, PaliGemma 2: A Family of Versatile VLMs for Transfer , https://arxiv.org/abs/2412.03555 5 Dec, VisionZip: Longer is Better but Not Necessary in Vision Language Models , https://arxiv.org/abs/2412.04467 5 Dec, Evaluating and Aligning CodeLLMs on Human Preference , https://arxiv.org/abs/2412.05210 6 Dec, MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale , https://arxiv.org/abs/2412.05237 6 Dec, Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling , https://arxiv.org/abs/2412.05271 7 Dec, LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods , https://arxiv.org/abs/2412.05579 8 Dec, Does RLHF Scale? Exploring the Impacts From Data, Model, and Method , https://arxiv.org/abs/2412.06000 9 Dec, Unraveling the Complexity of Memory in RL Agents: An Approach for Classification and Evaluation , https://arxiv.org/abs/2412.06531 9 Dec, Training Large Language Models to Reason in a Continuous Latent Space , https://arxiv.org/abs/2412.06769 9 Dec, AutoReason: Automatic Few-Shot Reasoning Decomposition , https://arxiv.org/abs/2412.06975 11 Dec, Large Concept Models: Language Modeling in a Sentence Representation Space , https://arxiv.org/abs/2412.08821 12 Dec, Phi-4 Technical Report , https://arxiv.org/abs/2412.08905 13 Dec, Byte Latent Transformer: Patches Scale Better Than Tokens , https://arxiv.org/abs/2412.09871 13 Dec, SCBench: A KV Cache-Centric Analysis of Long-Context Methods , https://arxiv.org/abs/2412.10319 13 Dec, Cultural Evolution of Cooperation among LLM Agents , https://arxiv.org/abs/2412.10270 13 Dec, DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding , https://arxiv.org/abs/2412.10302 16 Dec, No More Adam: Learning Rate Scaling at Initialization is All You Need , https://arxiv.org/abs/2412.11768 16 Dec, Precise Length Control in Large Language Models , https://arxiv.org/abs/2412.11937 16 Dec, The Open Source Advantage in Large Language Models (LLMs) , https://arxiv.org/abs/2412.12004 16 Dec, A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges , https://arxiv.org/abs/2412.11936 17 Dec, Are Your LLMs Capable of Stable Reasoning? , https://arxiv.org/abs/2412.13147 18 Dec, LLM Post-Training Recipes, Improving Reasoning in LLMs , https://arxiv.org/abs/2412.14135 18 Dec, Hansel: Output Length Controlling Framework for Large Language Models , https://arxiv.org/abs/2412.14033 18 Dec, Mind Your Theory: Theory of Mind Goes Deeper Than Reasoning , https://arxiv.org/abs/2412.13631 18 Dec, Alignment Faking in Large Language Models , https://arxiv.org/abs/2412.14093 18 Dec, SCOPE: Optimizing Key-Value Cache Compression in Long-Context Generation , https://arxiv.org/abs/2412.13649 19 Dec, LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-Context Multitasks , https://arxiv.org/abs/2412.15204 20 Dec, Offline Reinforcement Learning for LLM Multi-Step Reasoning , https://arxiv.org/abs/2412.16145 24 Dec, Mulberry: Empowering MLLM with O1-like Reasoning and Reflection via Collective Monte Carlo Tree Search , https://arxiv.org/abs/2412.18319 31 Dec, Titans: Learning to Memorize at Test Time , https://arxiv.org/abs/2501.00663

Odin

HTML Perl

0 views

James Stanley 1 years ago

Clock Gear Train Calculator

I made a Clock Gear Train Calculator this evening. This is something I've done with ad-hoc perl scripts in the past, but I wanted to use it again and couldn't find my previous scripts, so I thought this time it would be better to have a proper user interface on it. If you're designing a clock, you might find it useful. It lets you specify a range for the pinion sizes and wheel sizes, a number of shafts, and the target gear ratio, and then it runs the calculation in a web worker , with a progress bar, because it can take slightly too long to justify blocking the user interface. The search is a brute-force depth-first search, obviously there is a lot of overlapping subproblems so a dynamic programming solution would be better, but this is adequate for now. It allows you to specify a tolerance on the gear ratio. Usually you will want an exact ratio, but that's not always possible. For example, the sidereal time complication on Breguet's No. 2894 approximates 1.0027379 as 51/82 * 79/49 = 1.00273768 (source: "The Art of Breguet" by George Daniels). This is the best you can do with 3 shafts with gears between 20 and 120 teeth, possibly much more. It also lets you export the calculated ratios as a CSV file, which I imagine will mostly be useful for integrating with other custom scripts. And, obviously, I got Cursor to write most of the code.

Frontend

Web Development Perl

0 views

Hillel Wayne 1 years ago

Toolbox languages

A toolbox language is a programming language that’s good at solving problems without requiring third party packages. My default toolbox languages are Python and shell scripts, which you probably already know about. Here are some of my more obscure ones. Had to show up! Autohotkey is basically “shell scripting for GUIs”. Just a fantastic tool to smooth over using unprogrammable applications. It’s Windows-only but similar things exist for Mac and Linux. Audacity doesn’t let you configure mouse shortcuts, so I used AutoHotKey to map the middle-mouse to a keyboard shortcut anyway. I made typing ` ` fill in the current date. This is a tool I use to take timestamped notes. Other uses: launch REPLs for toolbox languages. Input the 100-keypress sequence required to beat one videogame (if you know, you know). Further reading: An array language, like APL . Really good at doing arithmetic on arrays, hair-pullingly frustrating at doing anything with strings or structured data. I used to use it a lot but I’ve mostly switched to other tools, like Excel and Raku. But it’s still amazing for its niches. Get all of the prime factors of a number: Given two processes, each running a four step algorithm, how many possible interleavings are there ? What if I wanted a table of interleavings for each value of 1 to 3 processors and 1 to 3 steps? Further reading: Possibly the most obscure language on this list. Frink is designed for dimensional analysis (math with units), but it’s also got a bunch of features for covering whatever the developer thinks is interesting. Which is quite a lot of things! It’s probably the closest to “a better calculator” of any programming language I’ve seen: easy to get started with, powerful, and doesn’t have the unfamiliar syntax of J or Raku. If someone was born at midnight on Jan 1st 1970, when do they become a billion seconds old? If I run a certain distance in a certain time, what’s my average pace? What’s (6 ± 2) * (8 ± 1)? Raku (née Perl 6) is a really weird language filled to the brim with dark magic. It’s very powerful and also very easy to screw up. I’m not yet comfortable running it for a production program. But for personal scripting and toolkits, it’s incredible. Generate three random 10-character lowercase strings. Parse unusual structured data formats with grammars (see link). Copy a bunch of SVG ids over into inkscape labels. Write a CLI with a few fiddly combinations of options ( example ). Further reading: My newest toolbox language, and the language that got me thinking about toolboxes in general. A heady mix of logic programming, constraint solving, and imperative escape hatches. I first picked it up as a Better Constraint Solver and kept finding new uses for it. If I run at 4.5 miles/hour for X minutes and 5.1 for Y minutes, what should X and Y be to run 3.1 miles in 38 minutes? Given a bunch of activities, time constraints, and incompatibilities, figure out a vacation plan . Checking if a logic puzzle has multiple solutions. Checking if the clues of a logic puzzle are redundant, or if one could be removed and preserve the unique solution. Mocking up a quick Petri net reachability solver . Most of the good toolbox languages I’ve seen are for computation and calculation. I think toolbox languages for effects and automation are possible (like AutoHotKey) but that space is less explored. A toolbox language should be really, REALLY fast to write. At the very least, faster than Python. Compare “ten pairs of random numbers”: A few things lead to this: a terse syntax means typing less. Lots of builtins means less writing basic stuff myself. Importing from a standard library is less than ideal, but acceptable. Having to install a third-party package bothers me. Raku does something cool here; the Rakudo Star Bundle comes with a bunch of useful community packages preinstalled. If you can do something in a single line, you can throw it in a REPL. So you want a good REPL. Most of the languages I use have good repls, though I imagine my lisp and Smalltalk readers will have words about what “good REPL” means. Ideally the language has a smooth on-ramp. Raku has a lot of complexity but you can learn just a little bit and still be useful, while J’s learning curve is too steep to recommend to most people. This tends to conflict with being “fast to write”, though. Thanks to Saul Pwanson for feedback. If you liked this post, come join my newsletter ! I write new essays there every week. I train companies in formal methods, making software development faster, cheaper, and safer. Learn more here . My new book, Logic for Programmers , is now in early access! Find it here . You can configure shortcuts that are only active for certain programs, if a global flag is set, if certain text appears on screen, etc. Simple access to lots of basic win32 functionality. Opening the file selection dialog is just . The GUI framework is really, really good. Honestly the best of any language I’ve used, at least for small things. Learn AutoHotKey by stealing my scripts Somehow AutoHotKey is Kinda Good Now In Praise of AutoHotKey It is insanely terse. Things that would take a several lines in most languages take a few characters in J, so I like it for quickly doing a bunch of math. First-class multidimensional arrays. can add two numbers together, two arrays elementwise, a single number to every element of an array, or an array to every row (or column) of an higher-dimension array. There are lots of top-level primitives that do special case mathematical things, like decompose a number into its prime factors. J notation as a tool of thought J as a desktop calculator How to solve sudoku (The appendix, mostly) Lots of builtin units and unit modifiers. is exactly 365 days, is 365.24, and is about 1.6 seconds. Date literal notation: is 2199.01 years. There’s a builtin interval type for working with uncertainties. It’s a little clunky but it works. The Frink is Good, the Unit is Evil A brief introduction to interval arithmetic You can define your own infix operators ! And postfix operators. And circumfix operators . Lots and lots of syntactic sugar, to a level that worries me. Like instead of you can write . And instead of you can write . Raku Just Knows™ what to do. If you define a function then its parameters are turned into CLI arguments. Multimethods with multiple dispatch, based on runtime values. Combining this with makes small CLI tooling really easy. Many of the mathematical operators have unicode equivalents (like ∈ for ` `), which synergizes well with all of my AutoHotKey hotstrings. Raku, a Language for Gremlins An RNG that runs in your brain Raku is Surprisingly good for CLIs Assignment to variables. Shockingly useful in a logic language. Lots of problems felt almost right for logic programming, but there’d always be one small part of the algorithm I couldn’t figure out how to represent 100% logically. Imperative provided the escape hatch I needed. The module. I love the module. It is my best friend. Give it a goal and a list of possible actions, Picat will find a sequence of actions that reaches the goal. It is extremely cool. Planner programming blows my mind Picat is my favorite new toolbox language Solving a math problem with planner programming jq for json processing Javascript, so I can modify other people’s websites via the dev console Some kind of APL that offers the benefits of J but without the same frustrations I keep having A concatenative PL if I ever find out what small problems a CPL is really good for Something that makes webscraping and parsing as easy as calculation. Requests and bs4 ain’t it.

JavaScript

Programming

JSON Perl

0 views

Lambda Land 1 years ago

Skills That I Needed When I Started My PhD

I’m starting my third year as a PhD student. I thought it would be good to look back on some of the things that have helped me to this point. I study programming languages, but I imagine these things will help anyone in computer science—and some might have application to other STEM fields as well. There are many softer skills that you need as a PhD student: curiosity, good work ethic, organization, etc. These are essential and nothing can replace them. (Note: that was not an exhaustive list.) I’m going to focus on some of the tools and hard skills that made the ride a little more comfortable. These compliment, rather than compete with, the softer skills that one develops as a beginning researcher. This is a rough list, and not a how-to article. This is mostly just a collection of things I’ve seen other people lacking that have caused them to struggle. If you are considering doing a PhD, you might want to pick up some of these skills as you get ready to start to help you hit the ground running. I recommend reading The Pragmatic Programmer (Thomas, David and Hunt, Andrew, 2019). It’s written primarily for industry programmers, but there’s a lot in there that applies to anyone in CS research. All of the things I mention in this section are covered in detail in there. You have got to know Git. If you cannot wrangle versions of your software and papers (yes, put the papers you write under version control) you will waste much time shooting yourself in the foot and trying to recover work you lost. You will also be laughed to scorn should you ever depart academia for a stint in industry if you do not know Git. In all of the papers I have worked on, we have used Git to collaborate. We’ve typically used GitHub, which is fine as forges go, but I’ve also worked with a self-hosted GitLab instance, and that was fine too. It is incredibly helpful to know a scripting language. I grew up on Perl, which makes munging large amounts of text a piece of cake. You don’t have to learn Perl; you should get really comfortable with a language that makes it easy to manipulate text and files. Makefiles are also super helpful. I like using Makefiles to simply give names to a particular workflow. A Makefile for building a paper might look like this: Now, instead of remembering all the incantations necessary to do some task, I have given that task a name by which I can call it. You must become proficient with the command line. If you are doing research, you will likely need to run software that other researchers have produced. And more likely than not, this will be rough software with bugs and sharp edges that is meant to demonstrate some research concept than be some practical tool ready for developers who only know how to code through YouTube videos and ChatGPT. That this software is rough is a feature of research software , not a bug. There is rarely , if ever, a GUI available. You are going to have to do stuff on the command line, so get used to it. Getting used to the command line helps with Scripting as well. Any task you do on the command line, you can write a script to automate. Building little scripts to e.g. build your paper, your homework, your experiments, etc. will save you time in the long run. Emacs or Vim—pick one and learn it really well. VS Code is flashy and all, but it doesn’t have the same depth and breadth of customizations that Emacs and Vim give you. Also, Emacs and Vim are free software. You are in control! I, of course, love Emacs and I even made a starter kit called Bedrock to help some of my friends in my research lab get started with Emacs. I use Emacs to program, write papers, take notes, manage email, track tasks, and more. I made a list of my top Emacs packages a few weeks ago if you’d like more ideas on what is possible. Vim is fine too and I will still respect you if you choose to go that route. ;) Familiarity with LaTeX has definitely helped me. Fighting with LaTeX is no fun, but you will have to do a little bit of it at some point. Lots of people like using Overleaf; I prefer the command line. Don’t get me wrong: Overleaf is awesome and makes collaborating in a Google Docs sort of way possible, but you loose some flexibility, and if something goes wrong on Overleaf right before your deadline, you’re toast. There is a lovely computer science bibliography hosted at dblp.org . When I was going through the bibliography for my last paper I was able to find lots of missing DOIs simply by putting in the title of the paper into the search bar; DBLP found all the bibliographic information that I needed. Take notes whenever you learn how to do something that wasn’t obvious to you when you started out doing it. I like the Zettelkasten method for taking notes: whenever I learn how to e.g. do some complex layout in LaTeX or learn a neat Makefile trick, I write it down. You can think of it as writing your own personal pages If you don’t know what a page is, this is the standard manual system available on UNIX-like systems (e.g. FreeBSD, macOS, and Linux). Open a terminal and run to read the manual page for itself. You really need to get comfortable with the Command line . Some of these notes I rarely look back at. Others I revisit regularly. But even though I might not review some notes that frequently, there are cases where something on my system will break and a years-old note comes to my rescue from the last time I had to solve that problem. For example, I took notes on how to upgrade my language server for Elixir. I don’t upgrade that thing very often, but there is a little tweak I need to do just because of how my system is set up that is not obvious. It took me a few hours of debugging the first time, but, because I took notes, it now only takes me a few minutes. Academics generally love email. It’s simple, robust, and doesn’t change its UI every few weeks, unlike some popular chat platforms. Unfortunately many universities are forcing everyone to move to Outlook. This is a very bad thing. Fortunately, there are some workarounds that you can use to reclaim some control over your email. I have a sweet workflow with my email. That’s right, I do it all from Emacs. Now, while I do recommend you learn how to use Emacs, I understand that not everyone will start using Emacs. Everyone should get proficient with their email client and know how to use it well. I recommend anything that you can control entirely from the keyboard. You should also get comfortable with editing replies. You know how, when you reply to an email, you usually see something like this: Some mail clients will make the at the beginning of the line pretty with different colored lines and whatnot. It’s all angle brackets under the hood, and you can still edit it as described here. Just typing your reply above the email is called “top-posting”, and it’s considered bad form. You can actually edit the bit that was sent to interleave your reply with bits of the prior email. This makes it easier for people to know what you’re replying to. When used appropriately, this makes emails much more pleasant to read. It doesn’t break the email thread either; you can still see the chain of replies. You need some way to keep track of tasks. I have a workflow based off of Org-mode , which I will not detail here. The short of it is that you need to be spending at least a little time with some regularity “sharpening the saw” 1 by making sure that whatever tool you use to keep track of tasks is working for you. Thomas, David and Hunt, Andrew (2019). The Pragmatic Programmer , Addison-Wesley. https://en.wikipedia.org/wiki/The_7_Habits_of_Highly_Effective_People ↩︎ https://en.wikipedia.org/wiki/The_7_Habits_of_Highly_Effective_People ↩︎

Programming

Elixir

Career Perl

0 views

Massively Parallel Procrastination 1 years ago

Fab out

A couple years back, I put a ton of effort into building a tool that would let me create KiCAD fabrication outputs (gerber files, pick and place docs, schematics) from the commandline. What started as a hacky perl script became a 500 megabyte Docker image and a conference talk . At the time, getting KiCAD to generate Gerber files was...just barely possible through their Python API. But when it came to generating a schematic or a BOM, the simplest, most straightforward option was to spin up a headless X server in a virtual machine and write a bit of code to open the GUI, tab through UI widgets and "click" on the output options. It was slow and incredibly clunky. But it worked. Flash forward to last week when the first Release Candidate for KiCAD 8 dropped. This shell script, largely written for me by an LLM, just does everything my tool used to.

Electronics

Shell Perl

Python

Open Source

0 views

Ratfactor 3 years ago

Wordleshare

After tiring of jumping through ridiculous hoops to privately share Wordle results with family, I created a really simple Perl CGI application to do the job... This is what software would be like in a utopia.

Web Development

Programming Perl

0 views

Ahmad Alfy 4 years ago

Early detection of potential problems by checking frequently updated files using Git

One of the popular metrics used to assess the engineering team’s output is Code Churn. It has several definitions and each company and tool measures it differently. I like how Pluralsight defines it : Code Churn is when a developer re-writes their own code shortly after it has been checked in (typically within three weeks). Measuring Code Churn is difficult and requires tools that can analyze Git history. It tracks the change of lines of code overtime per contributor and the output is more than just the additions vs deletions. I am not going to talk about Code Churn too much because the article by Pluralsight does that very well. I want to share a similar concept we’re starting to experiment with. During my journey to find how Code Churn is measured, I found a way to find how many commits were made to files. Let’s take a look at the snippet below: When this command is executed inside a Git repository, it will list the top 10 files that have been committed to. The list will show the number of commits then the path of the file. Let’s take an example by running that command on the repository of next.js , we will get the following stats: That’s pretty much expected, the top 10 files are package configuration files. It might be because the maintainers are keeping the dependencies up to date all the time. If we are interested more in the code the developers write, we can modify the script to exclude these files using and regular expressions: Update (5th of March 2021) : as pointed out by my friend Ahmed El Gabri ; it’s better if we exclude the merge commits as well by using flag: We’ve excluded packages’ configurations, files, markdown files, and anything within the path . We will finally start to see the output we’re interested in: We can now see that most of the work is happening in . It’s receiving most of the commits from the contributors and it’s expected. After all, that’s the core of that package. In agile teams that work by sprints, by the end of each sprint, the team can get this data for analysis and discussion. A high number of commits to certain paths can happen due to many reasons like: Having an open discussion during sprint reviews could be the key to early detection of anything that can be fixed. The earlier command lists the frequency of commits in a specific branch since the beginning of the repository. For sprint review, it might be useful also to check that frequency during the sprint. Thankfully, Git allows us to log the changes after a specific date by using the flag. I ran this script on a couple of active projects we have and shared the numbers with the teams. On some projects, we were able to spot some problems quickly and took corrective actions. Others were showing inconclusive data where the number of commits was aligned with the output of the sprints. I am pretty confident that this procedure will help us improve the quality of our work and I expect I will write more about the results after we adopt it. If you’re interested in knowing how the script work, I’ve included a section below to explain it. Each line of the script produce an output. That output is manipulated by the next line. You can think of it like an assembly line where each workstation receives an input and update it. shows the commits log. The flags we provide modify the output and its format is a utility used to search in text. We use the flag that will instruct to return all the strings that doesn’t match the supplied RegExp (defined by the flag). Note that we’re using a Perl-compatible RegExp(PCRE) (defined by the flag). Note that PCRE doesn’t work by default on Mac, you will have to install GNU grep using brew and use instead. will sort the output, brining similar lines together to prepare the next utility to count their frequency. will filter out all the repeated lines. The count flag will display the number of how many times a path has been repeated. Again here we sort the output but this time by the number written at the beginning of each line (defined by ) and we supply the type of ordering as numeric ( ) and finally we reverse the order to display the higher numbers on top (defined by ). Finally, the command is used to display only the top 10 lines ( define that we’re interested in number of lines). Special thanks to Emad Elsaid for taking time to review this article. There are unresolved bugs that require intervention in the same file over and over. Maybe it’s because of poor quality, unhandled cases, or not enough tests. It might be an indicator that this file should be refactored into smaller modules. Changes are happening because of the continuously changing requirements. The file is a build artifact that should be taken outside of version control to be handled by CI/CD. Unnecessary updates caused by misconfigured linting tool in a contributor’s development environment. will show only the log of commits after certain date. will show only the names of the changed files. will exclude merge commits. defines the formatting of the output. In our case nothing is shown. Introduction to Code Churn - An article by Pluralsight. True Git Code Churn - A python script that can be used to measure Code Churn.

Python

JavaScript

Programming Perl

0 views