CrypLogger.com
  • Home
  • Exclusive
  • Learn About Coins
  • Artificial Intelligence
  • Special Projects
  • News
  • Opinions
  • Current Prices
No Result
View All Result
  • Home
  • Exclusive
  • Learn About Coins
  • Artificial Intelligence
  • Special Projects
  • News
  • Opinions
  • Current Prices
No Result
View All Result
CrypLogger
No Result
View All Result
Home News

Meta Introduces Megabyte System to Solve GPT Issues

by Vaibhav
May 26, 2023
in News
0
Meta Introduces Megabyte System to Solve GPT Issues
189
SHARES
1.5k
VIEWS
Share on FacebookShare on Twitter

Author Anna Kuznetsova Reading 3 min Views 4 Published 05/26/2023 Updated 05/26/2023

Meta AI recently published a preliminary study demonstrating a radical new “Megabyte” framework for building generative pre-trained transformer (GPT) systems.

Called “promising” by Andrey Karpathy of OpenAI, a former director of artificial intelligence at Tesla, the new architecture is designed to process large amounts of data such as images, novels and video files without using a process known as tokenization.

Related articles

Fireblocks Integrates MetaMask Institutional Wallet

June 8, 2023
US Department of Justice Opposes Bankrupt Bittrex’s Plan to Pay Customers Before Accrued Fines

US Department of Justice Opposes Bankrupt Bittrex’s Plan to Pay Customers Before Accrued Fines

June 8, 2023

Promising. Everyone should hope that we can throw away tokenization in LLMs. Doing so naively creates (byte-level) sequences that are too long, so the devil is in the details.

Tokenization means that LLMs are not actually fully end-to-end. There is a whole separate stage with… https://t.co/t240ZPxPm7

— Andrej Karpathy (@karpathy) May 15, 2023

Tokenization is a lossy process comparable to file compression. To process large amounts of data, GPT models convert bytes into tokens. The tokens are then processed by the resolver and used to generate output tokens which are then decoded.

See also  Charity NFT Drop from Binance and FC Shakhtar

The tokenization process allows the AI ​​system to process large strings of data as numbers. For example, the words “my favorite color is red” when processed by OpenAI ChatGPT will be converted to the token string “3666, 4004, 3124, 318, 2266, 13” for processing.

OpenAI demo of the tokenization process. Source: OpenAI

Unfortunately, even with the help of tokenization, the amount of data that today’s modern systems can handle still has a hard limit. For GPT-3.5, the limit is just over 4,000 tokens, or around 3,000 words, while for GPT-4, the maximum is around 32,000 tokens, or around 24,000 words.

Meta’s new Megabyte system eschews tokenization in favor of a new multi-level prediction architecture capable of end-to-end modeling over 1 million bytes of data.

See also  Justin Sun wanted to farm SUI at Launchpool, but got punched in the nose by Changpeng Zhao

Most standard English encoding systems use the standard 8-bit encoding. In this paradigm, each character occupies one byte of data. Thus, an artificial intelligence system capable of processing 1 million bytes of data without tokenization can work with text documents containing 750,000 words, which is 3025% more than in GPT-4.

In comparison, GPT-4 can currently process about 10 full-length news articles per invitation, while Megabyte will be able to analyze Leo Tolstoy’s entire War and Peace plus two more medium-length novels.

Meta’s Megabyte model also performed well in ImageNet’s and ImageNet benchmarks related to audio file processing, either equaling or outperforming existing byte-based transformation models such as DeepMind’s Perciever AR in both cases:

“Megabyte matches the state-of-the-art performance of PerceiverAR using only half of the computation.”

The implications of this research could be far-reaching. Tokenization is considered a barrier in this field due to its severe data limitations and the amount of energy and time required to train systems.

See also  SEC refuses to include digital assets in hedge fund rules

Without tokenization, it should be possible to train AI models with stronger fundamental support for non-English languages, especially those that cannot be easily encoded with standard 8-bit characters.

This could lead to further democratization of these technologies, allowing everything from cryptocurrency trading bots to decentralized autonomous organization technologies to be embedded in native language codes around the world.

Related: Sam Altman’s Worldcoin Secures $115M for Decentralized Identity

It would also enhance the ability of models like ChatGPT to work with images, video, and audio files by creating media clips that use roughly the same time and power consumption as text.

Share76Tweet47

Related Posts

Fireblocks Integrates MetaMask Institutional Wallet

by Vaibhav
June 8, 2023
0

Ethereum-company ConsenSys announced the integration of the institutional version of the MetaMask wallet with the Fireblocks custodial platform. 📣We're thrilled...

US Department of Justice Opposes Bankrupt Bittrex’s Plan to Pay Customers Before Accrued Fines

US Department of Justice Opposes Bankrupt Bittrex’s Plan to Pay Customers Before Accrued Fines

by Vaibhav
June 8, 2023
0

Author CryptoHamster Reading 2 minutes Views 2 Published 06/08/2023 Updated 06/08/2023 The U.S. Department of Justice (DOJ) has filed an...

Lens web3 platform raises $15 million

by Vaibhav
June 8, 2023
0

The Lens Protocol project, owned by DeFi landing protocol Aave, has closed a $15M funding round led by IDEO CoLab...

The largest bank in Australia has introduced restrictions on transactions with cryptocurrencies

by Vaibhav
June 8, 2023
0

The Commonwealth Bank of Australia (CBA) has introduced measures that limit the ability of customers to send funds to cryptocurrency...

US Senators Suspect Binance of False Claims

by Vaibhav
June 8, 2023
0

The US Department of Justice should initiate an investigation into Binance over whether the cryptocurrency exchange lied to lawmakers about...

Load More

Recent News

  • Fireblocks Integrates MetaMask Institutional Wallet
  • US Department of Justice Opposes Bankrupt Bittrex’s Plan to Pay Customers Before Accrued Fines
  • Lens web3 platform raises $15 million
  • The largest bank in Australia has introduced restrictions on transactions with cryptocurrencies
  • US Senators Suspect Binance of False Claims
  • Binance May Have Lied to US Lawmakers, Senators Claim: Report
  • Changpeng Zhao commented on the story about the SEC lawsuit on Chinese TV
  • Expert: whales “bought on the bottom” against the backdrop of a lawsuit against Binance
  • Subsocial Chat Program Implements Ethereum Usernames, Polygon Donations
  • SEC sues crypto exchange Coinbase
  • World Mobile plans rollout in Africa after decentralized wireless field trials
  • 50x.com launches Telegram wallet with cryptocurrency exchange function
  • Lawyer Calls Binance’s Complicated Corporate Structure Weak Link in Case Against SEC
  • MetaMask Institutional integrates with Fireblocks MPC platform
  • MetaMask Institutional for integration with the Fireblocks MPC platform
  • Analysts have recorded a reduction in risk appetite in the crypto market
  • Former CFTC chief to join Circle
  • Bitcoin Pool f2pool Launches Satoshi Auctions for Ordinals
  • Ankr and Microsoft partnership continues with debut of blockchain creation tool
  • Kraken fixes withdrawal delay issue

Follow Us On Twitter

Twitter feed is not available at the moment.

  • Home
  • About Us
  • CCPA
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms of Use
Email: contact@cryplogger.com

© 2021-23 Cryplogger.com
CrypLogger is a cult magazine about bitcoin, blockchain technology and the digital economy. Every day we supply news and analytics on the cryptocurrency market since 2021.

No Result
View All Result
  • Home
  • Artificial Intelligence
  • Special Projects
  • News
  • Opinions

© 2021-23 Cryplogger.com
CrypLogger is a cult magazine about bitcoin, blockchain technology and the digital economy. Every day we supply news and analytics on the cryptocurrency market since 2021.

Go to mobile version