AIGC: I am not a vassal of the meta universe

Published on 1 Weeks ago   214 views   0 Comments  

"If productivity is mature, will imagination be far behind?"Β 

Author | Chen Caixian Editor | Cen Feng

Source:Lei Feng

Note: Babbitt is authorized to reprint. If you need to reprint, please apply for authorization on the official website of Lei Feng

Image source:Boundless Layout AI ToolBuild.

"When did you notice human beings?"

"When the first primitive man began to look up at the stars."

AI apes have long looked up to humans.

01 Examination from the machine

In the past two short years, Wang Chaoyue, an algorithm practitioner, was shocked by AI twice.

One was when OpenAI presented the artificial intelligence painting product DALL Β· E in March last year. Just input a sentence on the computer, and DALL Β· E can understand the sentence, and then automatically generate an image with corresponding meaning, which is the first and unique image in the whole network.

All communication across "ethnic groups" is a mutation of civilization, and the response from unknown machine systems also makes people feel shocked and curious about UFOs. In the modern society where the distance between people is getting farther and farther, machines seem to be able to read a person's heart.

"You can obviously feel its progress compared with GAN (an AI generation network in 2014). The technology of DALL Β· E is revolutionary." Wang Chaoyue told Lei

The second time was when Google released the 540 billion parameter model PaLM in April this year. With the increase of the parameters, PaLM's ability to understand text and to reason logically has been greatly improved. It can even explain a joke with text and tell readers why the joke is funny.

Before that, the most commonly used remark that people mocked AI was that the reasoning ability of this AI model was very weak, like a 3-year-old child. However, since the development of the big model, it has been able to do arithmetic and logical reasoning, and its mind has approached or even surpassed humans in some aspects. "For example," Wang Chaoyue gave an example: "There are many jokes that I can't understand at the moment, but they can explain to me, which shows that they know more about some language understanding tasks than I do."

Wang Chaoyue is a senior researcher of generative AI. Since the release of GAN in 2014, he began to pay attention to the research related to AIGC. At that time, GAN was the research hotspot of deep generative networks, but the popularity was far less than the major breakthrough of AIGC in the past two years. The two technologies mentioned above have also become the "fuse" to ignite the AI revelry in the second half of this year:

CLIP, the key technology behind DALL Β· E, enables the two modes of text and picture to find the junction point where they can talk, and becomes the cornerstone of DALL Β· E, DALL Β· E 2.0, Stable Diffusion and other breakthrough AI achievements; While language models like PaLM burn money, their ability to understand human language has improved rapidly, which is the premise that AI can understand people.

"The AI technology breakthrough in the past two years is really fast." Lan Zhenzhong, founder of Xinchen Technology (Inception Team), also said. He often reads papers with great excitement: "After CLIP came out for a period of time, MAE (an AI paradigm proposed by He Kaiming's team, which can transfer excellent performance in language tasks to visual task processing) came out again, followed by Stable Diffusion..."

Since the launch of Stable Diffusion in August this year, Lan Zhenzhong and his team quickly caught up. It took less than a month to launch the AI painting product "Dream Thief", and quickly became popular in China. The drawing speed was as short as 1 second, and the painting quality was very high. The daily retention rate was close to 50% (higher than 90% of the small programs). Within two months, they received a large order from To B.

Image works generated by "Inception Master"

The first AIGC White Paper in China was released at the World Artificial Intelligence Conference (WAIC) in Shanghai on the second day (September 1) when Inception Master went online. Wang Chaoyue participated in the writing of this white paper and led the sorting and outlook of the AIGC technology system.

The release of the AIGC white paper attracted the attention of a large number of participants. Not only researchers in the field of artificial intelligence, but also practitioners in the field of meta universe:

"At that time, Sequoia Capital's article on generative AI had not yet come out, and everyone did not know what AIGC was. This shows that the importance of digital content generation is an industry consensus."

Then, everything was fast: technological breakthroughs led to the prosperity of applications. Midjournal became popular overseas, and the boom in student maps made people see more previously unpopular AIGC branches, such as text generation, video generation, and music generation. Only then did industry insiders realize that overseas people were like Jasper Companies such as ai have been successfully verified on the commercial landing. Following the previous generation of perceptual intelligence focusing on recognition and detection, "creative intelligence" for generation and editing has become the new favorite of capital.

More unexpectedly, this wave of AIGC craze has also attracted the attention of people outside the circle, such as we media KOL, illustrators and graphic creators. People have panics, and their criticisms are endless; There is also joy and hope to embrace cutting-edge technology.

But whether people accept it or not, an irreversible trend is already taking place.

02AIGC's era of great navigation has begun

In 1519, an expedition fleet sailed out of Spain from the west, opening the era of great navigation of human civilization.

Later, when global historians recorded civilization, they always mentioned an explorer named Magellan and his curiosity about sailing at first: is the earth square or roundβ€”β€” Magellan is the advocator of the earth circle theory; If the earth is square, it proves that navigation cannot be successful; If the earth is round, it will eventually return to the origin.

In 1950, another scientific explorer named Alan Turing also had a similar curiosity: can machines make conscious responses according to human behaviorβ€”β€” He proposed a famous detection method called "Turing Test", which opened the era of artificial intelligence research.

Nowadays, researchers in the AI field seem to have acquired similar desire and enthusiasm in the technological exploration of AIGC. They want to know: can machines read human thoughts and logic and create from 0 to 1?

The answer is: after nearly ten years of technological development, they think they can and believeThe current AIGC exploration has reached the engineering stage。

Like Magellan's navigation, the purpose has been clear, and the navigation map (theory and framework) has taken shape. Next, it is to verify whether the technical route can reach the destination.

Take Wensheng Map as an example. Although AI's ability to draw according to the text description is not perfect, such as the ability to output images of different quality based on different text prompts, the ability to understand long texts is insufficient, and the generated images are incomplete due to missing key words, etc., these are all specific research problems, and it is only a matter of time before they are solved.

Why is AIGC's map sketched? This is mainly attributed to three aspects:Large model, multi-mode and controllable。

In 2020, OpenAI will launch the pre training language model GPT-3 with 175 billion parameters, setting off a research wave of 100 billion parameter models at home and abroad. At that time, AI's language expression and understanding ability began to improve rapidly. Since then, AI has been able to write good articles in a short time.

In fact, a wave of commercial companies specializing in text generation appeared overseas at that time, such as Jasper Ai and Copy ai。 These companies have developed a machine automatic writing platform. Users can enter keywords and AI in just a few minutes to write a long article that is logical and expressive, which can replace a lot of labor in the writing process and can exchange for business value.

However, because OpenAI does not open the GPT-3 interface to the mainland and Hong Kong, it is difficult for domestic AI researchers to use it, and the applications related to text generation have not been popular in China. In the past two years, although there are also many large factories and universities in China who are going to study the Chinese big model, the progress in open source is still slow, leading a large number of AI developers to stop at the high training cost, which limits the development of AI applications based on the Chinese language.

In this wave of AIGC, AI macro model has played a key role in understanding human language. Thanks to the development of large models, not only the effect of text generation is good, but also text-based image generation has made great progress compared with the GAN era.

Wang Chaoyue told Lei that when they wrote the AIGC White Paper, they actually had internal problems: should the title be written as "AIGC" or "Generative AI"? Finally, Wang Chaoyue voted for AIGC, because generating model is a special academic term, which generally describes the model fitting a specific distribution, such as GAN.However, what Dall Β· E 2.0 has done has exceeded the fitting of a certain data distribution to some extent, showing the general image generation ability.

For example, the most famous application of GAN is face generation: the model looks at a large number of face photos, knows that faces are distributed, and then learns the characteristics of faces. When there is no method to generate high-dimensional data images in 2014, GAN is a strong generation method, but its limitations are also essential:

First of all, it needs a specific data set (such as face), and its generalization ability is poor. For example, after the release of GAN, it is used to train a variety of face effects, but one GAN cannot train a variety of face effects, and another special effect requires training a new GAN; Secondly, GAN does not do well in controlling image generation through text description, which largely restricts it to become a controllable universal structure.

The Dall Β· E (and later Dall Β· E 2) released by OpenAI uses a universal model: a large language model that can handle multiple language tasks at the same time, a CLIP model that connects text and image modes, and a diffusion model that controls image generation, which can further generate the combination of concepts and elements and generate more complex scenes on the basis of ensuring authenticity.

An example is that AI can edit images according to the description of words, and take into account such factors as shadow, projection and physical surface texture when adding or moving image elements. For example, if humans plan to generate flamingos at the position shown in Figure 3 below, AI will really generate two flamingos at the glass edge of the outdoor and cast shadows:

When flamingos are generated at the position (in the middle of the swimming pool) specified in Figure 2 above, AI will automatically generate an image adapted to the swimming pool environment - flamingo swimming circle:

The multimodal study of words and images can be generally divided into three stages: 1. Graphic description (let the computer describe the things in the picture); 2. Graphic Q&A (give a picture and ask what is on the table in the picture. The robot needs to understand the question and what is in the picture); 3. Use one sentence to generate a picture (let the robot paint through one sentence description).

An important contribution of multimodality is the data source: it provides a good pair of training data of text and image, which is also an important material to help AIGC model learn cognition.

The representative applications of Phase I and Phase II previously were AI generated movie commentaries and intelligent dialogue robots on the short video platform, respectively. In Phase III, the machine system must understand human language and common sense, the operation rules of the physical world, etc., otherwise it is impossible to conduct cross mode creation controlled by people. However, DALL Β· E, Midjournal, Inception Master and other products have demonstrated breakthroughs in understanding human beings and the world.

A large number of research experiments have shown that AI can gradually understand abstract concepts (such as common sense and rules) of human language when the model is large enough and there are enough training data. Wang Chaoyue learned from Tao Dacheng when he was a doctoral student. Their team started from the deep learning theory, and more than once proved through the capacity analysis of the model that the large model has better performance in learning general knowledge and understanding generalization.

This is a capability not reflected in the previous generation model. It's decidedAIGC is not only a generation, but an application ecology of models based on cognition and understanding。 When AI has basic cognition and understanding, and machines think and create like people, it is no longer a legend of mirage, but a reality that is happening.

03 Commercialization: bursting in silence

The essence of activities in modern society is a string of digital content: voice, text, image, video... AIGC can provide a basic element for creating these content.

In fact, AIGC (Artificial Intelligence Generated Content) has always existed, but it was not until this year that it was popular with domestic capital. First, the technology was mature, and second, the capital that originally focused on the commercialization of visual AI was discovered, similar to Jasper Overseas NLP companies such as ai began to make large profits.

Due to the advantages of creating digital content, AIGC technology has also been listed as a tool for building the future meta universe by the enthusiasts following the meta universe in the past year. But behind the stunt, more AIGC practitioners believe that AIGC can build the next generation of digital world faster than the meta universe, a new track completely belonging to AIGC.

The reason behind this is the essential difference between AIGC and the technology that the current meta universe relies on: taking graphics (the key technology for creating digital people) as an example, graphics focuses on simulation and reproduction in content generation, while AIGC focuses on originality and creation. It is necessary to have a real person as a reference to build a digital person from the perspective of graphics, but AIGC has never seen anything like 0 to 1 in terms of voice, text and image generation.

Take the movie "No Pair" as an example. AIGC is Zhang Jingchu, and graphics is Guo Fucheng.

As AIGC meticulously calculates every word and pixel in writing and painting, the images and articles created by users based on the AIGC model are all unique and absolutely original players in the world.

The essence of digital content+unique and original features determine that the capacity of AIGC's track is large enough. The former implies that it can be built into a standard product like Internet content platform or product, while the latter means that itIt can obtain market recognition comparable to that of human creators。

Take text generation as an example. Like Jasper NLP companies like ai have incubated a new career overseas, called "AI Soul Writer":

Human users input titles and keywords on the AI text generation platform, and AI will generate a long article. Then people will modify the written article, sell the modified article to companies that need a large number of high-quality articles to do search engine optimization, and earn the difference between the sale of works and membership fees for subscribing to AI products.

The profit model of image generation is the same: for example, in overseas, users subscribe to Midjournal members, use AI to generate beautiful images, and then sell them to a gallery like iStock to earn the difference.

Since Google's search is inclined to original articles, and the articles written by AI are unique original articles and non information pieces in the whole network, Google will give more traffic to such articles and improve the search ranking.

This also makes Jasper Companies such as ai can quickly gain market share. According to Jasper Ai claims that as of september this year, their revenue last year exceeded 40 million dollars, and this year is expected to double. currently, the number of paying users has reached 70000, with a valuation of 1.5 billion dollars. It is only 18 months since Jasper ai was founded.

Lan Zhenzhong told Lei that the articles written by AI now are highly readable. They once opened a WeChat official account, wrote horoscopes with Chinese models, and also gained a certain amount of reading. Some readers even commented backstage: "You must be a Capricorn, so you know me." In addition to the dream stealers, their text generation tool "HayFriday" has not been launched for long, but there are thousands of paying users overseas.

High speed growth also occurs in the field of image generation. According to the data, the overseas AI painting product Midjournal has been launched only three months ago, with more than 3 million registered users. According to Lei,Less than two months after the launch of Inception, the number of images generated has reached 10 million.

Many industry people said, "To put it bluntly,The core of the Internet is traffic, and the core of traffic is content。 The essence of AIGC is to produce content. "

This also means that, compared with the visual AI that the previous generation needs to combine with terminal hardware, or the meta universe with a huge world outlook, AIGC's commercialization is more specific, with lower investment costs and faster profitability. Even a more radical point of view is that AIGC can run out of a "content generation platform" that relies on traffic and is comparable to or even exceeds the size of existing Internet content platforms (such as Xiaohongshu and Tiktok).

Image works generated by "Inception Master" according to user description

At the moment when the demand for content is strong, the change of content production mode brought by AIGC has also started to cause the change of content consumption mode. Respect the original market and start to revere AIGC.

The latest response is from library enterprises:

At the end of October, Shutterstock, a well-known overseas gallery, announced that they had established cooperation with OpenAI, allowing users to input text to generate original pictures that meet their needs in real time. (In fact, many people in the industry also believe that in the wave of AIGC, material library and P chart software are the first industries to be eliminated or replaced.)

This cooperation is not only a timely reflection of a traditional industry, but also means that in fact, an imagination of AIGC's commercial realization seems to be starting to come true: to build aBuild BasedA new content platform.

Many people do not know what this means, but in the eyes of some people, AIGC's influence has begunFrom serving individual users to serving actual industriesβ€”β€”The current content platform is mainly based on keyword search and recommendation. After the introduction of AIGC, the content consumed by users comes from AI's understanding of users. The content based on recommendation comes from a limited material library, while the content based on generation is endless

Although the authorities of AIGC are the creators of this track, they are still frightened by the creation speed and creativity of machines that are comparable to, or even superior to, humans in witnessing the magic of AIGC.

ZMO. Zhang Shiying, the founder of AI, said: "For example, in the current short video platform, recommendation is to recommend the limited content created by creators to you, but everyone who consumes content is the creator. Consumer feedback on content can make AI know more about what you want, what you like, what AI generates, which will be updated in real time and unlimited."

ZMO. AI is one of the earliest AIGC companies in China. Unlike the products that are good at creating artistic images such as Stable Diffusion, ZMO The track selected by AI is a real world image generation, such as design. They first focused on the img creator launched overseas Ai has reached a rapid growth of 320000 months.

They believe that AIGC is not only a production tool for digital entertainment content, but also a great help to many practical industries. This track is big enough, and researchers and entrepreneurs can do enough. (Emad Mostaque, the founder of Stability.AI, said the same thing, saying that the track of AIGC is bigger than that of new energy.)

As far as images are concerned, the current materials mainly rely on shooting, which is inefficient and expensive. For example, Shangxin, an e-commerce platform, is currently shooting offline, requiring makeup artists, costume designers, photographers, models, etc. In the future AIGC world, they want to use AI to directly generate model images that can display clothing. Now, their AIGC product "YUAN Initial" applet has achieved amazing results in image editing:

Compared with the generation of artistic style, the generation of real and photographic style images is more difficult, but it has a huge impact on actual production and life. For example, in the design industry, from posters, PPTs, web pages, to the packaging of all goods, illustrations and other applications with high requirements for original materials, AIGC has its place.

Not to mention replacement, a large number of designers will use AIGC products to simplify the work in the draft stage before design. Zhang Shiying gave an example of architectural design: they worked with an architect to design a symphony concert hall with a height of about 25 metersβ€”β€”

Before AIGC, the architectural designer first drew a sketch with a pencil, and then made it into colored lead after the sketch was beautiful. After the colored lead version was satisfied, he made a 3D visual effect picture for Party A's customers to see. When Party A was satisfied, he designed the engineering structure inside the building. With AIGC, they have greatly saved time from the first step, and quickly generated the scheme in the designer's mind with AI and sent it to Party A.

"When AI writes a few sentences and P pictures, you may feel nothing. But if AI starts to design buildings one day, you will have to rethink its value."

04 It's just a matter of time

In his novel Chaowen Dao, Liu Cixin expressed the truth of technological development in the mouth of the alien "risk eliminator":

"The starting point for mankind to obtain the ultimate mystery of the universe began with the first ape looking up at the sky."

Like human exploration of the universe, AI is also constantly exploring human beings. Today's AI apes (AIGC) have seen the vast sky. At present, more and more researchers are involved in the exploration of AIGC, and AIGC is getting closer to higher level thinking creation. Conquering seems to be just a matter of time.

The past ten years have witnessed the rapid development of AI. During the ten years of ups and downs, interesting technical points have emerged one after another, some of which have become a new track (such as identification for security), and some of which have been "stillborn" and short-lived in the process of commercialization.

In the great waves, people are both expecting and cautious about AIGC.

For example, some investors are worried about whether AIGC can achieve some success in domestic business.

Taking text generation as an example, AIGC's commercial realizationHighly user driven。 However, the current domestic Chinese language model lacks high-quality corpus data in the open source, resulting in uneven writing quality of Chinese AI on different topics; At the same time, the labor cost of domestic text practitioners is generally lower than that of overseas developed countries in Europe and the United States, and the cost savings of AIGC in content generation as a substitute for human resources are also significantly lower than that of overseas markets.

In the scenario involving game with people, the service cost provided by AI must have obvious advantages over human cost to be accepted by the existing industryβ€”β€”This is almost a tacit law. Industrial quality inspection is a good reference example: the priority of traditional factories in quality inspection is cost. When the monthly salary of a quality inspection employee is generally 6-7k, and a visual AI solution cannot match the cost and achieve high accuracy, it is difficult to convince the industry.

Xinchen Technology told that their text generation tool is currently priced at Jasper One tenth of AI, but the acceptance of domestic users is still climbing, which also requires the continuous progress of Chinese big models (GPT-3, PaLM, etc. are English big models).

But more people believe that AIGC will change all aspects of modern production and life,Because the problems solved by AIGC are existing rather than hypothetical。 These problems are very specific, and in most scenes, they can partly or completely replace the heavy manpower, not only reduce costs and increase efficiency, but also lower the threshold of content creation, stimulate people's creativity and imagination.

For example, painting. The skills that used to be acquired through more than ten years of training, but now people with zero foundation also use AI to create, and the quality of the works is not inferior to that of the works hand-painted by professionals. This also allows people to further see the essence of creation: ideas and views are always the soul of creation, rather than ways and tools.

Although many companies have attached the title of AIGC to the promotion and positioning of their products in recent days, Lei understands that there are still technical barriers to AIGC in terms of text generation and image creation.

In addition,The choice of algorithm and data also determines the subsequent performance of each enterprise in different scenarios。 At present, in the process of commercialization, it is urgent for AIGC practitioners to choose a landing scenario with high technical barriers and adequate security of the moat.

The entrepreneurs of AIGC told Lei that they believed that AI technology might change the outcome of right tour in the future. In the future, it is very likely that a content consumption platform based entirely on generation will emerge. In the future, AIGC will become a key technology in the meta universe and Web 3.0, but they still have to cross mountains and hills before they go.

But at least, they already know the location of the hills.

Next, we will talk about the difficulties and opportunities of AIGC's entrepreneurship in the Chinese market. If you are an AIGC entrepreneur or you are also following AIGC, please add WeChat (Fiona190913) for communication.

Reference link:

  1. White Paper on Artificial Intelligence Generated Content (AIGC) (2022)

Generic placeholder image
214 views   0 Comments   1 Weeks ago
94 views   0 Comments   1 Weeks ago
87 views   0 Comments   1 Weeks ago
128 views   0 Comments   1 Weeks ago
114 views   0 Comments   2 Weeks ago